About This Module

This module provides foundational knowledge of computer programming concepts and software engineering practices. It introduces students to major data science programming languages and workflows, with a focus on social science data and research questions. Students will be introduced to Python and R, two principal data science programming languages. This course covers basic and intermediate programming concepts, such as object types, functions, control flow, testing and debugging. Particular emphasis will be made on data handling and analytical tasks with a focus on problems in social sciences. Homeworks will include hands-on coding exercises. In addition, students will apply their programming knowledge on a research project at the end of the module.

Instructors

Module Meetings

  • 11 two-hour lectures
    • Monday at 11:00 in Lloyd Institute LB04
  • 11 one-hour tutorials
    • Group 1: Wednesday 14:00 in Lloyd Institute LB04
    • Group 2: Thursday 10:00 in Lloyd Institute LB04
  • No lecture/tutorial in Week 7
Week Language Topic
1 - What is computation?
2 Python Python Basics
3 Python Control Flow in Python
4 Python Functions in Python
5 Python Debugging and Testing in Python
6 Python Data Wrangling in Python
7 - -
8 R Fundamentals of R Programming I
9 R Fundamentals of R Programming II
10 R Data Wrangling in R
11 Python, R Performance and Complexity
12 Python, R Web scraping

Prerequisites

This is an introductory class and no prior experience with programming is required.

Hardware and Software

  • Laptop with Windows/Mac/Linux OS (no Chrome books)
  • Software:
    • Python (version 3+) - versatile programming language
    • R (version 4+) - statistical programming language
    • Jupyter - web-based interactive computational environment
    • RStudio - integrated development environment
    • Git - version control system
    • GitHub - git-based online platform for code hosting

Materials

The following texts provide a good introduction to Python and R programming with a focus on data analysis applications:

  • Guttag, John. 2021 Introduction to Computation and Programming Using Python: With Application to Computational Modeling and Understanding Data. 3rd ed. Cambridge, MA: The MIT Press

  • Matloff, Norman. 2011. The Art of R Programming: A Tour of Statistical Software Design. San Francisco, CA: No Starch Press.

  • McKinney, Wes. 2017. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. 2nd ed. Sebastopol, CA: O’Reilly Media

  • Sweigart, Al. 2019. Automate the Boring Stuff with Python. 2nd ed. San Francisco, CA: No Starch Press

  • Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: O’Reilly Media.

Additional online resources:

Assessment

  • 5 problem sets (50%)
    • Bi-weekly programming assigments
    • Due at 11:00 on Monday of weeks 3,5,7,10 and 12 on Blackboard
  • Research project (50%)
    • Final Python/R project demonstrating familiarity with programming concepts and ability to communicate results
    • Due at 11:00 on Monday, 20 December 2021

Assessment criteria

  1. ✔️ Code exists
  2. ⌚ Code runs and does what it has to do
  3. 📜 Code is legible (meaningful naming, comments)
  4. ⚙️ Code is modular (no redundacies, use of abstractions)
  5. 🏎️ Code is optimized (no needless loops, runs fast)

Marks at Trinity: https://www.tcd.ie/academicregistry/exams/student-guide/

Plagiarism

  • Plagiarising computer code is as serious as plagiarising text (see Google LLC v. Oracle America, Inc.)
  • All submitted programming assignments and final project should be done individually
  • You may discuss general approaches to solutions with your peers
  • But do not share or view each others code
  • You can use online resources but give credit in the comments