Databases and Data Science Practicum

Department of Applied Statistics, Social Science, and Humanities
APSTA-GE 2017

Spring 2026
2 units

Course description

This course provides a hands-on introduction to extracting, transforming, and visualizing data using real-world datasets. Students learn to query databases and join datasets using SQL, and learn to summarize, visualize, and map data in R using the tidyverse. Students also gain experience with git, enhancing their ability to work in modern collaborative environments. Alongside these modules, an ongoing emphasis of the course is to practice how to be a curious, skeptical, and articulate data scientist.

Student learning outcomes

Upon completion of the course, students will be able to:

  1. Perform file management functions on the command line
  2. Use git to pull code from and push code to GitHub
  3. Conduct SQL queries on a database server, including joins
  4. Transform and summarize data in the tidyverse in R
  5. Use ggplot to make data visualizations

Prerequisites

APSTA-GE 2352 (Practicum in Applied Statistics: Statistical Computing) or equivalent.

Course schedule

Introduction

  • Week 1:
    • Tue., Jan. 20: Course syllabus + motivation
    • Fri., Jan. 23: Assignment 0 due at 11:59pm ET

Command line interfaces

  • Week 2:
    • Tue., Jan. 27: Introduction + syntax
  • Week 3:
    • Tue., Feb. 3: Git and GitHub
    • Fri., Feb. 6: Assignment 1 due at 11:59pm ET

Structured Query Language (SQL)

  • Week 4:
    • Tue., Feb. 10: CLI wrapup / SQL introduction
    • (Optional) NYU Libraries’ “Love Data Week 2026” events, including:
      • Mon., Feb. 9, 1–3pm: NYC Open Data, Scavenger Hunt, and Intro to Python and APIs
      • Wed., Feb. 11, 12–1:30pm: Careers in Data Alumni Panel
  • Week off for President’s Day
  • Week 5:
    • Tue., Feb. 24: Syntax + summaries
  • Week 6:
    • Tue., Mar. 3: Joins
    • Fri., Mar. 6: Assignment 2 due at 11:59pm ET

Tidyverse

  • Week 7:
    • Tue., Mar. 10: Introduction + syntax
  • Week off for Spring Break
  • Week 8:
    • Tue., Mar. 24: Messy data
  • Week 9:
    • Tue., Mar. 31: Pivoting + summarization
  • Week 10:
    • Tue., Apr. 7: Functions + programming
    • Fri., Apr. 10: Assignment 3 due at 11:59pm ET

Visualization

  • Week 11:
    • Tue., Apr. 14: Introduction + grammar of graphics
  • Week 12:
    • Tue., Apr. 21: Best practices with visualization
  • Week 13:
    • Tue., Apr. 28: Geospatial mapping
    • Wed., Apr. 29: Assignment 4 due at 11:59pm ET

Final project

  • Week 14:
    • Tue., May 5: No class: Office hours for final projects
  • Week 15:
    • Sun., May 10: Final project due at 11:59pm ET