The Lede Program offers an intensive 10-week program in data and computation. During the program, students will complete coursework from the four following segments:

Foundations of Computing

During this introduction to the ins and outs of the Python programming language, students build a foundation upon which their later, more coding-intensive sections will depend. Dirty, real-world data sets will be cleaned, parsed and processed while recreating modern journalistic projects. The course will also touch upon basic visualization and mapping, and how to use public resources such as Google and StackOverflow to build self-reliance.

Focus: Familiarize yourself with the data-driven landscape
Topics & tools include: Python, basic statistical analysis, OpenRefine, Carto, pandas, HTML, CSVs, APIs, csvkit, git/GitHub, cron, StackOverflow, data cleaning, command line tools, and more

Data and Databases

Students will become familiar with a variety of data formats and methods for storing, accessing and processing information. Topics covered include comma-separated documents, interaction with web site APIs and JSON, raw-text document dumps, regular expressions, text mining, SQL databases, and more. Students will also tackle less accessible data by building web scrapers and converting difficult-to-use PDFs into useable information.

Focus: Finding and working with data
Topics & tools include: SQL, APIs, CSVs, regular expressions, text mining, PDF processing, pandas, Python, HTML, BeautifulSoup, Jupyter/IPython Notebooks, and more


Machine learning and data science are integral to processing and understanding large data sets. Whether you’re clustering schools or crime data, analyzing relationships between people or businesses, or searching for a needle in a haystack of documents, algorithms can help. Through supervised and unsupervised learning, students will generate leads, create insights, and figure out how to best focus their efforts with large data sets. A critical eye toward applications of algorithms will also be developed, uncovering the pitfalls and biases to look for in your own and others’ work.

Focus: Analyzing your data
Topics & tools include: linear regression, clustering, text mining, natural language processing, decision trees, machine learning, scikit-learn, Python, and more

Data Analysis Studio

In this project-driven course, students refine their creative workflow on personal work, from obtaining and cleaning data to final presentation. Data is explored not only as the basis for visualization, but also as a lead-generating foundation, requiring further investigative or research-oriented work. Regular critiques from instructors and visiting professionals are a critical piece of the course.

Focus: Applying your skillset
Topics & tools include: pandas, matplotlib, Adobe Illustrator, web scraping, mapping, Carto, GIS/QGIS, data cleaning, documentation, and more