Tools for Data and Analysis

While it’d be nice if your data stories were placed on your lap, ready to go, it’s certainly a rare occasion. Typically you’re up to your elbows in data, coding (or cleaning) up a storm.


pandas is a high-performance data analysis tool for Python.

IPython Notebook

IPython Notebooks are an interactive programming environment that encourage documentation, transparency, and reproducibility of work. When you’re done with your analysis, you’ll be able to put your work up for everyone to see (and check).


Natural Language Toolkit is a Python library built to process large amounts of text. Whether you’re analyzing Congressional bills, Twitter outrages or Shakespearean plays, NLTK has you covered.


OpenRefine (previously Google Refine) is downloadable software that helps you sort and sift dirty data, cleaning it to the point where you can start your actual analysis.


scikit-learn is a Python package for machine learning and data analysis. It’s the Swiss Army knife of data science: it covers classification, regression, clustering, dimensionality reduction, and so much more.