Tools for Data and Analysis

While it’d be nice if your data stories were placed on your lap, ready to go, it’s certainly a rare occasion. Typically you’re up to your elbows in data, coding (or cleaning) up a storm.

pandas

pandas is a high-performance data analysis tool for Python.

IPython Notebook

IPython Notebooks are an interactive programming environment that encourage documentation, transparency, and reproducibility of work. When you’re done with your analysis, you’ll be able to put your work up for everyone to see (and check).

NLTK

Natural Language Toolkit is a Python library built to process large amounts of text. Whether you’re analyzing Congressional bills, Twitter outrages or Shakespearean plays, NLTK has you covered.

OpenRefine

OpenRefine (previously Google Refine) is downloadable software that helps you sort and sift dirty data, cleaning it to the point where you can start your actual analysis.

scikit-learn

scikit-learn is a Python package for machine learning and data analysis. It’s the Swiss Army knife of data science: it covers classification, regression, clustering, dimensionality reduction, and so much more.