Spe Chen
We all know apartment hunting is hard in New York City. This is especially true for international students like me. I came to New York this May and needed to find a place settling down in a week before my course started. (That was the most miserable days. The only thing I knew about New York was this was the home of best bagels on the earth.) Even I googled as much as I could, saw apartments in person and met landlords, it was never possible to get the whole picture of that neighborhood before I lived in.
Surely landlords and agents want to rent out their apartment as soon as possible, it is unlikely they will tell you the dark side of the area. So should people accept this information inequality as it is? Is there any way to know the drawbacks of that neighborhood before you sign the contract and pay the first month rent?
First, let’s change the mindset of typical apartment hunting. Most of time we want to find a place with some nice features: convenience, safety and large windows etc., maybe because it is normally how agents promote their objects. However, as smart apartment hunters, we can not be confined by this thinking. So in this project I propose a new way of apartment hunting – not finding a best place, but finding a less worse neighborhood, that is, the area with fewest type of 311 complaints that you care about.
Here is the map of sampled complaints from NYC’s 311 service portal since 2010. I categorized those 200+ complaints into four, which correspond to four different kind of people.
- Noise (red dots) – Quiet people
- Sanitation (blue dots) – Clean people
- Traffic (yellow dots) – Car owners
- Utilities (green dots) – Homebody
The challenge of this project is not mapping, for I get CartoDB, a tool of one-click happy mapping! (Reminder: If you are a student, don’t forget to get an educational account.) Instead, data processing is what the most difficult one for a programming beginner like me. I downloaded the huge CSV (6 GB) from NYC’s OpenDate portal and use command line to get a quick peek of what I had. Next, I use IPython notebook and Pandas libraries to clean, filter, cluster and classify types of complaints. The process is so much like tree pruning, which I had never done in my life. I cut off the those unwanted branches and transformed an ugly tree into a nicer desired shape. Here is the illustration of my analogy of data processing:
After those tedious works (and clueless hours of staring at massive and messy data for sure), I got a data set with columns containing geo information and it was ready for mapping. Because of the maximum storage of my CartoDB account, I randomly selected 10k rows and imported it to CartoDB. Bang! Here comes the map!
Final thoughts
- Don’t judge a book by its cover. Clean-looking data might contain lots of useless stuff.
- Try Test-Break-Fix-Apply Programming when wrangling with large data sets.
- Other than Google and StackOverflow, CartoDB is my new friend now.
If you’re curious in learning more about the intersection of data, coding and visualization, check out the Lede Program – an intensive certification program at Columbia’s School of Journalism, in conjunction with the Department of Computer Science. Find out more on our mail page – applications are open soon!