1) Cab refusals and cab complaints in general have declined from 2010 to 2013 (and appear to be trending that way)

This could be indicative of a lot of transit issues at play. Usually weather and time of day are the biggest factors with cab refusals, (try hailing a cab when it starts to rain). But since the data was collected, New York City has implemented CitiBike, a bike-sharing program, the green boro cabs, that serve areas above the Manhattan central business district and the outter boros (although recent data suggest they’re heavily serving gentrifying areas such as Astoria, Harlem, Park Slope and Williamsburg. Ride services such as Lyft and Uber have become popular in use and have been a point of consternation for the taxi industry. The decline in complaints may reflect a declining reliance on the taxi infrastructure.

2) Cab refusals, like all cab rides in general are heavily concentrated in Manhattan

This is to be expected given that the sheer amount of cab rides will assure it will have an equally high number of complaints per ride without complaints.

3) Complaints peak during 4pm on weekdays and weekends during shift change and again on weekend evenings as cabs become scarcer

The sample size of complaints per given rides is extraordinally small. Any given day can have tens of thousands of cab rides in the city. With this data set, we can see there were 72,506 complaints total from 2010 to 2014. Of that, the greatest category of complaints was for refused rides, totalling at 16,136 for the 4-year period. And for the 2013 data set, they make up only 3,393 complaints for that year.

4) Brooklynites like to complain. A LOT.

It’s important to remember that the people that use cabs are those with some amount of disposable income, not exclusively, but usually. Of the 2013 complaints, which were classified by intended destination (if reasonably discernable!), Refusals intended for Brooklyn were still the greatest amount of complainers:

NO DEST-2069








The kmeans cateogrization clusters reveal a funny amount of Brooklyn references as well.

Also, when mapped out, the point of origin for a lot of the complaints are still in Brooklyn. This is indicative of issues with inter-boro transit, which may have been addressed with Green Cabs and services such as Uber.

5 Location data is flawed.

Unlike the GPS-based data of the taxi rides data set, a good chunk of this data has transposed locations. The above map shows complaints located in Brooklyn about trying to GET to Brooklyn.

6) Things I’m sad I didn’t have time to do…

  1. Download historic weather data from the weather.io api, and create a dictionary of days with precipitation and compare it to days with complaints to determine the correlation of complaints to rain and snow days.
  2. Get the most recent data for 2014 and 2015, bind them to areas such as census tracts, then track rate of decline per tract, then take the pubilc data sets for 2013, 2014, 2015 rides, and see if the overall number of cab rides has remained the same for those tracts, compare it with green cab and uber data (which is also recently available for 2015! It would offer a view of whether amid the Uber debate, if people are opting to use other ride options, translating to fewer complaints. Heck, even the Citibike rides are tracked, too!
  3. Create a Twitter Bot that if users tweet at it with a medallion number, it replies back with complaints for that medallion.