Weibo Text Mining

The objective of our project is to evaluate and predict the public attitudes on a specific social issue through an online social platform called Weibo (a Chinese Twitter). See code for this project here Topic Man’s brutal beating of female driver divides Chinese public after different car videos emerge. The different public opinion on this topic: – The woman deserved it – The man lost his mind Data 7,000 tweets from May 03 to June 03, including usernames, ids, publish date and time, counts of reposts, counts of like, content, and etc. Data Collection – Access to API of Weibo To apply natural language processing techniques on weibo content analysis, we tried to use API of Weibo, and later to do the web scraping try to get the content people posted on this topic.  But we failed to get the dataset because they provide very little data. – Then we found a dataset already made by a person and posted online, in contains over 7000 tweets on this topic. – We use TFIDF to extract the key words in Chinese from over 7000 tweets on this topic Method -Supervised Learning Randomly select 1/10 tweet from the database and analyze the attitude of the content. 1: The woman deserved it; -1: The man lost his mind Read the tweets, decide the attitude of the content, and skip the ones with murky attitude. (Eg: “I think both A and B were wrong, I can’t decide who is at more fault.”) Processing Data -clean data we need to get rid of the reposted content and also pay attention to the punctuation in special...
NYC Taxi Complaint Data

NYC Taxi Complaint Data

Elliot Ramos Refusing to see the truth at refusals In October of 2013 and January of 2014, I obtained a series of files from TLC, 311 and DoITT. The agencies collaborated to provide an extensive set of data that included fields key fields not found on the open data portal for New York City. Specifically the “descriptor” fields, which includes TLC’s categorization for taxi complaints as well as the verbatim narrative field, which is filled out via 311 dispatcher or view form submission online by residents. The data set is extensive and required months of manual work at some points. For the purposes of the class project, I’m focusing on the analysis and slicing of the data using pandas. Here is the data provided by the city of New York, at first, they had provided two files split up into complaints with summons and complaints without summons. Specific locations were not provided, but Service Request numbers were. The data goes back to January 2010 by incident date, however a handful of earlier records were included in this set and were excluded from the overall analysis as noise. Excel files: Using excel, those files were stacked atop of each other, originally given flags to TRUE if the records resulted in a summons. Subsequent requests were made to provide additional data that had service requests numbers and the Open Data portal Unique IDs, this allowed for a merging of data using CSV kit and data with the open data site that included location data such as x, y coordinates. Open Data taxi complaint set: https://data.cityofnewyork.us/Social-Services/311-Taxi-Complaints/uppf-z66u Subsequent requests were made to fill...
Apartment Hunting in Reverse

Apartment Hunting in Reverse

Spe Chen We all know apartment hunting is hard in New York City. This is especially true for international students like me. I came to New York this May and needed to find a place settling down in a week before my course started. (That was the most miserable days. The only thing I knew about New York was this was the home of best bagels on the earth.) Even I googled as much as I could, saw apartments in person and met landlords, it was never possible to get the whole picture of that neighborhood before I lived in. Surely landlords and agents want to rent out their apartment as soon as possible, it is unlikely they will tell you the dark side of the area. So should people accept this information inequality as it is? Is there any way to know the drawbacks of that neighborhood before you sign the contract and pay the first month rent? First, let’s change the mindset of typical apartment hunting. Most of time we want to find a place with some nice features: convenience, safety and large windows etc., maybe because it is normally how agents promote their objects. However, as smart apartment hunters, we can not be confined by this thinking. So in this project I propose a new way of apartment hunting – not finding a best place, but finding a less worse neighborhood, that is, the area with fewest type of 311 complaints that you care about. Here is the map of sampled complaints from NYC’s 311 service portal since 2010. I categorized those 200+ complaints into four, which...