Mapping home-delivery postal service in Switzerland

Fanny Giroud In Switzerland, La Poste has a legal obligation to make sure that everybody has access to a close-by post office. In some remote areas though, La Poste has invented a system to provide postal services directly at home through the mailman, who will ring the bell at your door if you have place a little sign on your mailbox. The number of villages where this home-delivery service is in place is now a third of all postal access points in Switzerland. While La Poste is saving money by shutting down post offices and replacing them with home-delivery systems, people who aren’t staying at home all day long (like unemployed or retired ones) can’t easily send a package or pay a bill. La Poste refused to give me the complete list of these villages so I decided to scrape all the pointon the map that they provide (which is not the best for visualization). As of August 26th, 2015, it looks like La Poste is migrating its websites to a new platform, and the map that we are interested in is still up, but doesn’t seem to appear on the new website (see notebook for links). In the last weeks, it also seemed like the map was undergoing lost of maintenance work: the query url parameters have been modified; the number of villages has increased by two; one error point with coordinates in Somalia was removed; and layers of security certificates have been added. These changes have slowed me down significantly as I believed that the code was creating mistakes, but I added code for every possible mistake...
Exploring Global Terrorism

Exploring Global Terrorism

Aliza Goldberg RAND Corporation’s database of all terrorist attacks from 1968-2009 reveals that the most common weapons used by terrorists was explosives and the most terrorist attacks occurred in 2006. The most fatalities from a single attack happened on 9/11, but the most overall deaths from terrorist attacks happened in Iraq. The number of attacks over the course of 41 years was truly staggering, as seen from map visualizations of the database. RAND Corporation, a well-known American think tank focused on international military affairs, began this database in 1980 after forming the Cabinet Committee to Combat Terrorism in 1972. Rand uses the common academic definition of terrorism, quoting terrorist scholar Bruce Hoffman to specify that terrorism is: 1. violent 2. meant to create fear 3. intended to coerce counteraction 4. politically motivated 5. against civilians 6. by either a group or an individual Since terrorism can be difficult to classify, the data may be skewed. Other terrorist attacks may have gone unreported. Borders and names of countries have changed over the last 41 years, which may have led to some analysis errors. The database ends in 2009, so terrorist attacks since then, such as the rise of ISIL, are not included. I cleaned the database to turn the dates into recognizable times and to eliminate the “unknown” and “other” perpetrators only for analysis of terrorist groups. With bar graphs, I the “year of terror” (2006), the weapons used, how fatal those weapons were and the top deadliest countries. Using a Mapquest API key, I geocoded all of the terrorist attacks by city or country. I used a pivot table...
Scrapers Used on Github

Scrapers Used on Github

Sebastian Muñoz-Najar Galvez (See code for this project here) The objective of this project is to create a database of all available Github repositories explicitly devoted to the development or implementation of web scrapers in order to (1) identify the languages used for scraping and (2) the themes and websites frequently scraped. Scrapers are a genre of code used to collect, aggregate and organize information from a website. Scrapers capitalize on regular patterns of site layout and other principles of progressive enhancement design to automate requests and aggregate information available piecemeal on a site. An alternative to scrapers is interaction with an API, where available. The web is not an archive through and through; some regions resist archival work (See ‘Swiss Scraper’ below), therefore it becomes relevant to identify the regions of the web that have been scraped, and how researchers went about doing so. Working with Github’s API GitHub’s API is a very thorough archive of repositories, users and code. Authorized applications can make 30 requests per minute and a search of GitHub’s repositories returns a  json document with a list of up to 1000 elements. However, the total amount of results from any given query may be over 1000. Therefore, for queries with 1k+ results it is necessary to make several ordered requests. I used the date of creation to segment my query of scraping repositories. This process involved a great deal of trial and error since I didn’t know how many scrapers were build for any particular interval of time. The key words for every request were ‘scrape OR scraper OR scraping’. The API looked for...
Airbnb Data

Airbnb Data

Adam Stoddard The Airbnb marketplace is very diverse: apartments and other housing could consist of anything from a bedbug ridden couch to a glamorous full floor penthouse. How do we quantify this database? Some of the most interesting factors to look at are text-based. Airbnb includes text descriptions of the apartments and ‘about’ the host. The descriptions of apartments, using both cosine similarity and topic modeling, are about what you would expect: descriptions consist of words people use to describe housing: beds, baths, location, access, nearby restaurants, subways, bars, etc. But the topic modeling on host descriptions can be enlightening, allowing us to see how people think of themselves. Some hosts group themselves into categories, which could involve being a “professional” who “enjoys” “traveling”, or an “artist” in “Brooklyn” who spends time with “girlfriends.” Host topic modeling using gemsim: 0 place family friends much girlfriends school good entrepreneur ive give 1 really de going also huge always living et vous things 2 living brooklyn ny manhattan people two well park best walk 3 things people time reading make moved meeting year amazing see 4 month great place also architect kyle couple married home ny 5 live favorite years enjoy garden travel home living like life 6 travel professional easy going organized time make clean park slope 7 great work host travel neighborhood see manhattan good currently trip 8 stayed living writer good dogs editor comfortable shows owner magazine 9 great restaurants yorker home space please event traveling way easy What does the marketplace look like? The following histogram shows the number of bedrooms: One bedrooms clearly dominate, with far more units than either studios or two...

Political Donors in Norway

Gunn Kari Hegvik As data on private political donors in Norway are defective, not collected and sorted, I want to build a database of all donors back until 2011. I want to finish this, before the national election in 2017. My first step is to scrape the donordata for the Conservative Party, the party that currently has the PM. I chose them first; because they have the most donors, and second; because their donors tend to be wealthy people and companies in shipping, real-estate and investment banking. Inspired by the NYTimes-story “Small pool of rich donors Dominates Election giving”, I wanted to find out how many families dominates election giving to the Conservative party. I also wanted to find out who the most faithful donors are, which donors stopped donating, and who the newcomers are. Compared to the US, where donordata are made public on given dates, the Norwegian political parties have to make donations public no more than four weeks after the donation was made. So my second part of the project was to build a newsbot, using Mandrill, that would email me, if any changes were made to the Conservative Partys 2015 donors website. To build the bot, I was inspired by quakebot, the LATimes newsbot that we worked on for the first part of summer. I set the bot to print out a sentence, that can be published as part of a story, if one or several new donors are made public. The newsbot will scrape the 2015-site every five minute, and runs from an ec2 server. As there is an election for local municipalities coming...
Civic Engagement Measures

Civic Engagement Measures

Rashida Kamal Purpose & Goals While there are challenges to measuring the resilience of a particular community, academics, NGOs and government agencies have identified several critical indicators of resilience. In their paper, “Measuring Capacities for Community Resilience,” Sherrieb, Kathleen et. al., have proposed four such indicators: social capital, economic development, communication, and community competency. For this project, I was particularly interested in social capital, defined as both formal and informal networks of social support. I looked at volunteer rates across the United States and a few other questions around civic life from the Current Population Survey. Specifically, I was curious to see if these items were affected by different demographic distributions from community to community. Given that several major metropolitan areas in the U.S. have experienced or are currently experiencing gentrification, it would be interesting to see how civic life changes as a community changes. The Dataset & Methodology The Current Population Survey is conducted by the U.S. Census Bureau and Bureau of Labor Statistics. For the most part, the survey is concerted with data around employment, but in September and November of each year, a Volunteer Supplement and Civic Engagement Supplement is conducted in addition to the main survey. Unfortunately, while data from the Volunteer Supplement is available for 2014, 2013, 2012, 2011, and 2010, only the 2013, 2011, and 2010 Civic Engagement data is readily available on the Current Population Survey FTP. Each year’s data for each survey is includes over 100,000 individuals. The data was in a .dat fixed-width file, made intelligible by the accompany documentation. Each type of response for each question was assigned a...