Meghan Bongartz
Code available at: https://github.com/mbongartz/final-project
The conversation about disease in the United States tends to revolve only around those diseases which pose a current threat or problem. This means that we spend far more time on average talking about measles than Ebola – but it makes it far more terrifying when Ebola is being talked about because it means that it is suddenly posing a threat and we do not have the infrastructure to deal with an outbreak. There are some diseases that we don’t currently consider threats in the United States for which it would be difficult to predict when they could become problems due to the way they are spread. However, there are other diseases that may spread or move with climate change, and we should be able to plan for these.
My goal was to investigate the risk for spread of tropical diseases in the United States as climate changes over time. There are a plethora of diseases that could be impacted by climate change for various reasons, but I narrowed my area of interest down to vector-borne diseases and, more specifically, mosquito-borne diseases because they show a stronger climate preference than some other vectors such as ticks. I looked at two different vectors: Tiger Mosquitos and Southern House Mosquitos.
Before addressing the vectors, though, I needed data on the rate at which climate is changing in the United States. This was available from the National Oceanic and Atmospheric Administration here: http://www.ncdc.noaa.gov/cag/time-series/us. NOAA has information about temperature and precipitation since 1895 that can be downloaded in a nicely formatted CSV file; however, the type of information, time scale, and state or region must be selected manually. Initially, I downloaded the annual mean temperature and precipitation for the months of January and July and for the full year on a regional basis because this was manageable to do manually. Upon exploring the data sets, though, I discovered that some regions had very strong correlations between time and temperature change or precipitation change, and others had virtually no correlation. While this was not unexpected, it did lead me to the decision that I should look at the data for individual states for more accuracy. In the future, I would even be interested in looking at smaller areas within the states.
In order to get around the manual download form and the authentication that went with it, I wrote a scraper to pull the CSVs I was interested in. The permalinks for the CSVs took the form “http://www.ncdc.noaa.gov/cag/time-series/us/” + fips_code + “/00/” + parameter + “/” + time_scale + “/” + month + “/1895-2015.csv?base_prd=true&firstbaseyear=1901&lastbaseyear=2000”, so I was tasked with establishing where each of the form selections fit into the url. I was initially under the impression that the site used FIPS codes as state identifiers, but this is not the case and resulted in a collection of wrongly labeled files. The states are actually just numbered in alphabetical order (excluding Alaska and Hawaii).
Once I had my climate data, I needed to find information about mosquitos. I found conflicting information on a number of websites, and one question that I needed to address was whether to use the ideal climate for my disease vectors or a tolerable climate. I settled on a combination of climate factors that would allow for the widest allowable climate window and therefore err on the side of predicting more states to fall into the range of risk for disease spreading mosquitos. Because the purpose of this project is to plan for potential outbreaks, it would be better to predict a mosquito supporting climate in a place that does not wind up having that climate than the reverse. The following websites were used for information about mosquitos:http://www.cabi.org/isc/datasheet/86848, http://www.cabi.org/isc/datasheet/94897,http://www.who.int/mediacentre/factsheets/fs387/en/,http://www.climatecentral.org/gallery/graphics/mosquito-season-getting-longer,http://invasivespeciesireland.com/news/predicting-the-spread-of-the-tiger-mosquito-in-europe/,http://digital.csic.es/handle/10261/60982. Lists of current mosquito locations were taken from CABI for comparison.
I used prediction models based on linear regression to calculate when each state would fall into the tolerated climate range for both Tiger Mosquitos and Southern House Mosquitos based on three predictors. For Tiger Mosquitos, these were warm month temperature range, cold month minimum, and minimum annual rain. For Southern House Mosquitos, there were temperature range over the whole year (important for larval development), warm month temperature, and minimum precipitation. I then created functions to calculate the year in which each state would fall into the range for each predictor.
The results of my analysis were not exactly what I expected. My model produced a long list of states that should be in the climate range for each type of mosquito this year, and only a couple of states with climate threshold dates in the future. In a way, this is unexciting because there is very little being predicted; however it’s also a reminder that we may be closer to a climate that is hospitable to tropical diseases than we think. The states at climate risk produced by the model are as follows:
Tiger mosquitos:
Mississippi 2015
Oklahoma 2015
Delaware 2015
Arkansas 2015
Louisiana 2015
Texas 2015
California 2015
Georgia 2015
Maryland 2042
Virginia 2015
Oregon 2088
South Carolina 2015
Florida 2015
Alabama 2015
North Carolina 2015
Tennessee 2015
House mosquitos:
Mississippi 2015
Oklahoma 2015
Delaware 2015
Illinois 2015
Arkansas 2015
Indiana 2015
Louisiana 2015
Texas 2015
Kansas 2015
Connecticut 2027
California 2015
West Virginia 2015
Georgia 2015
Pennsylvania 2096
Missouri 2015
New Jersey 2015
Maryland 2015
Virginia 2015
Massachusetts 2077
South Carolina 2015
Florida 2015
Kentucky 2015
Rhode Island 2015
Nebraska 2066
Ohio 2015
Alabama 2015
North Carolina 2015
Tennessee 2015
These states are at higher risk for diseases including dengue fever, yellow fever, chikungunya, St. Louis encephalitis, West Nile virus, lymphatic filariasis, and Japanese encephalitis.
This analysis is certainly far from perfect. It would be worth looking at areas smaller than states, as some states cover a very large area and may have differing climates within their borders (Texas comes to mind, but states like California, Illinois, and Indiana are also quite long). I would also like to do further research into the preferred climates of the two mosquito types in order to make the models more accurate. Some options might be to enter states that already host mosquitos into the model in order to train it or to use the climates of countries where diseases spread by these mosquitos are a major problem currently in order to model the preferred climates. Regardless, it should give us pause that these diseases are still considered “tropical” or “exotic” when parts of the United States could be at very real risk for them.
If you’re curious in learning more about the intersection of data, coding and visualization, check out the Lede Program – an intensive certification program at Columbia’s School of Journalism, in conjunction with the Department of Computer Science. Find out more on our mail page – applications are open soon!