Finishing the Course

After 10 busy weeks, our project is complete! We’re proud to have worked with the Food Bank to better understand this important policy issue.
Let’s recap what we’ve accomplished and why.

The Atlanta Community Food Bank (ACFB) aspires to eliminate hunger in its service area by 2025, and to help achieve this goal, the food bank is raising awareness about SNAP among its clients and donors. SNAP is a federal program that helps low income families purchase food. The food bank asked us to gauge public opinion on SNAP and to determine what sort of arguments were being made for and against SNAP. They were also interested in learning about Georgia politicians’ opinions on SNAP. To analyze public opinion, we examined twitter data and news articles. To track politicians opinions, we created a tool that allows the Food Bank to see local politicians voting records on bills related to food insecurity. We analyzed the sentiment of the tweets and news articles and visualized our results to show how sentiment changed in response to current events. We also used the data to create a map that showed our sentiment varies across different news outlets. We displayed our results in an R Shiny web app that is accessible to the food bank and the public.

 

Sentiment Analysis
Sentiment analysis is a form of text analysis that determines the subjectivity, polarity (positive or negative) and polarity strength (weakly positive, mildly positive, strongly positive, etc.) of a text . In other words, sentiment analysis tries to gauge the tone of the writer. To conduct our sentiment analysis, we scraped news articles and tweets that contained key words such as “SNAP”, “food stamps”, and “EBT”.

The Vader and AFINN packages in Python were used to conduct unsupervised sentiment analysis. Vader is short for Valence Aware Dictionary Sentiment Reasoner,and is a lexicon and rule-based sentiment analysis tool. AFINN is a dictionary of words that rates connotation severity from -5 to 5. The actual sentiment score was given as the sum of the word score within a sentence. The Vader tool gauges the overall syntactical sentiment more so than the word usage. Conversely, AFINN gauges the type of words that are being used and their intensity. Additionally, sentences with key words (words relating to SNAP) were given a higher weight so that sentiment towards this issue would be amplified.Each article was tokenized to the sentence level, and each sentence was given a sentiment score according to the two sentiment analysis tools. Then, the scores were aggregated for each article with the weight that was assigned to each sentence. This aggregated score represents the sentiment of the article. To take into account of impact of the article, each article was then aggregated in regards to the traffic level of the website and the reading level of the article. This process is visualized below.

Sentiment Analysis Process

Additionally, information on the arguments and topics in these articles would be very useful to the ACFB. To do this, preliminary topic modeling (Latent Dirichlet Allocation) has been performed to extract the topical words from the set of text. It returns a set of words with probabilistic weight on each of the word to indicate its importance. Bigram collocation has been used to detect sets of two words that are most frequent and meaningful. Term frequency inverse document frequency (TFIDF) was used to detect important words across all the documents. Name Entity Recognition (NER) from the Stanford Natural Language Processing Group and gensim were used to detected key people or locations mentioned in the articles. After generating all the statistics, each word within TFIDF, bigram collocation and NER was multiplied with the weight that was computed with each of the documents. Then, all the words were aggregated into a list. Using this list, a word cloud can be generated to visualize meaningful words. Word clouds are especially of interest to our partners at the food bank. Along with the word cloud, its aggregation by each date will help the viewer understand the subject of the sentiment to better decipher the public opinion about SNAP.

Sentiment Visualization

 

Spatial Analysis
The AFINN and Vader scores were linked to the geocoded new outlets. Using ArcMap 10.4, spatial analysis was conducted on the outlets to determine whether there was any clustering of articles that had positive or negative sentiment about SNAP. In order to do this, a hexagon grid was created over the extent of a U.S. shapefile and a spatial join was conducted in order to join the number of news outlets to the hexagon polygons. After the spatial join, hot spot analysis was done by calculating the Getis-Ord Gi* statistic. The Getis-Ord Gi* statistic determines where there is clustering of cold spots and hot spots though looking at the location of features in relation to neighboring features
The outputs of the Getis-Ord Gi* statistic are z-scores and areas that have statistically significant high z-scores are hot spots while areas that have statistically significant low z-scores are areas that are cold spots. Significance is determined based on looking at the proportion of the local sum of features and its neighbors to all the features. If the difference between the calculated sum and the expected sum is very large, then the z-score is statistically significant . In the context of this research, hot spots are areas in which the articles have a positive sentiment on SNAP and cold spots are areas in which the articles have a negative sentiment on SNAP.

 

SNAP InfoMap

 

Politician Tracking Tool
The voting records of Georgia state representatives were collected through Open States, a site that collects data on state representatives. Bills were selected if they contained the phrases “food stamps”, “SNAP”, “food bank”, “food desert,” “hunger,” “food insecurity,” or “georgia peach card”. Bills with no votes were removed, and votes by representatives no longer in office were removed.

On the web app, user can select what chamber of the Georgia General Assembly they want (House or Senate). Then, they can choose a politician to learn about. The web app will then display the legislator’s voting record on bills relating to food stamps, and will link the user to further resources such as the text of the bills and a link to the legislator’s site.

Politician Tracking Tool

 

The food bank is planning on using our tools to inform their interaction with media outlets, to prepare for meetings with politicians, and to adjust their social media and outreach messaging.
We are proud to have been able to work alongside the food bank to create this web app. Frequent feedback and discussions with the Atlanta Community Food Bank helped us to shape our project to suit their needs.

 

Thank you!
We thank our mentor, Carl DiSalvo, Associate Professor and Coordinator for the MS in Human-Computer Interaction at Georgia Tech for his guidance and advice. We also thank our Food Bank partners Lauren Waits, Director of Government Affairs; Allison Young, Marketing Manager; and Jocelyn Leitch, Data and Insights Analyst; for educating us about food policy, food insecurity in Atlanta and across the nation, and the work of the Atlanta Community Food Bank. Finally, we would like to thank the staff and students participating in the Data Science for Social Good – Atlanta program for their support and assistance.

Nearing the finish line

This past week we’ve been working on creating visualizations of the data we’ve collected and starting to prepare it to be put on the R Shiny Snap app.

 

Below is an analysis we created for understanding SNAP sentiment over time.  The y axis shows the Vader score, and the x axis shows the date. When you hover over one of the bars, you will be able to see the most frequent words from that time frame. This is important because positive sentiment can be due to many things – for example, people may be speaking positively about budget cuts to SNAP, or they be speaking positively about SNAP itself. Showing the most frequent words will help to tease out the meaning.

Additionally, we continued to work further on our map of news outlets. We included information about sentiment and then created a hotspot map of sentiment. In the map below, blue is a cold spot (negative about SNAP), and red is a hot spot (positive about SNAP). We are also planning on adding the top words to start extracting the meaning of this sentiment.

Finally, our politician tracking tool is coming along nicely. The data has been cleaned and is being displayed in RShiny. Below is a screenshot of the application: you are able to choose if are researching a senator or house of representatives member, and then you will choose the specific representative. Going forward we will include more detail on the bills and the representatives. 

Food Bank: initial sentiment and network analyses

We’ve made a lot of progress since last week. Most of our work has been devoted to sentiment analysis and network analysis.

Sentiment Analysis
Initially, to examine themes in SNAP/food stamp coverage, we scraped articles from the past month that included the words “food stamp” or “food stamps” in the title and calculated how often each stemmed word appeared. To initially visualize this information, we made word clouds. The word cloud below shows the most common words in conservative articles about food stamps:

In order to further analyze the content of the news articles and social media posts that we’ve scraped, we’re doing sentiment analysis on the text. To do this, we examined various metrics about these texts such as the complexity of the words, the reading level, the punctuation, and whether the sentences in the articles are positive or negative.

To quantify the tone of the articles, we used Vader, a sentiment analysis metric from the Natural Language Toolkit, as well as AFINN, another sentiment metric. For Vader, sentences can range from -1 (negative) to 1 (positive). For AFINN, sentences can range from -5 to 5. For both metrics, a score above 0 indicates the the sentence has a positive sentiment. We placed each article in a category (eg, Economy, Opinion, News, etc) and found the average. The graph below shows the average total AFINN score vs. the average total vader score. The size of the bubble reflects the number of articles.

Interestingly, the Vader score suggests that all article categories had a positive sentiment (all > 0), while the AFINN score suggests that only the local, international, and opinion categories were positive, on average.

Georgia Representatives
When we talked with the food bank last week, they expressed interest in an analysis of of how Georgia politicians speak about SNAP on twitter. Our research suggest that Georgia politicians do not speak on the issue frequently enough for us to have sufficient data to analyze. Instead, we are considering creating a visualization that tracks how representatives have voted on legislation regarding SNAP. We are doing this using the Open States API, which has data on bills, legislators, and events in state governments, and ProPublica Congress API, which has national data.

We meet with the Atlanta Community Food Bank again tomorrow, and will consult with them to better understand how they currently follow food policy how we could use these APIs to analyze and present this information for them.

Network Analysis

Another strategy we are using to analyze news articles is by doing a  Term Frequency- Inverse Document Frequency network analysis in gephi on Washington Post articles about food stamps. In the above graph, the bigger, darker circles  are more connected. Words are connected if they appear in the same sentence. We were unsure why Perdue and Southerland were so connected. After researching these names, we learned that Steve Southerland is a Florida congressman who wants to impose work requirements on those who get SNAP, and Sunny Perdue is the Secretary of Agriculture.

 

Next Steps

Going forward, we hope to do a more granular sentiment analysis that can help us to extract arguments from our text. We also need to clean data that we’ve scraped from facebook, and we are also starting to learn about how Google Trends can be a useful tool to us going forward.