Contact Us

Use the form on the right to contact us.

You can edit the text in this area, and change where the contact form on the right submits to, by entering edit mode using the modes on the bottom right. 


123 Street Avenue, City Town, 99999

(123) 555-6789


You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.

Data Blog

Data Blog

Sentiment Analysis of Chicago Gang Tweets (Methods)

Clay Gibson

Understanding gang communication through Twitter data is still in its infancy, and thus developing new methods for analysis is common. In an ideal world, researchers could search a database of Tweets for those that contained a certain reference word (for example, “liljojo”) during a specified time period (say, a few weeks before and a few weeks after Lil JoJo got shot). Using this methodology would allow us to understand the escalation and impact of certain conflicts between gangs. However, at the current time, doing so with the Twitter APIs is not possible. 

Twitter offers developers two ways to access their data: the search API and the streaming API. The search API allows you to perform a search for certain keywords, usernames, or locations on a limited subset of past Tweets. The streaming API lets you collect Tweets roughly in real-time. You can later subset the stream based on keywords, usernames, or locations. Both APIs have their advantages and disadvantages. The search API gives you access the Twitter history of a specific user – for example, if you specifically wanted to look at Lil JoJo’s Tweets around his death, you could search his Twitter handle, @OsoArrogantJoJo, and subset the data based on time. Patton (2015) takes advantage of this functionality. He first creates a list of known gang members in Detroit, searches the names of the gang members on Twitter to find profiles, and then gathers Tweets from those profiles using Twitter’s search API. 

Unfortunately I did not have access to a list of gang members, so I instead planned to find gang Tweets based on keywords. The search API is surprisingly less useful for this methodology. One can use the search API to look for Tweets containing a substring, however the API only returns up to 100 Tweets per search, and limits the search to Tweets that have happened in the last seven days.[1] I used the streaming API to collect a larger sample of related Tweets.[2] The downside of the streaming API is simply that there is no access to past data, only Tweets from when the listener was created, onwards.

I searched the following keywords: otf, gdk, bdk, ebt, oblock, 3hunna, and JoJogang. Each has a meaning within Chicago gang culture. The following chart summarizes the terms, their meanings, and gang affiliations:

Using the Tweepy module for the Python programming language, I set up a Twitter listener that gathered real-time Tweets that contained any of the above keywords. The listener only recorded data when the computer running the code was connected to the internet, and thus the data is not temporally continuous. The data was recorded over a period of four weeks. A total of 16,900 Tweets were recorded. During the analysis, I noticed that “ebt” is frequently used in a non-gang context to mean “Electronic Bank Transfer.” Most Tweets containing “ebt” were related to “ebt cards.” As a result, Tweets with the keyword “ebt” were excluded from the analysis. Of the original 16,900, only 3,224 were unique Tweets (ie. not re-Tweets). Of the 3,224 unique Tweets, 277 had an associated geo-tag. The latitude and longitude of those Tweets were placed over a map of the United States to offer a temporal understanding of the terms.

For the next part of the analysis, a series of regular expressions were used to strip the 3,224 unique Tweets of unnecessary syntax (for example, hashtags, links, and punctuation). The resulting text of the Tweets was passed to a simple voter algorithm defined in the “sentiment” package for the R programming language. The package contains a dataset of words assigned to six emotion categories: anger, joy, sadness, disgust, fear, and surprise. Based on the frequency of the words in the text, the algorithm guesses the most probable emotion of the entire passage. The text from all 3224 unique Tweets was combined, and the frequency count of words used was calculated. The “wordcloud” package was used to display the words that appear most in the Tweets.

Lastly, I created a list of gun-related words (glock, gat, piece, nine, forty, heater, nina, heat, gauge, rosco, rod, strap, bullet, cap, AK, gun, firearm, weapon, mac, pistol, rifle, pop, mm, caliber, milla). I looked at the proportion of Tweets that contained these words, as well as a mourning-related word (“RIP”) on each day of data collection. The proportions were graphed over time to understand how gang members might be using Twitter to discuss the murder of Lil Durk’s manager, Chino. 

[1] See for more information. 

[2] It is important to note that even with the streaming API, developers do not have access to the full fire hose of Tweets. The documentation states that the API returns around 1% of the total flow of Tweets. For research purposes, we would hope that the 1% is a randomly selected subset of all of the data; however, it is unclear whether that is the case.