Contact Us

Use the form on the right to contact us.

You can edit the text in this area, and change where the contact form on the right submits to, by entering edit mode using the modes on the bottom right. 


123 Street Avenue, City Town, 99999

(123) 555-6789


You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.

Data Blog

Data Blog

Sentiment Analysis of Chicago Gang Tweets (Methods)

Clay Gibson

Understanding gang communication through Twitter data is still in its infancy, and thus developing new methods for analysis is common. In an ideal world, researchers could search a database of Tweets for those that contained a certain reference word (for example, “liljojo”) during a specified time period (say, a few weeks before and a few weeks after Lil JoJo got shot). Using this methodology would allow us to understand the escalation and impact of certain conflicts between gangs. However, at the current time, doing so with the Twitter APIs is not possible. 

Twitter offers developers two ways to access their data: the search API and the streaming API. The search API allows you to perform a search for certain keywords, usernames, or locations on a limited subset of past Tweets. The streaming API lets you collect Tweets roughly in real-time. You can later subset the stream based on keywords, usernames, or locations. Both APIs have their advantages and disadvantages. The search API gives you access the Twitter history of a specific user – for example, if you specifically wanted to look at Lil JoJo’s Tweets around his death, you could search his Twitter handle, @OsoArrogantJoJo, and subset the data based on time. Patton (2015) takes advantage of this functionality. He first creates a list of known gang members in Detroit, searches the names of the gang members on Twitter to find profiles, and then gathers Tweets from those profiles using Twitter’s search API. 

Unfortunately I did not have access to a list of gang members, so I instead planned to find gang Tweets based on keywords. The search API is surprisingly less useful for this methodology. One can use the search API to look for Tweets containing a substring, however the API only returns up to 100 Tweets per search, and limits the search to Tweets that have happened in the last seven days.[1] I used the streaming API to collect a larger sample of related Tweets.[2] The downside of the streaming API is simply that there is no access to past data, only Tweets from when the listener was created, onwards.

I searched the following keywords: otf, gdk, bdk, ebt, oblock, 3hunna, and JoJogang. Each has a meaning within Chicago gang culture. The following chart summarizes the terms, their meanings, and gang affiliations:

Using the Tweepy module for the Python programming language, I set up a Twitter listener that gathered real-time Tweets that contained any of the above keywords. The listener only recorded data when the computer running the code was connected to the internet, and thus the data is not temporally continuous. The data was recorded over a period of four weeks. A total of 16,900 Tweets were recorded. During the analysis, I noticed that “ebt” is frequently used in a non-gang context to mean “Electronic Bank Transfer.” Most Tweets containing “ebt” were related to “ebt cards.” As a result, Tweets with the keyword “ebt” were excluded from the analysis. Of the original 16,900, only 3,224 were unique Tweets (ie. not re-Tweets). Of the 3,224 unique Tweets, 277 had an associated geo-tag. The latitude and longitude of those Tweets were placed over a map of the United States to offer a temporal understanding of the terms.

For the next part of the analysis, a series of regular expressions were used to strip the 3,224 unique Tweets of unnecessary syntax (for example, hashtags, links, and punctuation). The resulting text of the Tweets was passed to a simple voter algorithm defined in the “sentiment” package for the R programming language. The package contains a dataset of words assigned to six emotion categories: anger, joy, sadness, disgust, fear, and surprise. Based on the frequency of the words in the text, the algorithm guesses the most probable emotion of the entire passage. The text from all 3224 unique Tweets was combined, and the frequency count of words used was calculated. The “wordcloud” package was used to display the words that appear most in the Tweets.

Lastly, I created a list of gun-related words (glock, gat, piece, nine, forty, heater, nina, heat, gauge, rosco, rod, strap, bullet, cap, AK, gun, firearm, weapon, mac, pistol, rifle, pop, mm, caliber, milla). I looked at the proportion of Tweets that contained these words, as well as a mourning-related word (“RIP”) on each day of data collection. The proportions were graphed over time to understand how gang members might be using Twitter to discuss the murder of Lil Durk’s manager, Chino. 

[1] See for more information. 

[2] It is important to note that even with the streaming API, developers do not have access to the full fire hose of Tweets. The documentation states that the API returns around 1% of the total flow of Tweets. For research purposes, we would hope that the 1% is a randomly selected subset of all of the data; however, it is unclear whether that is the case. 



Sentiment Analysis of Chicago Gang Tweets (An Introduction)

Clay Gibson

The gang problem in the United States is characterized by delinquency and violence: self-reported delinquency is much higher for gang members than for other youths;  gang-related killings account for around 12 percent of annual homicides.  John Hagedorn defines a gang as a group of individuals socialized by non-conventional institutions, the street or prisons, but our understanding of this socialization need not be confined to physical interactions: socialization of the street now happens online. Over 66% of fourth to ninth graders can access the internet from their bedroom. As of 2015, nearly a quarter of teens report being online “almost constantly.”  Social media sites such as Facebook, Instagram, Snapchat, and Twitter are specifically popular among adolescents. 

Gang members too have social media accounts, and on them, we can see classic illustrations of group processes --  efforts to earn or maintain status, avoid ridicule, and gain acceptance are all part of what Desmond Patton calls “internet banging,” using social media to brag, initiate threats and dares. One study found that 74% of gang members who use the internet report using social media to show or gain respect for their gang.  With social media comes a large amount of “unstructured and unsupervised peer socialization,” socialization that can be a detriment to intervention strategies for delinquent youths.

These musings on the internet are not detached from real life consequences. On more than one occasion, gang members have gotten shot because of what they said on social media sites. In 2012, Lil JoJo, a member of the Gangster Disciples in Chicago, was shot to death on the back of a friend’s bicycle shortly after Tweeting his location.  The day before she was gunned down, Gakirah “Lil Snoop” Barnes Tweeted, “I Dne seen 2 many of my niggaz n a casket…In da end we DIE.” Social media, personal interactions, and intergroup conflict are all entwined, and as the ubiquity of social media rises, so too does the importance of understanding and using social media for social change. 

Of the popular social media sites, Twitter offers specifically useful data. Users are asked one simple question: “What’s happening?” and are allowed to answer in under 140 characters. Their updates, called Tweets, are, in essence, real-time micro-news. Recently, researchers have been able to leverage this data to make predictions about future events, citing potential uses in responding to storms, fires, traffic jams, riots, and earthquakes.  But it need not stop there. According to researchers at the University of Tokyo, Twitter data can be used to predict any sort of event that has the following three properties: it affects many users; it influences daily life enough for people to Tweet about it; and it takes place spatially and/or temporally.  Possibly, Twitter could be used as a social sensor to understand, and perhaps predict, gang conflicts. 

However, there is little research on how social media is used to fuel gang mentality and behaviors. The geographic location of Tweets has been used to estimate the physical boundaries between rival gangs in Los Angeles.  Desmond Patton takes a look at the Twitter profiles of Detroit gang members and notes that there are three common themes in the content: arguing with other gang members, mourning the loss of loved ones, and discussing firearms or drugs. In the next few posts, I will look at the location, frequency, and content of Chicago gang-related Tweets by tracking certain key terms in the streaming Twitter application programming interface (API).