Taking a close look at the steady flow of millions of 140-character messages per day generated on Twitter is akin to trying to take a drink from the torrent of a fire hydrant’s discharge. But thanks to sophisticated software named in homage to the researcher, transfixed on a computer monitor, that it is designed to relieve -the “TwitterZombie” is helping researchers at Drexel collect tweets without being washed away.
Since tweets disappear from public access after six days or 1,500 total tweets on the same topic, it is difficult for researchers to examine more than a sampling of existing tweets. Drexel’s program searches, sorts and stores high volumes of tweets, enabling iSchool researchers to focus on a variety of topics – among them: the political debates and upcoming election, pop culture and sporting events.
Alan Black, a doctoral student at the iSchool-College of Information Science and Technology, wrote the TwitterZombie program to expand data collection capacity for the Big Social Data Warehouse developed by Dr. Sean Goggins, a professor in the iSchool, with masters student Michael Gallagher starting in late 2009. Since spring of 2012, the TwitterZombie has been used by several iSchool researchers who are studying social media.
“Previously collecting all that data was as unwieldy as trying to contain the flow of a fire hose,” Black said. “The TwitterZombie is more like using a bunch of straws to suck out just the data flows that we want to examine.”
The software gathers data generated from a series of searches that are schedule to be executed at regular intervals through Amazon’s cloud computing platform. One of the advantages of the programs is that it’s able to capture nearly all of the data from thousands of search phrases simultaneously.
Additionally, TwitterZombie can run searches for high-volume queries more frequently than queries that don’t return as many results – this helps to collect a more complete set of a data without missing tweets.
With millions of tweets collected by TwitterZombie, Drexel researchers are able to quickly run textual analyses and sift through the 140-character sets to discern such things as trends in communication, social network forming, frequency of tweets surrounding events and –in general- paint a picture of how society is reflected via twitter.
“Before we started using TwitterZombie, we were running queries on our laptops in a conference room,” said doctoral student Christopher Mascaro. “Now when we need to gather data, we just tell the Zombie what to look for and it does the rest.”
Black, Mascaro, Gallagher and Goggins will present their TwitterZombie at the ACM Group conference this fall and will continue to expand its usability by making the Zombie and its data accessible via cloud storage.