derek ruths || network dynamics

CSS 2012 Tutorial – From Tweets to Results


Since Twitter’s creation in 2006, it has become one of the most popular microblogging platforms in the world. By virtue of its popularity, the relative structural simplicity of Twitter posts, and a tendency towards relaxed privacy settings, Twitter has also become a popular data source for research on a range of topics in sociology, psychology, political science, and anthropology. Nonetheless, despite its widespread use in the research community, there are many pitfalls when working with Twitter data.

In this day-long workshop, we will lead participants through the entire Twitter-based research pipeline: from obtaining Twitter data all the way through performing some of the sophisticated analyses that have been featured in recent high-profile publications. In the morning, we will cover the nuts and bolts of obtaining and working with a Twitter dataset including: using the Twitter API, the firehose, and rate limits; strategies for storing and filtering Twitter data; and how to publish your dataset for other researchers to use. In the afternoon, we will delve into techniques for analyzing Twitter content including constructing retweet, mention, and follower networks; measuring the sentiment of tweets; and inferring the gender of users from their profiles and unstructured text.

We assume that participants will have little to no prior experience with mining Twitter or other social network datasets. As the workshop will be interactive, participants are encouraged to bring a laptop. Code examples and exercises will be given in Python, thus participants should have some familiarity with the language. However, all concepts and techniques covered will be language-independent, so any individual with some background in scripting or programming will benefit from the workshop.

Pre-tutorial setup

While you can certainly treat the tutorial as a lecture, we encourage you to come with a laptop configured so that you can follow along with some of the examples and exercises that will be shown over the course of the morning. To do this, please have the following libraries and tools installed:

  • Python 2.7
  • Other libraries are required, but are now bundled in with the data and scripts (see below)

In addition, you should do the following

  • Setup a Twitter account (keep the username/password handy)
  • Download the data and scripts that will be used during the workshop. (UPDATED with the httplib2 and modifications made to scripts during the workshop)
  • Download the lecture slides that will be used during the workshop.

As time will be limited during the tutorial itself, we will be setting aside only a few minutes for participants to setup their machines. Therefore, we encourage participants to come with their machine setup and the files downloaded.


The following is a brief list of the topics that are slated to be covered during the workshop.

  1. Introduction to Twitter
    1. Why use Twitter?
    2. Twitter Fundamentals
  2. Getting Twitter data
    1. Using the Twitter API
    2. Using the Twitter firehose
  3. Managing Twitter data
    1. Storage strategies
    2. How to publish a dataset
  4. Analyzing Twitter Data
    1. Inferring user device platforms
    2. Geo-locating users
    3. Topic classification
    4. Measuring sentiment
    5. Inferring user gender
    6. Constructing Twitter-based networks