Pages

TweetMapping




Twitter incessantly produces copious amount of data. The locations of tweets can help with some interesting questions. One that comes to mind, and one that I plan to do when the time is right is- What part of the world is interested in The Champions League final vs The World Cup final. The rationale for this is the amount of debate currently happening on this topic.
Basically, this post will answer "where in the world are people searching for [something]?" Also, all explanation will be in the comments itself.

Code


library(ROAuth)
library(twitteR)
library(ggplot2)
library(maps)
library(dismo)

# Check to see if R is connected to twitter
registerTwitterOAuth(twitCred)

# searchString parameter for Twitter API
key <- "#UCLfinal"

# requesting Twitter API
tag <- searchTwitter(key, n = 2000, lang= "en") 

# Tweets data frame
# At this stage it is quite possible to get rid of all the non-geotagged tweets.
# However, a very very small portion of users geotag tweets. Therefore, another approach
# is used here. In the next step, the location in their profile description will
# be extracted.
df <- do.call("rbind", lapply(tag, as.data.frame))

# User data frame
userInfo <- do.call("rbind", lapply(lookupUsers(df$screenName), as.data.frame))   

# geocoding all users with some sort of location identification
# also creating "Interpreted Place"
# All locations with invalid location will be dropped after this step
# Package dismo used here
# Although using oneRecord=T decreases the size, it produces more
# reliable location data frame
locations <- geocode(userInfo$location, progress="text", oneRecord=T)

# getting rid of all rows with na
locations <- locations[complete.cases(locations),]

# also getting rid of all tweets that only have country name as locations
# For example, it is good to avoid the center of Australia (which is very very sparse) 
# showing massive number of tweets.
# The easiest way to do this is to get rid of all rows that do not have a comma
locations <- locations[grep("\\,",locations$interpretedPlace),]

# Map of the world
# It is also possible to do the same with country/places maps.
# However, it is necessary to make sure that the coordinates are correct
result <- ggplot(map_data("world")) + geom_path(aes(x = long, y = lat, group = group))

# Adding Tweet locations
result <- result + geom_point(data = locations, aes(x = longitude, y = latitude),
                              color = "red", alpha = .2, size = 3)
result <- result + ggtitle(key) + theme_minimal() + theme(axis.text=element_blank(),
                                                          axis.ticks=element_blank(),
                                                          axis.title=element_blank())  

# Time Stamping the file name
filename <- paste(format(Sys.Date(),"%d%m%y"),format(Sys.time(), "%H%M%S"),".png",sep="")

# Saving the file
ggsave(filename, units="in", width=8.15, height=5.20, dpi=300)



After coupling the result with the wordcloud code (Click here to go to that blog post) and Photoshop, these are the products.

#ChampionshipPlayOffFinal






Ayush Subedi

Coffee Connoisseur