Twitter incessantly produces copious amount of data. The locations of tweets can help with some interesting questions. One that comes to mind, and one that I plan to do when the time is right is- What part of the world is interested in The Champions League final vs The World Cup final. The rationale for this is the amount of debate currently happening on this topic.
Basically, this post will answer "where in the world are people searching for [something]?" Also, all explanation will be in the comments itself.



# Check to see if R is connected to twitter

# searchString parameter for Twitter API
key <- "#UCLfinal"

# requesting Twitter API
tag <- searchTwitter(key, n = 2000, lang= "en") 

# Tweets data frame
# At this stage it is quite possible to get rid of all the non-geotagged tweets.
# However, a very very small portion of users geotag tweets. Therefore, another approach
# is used here. In the next step, the location in their profile description will
# be extracted.
df <-"rbind", lapply(tag,

# User data frame
userInfo <-"rbind", lapply(lookupUsers(df$screenName),   

# geocoding all users with some sort of location identification
# also creating "Interpreted Place"
# All locations with invalid location will be dropped after this step
# Package dismo used here
# Although using oneRecord=T decreases the size, it produces more
# reliable location data frame
locations <- geocode(userInfo$location, progress="text", oneRecord=T)

# getting rid of all rows with na
locations <- locations[complete.cases(locations),]

# also getting rid of all tweets that only have country name as locations
# For example, it is good to avoid the center of Australia (which is very very sparse) 
# showing massive number of tweets.
# The easiest way to do this is to get rid of all rows that do not have a comma
locations <- locations[grep("\\,",locations$interpretedPlace),]

# Map of the world
# It is also possible to do the same with country/places maps.
# However, it is necessary to make sure that the coordinates are correct
result <- ggplot(map_data("world")) + geom_path(aes(x = long, y = lat, group = group))

# Adding Tweet locations
result <- result + geom_point(data = locations, aes(x = longitude, y = latitude),
                              color = "red", alpha = .2, size = 3)
result <- result + ggtitle(key) + theme_minimal() + theme(axis.text=element_blank(),

# Time Stamping the file name
filename <- paste(format(Sys.Date(),"%d%m%y"),format(Sys.time(), "%H%M%S"),".png",sep="")

# Saving the file
ggsave(filename, units="in", width=8.15, height=5.20, dpi=300)

After coupling the result with the wordcloud code (Click here to go to that blog post) and Photoshop, these are the products.


This is the second version my only app in the Google Play Store that is somehow not down in the abbess. A recent comment made me realize that certain functionality in the previous version do not work on the newer Android devices. It was because the newer devices do not allow the main thread to do the background task (which is a great restriction. Kudos Android). In my case, the background task was to connect to to extract random numbers. So, I had to jettison the previous code and re-implement it using AsyncTask.


Code for DIY part

package com.randomayush;


import android.content.Context;

import android.os.AsyncTask;
import android.os.Bundle;
import android.view.View;
import android.view.inputmethod.InputMethodManager;
import android.widget.Button;
import android.widget.EditText;
import android.widget.TextView;

public class diy extends MainActivity {
    EditText min, max;
    Button submit;
    TextView box, status;
    String minimum, maximum;

    protected void onCreate(Bundle savedInstanceState) {

        min = (EditText) findViewById(;
        max = (EditText) findViewById(;
        submit = (Button) findViewById(;
        box = (TextView) findViewById(;
        status = (TextView) findViewById(;

        submit.setOnClickListener(new View.OnClickListener() {

            public void onClick(View v) {

                InputMethodManager mgr = (InputMethodManager
                mgr.hideSoftInputFromWindow(min.getWindowToken(), 0);

                minimum = min.getText().toString() + "";
                maximum = max.getText().toString() + "";
                if (minimum == "" || maximum == "") {
                    status.setText("Result: Invalid Input. Missing Data...");
                } else if (Integer.parseInt(maximum) <= 
                                            Integer.parseInt(minimum)) {
                      ("Result: Invalid Input. Check Max/Min Values ..");
                } else {
                    new MyTask().execute();


    private class MyTask extends AsyncTask<String, String, String> {
        String statusMsg = "";

        protected String doInBackground(String... params) {
            statusMsg = "Status: Checking connection...";
            if (haveNetworkConnection()) {
                statusMsg = "Status: network available... connecting now...";
                try {
                    statusMsg = "Status: Connecting to";
                    return dome(minimum, maximum);
                } catch (Exception e) {
                    statusMsg = "Status: Connection failed...";
                    return ("");
            statusMsg = "Status: Connection failed...";
            return "";


        protected void onPostExecute(String result) {

            if (result.equals("")) {
                status.setText("DIY Random-Type: Pseudo");
                box.setText("" + (Integer.parseInt(minimum) + 
                (int) (Math.random() * (Integer.parseInt(maximum) - 
                 Integer.parseInt(minimum)) + .5)));

            else {
                status.setText("DIY Random-Type: Real");


        protected void onPreExecute() {

        protected void onProgressUpdate(String... text) {


    public boolean haveNetworkConnection() {
        ConnectivityManager cm = (ConnectivityManager
        return cm.getActiveNetworkInfo() != null && 

    public String dome(String min, String max) throws Exception {
        String sendto = "" + 
         min + "&max=" + max + "&col=1&base=10&format=plain&rnd=new";
        URL url = new URL(sendto);
        BufferedReader in = new BufferedReader
         (new InputStreamReader(url.openStream()));
        return (in.readLine());

This is an austere implementation of mining tweets using R. I will work on making it better sometime in the future (once I engulf R data structures more).

Connecting to Twitter API

The first step was to get consumer key and consumer secret from Twitter. This is used for authentication purposes. To get these, I just created a new app on Twitter.
Apparently, on Windows system the authentication requires an extra step. Thanks to this blog post for clarifying that. See below.


#necessary step for Windows
download.file(url="", destfile="cacert.pem")

#to get your consumerKey and consumerSecret see the twitteR documentation for instructions
credentials <- OAuthFactory$new

save(credentials, file="twitter authentication.Rdata")

After following the link provided, allowing access to the Twitter app and copying the security key produced back in R, everything was all set. To check:

> registerTwitterOAuth(credentials)
[1] TRUE

Tweet Cloud


tag<- searchTwitter("#AFC", n=100, cainfo="cacert.pem")
df <-"rbind", lapply(tag,
twitterCorpus <- Corpus(VectorSource(df$text))
twitterCorpus <- tm_map(twitterCorpus, tolower)
# remove punctuation
twitterCorpus <- tm_map(twitterCorpus, removePunctuation)
# remove numbers
twitterCorpus <- tm_map(twitterCorpus, removeNumbers)
tdm <- TermDocumentMatrix(twitterCorpus, control = list(minWordLength = 1))
m <- as.matrix(tdm)
# calculate the frequency of words
v <- sort(rowSums(m), decreasing=TRUE)
words <- names(v)
d <- data.frame(word=words, freq=v)
wordcloud(d$word, d$freq, min.freq=3)



Although it generates the cloud, at this point, it is not effective enough for analysis. One way to make it better is to get rid of common English words such as "and", "but", etc. Secondly, getting rid of Unicode and non-English languages would be an improvement also. And, there is MORE that needs to be done.
Hopefully soon.


I just learnt that the common words mentioned above are referred as stop-words. And, it is pretty easy to handle them. After removing them and adding some colors the clouds look more beautiful. I also got rid of all words with substring "http" (maybe not in the most efficient way.. I will have to look into it).

# First of all make sure the connection to twitter is authenticated.
# see file twiiterconnection.R
# if that has been done previously and this is a new session, open .Rdata
# that has stored credentials
#if registerTwitterOAuth(credentials) is TRUE, you are good to go


# check connection

tag<- searchTwitter("#MH370", n=700, lang= "en", cainfo="cacert.pem")
df <-"rbind", lapply(tag,

twitterCorpus <- Corpus(VectorSource(df$text))
tdm <- TermDocumentMatrix(
  twitterCorpus, control = list(minWordLength = 1,
                                removePunctuation = TRUE, 
                                removeNumbers = TRUE,
                                stopwords = TRUE,

m <- as.matrix(tdm)

# frequency of words
frequency <- sort(rowSums(m), decreasing=TRUE)
words <- names(frequency)

# get rid of words that contains "http"
# hopefully there is a better way to do it
words[which(grepl("http",names(frequency)))] <- ""

d <- data.frame(word=words, freq=frequency)

#save the image in png format
png("mh370.png", width=12, height=12, units="in", res=300)
wordcloud(d$word, d$freq,scale=c(10,.4),min.freq=10,
          max.words=Inf, random.order=FALSE, rot.per=.3, colors=brewer.pal(8, "Dark2"))




wordcloud(d$word, d$freq,scale=c(8,.1),min.freq=7,
          max.words=Inf, random.order=FALSE, 
          rot.per=.3, vfont=c("gothic english","plain"),
          colors=brewer.pal(8, "Dark2"))

After changing the font, inverting the color and messing around in Photoshop: