Pages

RelearningR

I used R for two of my classes in college: Probability and Statistics and Stochastic Probabilities. During that time, I was in reverence with the way it handled lists. However, I did not use it much after getting done with those classes. I have decided to relearn it.

In this post, I will start off with some simple programs. After ascertaining some comfort with the syntax and semantics of R, I will try some complex ones. Also, since I am in the process of learning myself, if you decide to use some of these codes, just keep in mind that there might be several better ways of doing the same thing.

1. Dice Rolls: Probability of getting a side f number of times in n dice rolls

# DiceRolls.R
# Probability of getting a side f number of times in n dice rolls

#inputs
side <- 4
f <- 2
n <- 5

# Binomial Distribution
# using R's function for binomial distribution 
# where P(X=k)
 probability <- dbinom(frequency, n, 1/6)
 probability

# Monte Carlo Method
rep <- 10000 # Number of repetitions
list <- numeric(rep) # Initialize vector
for (i in 1:rep) {
trial <- sample(1:6, n, replace=TRUE)
#1 if number of times a side appears is f
success <- if ((sum(trial==side))==f) 1 else 0
list[i] <- success 
}
mean(list)

>  probability
[1] 0.160751
> mean(list)
[1] 0.1647

2. One dimentional random walks

# Five random walks in one dimension starting at 0

# creating a color string vector
color <- c("red","green","blue","black","darkgoldenrod1")
# cleaning the plot; this is necessary because we are using par(new=TRUE)
plot.new()
for (i in 1:5) {
x0<-0
I<-1
T <-1000
x=c(x0,rnorm(T-1)) # T-1 randomly generated normal distributions
y=cumsum(x)
par(new=TRUE) # so that the previous graphs are not deleted
plot(y,type="l",col=color[i], main="Five random walks in one dimension starting at 0", xlab="", ylim=c(-60, 60))
}




3. Two dimensional random walk


I stumbled upon a presentation slideshow that discusses how R can be used more efficiently. They perform analysis on several implementations that simulate two dimensional random walk. I started out with what they call "naive implementation". According to them, this is how a person who codes in C, C++ or Java would think of doing it. I totally agree. This exercise is great because it helps understand the positives of R in comparison to other languages that I am comfortable with. Link to the aforementioned presentation slideshow. 

Naive implementation:
# naive implementation (loops are slow in R)
naive <- function(n){

# numeric creates a double-precision vector of the specified length with each element equal to 0
xpos = ypos = numeric(n)

# will be used to randomly generate direction
xdir = c(TRUE,FALSE)
nextdir = c(1,-1)

# xpos and ypos are already 0 (see comment on numeric)
for(i in 2:n){

 if(sample(xdir,1)){
  xpos[i]=xpos[i-1]+sample(nextdir,1)
  ypos[i]=ypos[i-1]
 }
 else{
  xpos[i]=xpos[i-1]
  ypos[i]=ypos[i-1]+sample(nextdir,1)
 }
}
plot(xpos,ypos,type="l",col="blue",main="Two Dimensional Random Walk", xlab="x", ylab="y")
# starting point
points(0,0,pch = 19,col="red")
}

> system.time(naive(50000))
   user  system elapsed 
   0.84    0.17    1.02 


> system.time(naive(100000))
   user  system elapsed 
   1.70    0.32    2.03 
Vector implementation:

This implementation gets rid of the for loop. It is similar to the code in one dimensional random walk code above.

# Vectorization Process
factorization = function(n){ 
 steps = sample(c(-1,1), n-1, replace = T)
 xdir=sample(c(TRUE,FALSE), n-1,replace=T)
 xpos=c(0,cumsum(ifelse(xdir,steps,0)))
 ypos=c(0,cumsum(ifelse(xdir,0,steps)))
 plot(xpos,ypos,col="green",type="l",main="Two Dimensional Random Walk", xlab="x", ylab="y")
 points(0,0,pch = 19,col="red")
}

> system.time(vectorization(50000))
   user  system elapsed 
   0.05    0.19    0.25 


> system.time(vectorization(100000))
   user  system elapsed 
   0.09    0.31    0.40 

4. Birthday Probability

Click here to go to my post on Birthday Probability using Java. Using R, I was really surprised how easy it was.

How a java programmer would have done it in R

# Birthday Probability
birthday <- function(numofpeople, numofrooms){
  count <- 0
  for (i in 1:numofrooms){
    bday <- sample(1:365,numofpeople, replace=T)
    if (any(duplicated(bday)))
      count<-count+1;
  }  
probability = count/numofrooms
probability
}

How a R programmer would have done it (I think)

# Birthday Probability
birthday <- function(numofpeople, numofrooms){ 
  list <- numeric(numofrooms)
  
  for (i in 1:numofrooms){
    bday <- sample(1:365,numofpeople, replace=T)
    list[i] <- if (any(duplicated(bday))) 1 else 0   
  }   
mean(list)
}

Result:

> source("birthday.r")
> birthday(23,5000)
[1] 0.5044

5. Mandelbrot Set Area

Click here to go to my previous blog post on the area of the Mandelbrot Set using Java and Javascript. 

monte.Carlo <- 5000
x <- runif(monte.Carlo, -2, 2)
y <- runif(monte.Carlo, -2, 2)
list <- numeric(monte.Carlo)

for (j in 1:monte.Carlo){
  list[j] <- if (inmandelbrotset(complex(real = x[j], imaginary = y[j]))) 1 else 0
}
area<-mean(list)*16

# function that checks if a point E mandelbrot set
inmandelbrotset <- function(c)
{
  dwell.limit <- 2048
  z <- 0  
  for (i in 1:dwell.limit)
  { 
    z <- z ** 2 + c
    if (Mod(z) > 2)
    {
      return(FALSE)
    }
  }  
  return(TRUE)
}

Result:

> system.time(source("mandelbrot.r"))
   user  system elapsed 
   1.91    0.00    1.90 
> area
[1] 1.5168

Ayush Subedi

Coffee Connoisseur