Installing R

The R programming language is officially managed by the Comprehensive R Archive Network (CRAN) and can be downloaded for free here.

When opening the R "console", you should see the version that is installed and a "prompt" >. This tells you R is ready to receive your input. It now works like a big calculator!

Installing RStudio

Everything that you do with R, ultimately happens in the console. However, it is not the most user friendly option to do everything directly in, and only with the console.

Luckily, companion programs exist that amplify the R experience. These are often called integrated development environments (IDEs). The most popular, and the one we will use is RStudio. Like R, the desktop version of RStudio is free and cross platform. It can be downloaded here.

RStudio has 4 windows:

image source

R syntax

Like any other language, R has syntax. This is the arrangement of words, symbols and phrases to create meaning (e.g. a sentence). Here are some of the basics:

Spaces, indents and line breaks

  • Code is run line-by-line and from left to right.
  • Spaces (including indents) have no meaning!
# this is the same:
1+1
## [1] 2
# as this:
  1     +  1
## [1] 2

Comments

# This is a comment!

1+1 # This is also a comment
## [1] 2

Case-sensitivity

R is case-sensitive.

# this will work:
print("Hello")
## [1] "Hello"
# This will not:
Print("Hello")

Storing information

The true power comes from storing information. To store information we use the arrow <- operator

a<-1
b<-2

a+b
## [1] 3

Operators

We have already seen a few operators, such as + or <-. Most are pretty self-explanatory, but I will highlight a few more:

# multiplication
2*3
## [1] 6
# division
2/3
## [1] 0.6666667
# exponents 
2^3
## [1] 8

Equally important are logical operators. These are ones that will result in a TRUE or FALSE. For example:

# smaller than
2<3
## [1] TRUE
# bigger or equal to
2>=3
## [1] FALSE
# same as/eual to
3==3
## [1] TRUE
# not equal to
3!=2
## [1] TRUE

The basic building blocks

As we have already mentioned, R is an object-oriented language! That means, information is mostly stored as "objects". To master R, we therefore have to get familiar with the different types of objects. Here a few of the most important:

Vectors

The most basic object type is a vector. This is a one-dimensional storage (think of it as a single line of text, or a single column in excel). We have already seen this when we stored a number as an alphabet:

a<-1
a
## [1] 1

However, we can make this more than just a single number!

a<-1:100

Tip: Rstudio will often tell you what kind of structure the data in the object has! In this case a series of 100 integers!

Vectors are not limited to numbers, then can also be characters, or "strings" as they are often called.

b<-"Hello, World!"

IMPORTANT: notice how characters are wrapped in quotations (single or double), while as numerics are not.

If we want to store a series of data, we need to use the c() syntax.

# for numerics, this:
d<-1:4
# is the same as this:
e<-c(1,2,3,4)
# but for characters, we need to use c()
f<-c("Hello, World!", "Hello, Tanzania!", "Hello, Zanzibar!", "Hello, UDSM!")

Two other important types of vectors that are worth mentioning are logical vectors and factors

# this is logical vector
g<-c(TRUE, FALSE)

# this is a factor (categorical variables)
h<-as.factor(c("a","b"))

Matrix and Dataframe

Rarely will our data be one-dimensional. Most of the time, we will be working with tables, that is, 2D objects with rows and columns. We could combine two 1D objects into a data frame:

df<-data.frame(e, f)
df
##   e                f
## 1 1    Hello, World!
## 2 2 Hello, Tanzania!
## 3 3 Hello, Zanzibar!
## 4 4     Hello, UDSM!

Note: notice how we can combine different types of data (strings and numbers).

Functions

We will come back to dataframes in more detail later, but now is a good time to introduce some more syntax, and functions. It is very likely that we are going to do things over and over again. Like adding together numbers.

4+6
## [1] 10
7+8
## [1] 15
10+50
## [1] 60

To make this more practical, we can come up with a flexible way to do this for any two numbers:

product<-function(a,b) a+b

There is a lot to unpack here. We have introduced the function() syntax, the arguments of a function and the description.

Note: you have already seen functions, with data.frame() and as.factor()

we have now stored the function as and object called product and can now easily use it!

product(a=2, b=9)
## [1] 11

Obviously this is a fairly useless function because + does the same thing. In fact, R comes loaded with many useful functions, incuding sum(). This does exactly the same:

sum(c(2,9))
## [1] 11

Tip: To know what arguments are available for a specific function, use the "Help" window on the bottom right.

Some other useful functions include:

mean(c(2,9,9))
## [1] 6.666667
median(c(2,9,9))
## [1] 9
length(c(2,9,9))
## [1] 3

Libraries and documentation

Sets of related functions are often stored together in a "package" or "library". Some of these come shipped with R, such as the stats package. One of the most powerful things about R is that anyone can write a package or library and make it available to the whole community. This is why R is always evolving and is quickly becoming the leading software for statistics, data visualization, phylogenetics, GIS and spatial data, and even web applications.

To make use of these libraries, we need to first install them (done only once) and then load them (needs to be done for every new session). Lets spend some time looking at the following:

# installing a package
install.packages("readxl")

# loading a package
library(readxl)

Common mistakes

Sometimes code just won't work. This could be because there is something inherently wring with your data and/or a function you are trying to use. In this case, sometimes the error messages can be helpful, other times not so much. This often depends on how well documented a function is, and how well the author can anticipate ways in which the function may break. For example, we cannot calculate the mean of a vector that contains non-numerics:

mean(c(1,2,3))
## [1] 2
mean(c(1, 2, "hello"))
## Warning in mean.default(c(1, 2, "hello")): argument is not numeric or logical:
## returning NA
## [1] NA

Other times errors arise because your syntax is incorrect. In such cases, RStudio has become incredibly sophisitcated and tries to help you, even before you run your code, with red underlines under the offending section of code, or a red cross at the start of the line. Lets look at some very common problems:

Where to get help?

No one expects you to write all of your code from scratch. In fact, most of what you want to do, many people have already done. R-users are a vibrant, open community that are generally OK with sharing code. As such, if there is something that you want to do, but don't know how... google it! Besides the many dedicated webpages and online books on R, There are many platforms and forums like stackoverflow where people post code that didn't work for them, with others providing solutions.

Another, very recent solution is to use ChatGPT! For common problems it is incredibly good at writing R code.

Although it is therefore generally advised to "copy code", for your own development, it is important to take some time to understand the code you are using.