Programming basics in R - Functions, Plots and Loops

Tiago Maié

Johannes Schöneich

04 November 2021


5 - Level

We finished the last class touching very lightly on functions. Today we’ll go deeper on this subject. Functions are one of the most powerful tools in R.

5.1 More functions

In R the basic trigonometry functions use Radians instead of Degrees. The relationship between radians and degrees is the following:

Angle in radians = Angle in degrees * (pi/180)

Try to write the code to compute the value of 45 degrees in radians.


Now… To build our first function! If you want to a) subtract two numbers and b) obtain the absolute value of the operation done in a); you could do so with a few lines of code like:

Number_1 = -168
Number_2 = -126
Result = Number_1 - Number_2
Final_result = abs(Result)

Every time you would want to repeat this operation but with different numbers you would have to use the code above but changing the values of Number_1 and Number_2.

Number_1 = 1191
Number_2 = 1233
Result = Number_1 - Number_2
Final_result = abs(Result)
print(x=Final_result)
## [1] 42

If you write a function to do it however, if you want to repeat the operation you can do so with a single line of code later on. To do the same operation as the one described above using a function, you could do:

My_function = function(Number_1, Number_2){
  Result = Number_1 - Number_2
  Final_result = abs(Result)
  return(Final_result)
}

My_function_but_more_advanced = function(Number_1, Number_2){
  return(abs(Number_1 - Number_2))
}

Both functions do the same, but the cool thing is that after doing this, you can obtain the same results as above with a single line of code for each of the examples as:

My_final_result_1 = My_function(Number_1=-168, Number_2=-126)
My_final_result_2 = My_function(Number_1=1191, Number_2=1233)
print(x=My_final_result_1)
## [1] 42
print(x=My_final_result_2)
## [1] 42

Writing our first function in high-school math notation gives us:

\(f(x, y) = |g(x, y)|\)

\(g(x, y) = x - y\)

Or if we simplify: \(f(x,y) = |x-y|\)

There are a couple of things that you should know about functions. Let us use kitchen robots again as an example. In R, these kitchen robots clean after themselves, that is, if you give it the ingredients, they’ll give you a meal and afterwars, inside there won’t be anything, just the instructions to execute the recipe. Also if you don’t provide the correct ingredients, the robot will not know what to do. If your robot makes pasta and if you provide as ingredients rice and a shoe it will probably stop working.

5.2 Exercise

Write a function that takes in degrees and gives back/returns radians.

Use it to compute the radian values of a 90, 45 and 0 degree angles.

5.3 Exercise

Use these basic trigonometry functions (they only accept radians):

  • cos(x) - cosine;
  • sin(x) - sine;
  • tan(x) - tangent

and write the code to compute the sine, cosine and tangent of a 90 degree angle and check the result.


6 - Level

At this point you should be able to start working on real problems, so let’s do just that!

6.1 Loading data

Reading data from a file

First lets load some data. If you still have not downloaded the data, gene_expression_data.csv, you can do so in this link.

gene_data = read.csv("gene_expression_data.csv")

Help

To check how the read.csv function works check the read.csv page by typing ?read.csv in the console.

?read.csv

This is just a simple description. The help page has much more useful information!

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = FALSE,
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

read.csv2(file, header = TRUE, sep = ";", quote = "\"",
          dec = ",", fill = TRUE, comment.char = "", ...)

read.delim(file, header = TRUE, sep = "\t", quote = "\"",
           dec = ".", fill = TRUE, comment.char = "", ...)

read.delim2(file, header = TRUE, sep = "\t", quote = "\"",
            dec = ",", fill = TRUE, comment.char = "", ...)

6.2 Inspecting what we loaded

What’s inside?

We can quickly check what is inside this object with a few commands.

dim(gene_data)     # gives us the dimensions of the table
## [1] 305   4
head(gene_data)    # shows us the first 6 rows of the table
##           X     geneA     geneB   cell_type
## 1 FIBRO-9DW 0.7542634 0.9423952 FIBROBLASTS
## 2 FIBRO-TPA 0.7476058 0.9396038 FIBROBLASTS
## 3 FIBRO-TYL 0.7476938 0.9457088 FIBROBLASTS
## 4 FIBRO-4LI 0.7667875 0.9727001 FIBROBLASTS
## 5 FIBRO-7CJ 0.7712265 0.9645454 FIBROBLASTS
## 6 FIBRO-50S 0.7625049 0.9724643 FIBROBLASTS
summary(gene_data) # shows us information about the different columns of the table
##       X                 geneA            geneB         cell_type        
##  Length:305         Min.   :0.7318   Min.   :0.9116   Length:305        
##  Class :character   1st Qu.:0.7630   1st Qu.:0.9560   Class :character  
##  Mode  :character   Median :1.3015   Median :0.9798   Mode  :character  
##                     Mean   :1.0982   Mean   :0.9822                     
##                     3rd Qu.:1.3158   3rd Qu.:0.9926                     
##                     Max.   :1.3348   Max.   :1.2552

With the previous commands we can already say quite a few things about the data we have loaded. It has 305 rows and 4 columns. The 1st column is an experiment identifier, the 2nd and 3rd columns is the gene expression information about two genes (geneA and geneB). Finally, the 4th column is named “cell_type”, which contains the cell type associated with each different sample.

We can use the command “table” on the column “cell_type” of the data we loaded to obtain a count of each of the different cell types present in this data in an easy to read form.

If you remember, from the previous exercises and the lecture, this is done with “[]”. Further since we are looking at some sort of matrix and we want a column, we know that we should place the name of the column after a comma (data[,“name_of_column”]).

count_cell_types = table(gene_data[,"cell_type"])
count_cell_types
## 
## BLOOD.CELLS FIBROBLASTS        IPSC 
##         184         108          13

Help

For detailed help use ?NAME_OF_FUNCTION.

?dim
?head
?table
?summary

dim

dim(x)
dim(x) <- value

head

head(x, ...)
## Default S3 method:
head(x, n = 6L, ...)

## S3 method for class 'matrix'
head(x, n = 6L, ...) # is exported as head.matrix()
## NB: The methods for 'data.frame' and 'array'  are identical to the 'matrix' one

## S3 method for class 'ftable'
head(x, n = 6L, ...)
## S3 method for class 'function'
head(x, n = 6L, ...)


tail(x, ...)
## Default S3 method:
tail(x, n = 6L, keepnums = FALSE, addrownums, ...)
## S3 method for class 'matrix'
tail(x, n = 6L, keepnums = TRUE, addrownums, ...) # exported as tail.matrix()
## NB: The methods for 'data.frame', 'array', and 'table'
##     are identical to the  'matrix'  one

## S3 method for class 'ftable'
tail(x, n = 6L, keepnums = FALSE, addrownums, ...)
## S3 method for class 'function'
tail(x, n = 6L, ...)

table

table(...,
      exclude = if (useNA == "no") c(NA, NaN),
      useNA = c("no", "ifany", "always"),
      dnn = list.names(...), deparse.level = 1)

as.table(x, ...)
is.table(x)

## S3 method for class 'table'
as.data.frame(x, row.names = NULL, ...,
              responseName = "Freq", stringsAsFactors = TRUE,
              sep = "", base = list(LETTERS))

summary

summary(object, ...)

## Default S3 method:
summary(object, ..., digits, quantile.type = 7)
## S3 method for class 'data.frame'
summary(object, maxsum = 7,
       digits = max(3, getOption("digits")-3), ...)

## S3 method for class 'factor'
summary(object, maxsum = 100, ...)

## S3 method for class 'matrix'
summary(object, ...)

## S3 method for class 'summaryDefault'
format(x, digits = max(3L, getOption("digits") - 3L), ...)
 ## S3 method for class 'summaryDefault'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

6.3 Basic information about the data

With the information gathered through the few commands we ran, we can say that:

  • the table has the gene expression values for genes A and B for 305 different samples from 3 different cell types.
  • there are 184 blood cells samples, 108 fibroblasts and 13 IPSCs in this dataset.

6.4 Plotting

Basic plots

There are several different functions that you can use to analyse this data. For example we can use the count data that we generated above with the table function to quickly generate a pie chart or bar plot with the number of samples per cell type in our data.

pie(count_cell_types)

barplot(count_cell_types)

These are fine but we can define our own colours that we can use here and throughout the whole exercise to keep the plots consistent with each other. R has a number of predefined colours that you can call by name (e.g. “red”,“black”,“green”) and which you can check in this link. You can also use a hexadecimal colour code, commonly called hexcolor, to define over 16 million colours. In “hexcolor”, red can be “#ff0000”, black is “#000000” and green can be “#42853c”. Hexcolour codes always start with a “#” symbol followed by 6 alfanumerical characters.

In this exercise we will use the colors pre-defined in R but feel free to change these to colours you like. Lets define cell_colours and use it throughout the exercise.

cell_colours = c("red", "blue", "green")

Lets repeat the previous plots but with colours.

pie(count_cell_types, col=cell_colours)     # the col argument defines the color

barplot(count_cell_types, col=cell_colours) # the col argument defines the color

A scatter plot, where we show the expression of the two genes for all samples, provides the best vizualisation for this dataset. To do this, we can use the function plot. It has many arguments and you should check its help page. To start with we will just use the x and y arguments.

plot(
  x=gene_data[,"geneA"], # data in the x axis
  y=gene_data[,"geneB"]  # data in the y axis
)