We finished the last class touching very lightly on functions. Today we’ll go deeper on this subject. Functions are one of the most powerful tools in R.
In R the basic trigonometry functions use Radians instead of Degrees. The relationship between radians and degrees is the following:
Angle in radians = Angle in degrees * (pi/180)
Try to write the code to compute the value of 45 degrees in radians.
Now… To build our first function! If you want to a) subtract two numbers and b) obtain the absolute value of the operation done in a); you could do so with a few lines of code like:
Number_1 = -168
Number_2 = -126
Result = Number_1 - Number_2
Final_result = abs(Result)
Every time you would want to repeat this operation but with different numbers you would have to use the code above but changing the values of Number_1 and Number_2.
Number_1 = 1191
Number_2 = 1233
Result = Number_1 - Number_2
Final_result = abs(Result)
print(x=Final_result)
## [1] 42
If you write a function to do it however, if you want to repeat the operation you can do so with a single line of code later on. To do the same operation as the one described above using a function, you could do:
My_function = function(Number_1, Number_2){
Result = Number_1 - Number_2
Final_result = abs(Result)
return(Final_result)
}
My_function_but_more_advanced = function(Number_1, Number_2){
return(abs(Number_1 - Number_2))
}
Both functions do the same, but the cool thing is that after doing this, you can obtain the same results as above with a single line of code for each of the examples as:
My_final_result_1 = My_function(Number_1=-168, Number_2=-126)
My_final_result_2 = My_function(Number_1=1191, Number_2=1233)
print(x=My_final_result_1)
## [1] 42
print(x=My_final_result_2)
## [1] 42
Writing our first function in high-school math notation gives us:
\(f(x, y) = |g(x, y)|\)
\(g(x, y) = x - y\)
Or if we simplify: \(f(x,y) = |x-y|\)
There are a couple of things that you should know about functions. Let us use kitchen robots again as an example. In R, these kitchen robots clean after themselves, that is, if you give it the ingredients, they’ll give you a meal and afterwars, inside there won’t be anything, just the instructions to execute the recipe. Also if you don’t provide the correct ingredients, the robot will not know what to do. If your robot makes pasta and if you provide as ingredients rice and a shoe it will probably stop working.
Write a function that takes in degrees and gives back/returns radians.
Use it to compute the radian values of a 90, 45 and 0 degree angles.
Use these basic trigonometry functions (they only accept radians):
and write the code to compute the sine, cosine and tangent of a 90 degree angle and check the result.
At this point you should be able to start working on real problems, so let’s do just that!
First lets load some data. If you still have not downloaded the data, gene_expression_data.csv, you can do so in this link.
gene_data = read.csv("resources/gene_expression_data.csv")
To check how the read.csv function works check the read.csv page by typing ?read.csv in the console.
?read.csv
This is just a simple description. The help page has much more useful information!
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors, tryLogical = TRUE,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = FALSE,
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
read.csv(file, header = TRUE, sep = ",", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
read.csv2(file, header = TRUE, sep = ";", quote = "\"",
dec = ",", fill = TRUE, comment.char = "", ...)
read.delim(file, header = TRUE, sep = "\t", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
read.delim2(file, header = TRUE, sep = "\t", quote = "\"",
dec = ",", fill = TRUE, comment.char = "", ...)
We can quickly check what is inside this object with a few commands.
dim(gene_data) # gives us the dimensions of the table
## [1] 305 4
head(gene_data) # shows us the first 6 rows of the table
## X geneA geneB cell_type
## 1 FIBRO-9DW 0.7542634 0.9423952 FIBROBLASTS
## 2 FIBRO-TPA 0.7476058 0.9396038 FIBROBLASTS
## 3 FIBRO-TYL 0.7476938 0.9457088 FIBROBLASTS
## 4 FIBRO-4LI 0.7667875 0.9727001 FIBROBLASTS
## 5 FIBRO-7CJ 0.7712265 0.9645454 FIBROBLASTS
## 6 FIBRO-50S 0.7625049 0.9724643 FIBROBLASTS
summary(gene_data) # shows us information about the different columns of the table
## X geneA geneB cell_type
## Length:305 Min. :0.7318 Min. :0.9116 Length:305
## Class :character 1st Qu.:0.7630 1st Qu.:0.9560 Class :character
## Mode :character Median :1.3015 Median :0.9798 Mode :character
## Mean :1.0982 Mean :0.9822
## 3rd Qu.:1.3158 3rd Qu.:0.9926
## Max. :1.3348 Max. :1.2552
With the previous commands we can already say quite a few things about the data we have loaded. It has 305 rows and 4 columns. The 1st column is an experiment identifier, the 2nd and 3rd columns is the gene expression information about two genes (geneA and geneB). Finally, the 4th column is named “cell_type”, which contains the cell type associated with each different sample.
We can use the command “table” on the column “cell_type” of the data we loaded to obtain a count of each of the different cell types present in this data in an easy to read form.
If you remember, from the previous exercises and the lecture, this is done with “[]”. Further since we are looking at some sort of matrix and we want a column, we know that we should place the name of the column after a comma (data[,“name_of_column”]).
count_cell_types = table(gene_data[,"cell_type"])
count_cell_types
##
## BLOOD.CELLS FIBROBLASTS IPSC
## 184 108 13
For detailed help use ?NAME_OF_FUNCTION.
?dim
?head
?table
?summary
dim
dim(x)
dim(x) <- value
head
head(x, ...)
## Default S3 method:
head(x, n = 6L, ...)
## S3 method for class 'matrix'
head(x, n = 6L, ...) # is exported as head.matrix()
## NB: The methods for 'data.frame' and 'array' are identical to the 'matrix' one
## S3 method for class 'ftable'
head(x, n = 6L, ...)
## S3 method for class 'function'
head(x, n = 6L, ...)
tail(x, ...)
## Default S3 method:
tail(x, n = 6L, keepnums = FALSE, addrownums, ...)
## S3 method for class 'matrix'
tail(x, n = 6L, keepnums = TRUE, addrownums, ...) # exported as tail.matrix()
## NB: The methods for 'data.frame', 'array', and 'table'
## are identical to the 'matrix' one
## S3 method for class 'ftable'
tail(x, n = 6L, keepnums = FALSE, addrownums, ...)
## S3 method for class 'function'
tail(x, n = 6L, ...)
table
table(...,
exclude = if (useNA == "no") c(NA, NaN),
useNA = c("no", "ifany", "always"),
dnn = list.names(...), deparse.level = 1)
as.table(x, ...)
is.table(x)
## S3 method for class 'table'
as.data.frame(x, row.names = NULL, ...,
responseName = "Freq", stringsAsFactors = TRUE,
sep = "", base = list(LETTERS))
summary
summary(object, ...)
## Default S3 method:
summary(object, ..., digits, quantile.type = 7)
## S3 method for class 'data.frame'
summary(object, maxsum = 7,
digits = max(3, getOption("digits")-3), ...)
## S3 method for class 'factor'
summary(object, maxsum = 100, ...)
## S3 method for class 'matrix'
summary(object, ...)
## S3 method for class 'summaryDefault'
format(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summaryDefault'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
With the information gathered through the few commands we ran, we can say that:
There are several different functions that you can use to analyse this data. For example we can use the count data that we generated above with the table function to quickly generate a pie chart or bar plot with the number of samples per cell type in our data.
pie(count_cell_types)
barplot(count_cell_types)
These are fine but we can define our own colours that we can use here and throughout the whole exercise to keep the plots consistent with each other. R has a number of predefined colours that you can call by name (e.g. “red”,“black”,“green”) and which you can check in this link. You can also use a hexadecimal colour code, commonly called hexcolor, to define over 16 million colours. In “hexcolor”, red can be “#ff0000”, black is “#000000” and green can be “#42853c”. Hexcolour codes always start with a “#” symbol followed by 6 alfanumerical characters.
In this exercise we will use the colors pre-defined in R but feel free to change these to colours you like. Lets define cell_colours and use it throughout the exercise.
cell_colours = c("red", "blue", "green")
Lets repeat the previous plots but with colours.
pie(count_cell_types, col=cell_colours) # the col argument defines the color
barplot(count_cell_types, col=cell_colours) # the col argument defines the color