# Useful Code

# The R language

Learning to code can help enormously in many tasks that a scientist has to do such as doing statistics and analysing data. The R programming language is particularly well suited to this task. There are many good resources online to pick up the basics of the R language, Try R is an online course that will teach you what you need to know to get yourself started.

While the code on this page can be used with minimum knowledge of R, to utilise its full power and customisability a little programming knowledge is required. This is particularly true for customising your own plots, for which the R package ggplot2 is extremely good. A very concise pdf tutorial of ggplot2 can be found here.

## Aequorin Calibration function

This R code aims to replace a previous excel spreadsheet that was being used by members of the lab. The functionality should be identical as the code is based on the calculations made in the spreadsheet. The code is written as follows:

**aequorin_cal_funct.R**

aequorin_calibration_function<-function(file_path,rep_type,rep_custom,data_type,plot_offset){ #Assuming all data is of the same type data_file <- read.csv(file_path, header=T) if(is.na(data_file[1,dim(data_file)][2])==TRUE){ data_length<-dim(data_file)[2]-4} else{ data_length<-dim(data_file)[2]-3} rep1<-list() if(rep_type=="Numbers"){ num_u<-unique(data_file[,2]) for (num_u_i in 1:length(num_u)){ rep1[[num_u_i]]<-(which(data_file[,2]==num_u[num_u_i]))} }else if(rep_type=="Letters"){ let_u<-unique(data_file[,3]) for (let_u_i in 1:length(let_u)){ rep1[[let_u_i]]<-(which(data_file[,3]==let_u[let_u_i]))} } else if(rep_type=="Custom"){ rep1<-rep_custom } else{ return("ERROR: You have not specified replicate type in the second arguement. \"Numbers\" indicate replicates are ordered by numbers, \"Letters\" by letters. Choose \"Custom\" if replicates are ordered by some other means and please specify which in the next argument.") } # Define constants - Note only works right now if all samples are of the same data type, would need to specify a list of "WT" or "Mut" etc. result_mat1_list<-list() for (ml1 in 1:length(rep1)){ if(data_type[ml1]=="WT"){ KR22<-7330000 KR37<-KR22*sqrt(2) KTR<-120 Slope<-2.99 Ratec<-1 }else if(data_type[ml1]=="Mut"){ KR22<-1.61*10^7 KR37<-KR22*sqrt(2) KTR<-22008 Slope<-1.43 Ratec<-1 }else if(data_type[ml1]=="Mut_coeln"){ KR22<-8.47*10^7 KR37<-KR22*sqrt(2) KTR<-165600 Slope<-1.2038 Ratec<-0.138 }else{ return("ERROR: data type has not been specified, pleas put either \"WT\", \"Mut\" or \"Mut_coeln\".") } result_mat1<-matrix(0,data_length,4*length(rep1[[ml1]]),) for (i in 1:length(rep1[[ml1]])){ result_mat1[,i]<-as.numeric(data_file[rep1[[ml1]][i],][-c(1:3,184)]) min_ratio<-min(result_mat1[,i]) result_mat1[1,i+length(rep1[[ml1]])]<-((result_mat1[1,i]-(min_ratio*Ratec))/(sum(result_mat1[,i])))^(1/Slope) result_mat1[1,i+(2*length(rep1[[ml1]]))]<-(result_mat1[1,i+length(rep1[[ml1]])]+(result_mat1[1,i+length(rep1[[ml1]])]*KTR)-1)/(KR37*(1-result_mat1[1,i+length(rep1[[ml1]])])) result_mat1[1,i+(3*length(rep1[[ml1]]))]<-result_mat1[1,i+(2*length(rep1[[ml1]]))]*10^6 for (j in 2:data_length){ result_mat1[j,i+length(rep1[[ml1]])]<-((result_mat1[j,i]-(min_ratio*Ratec))/(sum(result_mat1[-c(1:(j-1)),i])))^(1/Slope) result_mat1[j,i+(2*length(rep1[[ml1]]))]<-(result_mat1[j,i+length(rep1[[ml1]])]+(result_mat1[j,i+length(rep1[[ml1]])]*KTR)-1)/(KR37*(1-result_mat1[j,i+length(rep1[[ml1]])])) result_mat1[j,i+(3*length(rep1[[ml1]]))]<-result_mat1[j,i+(2*length(rep1[[ml1]]))]*10^6 } } result_mat1_list[[(ml1*2)-1]]<-result_mat1 a<-plot_offset RN<-rep(1,(data_length-a)) ratio<-result_mat1[c(1:(data_length-a)),length(rep1[[ml1]])+1] C2pC<-result_mat1[c(1:(data_length-a)),length(rep1[[ml1]])*3+1] for (i1 in 2:length(rep1[[ml1]])){ RN<-c(RN,rep(i1,(data_length-a))) ratio<-c(ratio,result_mat1[c(1:(data_length-a)),length(rep1[[ml1]])+i1]) C2pC<-c(C2pC,result_mat1[c(1:(data_length-a)),length(rep1[[ml1]])*3+i1]) } DF_plot<-data.frame(time=rep(c(1:(data_length-a)),length(rep1[[ml1]])),ReplicateNumber=RN,Ratio=ratio,Ca2plusConc=C2pC) result_mat1_list[[(ml1*2)]]<-ggplot(DF_plot, aes(x=time, y=Ca2plusConc, colour=factor(ReplicateNumber), group=factor(ReplicateNumber)))+ scale_colour_brewer(palette="Set1")+ theme_bw()+ geom_line(size=1) + guides(color=guide_legend(title="Replicates"))+ geom_point() +ylab("Ca2+ concentration") +xlab("Time") +labs(title=as.character(ml1)) } return(result_mat1_list) }

## Instructions for use

The code itself is very easy to use, you need just two things a working copy of R on your computer and a CSV file containing the data. If you do not have R install it right away, it is an open source software and won't cost you anything and is much better than excel. I recommend also installing RStudio as a user friendly interface for working with R

**Step 1**

Make sure the CSV file is in the right format, particularly R when it reads a CSV format reads only the comma separated values (hence the name CSV) anything else before these values will cause R to fail. Make sure your CSV does not contain something at the beginning like this:

User: USER,Path: C:\Program Files\BMG\OPTIMA\User\Data\,Test run no.: 249

Test name: GYORGY,Date: 22/11/2012,Time: 17:03:21

Luminescence

To fix just delete these lines and any blank lines preceding the data.

**Step 2**

With the CSV file prepared the code is nearly ready to run, before the code itself is run two things much be done. The R library ggplots2 must be installed and loaded as this is needed for plotting the data, to do this just run the following code:

install.packages("ggplot2") library(ggplot2)

The final preparation is simple, the function we are using must be loaded into R, this can be done by downloading the code above as an R file (.R extension) and then typing into R command line:

source('aequorin_cal_funct.R')

Note for this to work the R file must be in your current working directory. Your current working directory can be changed with the command setwd()

**Step 3**

Now the code can be run, the code I just ran to test everything is in order looks as follows:

output<-aequorin_calibration_function("~/Downloads/acq6.CSV","Letters",NA,rep("WT",5),100)

Note there are 5 arguments that need to be entered and also to get results we must assign them to the output with an arrow "<-"

1. The first argument is the path to the CSV file, here as you can see it is in my Downloads folder, the path must be in quotation marks.

2. The second argument refers to how the replicates are ordered, if they are ordered by column they will have the same number (e.g A1,B1,C1,...) in this case the value "Numbers" must be written as the second argument. If alternatively all replicates are in the same row they each will share the same letter (e.g. A1,A2,A3,...) in this case write "Letters" as the second argument. If your replicates are not ordered by column or row write "Custom" instead.

3. The third argument is only used if you selected "Custom" in the second argument. If you selected "Numbers" or "Letters" just write NA and move on the the 4th argument. If you did select "Custom" you need to put in a list defining where each of your replicates are, this requires more work and is generally not advised unless it can not be avoided. To create a list in R do as follows:

mylist<-list() mylist[[1]]<-c(1,2,3,4,12) mylist[[2]]<-c(5,6,7,8,9,11) mylist[[3]]<-c(10,13,14,15)

This list entered into the function would specify 3 replicates. Each entry in the list states in which rows in the CSV file those replicates are. The CSV file orders letters together (it counts along the rows), for example the first 15 rows would be A1,A2,A3,A4,A5,B1,B2,B3,B4,B5,C1,C2,C3,C4,C5 to use a custom list you need to know exactly how your replicates are arranged in the CSV file.

4. The fourth argument concerns what type of experiment you are analysing. "WT" = wild type aequorin, wild type coelenterazine, "Mut" = mutant aequorin, wild type coelenterazine, "Mut_coeln" = mutant aequorin, mutant coelenterazine. If your sets of replicates are all of the same type just type rep("WT",n) where n is the number of sets of replicates you are analysing. If you are analysing experiments of different types input a vector like c("WT","MuT","WT","MuT","WT") but be careful that the first set of replicates (e.g the first row) is "WT" and so on. **WARNING** The excel spread sheet this code is based on did not have a functioning "Mut_coeln" for unknown reasons, I need to still check if "Mut_coeln" here actually works as it is supposed to.

5. The fifth and final argument is a plotting variable. It is a number which specifies how many of the final time points you do not wish to plot. For instance if 180 time points were collected but only the first 80 need to be plotted put 100 in this argument.

**Step 4**

Finally you need to deal with the output. The output is given out as a list of length 2 times the number of sets of replicates with 2 bits of data representing each set. The first bit of data is the raw values of the calculations if they are required for any reason. These are accessed from the on entries of the list :

output[[1]]

Note to access data from lists you need to use double square brackets. Also this will output a matrix as follows, for a set containing n replicates the first n columns will be the raw data itself, the next n will be the ratio calculation, the next n will be the calculating of the Ca2+ and the final n columns will be the Ca2+ concentration itself. These columns reflect the steps of the calculation done in the original excel spreadsheet if you wish to compare the result to excel.

Finally the even numbered entries to the list contain instructions for plotting, just type:

output[[2]]

and a graph will appear on your screen.

If you have any other questions on how this works or wish some added functionality or find some bugs in this code please let me know

Bobby 27/11/2012