Is it possible to normalize this table in R based on the last column(samples) samples = number of sequenced genomes. So I want to get a normalised distribution of all the genes in all the conditions.
Simplified example of my data:
I tried:
dat1 <- read.table(text = " gene1 gene2 gene3 samples
condition1 1 1 8 120
condition2 18 4 1 118
condition3 0 0 1 75
condition4 32 1 1 130", header = TRUE)
dat1<-normalize(dat1, method = "standardize", range = c(0, 1), margin = 1L, on.constant = "quiet")
But the results include negative values and I am not sure how useful this approach is. Can anyone please suggest how I should normalize my data ... to get meaningful results.
Thanks a lot and apologies if it is a dumb question.
Using your data, you write a min max function first:
minmax = function(x){ (x-min(x))/(max(x)-min(x))}
Then iterate through the columns:
norm = data.frame(lapply(dat1[,1:3],function(i) minmax(i/dat1$samples)))
And it looks like this, I hope it's correct:
gene1 gene2 gene3
1 0.03385417 0.2458333 1.00000000
2 0.61970339 1.0000000 0.01326455
3 0.00000000 0.0000000 0.09565217
4 1.00000000 0.2269231 0.00000000