r-factor

Can I programmatically update the type of a set of columns (to factors) in data.table?

ⅰ亾dé卋堺 提交于 2019-12-23 09:50:06
问题 I would like to modify a set of columns inside a data.table to be factors. If I knew the names of the columns in advance, I think this would be straightforward. library(data.table) dt1 <- data.table(a = (1:4), b = rep(c('a','b')), c = rep(c(0,1))) dt1[,class(b)] dt1[,b:=factor(b)] dt1[,class(b)] But I don't, and instead have a list of the variable names vars.factors <- c('b','c') I can apply the factor function to them without a problem ... lapply(vars.factors, function(x) dt1[,class(get(x))]

Using ifelse on factor in R

社会主义新天地 提交于 2019-12-22 06:32:40
问题 I am restructuring a dataset of species names. It has a column with latin names and column with trivial names when those are available. I would like to make a 3rd column which gives the trivial name when available, otherwise the latin name. Both trivial names and latin names are in factor-class. I have tried with an if-loop: if(art2$trivname==""){ art2$artname=trivname }else{ art2$artname=latname } It gives me the correct trivnames, but only gives NA when supplying latin names. And when I use

R proportion confidence interval factor

一曲冷凌霜 提交于 2019-12-21 16:51:46
问题 I am trying to summarise data from a household survey and as such most of my data is categorical (factor) data. I was looking to summarise it with plots of frequencies of responses to certain questions (e.g., a bar plot of percentages of households answering certain questions, with error bars showing confidence intervals). I found this excellent tutorial which I had thought was the answer to my prayers (http://www.cookbook-r.com/Manipulating_data/Summarizing_data/) but turns out this is only

subsetting based on number of observations in a factor variable

≯℡__Kan透↙ 提交于 2019-12-20 04:55:36
问题 how do you subset based on the number of observations of the levels of a factor variable? I have a dataset with 1,000,000 rows and nearly 3000 levels, and I want to subset out the levels with less say 200 observations. data <- read.csv("~/Dropbox/Shared/data.csv", sep=";") summary(as.factor(data$factor) 10001 10002 10003 10004 10005 10006 10007 10009 10010 10011 10012 10013 10014 10016 10017 10018 10019 10020 414 741 2202 205 159 591 194 678 581 774 778 738 1133 997 381 157 522 6 10021 10022

Reorder factor levels using names

和自甴很熟 提交于 2019-12-18 09:09:17
问题 I can reorder the levels of a factor using their indices like this factor(iris$Species,levels(iris$Species)[c(3:1)]) However if I try to reorder the same factor by name, it does not work: factor(iris$Species,levels(iris$Species)[c("virginica", "versicolor", "setosa")]) Is there a way to reorder the levels of a factor using their names? 回答1: Why don't you use the basic variant with giving new level names: factor(iris$Species, levels=c("virginica", "versicolor", "setosa")) Be sure to list all

Sort a factor based on value in one or more other columns

痞子三分冷 提交于 2019-12-17 18:26:11
问题 I've looked through a number of posts about ordering factors, but haven't quite found a match for my problem. Unfortunately, my knowledge of R is still pretty rudimentary. I have a subset of an archaeological artifact catalog that I'm working with. I'm trying to cross-tabulate diagnostic historical artifact types and site testing locations. Easy enough with ddply or tapply. My problem is that I want to sort the artifact types (a factor) by their mean diagnostic date (number/year), and I keep

ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

时光毁灭记忆、已成空白 提交于 2019-12-17 17:38:44
问题 I would like to create boxplots of multiple variables for groups of a continuous x-variable. The boxplots should be arranged next to each other for each group of x. The data looks like this: require (ggplot2) require (plyr) library(reshape2) set.seed(1234) x <- rnorm(100) y.1 <- rnorm(100) y.2 <- rnorm(100) y.3 <- rnorm(100) y.4 <- rnorm(100) df <- as.data.frame(cbind(x,y.1,y.2,y.3,y.4)) which I then melted dfmelt <- melt(df, measure.vars=2:5) The facet_wrap as shown in this solution (

Sort data frame column by factor

我怕爱的太早我们不能终老 提交于 2019-12-17 16:26:10
问题 Supose I have a data frame with 3 columns ( name , y , sex ) where name is character, y is a numeric value and sex is a factor. sex<-c("M","M","F","M","F","M","M","M","F") x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET") name<-as.character(x) y<-rnorm(9,8,1) score<-data.frame(x,y,sex) score name y sex 1 MARK 6.767086 M 2 TOM 7.613928 M 3 SUSAN 7.447405 F 4 LARRY 8.040069 M 5 EMMA 8.306875 F 6 LEONARD 8.697268 M 7 TIM 10.385221 M 8 MATT 7.497702 M 9 VIOLET 10.177969 F

Concatenate rows of a data frame

扶醉桌前 提交于 2019-12-17 10:53:32
问题 I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single element in a vector. As an example, I make a data frame of letters and numbers, and then I would like to concatenate the first row via the paste function, and hopefully return the value "A1" df <- data.frame(letters = LETTERS[1:5], numbers = 1:5) df ## letters numbers ## 1 A 1 ## 2 B 2 ## 3 C 3 ## 4 D 4 ## 5 E 5 paste(df[1,]

Convert factor to integer in a data frame

自作多情 提交于 2019-12-17 09:52:43
问题 I have the following code anna.table<-data.frame (anna1,anna2) write.table<-(anna.table, file="anna.file.txt",sep='\t', quote=FALSE) my table in the end contains numbers such as the following chr start end score chr2 41237927 41238801 151 chr1 36976262 36977889 226 chr8 83023623 83025129 185 and so on...... after that i am trying to to get only the values which fit some criteria such as score less than a specific value so i am doing the following anna3<-"data/anna/anna.file.txt" anna.total<