r-factor | 易学教程

Can I programmatically update the type of a set of columns (to factors) in data.table?

阅读更多关于 Can I programmatically update the type of a set of columns (to factors) in data.table?

问题 I would like to modify a set of columns inside a data.table to be factors. If I knew the names of the columns in advance, I think this would be straightforward. library(data.table) dt1 <- data.table(a = (1:4), b = rep(c('a','b')), c = rep(c(0,1))) dt1[,class(b)] dt1[,b:=factor(b)] dt1[,class(b)] But I don't, and instead have a list of the variable names vars.factors <- c('b','c') I can apply the factor function to them without a problem ... lapply(vars.factors, function(x) dt1[,class(get(x))]

Using ifelse on factor in R

阅读更多关于 Using ifelse on factor in R

问题 I am restructuring a dataset of species names. It has a column with latin names and column with trivial names when those are available. I would like to make a 3rd column which gives the trivial name when available, otherwise the latin name. Both trivial names and latin names are in factor-class. I have tried with an if-loop: if(art2$trivname==""){ art2$artname=trivname }else{ art2$artname=latname } It gives me the correct trivnames, but only gives NA when supplying latin names. And when I use

R proportion confidence interval factor

阅读更多关于 R proportion confidence interval factor

问题 I am trying to summarise data from a household survey and as such most of my data is categorical (factor) data. I was looking to summarise it with plots of frequencies of responses to certain questions (e.g., a bar plot of percentages of households answering certain questions, with error bars showing confidence intervals). I found this excellent tutorial which I had thought was the answer to my prayers (http://www.cookbook-r.com/Manipulating_data/Summarizing_data/) but turns out this is only

subsetting based on number of observations in a factor variable

阅读更多关于 subsetting based on number of observations in a factor variable

问题 how do you subset based on the number of observations of the levels of a factor variable? I have a dataset with 1,000,000 rows and nearly 3000 levels, and I want to subset out the levels with less say 200 observations. data <- read.csv("~/Dropbox/Shared/data.csv", sep=";") summary(as.factor(data$factor) 10001 10002 10003 10004 10005 10006 10007 10009 10010 10011 10012 10013 10014 10016 10017 10018 10019 10020 414 741 2202 205 159 591 194 678 581 774 778 738 1133 997 381 157 522 6 10021 10022

Reorder factor levels using names

阅读更多关于 Reorder factor levels using names

问题 I can reorder the levels of a factor using their indices like this factor(iris$Species,levels(iris$Species)[c(3:1)]) However if I try to reorder the same factor by name, it does not work: factor(iris$Species,levels(iris$Species)[c("virginica", "versicolor", "setosa")]) Is there a way to reorder the levels of a factor using their names? 回答1: Why don't you use the basic variant with giving new level names: factor(iris$Species, levels=c("virginica", "versicolor", "setosa")) Be sure to list all

Sort a factor based on value in one or more other columns

阅读更多关于 Sort a factor based on value in one or more other columns

问题 I've looked through a number of posts about ordering factors, but haven't quite found a match for my problem. Unfortunately, my knowledge of R is still pretty rudimentary. I have a subset of an archaeological artifact catalog that I'm working with. I'm trying to cross-tabulate diagnostic historical artifact types and site testing locations. Easy enough with ddply or tapply. My problem is that I want to sort the artifact types (a factor) by their mean diagnostic date (number/year), and I keep

ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

阅读更多关于 ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

问题 I would like to create boxplots of multiple variables for groups of a continuous x-variable. The boxplots should be arranged next to each other for each group of x. The data looks like this: require (ggplot2) require (plyr) library(reshape2) set.seed(1234) x <- rnorm(100) y.1 <- rnorm(100) y.2 <- rnorm(100) y.3 <- rnorm(100) y.4 <- rnorm(100) df <- as.data.frame(cbind(x,y.1,y.2,y.3,y.4)) which I then melted dfmelt <- melt(df, measure.vars=2:5) The facet_wrap as shown in this solution (

Sort data frame column by factor

阅读更多关于 Sort data frame column by factor

问题 Supose I have a data frame with 3 columns ( name , y , sex ) where name is character, y is a numeric value and sex is a factor. sex<-c("M","M","F","M","F","M","M","M","F") x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET") name<-as.character(x) y<-rnorm(9,8,1) score<-data.frame(x,y,sex) score name y sex 1 MARK 6.767086 M 2 TOM 7.613928 M 3 SUSAN 7.447405 F 4 LARRY 8.040069 M 5 EMMA 8.306875 F 6 LEONARD 8.697268 M 7 TIM 10.385221 M 8 MATT 7.497702 M 9 VIOLET 10.177969 F

Concatenate rows of a data frame

阅读更多关于 Concatenate rows of a data frame

问题 I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single element in a vector. As an example, I make a data frame of letters and numbers, and then I would like to concatenate the first row via the paste function, and hopefully return the value "A1" df <- data.frame(letters = LETTERS[1:5], numbers = 1:5) df ## letters numbers ## 1 A 1 ## 2 B 2 ## 3 C 3 ## 4 D 4 ## 5 E 5 paste(df[1,]

Convert factor to integer in a data frame

阅读更多关于 Convert factor to integer in a data frame

问题 I have the following code anna.table<-data.frame (anna1,anna2) write.table<-(anna.table, file="anna.file.txt",sep='\t', quote=FALSE) my table in the end contains numbers such as the following chr start end score chr2 41237927 41238801 151 chr1 36976262 36977889 226 chr8 83023623 83025129 185 and so on...... after that i am trying to to get only the values which fit some criteria such as score less than a specific value so i am doing the following anna3<-"data/anna/anna.file.txt" anna.total<