r-factor

subsetting based on number of observations in a factor variable

半城伤御伤魂 提交于 2019-12-02 10:44:53
how do you subset based on the number of observations of the levels of a factor variable? I have a dataset with 1,000,000 rows and nearly 3000 levels, and I want to subset out the levels with less say 200 observations. data <- read.csv("~/Dropbox/Shared/data.csv", sep=";") summary(as.factor(data$factor) 10001 10002 10003 10004 10005 10006 10007 10009 10010 10011 10012 10013 10014 10016 10017 10018 10019 10020 414 741 2202 205 159 591 194 678 581 774 778 738 1133 997 381 157 522 6 10021 10022 10023 10024 10025 10026 10027 10028 10029 10030 10031 10032 10033 10034 10035 10036 10037 10038 398 416

Remove rows based on factor-levels

血红的双手。 提交于 2019-12-01 20:18:20
问题 I have a data.frame df in format "long". df <- data.frame(site = rep(c("A","B","C"), 1, 7), time = c(11,11,11,22,22,22,33), value = ceiling(rnorm(7)*10)) df <- df[order(df$site), ] df site time value 1 A 11 12 2 A 22 -24 3 A 33 -30 4 B 11 3 5 B 22 16 6 C 11 3 7 C 22 9 Question How do I remove the rows where an unique element of df$time is not present for each of the levels of df$site ? In this case I want to remove df[3,] , because for df$time the timestamp 33 is only present for site A and

Incorrect Conversion of Date as a Factor to a Date

冷暖自知 提交于 2019-11-30 23:23:18
I am having trouble calculating a date that is imported in from a .csv file. What I want to do is take that date in the factor DateClosed and generate a date in a date field (a). Example if a=203 I want the date to be the equivalent of DateClosed-203. However, I am having trouble with the code listed below. DateClose is a factor. > head(DateClosed) [1] 7/30/2007 12/12/2007 5/8/2009 6/24/2009 6/24/2009 2/29/2008 165 Levels: 1/12/2010 1/15/2011 1/15/2013 1/17/2009 1/18/2008 1/19/2012 1/2/2013 1/21/2013 1/22/2010 1/24/2013 1/26/2014 ... 9/7/2010 > head(as.Date(DateClosed,format="%m/%d/%y")) [1]

Incorrect Conversion of Date as a Factor to a Date

萝らか妹 提交于 2019-11-30 19:43:45
问题 I am having trouble calculating a date that is imported in from a .csv file. What I want to do is take that date in the factor DateClosed and generate a date in a date field (a). Example if a=203 I want the date to be the equivalent of DateClosed-203. However, I am having trouble with the code listed below. DateClose is a factor. > head(DateClosed) [1] 7/30/2007 12/12/2007 5/8/2009 6/24/2009 6/24/2009 2/29/2008 165 Levels: 1/12/2010 1/15/2011 1/15/2013 1/17/2009 1/18/2008 1/19/2012 1/2/2013 1

R how to change one of the level to NA

断了今生、忘了曾经 提交于 2019-11-30 11:33:24
I have a data set and one of its column has factor levels "a" "b" "c" "NotPerformed" . How can I change all the "NotPerformed" factors to NA? Set the level to NA: x <- factor(c("a", "b", "c", "NotPerformed")) x ## [1] a b c NotPerformed ## Levels: a b c NotPerformed levels(x)[levels(x)=='NotPerformed'] <- NA x ## [1] a b c <NA> ## Levels: a b c Note that the factor level is removed. I revise my old answer and provide what you can do as of September 2016. With the development of the dplyr package, now you can use recode_factor() to do the job. x <- factor(c("a", "b", "c", "NotPerformed")) # [1]

R how to change one of the level to NA

假如想象 提交于 2019-11-29 17:06:22
问题 I have a data set and one of its column has factor levels "a" "b" "c" "NotPerformed" . How can I change all the "NotPerformed" factors to NA? 回答1: Set the level to NA: x <- factor(c("a", "b", "c", "NotPerformed")) x ## [1] a b c NotPerformed ## Levels: a b c NotPerformed levels(x)[levels(x)=='NotPerformed'] <- NA x ## [1] a b c <NA> ## Levels: a b c Note that the factor level is removed. 回答2: I revise my old answer and provide what you can do as of September 2016. With the development of the

Reorder factor levels using names

佐手、 提交于 2019-11-29 16:12:01
I can reorder the levels of a factor using their indices like this factor(iris$Species,levels(iris$Species)[c(3:1)]) However if I try to reorder the same factor by name, it does not work: factor(iris$Species,levels(iris$Species)[c("virginica", "versicolor", "setosa")]) Is there a way to reorder the levels of a factor using their names? Why don't you use the basic variant with giving new level names: factor(iris$Species, levels=c("virginica", "versicolor", "setosa")) Be sure to list all level names, though. Otherwise, you will end up with NA values. However, for completeness: If you rely on the

Recoding dummy variable to ordered factor

前提是你 提交于 2019-11-29 14:58:26
I need some help with coding factors for a logistic regression. What I have are six dummy variables representing income brackets. I want to convert these into a single ordered factor for use in a logistic regression. My data frame looks like: INC1 INC2 INC3 INC4 INC5 INC6 1 0 0 1 0 0 0 2 NA NA NA NA NA NA 3 0 0 0 0 0 1 4 0 0 0 0 0 1 5 0 0 1 0 0 0 6 0 0 0 1 0 0 7 0 0 1 0 0 0 8 0 0 0 1 0 0 What I want it to look like: INC 1 INC3 2 NA 3 INC6 4 INC6 5 INC3 6 INC4 7 INC3 8 INC4 This must be a common (and simple) operation, but my searches have not turned up a concise answer for how to perform this

How can i convert a factor column that contains decimal numbers to numeric?

梦想的初衷 提交于 2019-11-29 04:36:11
I have a csv file and when i use this command SOLK<-read.table('Book1.csv',header=TRUE,sep=';') I get this output > SOLK Time Close Volume 1 10:27:03,6 0,99 1000 2 10:32:58,4 0,98 100 3 10:34:16,9 0,98 600 4 10:35:46,0 0,97 500 5 10:35:50,6 0,96 50 6 10:35:50,6 0,96 1000 7 10:36:10,3 0,95 40 8 10:36:10,3 0,95 100 9 10:36:10,4 0,95 500 10 10:36:10,4 0,95 100 . . . . . . . . . . . . 285 17:09:44,0 0,96 404 the str(SOLK) outcomes this 'data.frame': 285 obs. of 3 variables: $ Time : Factor w/ 174 levels "10:27:03,6","10:32:58,4",..: 1 2 3 4 5 5 6 6 7 7 ... $ Close : Factor w/ 8 levels "0,92","0,93

R: factor levels, recode rest to 'other'

家住魔仙堡 提交于 2019-11-29 01:53:35
I use factors somewhat infrequently and generally find them comprehensible, but I often am fuzzy about the details for specific operations. Currently, I am coding/collapsing categories with few observations into "other" and am looking for a quick way to do that--I have a perhaps 20 levels of a variable, but am interested in collapsing a bunch of them to one. data <- data.frame(employees = sample.int(1000,500), naics = sample(c('621111','621112','621210','621310','621320','621330','621340','621391','621399','621410','621420','621491','621492','621493','621498','621511','621512','621610','621910