r-factor

How to fill NAs with LOCF by factors in data frame, split by country

我怕爱的太早我们不能终老 提交于 2019-11-28 18:48:53
I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values: country value AUT NA AUT 5 AUT NA AUT NA GER NA GER NA GER 7 GER NA GER NA The following generates the above data frame: data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA)) Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf in the zoo package. data <- na.locf(data) would give me the

Unseen factor levels when appending new records with unseen string values to a dataframe, cause Warning and result in NA

微笑、不失礼 提交于 2019-11-28 15:28:40
问题 I have a dataframe (14.5K rows by 15 columns) containing billing data from 2001 to 2007. I append new 2008 data to it with: alltime <- rbind(alltime,all2008) Unfortunately that generates a warning: > Warning message: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : invalid factor level, NAs generated My guess is that there are some new patients whose names were not in the previous dataframe and therefore it would not know what level to give those. Similarly new unseen

How do I get discrete factor levels to be treated as continuous?

筅森魡賤 提交于 2019-11-28 10:08:39
I have a data frame with columns initially labeled arbitrarily. Later on, I want to change these levels to numerical values. The following script illustrates the problem. library(ggplot2) library(reshape2) m <- 10 n <- 6 nam <- list(c(),letters[1:n]) var <- as.data.frame(matrix(sort(rnorm(m*n)),m,n,F,nam)) dtf <- data.frame(t=seq(m)*0.1, var) mdf <- melt(dtf, id=c('t')) xs <- c(0.25,0.5,1.0,2.0,4.0,8.0) levels(mdf$variable) <- xs g <- ggplot(mdf,aes(variable,value,group=variable,colour=t)) g + geom_point() + #scale_x_continuous() + opts() This plot is produced. The 'variable' quantities are

geom_boxplot() from ggplot2 : forcing an empty level to appear

此生再无相见时 提交于 2019-11-28 10:05:31
I can't find a way to ask ggplot2 to show an empty level in a boxplot without imputing my dataframe with actual missing values. Here is reproducible code : # fake data dftest <- expand.grid(time=1:10,measure=1:50) dftest$value <- rnorm(dim(dftest)[1],3+0.1*dftest$time,1) # and let's suppose we didn't observe anything at time 2 # doesn't work even when forcing with factor(..., levels=...) p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value)) p + geom_boxplot() # only way seems to have at least one actual missing value in the dataframe dftest2 <- dftest dftest2

Recoding dummy variable to ordered factor

妖精的绣舞 提交于 2019-11-28 09:18:41
问题 I need some help with coding factors for a logistic regression. What I have are six dummy variables representing income brackets. I want to convert these into a single ordered factor for use in a logistic regression. My data frame looks like: INC1 INC2 INC3 INC4 INC5 INC6 1 0 0 1 0 0 0 2 NA NA NA NA NA NA 3 0 0 0 0 0 1 4 0 0 0 0 0 1 5 0 0 1 0 0 0 6 0 0 0 1 0 0 7 0 0 1 0 0 0 8 0 0 0 1 0 0 What I want it to look like: INC 1 INC3 2 NA 3 INC6 4 INC6 5 INC3 6 INC4 7 INC3 8 INC4 This must be a

Sort a factor based on value in one or more other columns

旧时模样 提交于 2019-11-28 07:15:34
I've looked through a number of posts about ordering factors, but haven't quite found a match for my problem. Unfortunately, my knowledge of R is still pretty rudimentary. I have a subset of an archaeological artifact catalog that I'm working with. I'm trying to cross-tabulate diagnostic historical artifact types and site testing locations. Easy enough with ddply or tapply. My problem is that I want to sort the artifact types (a factor) by their mean diagnostic date (number/year), and I keep getting them alphabetically. I know I need to make it an ordered factor, but can't figure out how to

ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

我与影子孤独终老i 提交于 2019-11-28 04:35:17
I would like to create boxplots of multiple variables for groups of a continuous x-variable. The boxplots should be arranged next to each other for each group of x. The data looks like this: require (ggplot2) require (plyr) library(reshape2) set.seed(1234) x <- rnorm(100) y.1 <- rnorm(100) y.2 <- rnorm(100) y.3 <- rnorm(100) y.4 <- rnorm(100) df <- as.data.frame(cbind(x,y.1,y.2,y.3,y.4)) which I then melted dfmelt <- melt(df, measure.vars=2:5) The facet_wrap as shown in this solution ( Multiple plots by factor in ggplot (facets) ) gives me out each variable in an individual plot, but I would

Convert Factor to Date/Time in R

自闭症网瘾萝莉.ら 提交于 2019-11-27 22:13:50
This is the information contained within my dataframe: ## minuteofday: factor w/ 89501 levels "2013-06-01 08:07:00",... ## dDdt: num 7.8564 2.318 ... ## minutes: POSIXlt, format: NA NA NA I need to convert the minute of day column to a date/time format: minuteave$minutes <- as.POSIXlt(as.character(minuteave$minuteofday), format="%m/%d/%Y %H:%M:%S") I've tried as.POSIXlt , as.POSIXct and as.Date . None of which worked. Does anyone have ANY thoughts. The goal is to plot minutes vs. dDdt, but it won't let me plot in the specified time period that I want to as a factor. I have no idea what to try

Sort data frame column by factor

旧城冷巷雨未停 提交于 2019-11-27 22:07:44
Supose I have a data frame with 3 columns ( name , y , sex ) where name is character, y is a numeric value and sex is a factor. sex<-c("M","M","F","M","F","M","M","M","F") x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET") name<-as.character(x) y<-rnorm(9,8,1) score<-data.frame(x,y,sex) score name y sex 1 MARK 6.767086 M 2 TOM 7.613928 M 3 SUSAN 7.447405 F 4 LARRY 8.040069 M 5 EMMA 8.306875 F 6 LEONARD 8.697268 M 7 TIM 10.385221 M 8 MATT 7.497702 M 9 VIOLET 10.177969 F If I wanted to order it by y I would use: score[order(score$y),] x y sex 1 MARK 6.767086 M 3 SUSAN 7

Change stringsAsFactors settings for data.frame

天涯浪子 提交于 2019-11-27 19:43:34
I have a function in which I define a data.frame that I use loops to fill with data. At some point I get the Warning message: Warning messages: 1: In [<-.factor ( *tmp* , iseq, value = "CHANGE") : invalid factor level, NAs generated Therefore, when I define my data.frame, I'd like to set the option stringsAsFactors to FALSE but I don't understand how to do it. I have tried: DataFrame = data.frame(stringsAsFactors=FALSE) and also: options(stringsAsFactors=FALSE) What is the correct way to set the stringsAsFactors option? MvG It depends on how you fill your data frame, for which you haven't