r-factor | 易学教程

How to fill NAs with LOCF by factors in data frame, split by country

阅读更多关于 How to fill NAs with LOCF by factors in data frame, split by country

I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values: country value AUT NA AUT 5 AUT NA AUT NA GER NA GER NA GER 7 GER NA GER NA The following generates the above data frame: data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA)) Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf in the zoo package. data <- na.locf(data) would give me the

Unseen factor levels when appending new records with unseen string values to a dataframe, cause Warning and result in NA

阅读更多关于 Unseen factor levels when appending new records with unseen string values to a dataframe, cause Warning and result in NA

问题 I have a dataframe (14.5K rows by 15 columns) containing billing data from 2001 to 2007. I append new 2008 data to it with: alltime <- rbind(alltime,all2008) Unfortunately that generates a warning: > Warning message: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : invalid factor level, NAs generated My guess is that there are some new patients whose names were not in the previous dataframe and therefore it would not know what level to give those. Similarly new unseen

How do I get discrete factor levels to be treated as continuous?

阅读更多关于 How do I get discrete factor levels to be treated as continuous?

I have a data frame with columns initially labeled arbitrarily. Later on, I want to change these levels to numerical values. The following script illustrates the problem. library(ggplot2) library(reshape2) m <- 10 n <- 6 nam <- list(c(),letters[1:n]) var <- as.data.frame(matrix(sort(rnorm(m*n)),m,n,F,nam)) dtf <- data.frame(t=seq(m)*0.1, var) mdf <- melt(dtf, id=c('t')) xs <- c(0.25,0.5,1.0,2.0,4.0,8.0) levels(mdf$variable) <- xs g <- ggplot(mdf,aes(variable,value,group=variable,colour=t)) g + geom_point() + #scale_x_continuous() + opts() This plot is produced. The 'variable' quantities are

geom_boxplot() from ggplot2 : forcing an empty level to appear

阅读更多关于 geom_boxplot() from ggplot2 : forcing an empty level to appear

I can't find a way to ask ggplot2 to show an empty level in a boxplot without imputing my dataframe with actual missing values. Here is reproducible code : # fake data dftest <- expand.grid(time=1:10,measure=1:50) dftest$value <- rnorm(dim(dftest)[1],3+0.1*dftest$time,1) # and let's suppose we didn't observe anything at time 2 # doesn't work even when forcing with factor(..., levels=...) p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value)) p + geom_boxplot() # only way seems to have at least one actual missing value in the dataframe dftest2 <- dftest dftest2

Recoding dummy variable to ordered factor

阅读更多关于 Recoding dummy variable to ordered factor

问题 I need some help with coding factors for a logistic regression. What I have are six dummy variables representing income brackets. I want to convert these into a single ordered factor for use in a logistic regression. My data frame looks like: INC1 INC2 INC3 INC4 INC5 INC6 1 0 0 1 0 0 0 2 NA NA NA NA NA NA 3 0 0 0 0 0 1 4 0 0 0 0 0 1 5 0 0 1 0 0 0 6 0 0 0 1 0 0 7 0 0 1 0 0 0 8 0 0 0 1 0 0 What I want it to look like: INC 1 INC3 2 NA 3 INC6 4 INC6 5 INC3 6 INC4 7 INC3 8 INC4 This must be a

Sort a factor based on value in one or more other columns

阅读更多关于 Sort a factor based on value in one or more other columns

I've looked through a number of posts about ordering factors, but haven't quite found a match for my problem. Unfortunately, my knowledge of R is still pretty rudimentary. I have a subset of an archaeological artifact catalog that I'm working with. I'm trying to cross-tabulate diagnostic historical artifact types and site testing locations. Easy enough with ddply or tapply. My problem is that I want to sort the artifact types (a factor) by their mean diagnostic date (number/year), and I keep getting them alphabetically. I know I need to make it an ordered factor, but can't figure out how to

ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

阅读更多关于 ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

I would like to create boxplots of multiple variables for groups of a continuous x-variable. The boxplots should be arranged next to each other for each group of x. The data looks like this: require (ggplot2) require (plyr) library(reshape2) set.seed(1234) x <- rnorm(100) y.1 <- rnorm(100) y.2 <- rnorm(100) y.3 <- rnorm(100) y.4 <- rnorm(100) df <- as.data.frame(cbind(x,y.1,y.2,y.3,y.4)) which I then melted dfmelt <- melt(df, measure.vars=2:5) The facet_wrap as shown in this solution ( Multiple plots by factor in ggplot (facets) ) gives me out each variable in an individual plot, but I would

Convert Factor to Date/Time in R

阅读更多关于 Convert Factor to Date/Time in R

This is the information contained within my dataframe: ## minuteofday: factor w/ 89501 levels "2013-06-01 08:07:00",... ## dDdt: num 7.8564 2.318 ... ## minutes: POSIXlt, format: NA NA NA I need to convert the minute of day column to a date/time format: minuteave$minutes <- as.POSIXlt(as.character(minuteave$minuteofday), format="%m/%d/%Y %H:%M:%S") I've tried as.POSIXlt , as.POSIXct and as.Date . None of which worked. Does anyone have ANY thoughts. The goal is to plot minutes vs. dDdt, but it won't let me plot in the specified time period that I want to as a factor. I have no idea what to try

Sort data frame column by factor

阅读更多关于 Sort data frame column by factor

Supose I have a data frame with 3 columns ( name , y , sex ) where name is character, y is a numeric value and sex is a factor. sex<-c("M","M","F","M","F","M","M","M","F") x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET") name<-as.character(x) y<-rnorm(9,8,1) score<-data.frame(x,y,sex) score name y sex 1 MARK 6.767086 M 2 TOM 7.613928 M 3 SUSAN 7.447405 F 4 LARRY 8.040069 M 5 EMMA 8.306875 F 6 LEONARD 8.697268 M 7 TIM 10.385221 M 8 MATT 7.497702 M 9 VIOLET 10.177969 F If I wanted to order it by y I would use: score[order(score$y),] x y sex 1 MARK 6.767086 M 3 SUSAN 7

Change stringsAsFactors settings for data.frame

阅读更多关于 Change stringsAsFactors settings for data.frame

I have a function in which I define a data.frame that I use loops to fill with data. At some point I get the Warning message: Warning messages: 1: In [<-.factor ( *tmp* , iseq, value = "CHANGE") : invalid factor level, NAs generated Therefore, when I define my data.frame, I'd like to set the option stringsAsFactors to FALSE but I don't understand how to do it. I have tried: DataFrame = data.frame(stringsAsFactors=FALSE) and also: options(stringsAsFactors=FALSE) What is the correct way to set the stringsAsFactors option? MvG It depends on how you fill your data frame, for which you haven't

订阅 r-factor