r-factor | 易学教程

Unexpected conversion to chars instead of factors in data frames and matrices

阅读更多关于 Unexpected conversion to chars instead of factors in data frames and matrices

问题 I am not a novice user of R, but the following is most confusing. I have a data frame (although the problem is equally present for matrices) of categorical variables taking the values +1/-1, which I'd like to convert into factors. mat <- matrix(sample(c(-1, +1), 16, replace = T), nrow = 4) mat <- data.frame(mat) However, using mat <- apply(mat, 2, factor) turns integers into characters instead of factors: > mat [,1] [,2] [,3] [,4] [1,] "-1" "1" "-1" "1" [2,] "-1" "-1" "-1" "-1" [3,] "-1" "1"

One of the factor's levels is an empty string; how to replace it with non-missing value?

阅读更多关于 One of the factor's levels is an empty string; how to replace it with non-missing value?

问题 Data frame AEbySOC contains two columns - factor SOC with character levels and integer count Count: > str(AEbySOC) 'data.frame': 19 obs. of 2 variables: $ SOC : Factor w/ 19 levels "","Blood and lymphatic system disorders",..: 1 2 3 4 5 6 7 8 9 10 ... $ Count: int 25 50 7 3 1 49 49 2 1 9 ... One of the levels of SOC is an empty character string: > l = levels(AEbySOC$SOC) > l[1] [1] "" I want to replace the value of this level by a non-empty string, say, "Not specified". This does not work: >

Converting Factor to Date in R

阅读更多关于 Converting Factor to Date in R

问题 I have a dataset imported from a large group of .csv file. The date imports as a factor, but the data is in the following format , 11, 4480, - 4570,NE, 12525,LB, , 10, , , , 0, 7:26A,26OC11, , 11, 7090, - 7290,NE, 5250,LB, , 9, , , , 0, 7:28A,26OC11, , 11, 5050, - 5065,NE, 50,LB, , 7, , , , 0, 7:31A,26OC11, , 12, 5440, - 5530,NE, 13225,LB, , 6, , , , 0, 8:10A,26OC11, , 12, 1020, - 1220,NE, 12020,LB, , 14, , , , 0, 8:12A,26OC11, , 12, 50, - 25,NE, 12040,LB, , 15, , , , 0, 8:13A,26OC11, 4 For

predict and model.matrix give different predicted means within levels of a factor variable

阅读更多关于 predict and model.matrix give different predicted means within levels of a factor variable

问题 This question arose as a result of another question posted here: non-conformable arguments error from lmer when trying to extract information from the model matrix When trying to obtain predicted means from an lmer model containing a factor variable, the output varies depending on how the factor variable is specified. I have a variable agegroup, which can be specified using the groups "Children <15 years", "Adults 15-49 years", "Elderly 50+ years" or "0-15y", "15-49y", "50+y". My choice

Using do.call factor to scale - resetting value error

阅读更多关于 Using do.call factor to scale - resetting value error

问题 This is an extension of the question that I asked here: Getting Factor Means into the dataset after calculation Now that I have basically normalized all of the stats that I am interested in using I want to search the data set for people that intersect with these. Thus I am searching the dataset like this: base3[((base3$ScaledAVG>2)&(base3$ScaledOBP>2)&(base3$ScaledK.AB<.20)),] looking for the players that have all three of those things true, yet when I run this it resets the Scaled K.AB value

How to separate a vector based on conditions in R

阅读更多关于 How to separate a vector based on conditions in R

问题 I have a dataframe with some event dates. I used difftime to compute the delay between each event, but now I want to create a factor with each first event. Here is my attempt : dataframe$delay.event.A = difftime(dataframe$dateA, dataframe$dateStart, units = "days") dataframe$delay.event.B = difftime(dataframe$dateB, dataframe$dateStart, units = "days") dataframe$delay.event.C = difftime(dataframe$dateC, dataframe$dateStart, units = "days") dataframe$delay.first.event = pmin.int(dataframe

How to convert a factor variable to numeric while preserving the numbers in R [duplicate]

阅读更多关于 How to convert a factor variable to numeric while preserving the numbers in R [duplicate]

问题 This question already has answers here : How to convert a factor to integer\numeric without loss of information? (8 answers) Closed 5 years ago . I wanted to merge to dataset by a variable ICPSR, but since the ICPSR was factor I had to change it to numeric variable. So I did as.numeric and after doing that, my ICPSR has been changed to a totally different values. I googled and found I need to use as.numeric(level(dv$ICPSR)). But it only turns out unique values not every value. So I was

R: “Binning” categorical variables

阅读更多关于 R: “Binning” categorical variables

问题 I have a data.frame which has 13 columns with factors. One of the columns contains credit rating data and has 54 different values: levels(TR_factor$crclscod) [1] "A" "A2" "AA" "B" "B2" "BA" "C" "C2" "C5" "CA" "CC" "CY" "D" [14] "D2" "D4" "D5" "DA" "E" "E2" "E4" "EA" "EC" "EF" "EM" "G" "GA" [27] "GY" "H" "I" "IF" "J" "JF" "K" "L" "M" "O" "P1" "TP" "U" [40] "U1" "V" "V1" "W" "Y" "Z" "Z1" "Z2" "Z4" "Z5" "ZA" "ZY" What I want is to "bin" those categories into something like levels(TR_factor

Strange behavior between functions cut and ifelse in R

阅读更多关于 Strange behavior between functions cut and ifelse in R

问题 I am working in R with a dataframe composed of a numeric variable and a character variable. My dataframe DF looks like this (I add the dput version in final part): a1 b1 1 a 10.15 2 a 25.10 3 a 32.40 4 a 56.70 5 a 89.02 6 b 90.50 7 b 78.53 8 b 98.12 9 b 34.30 10 b 99.75 In DF the variable a1 is a group variable and b1 is a numeric variable. Then the dilem appear. I want to create a new variable named c1 by using cut function and considering the group saved in a1 . For this reason I combine

Subset data frame to include only levels of one factor that have values in both levels of another factor

阅读更多关于 Subset data frame to include only levels of one factor that have values in both levels of another factor

问题 I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example: ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3") age <- rep(c("juvenile", "adult"), each=5) size <- rnorm(10) # e.g. a1 is measured 3 times, twice as a juvenile, once as an adult. d <- data.frame(ID, age, size) My goal is to subset that data frame by selecting the IDs that appear at least once as a