r-factor

Unexpected conversion to chars instead of factors in data frames and matrices

非 Y 不嫁゛ 提交于 2019-12-13 21:28:29
问题 I am not a novice user of R, but the following is most confusing. I have a data frame (although the problem is equally present for matrices) of categorical variables taking the values +1/-1, which I'd like to convert into factors. mat <- matrix(sample(c(-1, +1), 16, replace = T), nrow = 4) mat <- data.frame(mat) However, using mat <- apply(mat, 2, factor) turns integers into characters instead of factors: > mat [,1] [,2] [,3] [,4] [1,] "-1" "1" "-1" "1" [2,] "-1" "-1" "-1" "-1" [3,] "-1" "1"

One of the factor's levels is an empty string; how to replace it with non-missing value?

倖福魔咒の 提交于 2019-12-13 13:17:13
问题 Data frame AEbySOC contains two columns - factor SOC with character levels and integer count Count: > str(AEbySOC) 'data.frame': 19 obs. of 2 variables: $ SOC : Factor w/ 19 levels "","Blood and lymphatic system disorders",..: 1 2 3 4 5 6 7 8 9 10 ... $ Count: int 25 50 7 3 1 49 49 2 1 9 ... One of the levels of SOC is an empty character string: > l = levels(AEbySOC$SOC) > l[1] [1] "" I want to replace the value of this level by a non-empty string, say, "Not specified". This does not work: >

Converting Factor to Date in R

久未见 提交于 2019-12-13 05:29:26
问题 I have a dataset imported from a large group of .csv file. The date imports as a factor, but the data is in the following format , 11, 4480, - 4570,NE, 12525,LB, , 10, , , , 0, 7:26A,26OC11, , 11, 7090, - 7290,NE, 5250,LB, , 9, , , , 0, 7:28A,26OC11, , 11, 5050, - 5065,NE, 50,LB, , 7, , , , 0, 7:31A,26OC11, , 12, 5440, - 5530,NE, 13225,LB, , 6, , , , 0, 8:10A,26OC11, , 12, 1020, - 1220,NE, 12020,LB, , 14, , , , 0, 8:12A,26OC11, , 12, 50, - 25,NE, 12040,LB, , 15, , , , 0, 8:13A,26OC11, 4 For

predict and model.matrix give different predicted means within levels of a factor variable

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-13 01:46:57
问题 This question arose as a result of another question posted here: non-conformable arguments error from lmer when trying to extract information from the model matrix When trying to obtain predicted means from an lmer model containing a factor variable, the output varies depending on how the factor variable is specified. I have a variable agegroup, which can be specified using the groups "Children <15 years", "Adults 15-49 years", "Elderly 50+ years" or "0-15y", "15-49y", "50+y". My choice

Using do.call factor to scale - resetting value error

梦想与她 提交于 2019-12-12 04:57:43
问题 This is an extension of the question that I asked here: Getting Factor Means into the dataset after calculation Now that I have basically normalized all of the stats that I am interested in using I want to search the data set for people that intersect with these. Thus I am searching the dataset like this: base3[((base3$ScaledAVG>2)&(base3$ScaledOBP>2)&(base3$ScaledK.AB<.20)),] looking for the players that have all three of those things true, yet when I run this it resets the Scaled K.AB value

How to separate a vector based on conditions in R

岁酱吖の 提交于 2019-12-11 06:58:20
问题 I have a dataframe with some event dates. I used difftime to compute the delay between each event, but now I want to create a factor with each first event. Here is my attempt : dataframe$delay.event.A = difftime(dataframe$dateA, dataframe$dateStart, units = "days") dataframe$delay.event.B = difftime(dataframe$dateB, dataframe$dateStart, units = "days") dataframe$delay.event.C = difftime(dataframe$dateC, dataframe$dateStart, units = "days") dataframe$delay.first.event = pmin.int(dataframe

How to convert a factor variable to numeric while preserving the numbers in R [duplicate]

霸气de小男生 提交于 2019-12-11 05:13:01
问题 This question already has answers here : How to convert a factor to integer\numeric without loss of information? (8 answers) Closed 5 years ago . I wanted to merge to dataset by a variable ICPSR, but since the ICPSR was factor I had to change it to numeric variable. So I did as.numeric and after doing that, my ICPSR has been changed to a totally different values. I googled and found I need to use as.numeric(level(dv$ICPSR)). But it only turns out unique values not every value. So I was

R: “Binning” categorical variables

做~自己de王妃 提交于 2019-12-10 21:46:26
问题 I have a data.frame which has 13 columns with factors. One of the columns contains credit rating data and has 54 different values: levels(TR_factor$crclscod) [1] "A" "A2" "AA" "B" "B2" "BA" "C" "C2" "C5" "CA" "CC" "CY" "D" [14] "D2" "D4" "D5" "DA" "E" "E2" "E4" "EA" "EC" "EF" "EM" "G" "GA" [27] "GY" "H" "I" "IF" "J" "JF" "K" "L" "M" "O" "P1" "TP" "U" [40] "U1" "V" "V1" "W" "Y" "Z" "Z1" "Z2" "Z4" "Z5" "ZA" "ZY" What I want is to "bin" those categories into something like levels(TR_factor

Strange behavior between functions cut and ifelse in R

旧街凉风 提交于 2019-12-10 21:03:58
问题 I am working in R with a dataframe composed of a numeric variable and a character variable. My dataframe DF looks like this (I add the dput version in final part): a1 b1 1 a 10.15 2 a 25.10 3 a 32.40 4 a 56.70 5 a 89.02 6 b 90.50 7 b 78.53 8 b 98.12 9 b 34.30 10 b 99.75 In DF the variable a1 is a group variable and b1 is a numeric variable. Then the dilem appear. I want to create a new variable named c1 by using cut function and considering the group saved in a1 . For this reason I combine

Subset data frame to include only levels of one factor that have values in both levels of another factor

心已入冬 提交于 2019-12-10 11:44:03
问题 I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example: ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3") age <- rep(c("juvenile", "adult"), each=5) size <- rnorm(10) # e.g. a1 is measured 3 times, twice as a juvenile, once as an adult. d <- data.frame(ID, age, size) My goal is to subset that data frame by selecting the IDs that appear at least once as a