data-manipulation | 易学教程

R create unique combinations of IDs in a given class (all combinations not getting created)

阅读更多关于 R create unique combinations of IDs in a given class (all combinations not getting created)

问题 Hi I have a dataset like the following: library(gtools) z=c(120,122,124,126) ID=as.character(c(1,2,3,4,5,6,7,8,9,10,11,12)) IQ=c(120.5,123,125,122.5,122.1,121.7,123.2,123.7,120.7,122.3,120.1,122) Section=c("A","A","B","B","A","B","B","A","B","A","B","B") zz=data.frame(ID,IQ,Section) I am trying to create unique combinations of the IDs if the ID's lie in the given classes: 120-122, 122-124 and 124-126. combin_list=list("list",length(z)) Initial_IQ=0 jj=1 for (IQ1 in z){ obs_list=zz[(zz$IQ<IQ1

Getting nested elements from a list

阅读更多关于 Getting nested elements from a list

问题 I am trying to get nested elements from a list. I can extract the elements using: unlist(pull_lists[[i]]$content[[n]]['sha']) , however, it seems that I cannot insert them in a nested list. I have extracted a single element of the list in a gist, which creates the reproducible example below. Here is what I have so far: library("devtools") pull_lists <- list(source_gist("669dfeccad88cd4348f7")) sha_list <- list() for (i in length(pull_lists)){ for (n in length(pull_lists[[i]]$content)){ sha

How to identify sequences within each cluster?

阅读更多关于 How to identify sequences within each cluster?

问题 Using the biofam dataset that comes as part of TraMineR : library(TraMineR) data(biofam) lab <- c("P","L","M","LM","C","LC","LMC","D") biofam.seq <- seqdef(biofam[,10:25], states=lab) head(biofam.seq) Sequence 1167 P-P-P-P-P-P-P-P-P-LM-LMC-LMC-LMC-LMC-LMC-LMC 514 P-L-L-L-L-L-L-L-L-L-L-LM-LMC-LMC-LMC-LMC 1013 P-P-P-P-P-P-P-L-L-L-L-L-LM-LMC-LMC-LMC 275 P-P-P-P-P-L-L-L-L-L-L-L-L-L-L-L 2580 P-P-P-P-P-L-L-L-L-L-L-L-L-LMC-LMC-LMC 773 P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P I can perform a cluster analysis:

Assign an ID based on two columns R

阅读更多关于 Assign an ID based on two columns R

问题 I have some data that looks like this. I want to assign an 'ID' by email and wk_id. row_num email wk_id 1 aaaa 1/4/15 2 aaaa 1/11/15 3 aaaa 1/25/15 4 bbbb 6/29/14 5 bbbb 9/7/14 6 cccc 11/16/14 7 cccc 11/30/14 8 cccc 12/7/14 9 cccc 12/14/14 10 cccc 12/21/14 11 cccc 12/28/14 12 cccc 1/4/15 13 cccc 1/25/15 I want the data to look like this. row_num email wk_id ID 1 aaaa 1/4/15 1 2 aaaa 1/11/15 2 3 aaaa 1/25/15 3 4 bbbb 6/29/14 1 5 bbbb 9/7/14 2 6 cccc 11/16/14 1 7 cccc 11/30/14 2 8 cccc 12/7/14

Replace accents in string vector with Latex code

阅读更多关于 Replace accents in string vector with Latex code

问题 Define: df <- data.frame(name=c("México","Michoacán"),dat=c(1,2)) s.t. > df name dat 1 México 1 2 Michoacán 2 When I print this table to a .tex file using xtable the accented characters get garbled, which is no surprise. I would like to replace accents with proper Latex formatting e.g.: > df name dat 1 M\'{e}xico 1 2 Michoac\'{a}n 2 Please note in real dataset there are many different names with different accented letters but all with same type of accent (i.e. foward-slash), so the only thing

Comp. Efficent way of resetting sequence if condition met ( R )

阅读更多关于 Comp. Efficent way of resetting sequence if condition met ( R )

问题 Problem: I want to reset a (1,2) sequence if condition is met (subject changes). I have for and if loops that will do this but, unsurprisingly, that method is very slow. Any suggestions (e.g., involving the apply family) for a more efficent method? Current: subj odd_even a a a b b b b c c c Goal: subj odd_even a 1 a 2 a 1 b 1 b 2 b 1 b 2 c 1 c 2 c 1 df = data.frame( subj = c("a","a","a","b","b","b","b", "c","c","c"), odd_even = "" ) 回答1: I like the sequence function for this: df$odd_even <-

Filling “implied missing values” in a data frame that has varying observations per time unit

阅读更多关于 Filling “implied missing values” in a data frame that has varying observations per time unit

问题 I have a large dataset with spatiotemporal data. Each set of coordinates are associated with an id (player id in a computer game). Unfortunately the coordinates for each id aren't logged at every time unit. If a reading is not available for a specific id at x time stamp, then that row was entirely omitted from the dataset rather than logged as NA. I would like to have the same exact amount of observations per time unit as there are unique ids (i.e. inserting "implied missing NAs"). On time

Appending a row of sums for each level of a factor

阅读更多关于 Appending a row of sums for each level of a factor

问题 I want to append a row of sums for each Reg like this Reg Res Pop 1 Total 1000915 2 A Urban 500414 3 A Rural 500501 4 Total 999938 5 B Urban 499922 6 B Rural 500016 7 Total 1000912 8 C Urban 501638 9 C Rural 499274 10 Total 999629 11 D Urban 499804 12 D Rural 499825 13 Total 1000303 14 E Urban 499917 15 E Rural 500386 MWE is below: Reg <- rep(LETTERS[1:5], each = 2) Res <- rep(c("Urban", "Rural"), times = 5) set.seed(12345) Pop <- rpois(n = 10, lambda = 500000) df <- data.frame(Reg, Res, Pop)

Working with dataframes in a list: Drop variables, add new ones

阅读更多关于 Working with dataframes in a list: Drop variables, add new ones

问题 Define a list dats with two dataframes, df1 and df2 dats <- list( df1 = data.frame(a=sample(1:3), b = sample(11:13)), df2 = data.frame(a=sample(1:3), b = sample(11:13))) > dats $df1 a b 1 2 12 2 3 11 3 1 13 $df2 a b 1 3 13 2 2 11 3 1 12 I would like to drop variable a in each data frame. Next I would like to add a variable with the id of each dataframe from an external dataframe, like: ids <- data.frame(id=c("id1","id2"),df=c("df1","df2")) > ids id df 1 id1 df1 2 id2 df2 To drop unnecessary

insert missing category for each group in pandas dataframe

阅读更多关于 insert missing category for each group in pandas dataframe

问题 I need to insert missing category for each group, here is an example: import pandas as pd import numpy as np df = pd.DataFrame({ "group":[1,1,1 ,2,2], "cat": ['a', 'b', 'c', 'a', 'c'] , "value": range(5), "value2": np.array(range(5))* 2}) df # test dataframe cat group value value2 a 1 0 0 b 1 1 2 c 1 2 4 a 2 3 6 c 2 4 8 say I have some categories = ['a', 'b', 'c', 'd'] . if cat column does not contain a category from the list, I would like to insert a row, for each group with value 0 . how to