data-wrangling

R Help converting factor data from long to wide and assigning logical value

别来无恙 提交于 2021-02-17 05:54:28
问题 I have data in long format as seen below: Data: id code 1 EP 2 EP 3 EP 4 UM 5 UM 1 UM 2 UM 10 UM 6 BZ 7 BZ 14 BZ 2 BZ 8 TVOL 9 TVOL 16 TVOL 10 NW 11 NW 7 NW 12 SM 13 SM 3 SM 14 GS 15 GS 1 GS 2 GS 9 GS I would like to create a wide dataframe with each "code" as its own column marked TRUE/FALSE depending on whether there's an associated "id" as seen in the minimal example below: id code.EP code.UM code.BZ code.TVOL code.NW code.SM code.GS 1 TRUE TRUE FALSE FALSE FALSE FALSE TRUE 2 TRUE FALSE

Is there a way to build a pairwise data frame based on shared values in another data frame in R?

感情迁移 提交于 2021-02-16 18:24:27
问题 For example DF1 is: Id1 Id2 1 10 2 10 3 7 4 7 5 10 And want DF2: Id1 Id2 1 2 1 5 2 5 3 4 The data frame DF2 is a pairwise set of values from Id1 column in DF1 that shared a common value in Id2 of DF1. My attempt: temp <- do.call("cbind", split(DF1, rep(c(1,2), length.out = nrow(DF1)))) (DF2 <- temp %>% select("1.Id1", "2.Id2")) But this does not generate a pairwise data frame: Id1 Id2 1 2 3 4 回答1: Here is another tidyverse method using full_join . library(dplyr) library(purrr) dat2 <- dat %>%

Percentage of factor levels by group in R [duplicate]

我们两清 提交于 2021-02-08 10:25:10
问题 This question already has answers here : Relative frequencies / proportions with dplyr (9 answers) Extend contigency table with proportions (percentages) (6 answers) Closed 7 months ago . I am trying to calculate the percentage of different levels of a factor within a group. I have nested data and would like to see the percentage of schools in each country is a private schools (factor with 2 levels). However, I cannot figure out how to do that. # my data: CNT <- c("A", "A", "A", "A", "A", "B"

Preparing an aggregate dataframe for publication

◇◆丶佛笑我妖孽 提交于 2021-02-02 09:36:25
问题 I have a Pandas aggregate dataframe like this: import pandas as pd agg_df = pd.DataFrame({'v1':['item', 'item', 'item', 'item', 'location', 'status', 'status'], 'v2' :['bed', 'lamp', 'candle', 'chair', 'home', 'new', 'used' ], 'count':['2', '2', '2', '1', '7', '4', '3' ]}) agg_df I want to prepare it for academic publication and I need a new dataframe like this: # item bed 2 # lamp 2 # candle 2 # chair 1 # location home 7 # status new 4 # used 3 How can I create such a dataframe? 回答1: For

Preparing an aggregate dataframe for publication

那年仲夏 提交于 2021-02-02 09:35:54
问题 I have a Pandas aggregate dataframe like this: import pandas as pd agg_df = pd.DataFrame({'v1':['item', 'item', 'item', 'item', 'location', 'status', 'status'], 'v2' :['bed', 'lamp', 'candle', 'chair', 'home', 'new', 'used' ], 'count':['2', '2', '2', '1', '7', '4', '3' ]}) agg_df I want to prepare it for academic publication and I need a new dataframe like this: # item bed 2 # lamp 2 # candle 2 # chair 1 # location home 7 # status new 4 # used 3 How can I create such a dataframe? 回答1: For

accelerating code execution in R to provide counting of all possible combinations of events belonging to certain ID

戏子无情 提交于 2021-01-28 07:37:21
问题 I have a data set that has 3 columns , (ID , D , AE). sample=data.frame( ID=c(1,1,1,2,2,2), D=c('a','b','c','a','c','c'), AE=c('m','x','w','y','m','f') ) I want to count the number of IDs where all possible combinations that consist of any combination between any two drugs within a certain ID and the AEs corresponding to that ID. please see the image to understand exactly what i mean enter image description here. Someone could help me with a code that worked perfectly on the small dataset

How can I concatenate the rows in a pyspark dataframe with multiple columns using groupby and aggregate

ぃ、小莉子 提交于 2020-07-10 03:11:13
问题 I have a pyspark dataframe with multiple columns. For example the one below. from pyspark.sql import Row l = [('Jack',"a","p"),('Jack',"b","q"),('Bell',"c","r"),('Bell',"d","s")] rdd = sc.parallelize(l) score_rdd = rdd.map(lambda x: Row(name=x[0], letters1=x[1], letters2=x[2])) score_card = sqlContext.createDataFrame(score_rdd) +----+--------+--------+ |name|letters1|letters2| +----+--------+--------+ |Jack| a| p| |Jack| b| q| |Bell| c| r| |Bell| d| s| +----+--------+--------+ Now I want to

Reshaping data.frame with a by-group where id variable repeats [duplicate]

蹲街弑〆低调 提交于 2020-06-09 05:37:25
问题 This question already has answers here : How to reshape data from long to wide format (11 answers) Closed 21 days ago . I want to reshape/ rearrange a dataset, that is stored as a data.frame with 2 columns: id (non-unique, i.e. can repeat over several rows) --> stored as character value --> stored as numeric value (range 1:3) Sample data: id <- as.character(1001:1003) val_list <- data.frame(sample(1:3, size=12, replace=TRUE)) have <- data.frame(cbind(rep(id, 4), val_list)) colnames(have) <- c