data-wrangling | 易学教程

R Help converting factor data from long to wide and assigning logical value

阅读更多关于 R Help converting factor data from long to wide and assigning logical value

问题 I have data in long format as seen below: Data: id code 1 EP 2 EP 3 EP 4 UM 5 UM 1 UM 2 UM 10 UM 6 BZ 7 BZ 14 BZ 2 BZ 8 TVOL 9 TVOL 16 TVOL 10 NW 11 NW 7 NW 12 SM 13 SM 3 SM 14 GS 15 GS 1 GS 2 GS 9 GS I would like to create a wide dataframe with each "code" as its own column marked TRUE/FALSE depending on whether there's an associated "id" as seen in the minimal example below: id code.EP code.UM code.BZ code.TVOL code.NW code.SM code.GS 1 TRUE TRUE FALSE FALSE FALSE FALSE TRUE 2 TRUE FALSE

Is there a way to build a pairwise data frame based on shared values in another data frame in R?

阅读更多关于 Is there a way to build a pairwise data frame based on shared values in another data frame in R?

问题 For example DF1 is: Id1 Id2 1 10 2 10 3 7 4 7 5 10 And want DF2: Id1 Id2 1 2 1 5 2 5 3 4 The data frame DF2 is a pairwise set of values from Id1 column in DF1 that shared a common value in Id2 of DF1. My attempt: temp <- do.call("cbind", split(DF1, rep(c(1,2), length.out = nrow(DF1)))) (DF2 <- temp %>% select("1.Id1", "2.Id2")) But this does not generate a pairwise data frame: Id1 Id2 1 2 3 4 回答1: Here is another tidyverse method using full_join . library(dplyr) library(purrr) dat2 <- dat %>%

Percentage of factor levels by group in R [duplicate]

阅读更多关于 Percentage of factor levels by group in R [duplicate]

问题 This question already has answers here : Relative frequencies / proportions with dplyr (9 answers) Extend contigency table with proportions (percentages) (6 answers) Closed 7 months ago . I am trying to calculate the percentage of different levels of a factor within a group. I have nested data and would like to see the percentage of schools in each country is a private schools (factor with 2 levels). However, I cannot figure out how to do that. # my data: CNT <- c("A", "A", "A", "A", "A", "B"

Preparing an aggregate dataframe for publication

阅读更多关于 Preparing an aggregate dataframe for publication

问题 I have a Pandas aggregate dataframe like this: import pandas as pd agg_df = pd.DataFrame({'v1':['item', 'item', 'item', 'item', 'location', 'status', 'status'], 'v2' :['bed', 'lamp', 'candle', 'chair', 'home', 'new', 'used' ], 'count':['2', '2', '2', '1', '7', '4', '3' ]}) agg_df I want to prepare it for academic publication and I need a new dataframe like this: # item bed 2 # lamp 2 # candle 2 # chair 1 # location home 7 # status new 4 # used 3 How can I create such a dataframe? 回答1: For

Preparing an aggregate dataframe for publication

阅读更多关于 Preparing an aggregate dataframe for publication

accelerating code execution in R to provide counting of all possible combinations of events belonging to certain ID

阅读更多关于 accelerating code execution in R to provide counting of all possible combinations of events belonging to certain ID

问题 I have a data set that has 3 columns , (ID , D , AE). sample=data.frame( ID=c(1,1,1,2,2,2), D=c('a','b','c','a','c','c'), AE=c('m','x','w','y','m','f') ) I want to count the number of IDs where all possible combinations that consist of any combination between any two drugs within a certain ID and the AEs corresponding to that ID. please see the image to understand exactly what i mean enter image description here. Someone could help me with a code that worked perfectly on the small dataset

Check if values of one dataframe exist in another dataframe in exact order

阅读更多关于 Check if values of one dataframe exist in another dataframe in exact order

来源： https://stackoverflow.com/questions/63078838/check-if-values-of-one-dataframe-exist-in-another-dataframe-in-exact-order

Check if values of one dataframe exist in another dataframe in exact order

阅读更多关于 Check if values of one dataframe exist in another dataframe in exact order

来源： https://stackoverflow.com/questions/63078838/check-if-values-of-one-dataframe-exist-in-another-dataframe-in-exact-order

How can I concatenate the rows in a pyspark dataframe with multiple columns using groupby and aggregate

阅读更多关于 How can I concatenate the rows in a pyspark dataframe with multiple columns using groupby and aggregate

问题 I have a pyspark dataframe with multiple columns. For example the one below. from pyspark.sql import Row l = [('Jack',"a","p"),('Jack',"b","q"),('Bell',"c","r"),('Bell',"d","s")] rdd = sc.parallelize(l) score_rdd = rdd.map(lambda x: Row(name=x[0], letters1=x[1], letters2=x[2])) score_card = sqlContext.createDataFrame(score_rdd) +----+--------+--------+ |name|letters1|letters2| +----+--------+--------+ |Jack| a| p| |Jack| b| q| |Bell| c| r| |Bell| d| s| +----+--------+--------+ Now I want to

Reshaping data.frame with a by-group where id variable repeats [duplicate]

阅读更多关于 Reshaping data.frame with a by-group where id variable repeats [duplicate]

问题 This question already has answers here : How to reshape data from long to wide format (11 answers) Closed 21 days ago . I want to reshape/ rearrange a dataset, that is stored as a data.frame with 2 columns: id (non-unique, i.e. can repeat over several rows) --> stored as character value --> stored as numeric value (range 1:3) Sample data: id <- as.character(1001:1003) val_list <- data.frame(sample(1:3, size=12, replace=TRUE)) have <- data.frame(cbind(rep(id, 4), val_list)) colnames(have) <- c