reshape | 易学教程

Aggregate adjacent rows, ignoring certain columns

阅读更多关于 Aggregate adjacent rows, ignoring certain columns

问题 I have a df like below > head(df) OrderId Timestamp ErrorCode 1 3000000 1455594300434609920 NA 2 3000001 1455594300434614272 NA 3 3000000 1455594300440175104 0 4 3000001 1455594300440179712 0 5 3000002 1455594303468741120 NA 6 3000002 1455594303469326848 0 I need to collapse row in a way that output is something like below > head(df) OrderId Timestamp1 Timestamp2 ErrorCode Diff 3000000 1455594300434609920 1455594300440175104 0 3000001 1455594300434614272 1455594300440179712 0 3000002

reshape r dataframe to wide format

阅读更多关于 reshape r dataframe to wide format

问题 Is there a simple way to reshape this id date A Jan 2012 B Jan 2012 C Jan 2012 A Feb 2012 B Feb 2012 A Mar 2012 B MAr 2012 in id Jan 2012 Feb 2012 Mar 2012 A T T T B T T T C T F F dcast and reshape requires a aggregate function that I don't think I need (?) 回答1: Using dcast as you suggest... # Please provide reproducible data next time! set.seed(123) dt <- data.frame( id = rep(c("A","B","C"),3 ), date = sample( month.name[1:3] , 9 , repl = TRUE ) , stringsAsFactors = FALSE ) # id date #1 A

Speed up reshape/ not use reshape Matlab

阅读更多关于 Speed up reshape/ not use reshape Matlab

问题 I have this operation which is called multiple times: longRowVector; matrix = reshape(longRowVector, n, n)'; answer = matrix(:); This operation using reshape is slow. Is there a way to get to answer without using reshape. 回答1: There is no easy way to speed that up. if n exceeds a certain number (defined by your relevant cache size), the way in which the memory accesses will be ordered during the transpose operator. The cost is actually create in the transpose operation. Below i plot this cost

Combine two data frames with different number of rows in R [duplicate]

阅读更多关于 Combine two data frames with different number of rows in R [duplicate]

问题 This question already has answers here : How to join (merge) data frames (inner, outer, left, right) (13 answers) Closed 3 years ago . I have two data frames, link and body: link is like this: wpt ID 1 1235 mediate 4562 mediate 0928 2 6351 3 3826 mediate 0835 body is like this: wpt fuel distance 1 2221 53927 2 4821 48261 3 8362 47151 The output i expected is like this: wpt fuel distance ID 1 2221 53927 1235 mediate NA NA 4562 mediate NA NA 0928 2 4821 48261 6351 3 8362 47151 3826 mediate NA

How to find common variables in a list of datasets & reshape them in R?

阅读更多关于 How to find common variables in a list of datasets & reshape them in R?

问题 setwd("C:\\Users\\DATA") temp = list.files(pattern="*.dta") for (i in 1:length(temp)) assign(temp[i], read.dta13(temp[i], nonint.factors = TRUE)) grep(pattern="_m", temp, value=TRUE) Here I create a list of my datasets and read them into R, I then attempt to use grep in order to find all variable names with pattern _m, obviously this doesn't work because this simply returns all filenames with pattern _m. So essentially what I want, is my code to loop through the list of databases, find

Gather connected IDs across different rows of data frame

阅读更多关于 Gather connected IDs across different rows of data frame

问题 Given an R data frame like this: DF.a <- data.frame(ID1 = c("A","B","C","D","E","F","G","H"), ID2 = c("D",NA,"G",NA,NA,NA,"H",NA), ID3 = c("F",NA,NA,NA,NA,NA,NA,NA)) > DF.a ID1 ID2 ID3 1 A D F 2 B <NA> <NA> 3 C G <NA> 4 D <NA> <NA> 5 E <NA> <NA> 6 F <NA> <NA> 7 G H <NA> 8 H <NA> <NA> I would like to simplify/reshape it into the following: DF.b <- data.frame(ID1 = c("A","B","C","E"), ID2 = c("D",NA,"G",NA), ID3 = c("F",NA,"H",NA)) > DF.b ID1 ID2 ID3 1 A D F 2 B <NA> <NA> 3 C G H 4 E <NA> <NA>

Multiplying Mat matrices using reshape, Mat type issue in OpenCV

阅读更多关于 Multiplying Mat matrices using reshape, Mat type issue in OpenCV

问题 I'm trying to implement color conversion from RGB-LMS and LMS-RGB back and using reshape for multiplication matrix, following answer from this question : Fastest way to apply color matrix to RGB image using OpenCV 3.0? My ori Mat object is from an image with 3 channel (RGB), and I need to multiply them with matrix of 1 channel (lms), it seems like I have an issue with the matrix type. I've read reshape docs and questions related to this issue, like Issues multiplying Mat matrices, and I

Reshaping data in CSV to multiple columns

阅读更多关于 Reshaping data in CSV to multiple columns

问题 0 19 1 19 2 19 3 19 How can i change this above csv data in python to - 0 19 1 19 2 19 3 19 now i need help with reshaping my dataset which looks like this - 0 100 1 100 2 100 3 100 4 100 5 100 6 200 7 200 8 200 9 200 0 200 1 200 ..... I want to reshape my dataset in the following format - 0 100 1 100 2 100 3 100 4 100 5 100 .. 6 200 7 200 8 200 9 200 0 200 1 200 ... 回答1: from io import StringIO txt = """0 19 1 19 2 19 3 19 """ df = pd.read_csv(StringIO(txt),header=None,sep=' ') df=df.dropna

Broadcasting a 1D array to a particular dimension of a varying nD array via .reshape(generator)

阅读更多关于 Broadcasting a 1D array to a particular dimension of a varying nD array via .reshape(generator)

问题 I have a large matrix of the shape (2,2,2,...n) of nD dimensions, which often varies. However I am also receiving incoming data which is always a 1D array of shape (2,). Now I want to multiply my former matrix of nD dimensions with my 1D array via reshape... and I also have an 'index' of which dimensions I want to broadcast and modify in particular. Thus I'm doing the following (within a loop): matrix_nd *= array_1d.reshape(1 if i!=index else dimension for i, dimension in enumerate(matrix_nd

dcast changes content of dataframe

阅读更多关于 dcast changes content of dataframe

问题 I tried using the reshape package to reshape a dataframe I got, but when using it, numbers in the dataframe are changed which should not be. The dataframe contains several variables as well as multiple times these variables have been measured, for each person there are 6 rows, that is 6 times that person has been measured. Now I want to reshape the dataframe so there is only one row for each person instead of 6, that means every variable should be there 6 times (once for every measurement),