data-manipulation | 易学教程

Using R to insert a value for missing data with a value from another data frame

阅读更多关于 Using R to insert a value for missing data with a value from another data frame

问题 All, I have a question that I fear might be too pedestrian to ask here, but searching for it elsewhere is leading me astray. I may not be using the right search terms. I have a panel data frame (country-year) in R with some missing values on a given variable. I'm trying to impute them with the value from another vector in another data frame. Here's an illustration of what I am trying to do. Assume Data is the data frame of interest, which has missing values on a given vector that I'm trying

Reordering columns in data frame once again

阅读更多关于 Reordering columns in data frame once again

问题 I want to re-order my columns in my data frame, but what I found so far is not satisfactory. My dataframe looks like: cnt <-as.factor(c("Country 1", "Country 2", "Country 3", "Country 1", "Country 2", "Country 3" )) bnk <-as.factor(c("bank 1", "bank 2", "bank 3", "bank 1", "bank 2", "bank 3" )) mayData <-data.frame(age=c(10,12,13,10,11,15), Country=cnt, Bank=bnk, q10=c(1,1,1,2,2,2),q11=c(1,1,1,2,2,2), q1=c(1,1,1,2,2,2), q9=c(1,1,1,2,2,2), q6=c(1,1,1,2,2,2), year=c(1950,1960,1970,1980,1990

R: create a data frame out of a rolling window

阅读更多关于 R: create a data frame out of a rolling window

问题 Lets say I have a data frame with the following structure: DF <- data.frame(x = 0:4, y = 5:9) > DF x y 1 0 5 2 1 6 3 2 7 4 3 8 5 4 9 what is the most efficient way to turn 'DF' into a data frame with the following structure: w x y 1 0 5 1 1 6 2 1 6 2 2 7 3 2 7 3 3 8 4 3 8 4 4 9 Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields w x y 1 0 5 1 1 6 1 2 7 2 1 6 2 2 7 2 3 8 3 2 7 3 3 8 3 4 9 I am a bit stumped by

R: Combine columns based on different information in another column of a dataframe

阅读更多关于 R: Combine columns based on different information in another column of a dataframe

问题 I'm trying to find an easier way for the following purpose of data manipulation. The dataframe is like this: "object" "Date_In" "Date_out" "label" "room" "test" "1" "LEU_A" 6 9 "Up" "11z" "c" "2" "LEU_A" 1 10 "Down" "14x" "c" "3" "LEU_B" 6 8 "Up" "11z" "a1" "4" "LEU_B" 10 13 "Down" "14x" "a1" "5" "ALL_A" 7 8 "Up" "11z" "c" "6" "ALL_A" 1 26 "Down" "1g" "c" "7" "CLMIA_A" 5 15 "Up" "11z" "a2" "8" "CLMIA_A" 10 10 "Down" "14x" "a2" "9" "CLMIA_A" 10 12 "Down" "13w" "a2" For all rows with "Up" label

Generating and ordering a variable simultaneously

阅读更多关于 Generating and ordering a variable simultaneously

问题 I would like to avoid re-ordering the data to place the generated variable in the first column: sysuse auto, clear gen random = runiform() order random Is it possible to generate a variable and at the same time order it? The idea is to be able to directly observe the generated variable when I browse the data in the editor, which is not easy when I have several variables. 回答1: You can use the before() option: sysuse auto, clear generate random = runiform(), before(make) You can also further

How to create a column/index based on either of two conditions being met (to enable clustering of matched pairs within same dataframe)?

阅读更多关于 How to create a column/index based on either of two conditions being met (to enable clustering of matched pairs within same dataframe)?

问题 I have a large dataset of matched pairs (id1 and id2) and would like to create an index variable to enable me to merge these pairs into rows. As such, the first row would be index 1 and from then on the index will increase by 1, unless either id1 or id2 match any of the values in previous rows. Where this is the case, the previously attributed index should be applied. I have looked for weeks and most solutions seem to fall short of what I need. Here's some data to replicate what I have: id1 <

Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?

阅读更多关于 Convert string to dict, then access key:values??? How to access data in a for Python?

问题 I am having issues accessing data inside a dictionary. Sys: Macbook 2012 Python: Python 3.5.1 :: Continuum Analytics, Inc. I am working with a dask.dataframe created from a csv. Edit Question How I got to this point Assume I start out with a Pandas Series: df.Coordinates 130 {u'type': u'Point', u'coordinates': [-43.30175... 278 {u'type': u'Point', u'coordinates': [-51.17913... 425 {u'type': u'Point', u'coordinates': [-43.17986... 440 {u'type': u'Point', u'coordinates': [-51.16376... 877 {u

Faster equivalent to group_by %>% expand in R

阅读更多关于 Faster equivalent to group_by %>% expand in R

问题 I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this: ID Start_year 01 1999 02 2004 03 2015 04 2007 etc... I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and

Faster equivalent to group_by %>% expand in R

阅读更多关于 Faster equivalent to group_by %>% expand in R

How to do Group By Rollup in R? (Like SQL)

阅读更多关于 How to do Group By Rollup in R? (Like SQL)

问题 I have a dataset and I want to perform something like Group By Rollup like we have in SQL for aggregate values. Below is a reproducible example. I know aggregate works really well as explained here but not a satisfactory fit for my case. year<- c('2016','2016','2016','2016','2017','2017','2017','2017') month<- c('1','1','1','1','2','2','2','2') region<- c('east','west','east','west','east','west','east','west') sales<- c(100,200,300,400,200,400,600,800) df<- data.frame(year,month,region,sales