data-manipulation

Using R to insert a value for missing data with a value from another data frame

我的未来我决定 提交于 2020-01-29 05:28:24
问题 All, I have a question that I fear might be too pedestrian to ask here, but searching for it elsewhere is leading me astray. I may not be using the right search terms. I have a panel data frame (country-year) in R with some missing values on a given variable. I'm trying to impute them with the value from another vector in another data frame. Here's an illustration of what I am trying to do. Assume Data is the data frame of interest, which has missing values on a given vector that I'm trying

Reordering columns in data frame once again

纵然是瞬间 提交于 2020-01-23 02:43:45
问题 I want to re-order my columns in my data frame, but what I found so far is not satisfactory. My dataframe looks like: cnt <-as.factor(c("Country 1", "Country 2", "Country 3", "Country 1", "Country 2", "Country 3" )) bnk <-as.factor(c("bank 1", "bank 2", "bank 3", "bank 1", "bank 2", "bank 3" )) mayData <-data.frame(age=c(10,12,13,10,11,15), Country=cnt, Bank=bnk, q10=c(1,1,1,2,2,2),q11=c(1,1,1,2,2,2), q1=c(1,1,1,2,2,2), q9=c(1,1,1,2,2,2), q6=c(1,1,1,2,2,2), year=c(1950,1960,1970,1980,1990

R: create a data frame out of a rolling window

北城余情 提交于 2020-01-21 03:23:10
问题 Lets say I have a data frame with the following structure: DF <- data.frame(x = 0:4, y = 5:9) > DF x y 1 0 5 2 1 6 3 2 7 4 3 8 5 4 9 what is the most efficient way to turn 'DF' into a data frame with the following structure: w x y 1 0 5 1 1 6 2 1 6 2 2 7 3 2 7 3 3 8 4 3 8 4 4 9 Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields w x y 1 0 5 1 1 6 1 2 7 2 1 6 2 2 7 2 3 8 3 2 7 3 3 8 3 4 9 I am a bit stumped by

R: Combine columns based on different information in another column of a dataframe

放肆的年华 提交于 2020-01-17 04:44:06
问题 I'm trying to find an easier way for the following purpose of data manipulation. The dataframe is like this: "object" "Date_In" "Date_out" "label" "room" "test" "1" "LEU_A" 6 9 "Up" "11z" "c" "2" "LEU_A" 1 10 "Down" "14x" "c" "3" "LEU_B" 6 8 "Up" "11z" "a1" "4" "LEU_B" 10 13 "Down" "14x" "a1" "5" "ALL_A" 7 8 "Up" "11z" "c" "6" "ALL_A" 1 26 "Down" "1g" "c" "7" "CLMIA_A" 5 15 "Up" "11z" "a2" "8" "CLMIA_A" 10 10 "Down" "14x" "a2" "9" "CLMIA_A" 10 12 "Down" "13w" "a2" For all rows with "Up" label

Generating and ordering a variable simultaneously

风格不统一 提交于 2020-01-15 11:55:07
问题 I would like to avoid re-ordering the data to place the generated variable in the first column: sysuse auto, clear gen random = runiform() order random Is it possible to generate a variable and at the same time order it? The idea is to be able to directly observe the generated variable when I browse the data in the editor, which is not easy when I have several variables. 回答1: You can use the before() option: sysuse auto, clear generate random = runiform(), before(make) You can also further

How to create a column/index based on either of two conditions being met (to enable clustering of matched pairs within same dataframe)?

眉间皱痕 提交于 2020-01-14 18:52:32
问题 I have a large dataset of matched pairs (id1 and id2) and would like to create an index variable to enable me to merge these pairs into rows. As such, the first row would be index 1 and from then on the index will increase by 1, unless either id1 or id2 match any of the values in previous rows. Where this is the case, the previously attributed index should be applied. I have looked for weeks and most solutions seem to fall short of what I need. Here's some data to replicate what I have: id1 <

Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?

让人想犯罪 __ 提交于 2020-01-14 07:29:08
问题 I am having issues accessing data inside a dictionary. Sys: Macbook 2012 Python: Python 3.5.1 :: Continuum Analytics, Inc. I am working with a dask.dataframe created from a csv. Edit Question How I got to this point Assume I start out with a Pandas Series: df.Coordinates 130 {u'type': u'Point', u'coordinates': [-43.30175... 278 {u'type': u'Point', u'coordinates': [-51.17913... 425 {u'type': u'Point', u'coordinates': [-43.17986... 440 {u'type': u'Point', u'coordinates': [-51.16376... 877 {u

Faster equivalent to group_by %>% expand in R

[亡魂溺海] 提交于 2020-01-13 13:13:36
问题 I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this: ID Start_year 01 1999 02 2004 03 2015 04 2007 etc... I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and

Faster equivalent to group_by %>% expand in R

混江龙づ霸主 提交于 2020-01-13 13:12:20
问题 I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this: ID Start_year 01 1999 02 2004 03 2015 04 2007 etc... I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and

How to do Group By Rollup in R? (Like SQL)

爱⌒轻易说出口 提交于 2020-01-10 04:17:06
问题 I have a dataset and I want to perform something like Group By Rollup like we have in SQL for aggregate values. Below is a reproducible example. I know aggregate works really well as explained here but not a satisfactory fit for my case. year<- c('2016','2016','2016','2016','2017','2017','2017','2017') month<- c('1','1','1','1','2','2','2','2') region<- c('east','west','east','west','east','west','east','west') sales<- c(100,200,300,400,200,400,600,800) df<- data.frame(year,month,region,sales