data-manipulation

Postgres: convert single row to multiple rows (unpivot)

孤人 提交于 2019-12-29 08:47:04
问题 I have a table: Table_Name: price_list --------------------------------------------------- | id | price_type_a | price_type_b | price_type_c | --------------------------------------------------- | 1 | 1234 | 5678 | 9012 | | 2 | 3456 | 7890 | 1234 | | 3 | 5678 | 9012 | 3456 | --------------------------------------------------- I need a select query in Postgres which gives result like this: --------------------------- | id | price_type | price | --------------------------- | 1 | type_a | 1234 |

Multiple data.frames from one with almost random selection criteria

感情迁移 提交于 2019-12-25 05:15:46
问题 This is a follow-up question from Extract multiple data.frames from one with selection criteria. Let's say the data is the same as in above example df <- data.frame(x1 = runif(1000), x2 = runif(1000), x3 = runif(1000), split = sample( c('SPLITMEHERE', 'OBS'), 1000, replace=TRUE, prob=c(0.1, 0.9) )) Basically, I need more general solution than the one in the quoted example. Namely, some counties in some months (every month is a .txt file) have only 1 table, and therefore only one 'SPLITMEHERE'

Imputing missing values keeping circular trend in mind

谁说我不能喝 提交于 2019-12-24 13:42:18
问题 Think of a picture of Sunrise where a red circle is surrounded by yellow thick ring and then blue background. Take red as 3 then yellow as 2 and blue as 1. 11111111111 11111211111 11112221111 11222322211 22223332222 11222322221 11112221111 11111211111 This is the desired output. But, the record/file/data has missing values (30% of all elements are missing). How can we impute missing values so as to get this desired output keeping the circular trend in mind. 回答1: This is how I would solve a

Collapse data frame by group using different functions on each variable

不羁岁月 提交于 2019-12-24 10:28:00
问题 Define df<-read.table(textConnection('egg 1 20 a egg 2 30 a jap 3 50 b jap 1 60 b')) s.t. > df V1 V2 V3 V4 1 egg 1 20 a 2 egg 2 30 a 3 jap 3 50 b 4 jap 1 60 b My data has no factors so I convert factors to characters: > df$V1 <- as.character(df$V1) > df$V4 <- as.character(df$V4) I would like to "collapse" the data frame by V1 keeping: The max of V2 The mean of V3 The mode of V4 (this value does not actually change within V1 groups, so first, last, etc might do also.) Please note this is a

Conditional calculation in R based on Row values and categories

孤者浪人 提交于 2019-12-24 08:19:36
问题 I have this dataframe: df<-data.frame(a=c("a1","a2","a3","a4","b1","b2","b3","b4","a1","a2","a3","a4","b1","b2","b3","b4"), b=c("x1","x2","x3","total","x1","x2","x3","total", "x1","x2","x3","total","x1","x2","x3","total"), reg=c("A","A","A","A","A","A","A","A","B", "B","B","B","B","B","B","B"), c=c(1:16)) which looks like: a b reg c 1 a1 x1 A 1 2 a2 x2 A 2 3 a3 x3 A 3 4 a4 total A 4 5 b1 x1 A 5 6 b2 x2 A 6 7 b3 x3 A 7 8 b4 total A 8 9 a1 x1 B 9 10 a2 x2 B 10 11 a3 x3 B 11 12 a4 total B 12 13

Extract multiple data.frames from one with selection criteria

我的未来我决定 提交于 2019-12-24 06:41:40
问题 Let this be my data set: df <- data.frame(x1 = runif(1000), x2 = runif(1000), x3 = runif(1000), split = sample( c('SPLITMEHERE', 'OBS'), 1000, replace=TRUE, prob=c(0.04, 0.96) )) So, I have some variables (in my case, 15), and criteria by which I want to split the data.frame into multiple data.frames. My criteria is the following: each other time the 'SPLITMEHERE' appears I want to take all the values, or all 'OBS' below it and get a data.frame from just these observations. So, if there's 20

Map numerics to categorical values in R, based on different ranges for the numerics [duplicate]

自作多情 提交于 2019-12-24 02:38:19
问题 This question already has answers here : Add column which contains binned values of an integer column (3 answers) Closed 2 years ago . Hope my title makes sense. I have a dataframe with a column of numeric values, and I would like to use this column to create a new column whereby the numeric values are 'mapped' to different buckets based on their values. Below is some test data, as well as a rough-around-the-edges nested ifelse() approach that I am currently using to solve this problem. I am

Pivoting a CSV file using R

家住魔仙堡 提交于 2019-12-24 00:59:24
问题 I have a file that looks like this: type created_at repository_name 1 IssuesEvent 2012-03-11 06:48:31 bootstrap 2 IssuesEvent 2012-03-11 06:48:31 bootstrap 3 IssuesEvent 2012-03-11 06:48:31 bootstrap 4 IssuesEvent 2012-03-11 06:52:50 bootstrap 5 IssuesEvent 2012-03-11 06:52:50 bootstrap 6 IssuesEvent 2012-03-11 06:52:50 bootstrap 7 IssueCommentEvent 2012-03-11 07:03:57 bootstrap 8 IssueCommentEvent 2012-03-11 07:03:57 bootstrap 9 IssueCommentEvent 2012-03-11 07:03:57 bootstrap 10 IssuesEvent

R - Match values from 2 dataframes based on multiple condtions (when the order of lookup IDs are random)

杀马特。学长 韩版系。学妹 提交于 2019-12-23 12:58:32
问题 Hi I have two data frames: df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11), Played_together = c(1,0,0,1,1,0,0,0,1,0,1), Event=c(1,1,1,1,2,2,2,2,2,2,2), Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60)) df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11), Played_together = c(1,1,1,1), Event=c(1,2,2,2)) Where df1 looks like this: PersonId1 PersonId2 Played_together Event Utility 1 1 11 1 1 20.0 2 2 12 0 1 -2.0 3 3 13 0 1 -5.0 4 4 14 1 1 10

Deleting an extra character in each row?

試著忘記壹切 提交于 2019-12-23 06:12:56
问题 I have a variable, and for some reason R has added an extra "X" in the beginning of each. Is this a common occurrence that I could have avoided? Anyhow, below is my data (currently the variable is stored in a list): X1 X5 X33 X37 ... > str(rc1_output) chr [1:63, 1:3] "X1" "X5" "X33" "X37" "X52" "X645" "X646" ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:63] "X1" "X5" "X33" "X37" ... ..$ : chr [1:3] "" "Entropy" "Subseq." > dput(head(rc1_output)) structure(c("X1", "X5", "X33", "X37", "X52",