dataframe

Python Pandas: How to convert my table from a long format to wide format (specific example below)?

我与影子孤独终老i 提交于 2021-02-17 07:09:49
问题 Pretty much the title. I am attaching the spreadsheet here. I need to convert "Input" sheet to "Output" sheet. I know about Pandas wide_to_long. But I haven't been able to use it to give the desired output, the rows get scrambled up in the output. import pandas as pd df=pd.read_excel('../../Downloads/test.xlsx',sheet_name='Input', header=0) newdf=pd.wide_to_long(df, [str(i) for i in range(2022,2028)], 'Hotel Name', 'value', sep='', suffix='.+')\ .reset_index()\ .sort_values('Hotel Name')\

R How to group_by, split or subset by row values

拜拜、爱过 提交于 2021-02-17 07:08:07
问题 This is continued from last question R, how to group by row value? Split? The change in input Dataframe is id = str_c("x",1:22) val = c(rep("NO1", 2), "START", rep("yes1", 2), "STOP", "NO", "START","NO1", "START", rep("yes2", 3), "STOP", "NO1", "START", rep("NO3",3), "STOP", "NO1", "STOP") data = data.frame(id,val) Expected output is dataframe with val column as follows- val = c("START", rep("yes1", 2), "STOP", "START","NO1", "START", rep("yes2", 3), "STOP", "START", rep("NO3",3), "STOP",

Simultaneously merge multiple data.frames in a list

微笑、不失礼 提交于 2021-02-17 07:04:54
问题 I have a list of many data.frames that I want to merge. The issue here is that each data.frame differs in terms of the number of rows and columns, but they all share the key variables (which I've called "var1" and "var2" in the code below). If the data.frames were identical in terms of columns, I could merely rbind , for which plyr's rbind.fill would do the job, but that's not the case with these data. Because the merge command only works on 2 data.frames, I turned to the Internet for ideas.

Simultaneously merge multiple data.frames in a list

假如想象 提交于 2021-02-17 07:04:31
问题 I have a list of many data.frames that I want to merge. The issue here is that each data.frame differs in terms of the number of rows and columns, but they all share the key variables (which I've called "var1" and "var2" in the code below). If the data.frames were identical in terms of columns, I could merely rbind , for which plyr's rbind.fill would do the job, but that's not the case with these data. Because the merge command only works on 2 data.frames, I turned to the Internet for ideas.

getting ratio by iterating over two columns

此生再无相见时 提交于 2021-02-17 07:04:07
问题 Hi my data frame is as below Date Key y 1/2/2013 A 1 1/2/2013 B 2 1/2/2013 C 1 2/2/2013 A 1 2/2/2013 c 1 2/2/2013 B 3 I now want to create a new column "ratio" which is for a given date(1/2/2013), ratio of key A would be y(A)/(y(A)+y(B)+y(C)) which is 1/(1+2+1) i.e 0.25. My final df would be as follows Date Key y ratio 1/2/2013 A 1 0.25 1/2/2013 B 2 0.5 1/2/2013 C 1 0.25 2/2/2013 A 1 0.2 2/2/2013 c 1 0.2 2/2/2013 B 3 0.6 really appreciate the help 回答1: You can use groupby().transform('sum')

Add rep vector to dataframe with uneven total rows

空扰寡人 提交于 2021-02-17 06:05:24
问题 I'm trying to find a way of automating a large dataset to add two factors but the data may contain uneven rows. I've tried to do this with the 'rep' function but this will only work if the data frame has even numbers. x<-c(1,3,5,7,9) y<-c(2,4,6,8,10) df<-data.frame(x,y) df$state<-factor(rep(1:2)) Error in `$<-.data.frame`(`*tmp*`, state, value = 1:2) : replacement has 2 rows, data has 5 How do I get the data.frame to recycle 1 into row 5 instead of an error? 回答1: rep() 's length.out argument

Generating variable names for dataframes based on the loop number in a loop in R

情到浓时终转凉″ 提交于 2021-02-17 06:04:40
问题 I am working on developing and optimizing a linear model using the lm() function and subsequently the step() function for optimization. I have added a variable to my dataframe by using a random generator of 0s and 1s (50% chance each). I use this variable to subset the dataframe into a training set and a validation set If a record is not assigned to the training set it is assigned to the validation set. By using these subsets I am able to estimate how good the fit of the model is (by using

make upper case and replace space in column dataframe

走远了吗. 提交于 2021-02-17 06:02:51
问题 for a specific column of a pandas dataframe I would like to make the elements all uppercase and replace the spaces import pandas as pd df = pd.DataFrame(data=[['AA 123',00],[99,10],['bb 12',10]],columns=['A','B'],index=[0,1,2]) # find elements 'A' that are string temp1 = [isinstance(s, str) for s in df['A'].values] # Make upper case and replace any space temp2 = df['A'][temp1].str.upper() temp2 = temp2.str.replace(r'\s', '') # replace in dataframe df['A'].loc[temp2.index] = temp2.values I get

Spark How to Specify Number of Resulting Files for DataFrame While/After Writing

蓝咒 提交于 2021-02-17 05:25:06
问题 I saw several q/a's about writing single file into hdfs,it seems using coalesce(1) is sufficient. E.g; df.coalesce(1).write.mode("overwrite").format(format).save(location) But how can I specify "exact" number of files that will written after save operation? So my question is; If I have dataframe which consist 100 partitions when I make write operation will it write 100 files? If I have dataframe which consist 100 partitions when I make write operation after calling repartition(50)/coalsesce

Selecting first row with groupby and NaN columns

拥有回忆 提交于 2021-02-17 05:19:40
问题 I'm trying to select the first row of each group of a data frame. import pandas as pd import numpy as np x = [{'id':"a",'val':np.nan, 'val2':-1},{'id':"a",'val':'TREE','val2':15}] df = pd.DataFrame(x) # id val val2 # 0 a NaN -1 # 1 a TREE 15 When I try to do this with groupby , I get df.groupby('id', as_index=False).first() # id val val2 # 0 a TREE -1 The row returned to me is nowhere in the original data frame. Do I need to do something special with NaN values in columns other than the