dataframe | 易学教程

Python Pandas: How to convert my table from a long format to wide format (specific example below)?

阅读更多关于 Python Pandas: How to convert my table from a long format to wide format (specific example below)?

问题 Pretty much the title. I am attaching the spreadsheet here. I need to convert "Input" sheet to "Output" sheet. I know about Pandas wide_to_long. But I haven't been able to use it to give the desired output, the rows get scrambled up in the output. import pandas as pd df=pd.read_excel('../../Downloads/test.xlsx',sheet_name='Input', header=0) newdf=pd.wide_to_long(df, [str(i) for i in range(2022,2028)], 'Hotel Name', 'value', sep='', suffix='.+')\ .reset_index()\ .sort_values('Hotel Name')\

R How to group_by, split or subset by row values

阅读更多关于 R How to group_by, split or subset by row values

问题 This is continued from last question R, how to group by row value? Split? The change in input Dataframe is id = str_c("x",1:22) val = c(rep("NO1", 2), "START", rep("yes1", 2), "STOP", "NO", "START","NO1", "START", rep("yes2", 3), "STOP", "NO1", "START", rep("NO3",3), "STOP", "NO1", "STOP") data = data.frame(id,val) Expected output is dataframe with val column as follows- val = c("START", rep("yes1", 2), "STOP", "START","NO1", "START", rep("yes2", 3), "STOP", "START", rep("NO3",3), "STOP",

Simultaneously merge multiple data.frames in a list

阅读更多关于 Simultaneously merge multiple data.frames in a list

问题 I have a list of many data.frames that I want to merge. The issue here is that each data.frame differs in terms of the number of rows and columns, but they all share the key variables (which I've called "var1" and "var2" in the code below). If the data.frames were identical in terms of columns, I could merely rbind , for which plyr's rbind.fill would do the job, but that's not the case with these data. Because the merge command only works on 2 data.frames, I turned to the Internet for ideas.

Simultaneously merge multiple data.frames in a list

阅读更多关于 Simultaneously merge multiple data.frames in a list

getting ratio by iterating over two columns

阅读更多关于 getting ratio by iterating over two columns

问题 Hi my data frame is as below Date Key y 1/2/2013 A 1 1/2/2013 B 2 1/2/2013 C 1 2/2/2013 A 1 2/2/2013 c 1 2/2/2013 B 3 I now want to create a new column "ratio" which is for a given date(1/2/2013), ratio of key A would be y(A)/(y(A)+y(B)+y(C)) which is 1/(1+2+1) i.e 0.25. My final df would be as follows Date Key y ratio 1/2/2013 A 1 0.25 1/2/2013 B 2 0.5 1/2/2013 C 1 0.25 2/2/2013 A 1 0.2 2/2/2013 c 1 0.2 2/2/2013 B 3 0.6 really appreciate the help 回答1: You can use groupby().transform('sum')

Add rep vector to dataframe with uneven total rows

阅读更多关于 Add rep vector to dataframe with uneven total rows

问题 I'm trying to find a way of automating a large dataset to add two factors but the data may contain uneven rows. I've tried to do this with the 'rep' function but this will only work if the data frame has even numbers. x<-c(1,3,5,7,9) y<-c(2,4,6,8,10) df<-data.frame(x,y) df$state<-factor(rep(1:2)) Error in `$<-.data.frame`(`*tmp*`, state, value = 1:2) : replacement has 2 rows, data has 5 How do I get the data.frame to recycle 1 into row 5 instead of an error? 回答1: rep() 's length.out argument

Generating variable names for dataframes based on the loop number in a loop in R

阅读更多关于 Generating variable names for dataframes based on the loop number in a loop in R

问题 I am working on developing and optimizing a linear model using the lm() function and subsequently the step() function for optimization. I have added a variable to my dataframe by using a random generator of 0s and 1s (50% chance each). I use this variable to subset the dataframe into a training set and a validation set If a record is not assigned to the training set it is assigned to the validation set. By using these subsets I am able to estimate how good the fit of the model is (by using

make upper case and replace space in column dataframe

阅读更多关于 make upper case and replace space in column dataframe

问题 for a specific column of a pandas dataframe I would like to make the elements all uppercase and replace the spaces import pandas as pd df = pd.DataFrame(data=[['AA 123',00],[99,10],['bb 12',10]],columns=['A','B'],index=[0,1,2]) # find elements 'A' that are string temp1 = [isinstance(s, str) for s in df['A'].values] # Make upper case and replace any space temp2 = df['A'][temp1].str.upper() temp2 = temp2.str.replace(r'\s', '') # replace in dataframe df['A'].loc[temp2.index] = temp2.values I get

Spark How to Specify Number of Resulting Files for DataFrame While/After Writing

阅读更多关于 Spark How to Specify Number of Resulting Files for DataFrame While/After Writing

问题 I saw several q/a's about writing single file into hdfs,it seems using coalesce(1) is sufficient. E.g; df.coalesce(1).write.mode("overwrite").format(format).save(location) But how can I specify "exact" number of files that will written after save operation? So my question is; If I have dataframe which consist 100 partitions when I make write operation will it write 100 files? If I have dataframe which consist 100 partitions when I make write operation after calling repartition(50)/coalsesce

Selecting first row with groupby and NaN columns

阅读更多关于 Selecting first row with groupby and NaN columns

问题 I'm trying to select the first row of each group of a data frame. import pandas as pd import numpy as np x = [{'id':"a",'val':np.nan, 'val2':-1},{'id':"a",'val':'TREE','val2':15}] df = pd.DataFrame(x) # id val val2 # 0 a NaN -1 # 1 a TREE 15 When I try to do this with groupby , I get df.groupby('id', as_index=False).first() # id val val2 # 0 a TREE -1 The row returned to me is nowhere in the original data frame. Do I need to do something special with NaN values in columns other than the