问题
Essentially, I'm trying to build a new dataframe from two others but the situation is a little complicated and I'm not sure what the best way to do this is.
In DF1, each row is data about objects defined by IDs, and it looks something like this:
ID Name datafield1 datafield2
1 Foo info1 info2
2 bar info3 info4
3 Foos info5 info6
DF2 has monthly data about each object formatted like this:
ID Name Month data
1 Foo 1/20 53.6
1 Foo 2/20 47.2
1 Foo 3/20 12.7
1 Foo 4/20 3.2
2 Bar 1/20 82.2
2 Bar 2/20 65.0
2 Bar 3/20 41.7
2 Bar 4/20 28.4
So what I want to do is to search DF2 by ID found from DF1 and then put the monthly data from DF2 and a couple of important columns from DF1 and put all of this in a new dataframe.
This is what I had so far but from what I've read this is a bad approach:
IDs = df1['ID'].unique()
df3 = pd.DataFrame(rows = IDs)
for id, df in df1.groupby('ID'):
if ([df2['ID'] == id]):
*not sure what to put here*
So it sounds like creating an empty dataframe is a bad approach but I'm not sure how else to approach it. How should I create this new dataframe? And is it better (meaning which is a smarter approach) to convert the monthly data into columns and have a single row for each ID or would it be better to just keep each month separate and add a couple of columns from DF1 to each row?
回答1:
Check if below lines can help you to add columns from DF1 to new frame, I have taken frame through excel you can use your own way...data used is displayed in image
import pandas as pd
df1 = pd.read_excel('frame1.xlsx')
df2 = pd.read_excel('frame2.xlsx')
df = pd.merge(df2, df1[['ID','datafield1','datafield2']], on = 'ID', how = 'left')
print(df)
来源:https://stackoverflow.com/questions/61923715/python-the-best-way-to-create-a-new-dataframe-from-two-other-dataframes-with-d