问题
I have 2 csv files each with only 1 column as below:
csv file 1 : adam3us.csv
created_at
6/7/2018 19:00
6/6/2018 12:00
6/6/2018 9:00
6/6/2018 9:00
6/6/2018 5:00
6/5/2018 16:00
6/5/2018 7:00
6/4/2018 16:00
csv file 2 : Bitcoin Hourly Based
created_at
1/8/2017 0:00
1/8/2017 1:00
1/8/2017 2:00
1/8/2017 3:00
1/8/2017 4:00
1/8/2017 5:00
1/8/2017 6:00
6/7/2018 19:00
I am trying to write a python script that will compare each value of the csv file 2 with every entry in the csv file 1 using a loop and if the entries match, should increment a declared variable called count and then should write to a new csv file, with one column created_at containing the time that is same for both csv files and a second column with the value of count.
For example, the 1st iteration will take the 1st row of csv file 2, i.e. 6/7/2018 19:00 and compare its value with every row present in the csv file 1. If the 1st row of the csv file 2 matches any row of csv file 1 then count variable should be incremented. In this case it will match the 1st row of the csv file 2 with the last row of the csv file 1 and would increment count from 0 to 1 and would write the value of created_at and the value of count to new separate csv file called output. The output file for this example should look like as below :
output.csv
created_at count
6/7/2018 19:00 1
The count variable should be set to 0 for every iteration and the process should repeat for every iteration.
My code that I have is as below :
import csv
count=0
path1 = r'C:\Users\Ahmed Ismail Khalid\Desktop\Bullcrap Testing Delete Later\Bitcoin Prices Hourly Based.csv'
path2 = r'C:\Users\Ahmed Ismail Khalid\Desktop\Bullcrap Testing Delete Later\adam3us.csv'
path3 = r'C:\Users\Ahmed Ismail Khalid\Desktop\output.csv'
with open(path1,'rt',encoding='utf-8') as csvin:
reader1 = csv.reader(csvin)
for row in reader1:
b=row[0]
with open(path2,'rt',encoding='utf-8') as csvinpu:
with open(path3, 'w', newline='',encoding='utf-8') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader2 = csv.reader(csvinpu)
all = []
row = next(reader2)
row.append('count')
all.append(row)
for row in reader2:
d=row[0]
if(b==d) :
count+=1
row.append(count)
all.append(row)
else:
row.append(count)
all.append(row)
writer.writerows(all)
Any and all help would be appreciated.
Thanks
回答1:
Use pandas for such manipulations.
Load both csv file in two data-frame using pandas and take the intersection of both column. Pandas have inbuilt features. pd.merge
Import pandas as pd
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
output = pd.merge(df1, df2, how="inner", on="column_name") #column_name should be common in both dataframe
#how represents type of intersection. In your case it will be inner(INNER JOIN)
output['count'] = output.groupby('column_name')['column_name'].transform('size') #pandas query
final_output = output.drop_duplicates() #It will remove duplicate rows
Hope, It will help.
来源:https://stackoverflow.com/questions/50785498/python-comparing-columns-of-2-csv-files-and-writing-to-a-new-csv