问题
I have 2 CSV files, as below.
- I want a new column
Difference
, where...- if a mobile number appears within the date range of
Book_date
...App_date
:Difference
= differenceApp_date
andOccur_date
- or NaN if it doesn't occur in that date range.
- if a mobile number appears within the date range of
- I also want to filter it based on a unique category and mobile_number
csv_1
Mobile_Number Book_Date App_Date
503477334 2018-10-12 2018-10-18
506002884 2018-10-12 2018-10-19
501022162 2018-10-12 2018-10-16
503487338 2018-10-13 2018-10-13
506012887 2018-10-13 2018-10-21
503427339 2018-10-14 2018-10-17
csv_2
Mobile_Number Occur_Date
503477334 2018-10-16
506002884 2018-10-21
501022162 2018-10-15
503487338 2018-10-13
501428449 2018-10-18
506012887 2018-10-14
I want a new column in csv_1, where if a mobile number appears within the date range of Book_date and App_date in csv_2, the difference between App_date and the Occur_date or NaN if it doesn't occur in that date range. The output should be
Output
Mobile_Number Book_Date App_Date Difference
503477334 2018-10-12 2018-10-18 2
506002884 2018-10-12 2018-10-19 -2
501022162 2018-10-12 2018-10-16 1
503487338 2018-10-13 2018-10-13 0
506012887 2018-10-13 2018-10-21 7
503427339 2018-10-14 2018-10-17 NaN
EDIT:
If I want to filter it based on a unique category and mobile_number on the above two csv files. How to do the same?
csv_1
Category Mobile_Number Book_Date App_Date
A 503477334 2018-10-12 2018-10-18
B 503477334 2018-10-07 2018-10-16
C 501022162 2018-10-12 2018-10-16
A 503487338 2018-10-13 2018-10-13
C 506012887 2018-10-13 2018-10-21
E 503427339 2018-10-14 2018-10-17
csv_2
Category Mobile_Number Occur_Date
A 503477334 2018-10-16
B 503477334 2018-10-13
A 501022162 2018-10-15
A 503487338 2018-10-13
F 501428449 2018-10-18
C 506012887 2018-10-14
I want the output to be filtered based on the Mobile_Number and the Category
Output
Category Mobile_Number Book_Date App_Date Difference
A 503477334 2018-10-12 2018-10-18 2
B 503477334 2018-10-07 2018-10-16 3
C 501022162 2018-10-12 2018-10-16 NaN
A 503487338 2018-10-13 2018-10-13 0
C 506012887 2018-10-13 2018-10-21 7
E 503427339 2018-10-14 2018-10-17 NaN
回答1:
Use Series.map for new Series
matched by Mobile_Number
and for test values between columns use Series.between, then assign values by mask with numpy.where:
df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])
s1 = df2.drop_duplicates('Mobile_Number').set_index('Mobile_Number')['Occur_Date']
s2 = df1['Mobile_Number'].map(s1)
m = s2.between(df1['Book_Date'], df1['App_Date'])
#solution with no mask
df1['Difference1'] = df1['App_Date'].sub(s2).dt.days
#solution with test between
df1['Difference2'] = np.where(m, df1['App_Date'].sub(s2).dt.days, np.nan)
print (df1)
Mobile_Number Book_Date App_Date Difference Difference1 Difference2
0 503477334 2018-10-12 2018-10-18 2018-10-16 2.0 2.0
1 506002884 2018-10-12 2018-10-19 2018-10-21 -2.0 NaN
2 501022162 2018-10-12 2018-10-16 2018-10-15 1.0 1.0
3 503487338 2018-10-13 2018-10-13 2018-10-13 0.0 0.0
4 506012887 2018-10-13 2018-10-21 2018-10-14 7.0 7.0
5 503427339 2018-10-14 2018-10-17 NaT NaN NaN
EDIT:
You can use merge
instead map
for join by 2 columns:
df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])
df3 = df1.merge(df2, on=['Category','Mobile_Number'], how='left')
print (df3)
Category Mobile_Number Book_Date App_Date Occur_Date
0 A 503477334 2018-10-12 2018-10-18 2018-10-16
1 B 503477334 2018-10-07 2018-10-16 2018-10-13
2 C 501022162 2018-10-12 2018-10-16 NaT
3 A 503487338 2018-10-13 2018-10-13 2018-10-13
4 C 506012887 2018-10-13 2018-10-21 2018-10-14
5 E 503427339 2018-10-14 2018-10-17 NaT
m = df3['Occur_Date'].between(df3['Book_Date'], df3['App_Date'])
#print (m)
df3['Difference2'] = np.where(m, df3['App_Date'].sub(df3['Occur_Date']).dt.days, np.nan)
print (df3)
Category Mobile_Number Book_Date App_Date Occur_Date Difference2
0 A 503477334 2018-10-12 2018-10-18 2018-10-16 2.0
1 B 503477334 2018-10-07 2018-10-16 2018-10-13 3.0
2 C 501022162 2018-10-12 2018-10-16 NaT NaN
3 A 503487338 2018-10-13 2018-10-13 2018-10-13 0.0
4 C 506012887 2018-10-13 2018-10-21 2018-10-14 7.0
5 E 503427339 2018-10-14 2018-10-17 NaT NaN
来源:https://stackoverflow.com/questions/59456738/occurrence-of-a-number-between-two-specific-datetime-ranges-in-pandas