问题
I have two data-frame which looks like:
Data1:
col1 col2
['A_2'] ['C_8']
['A_2','B_3'] ['C_7']
['B_5'] ['A_3]
Data2:
A B C
1 2 8
2 3 8
3 5 7
3 5 7
1 6 7
2 3 8
3 7 9
4 8 9
2 3 7
10 5 11
Here Data1
has two columns which has values as colnameofData2_valueofData2 from Data2
.
I have to write a sql query or using pandas to create a output table3 like this:
col1 col2 count1 error
['A_2'] ['C_8'] 2 1
['A_2','B_3'] ['C_7'] 1 2
['B_5'] ['A_3] 2 1
For count1
column: From Data1 we pick row 1 and say that Count the rows in Data2 when value in Column A is 2 and Value in Column C is 8. I have to repeat this for every row in Data1 and create the output table3 as above.
For error
column: count of rows when When '2' is present in column A and '8' is not
present in column c
I am trying to solve either by pandas or by sql using python.
Here is my approach but stuck with sql:
```import pandasql as ps
Data1['col1'] = Data1['col1'].apply(lambda x:
x.replace('[','').replace(']',''))
Data1['col2'] = Data1['col2'].apply(lambda x:
x.replace('[','').replace(']',''))
q1 = SELECT col1,col2,confidence FROM Data1
Data1q1= ps.sqldf(q1, locals())
Data1list=Data1q1.values.tolist()
print(Data1list[0])
part1=Data1list[0]
ant=part1[0].split("_")
cons=part1[1].split("_")
print(ant,cons)
if len(ant)>2:
val1=ant[-1].replace("'", '')
var="_".join(ant[:-1]).replace("'", '')
else:
val1=ant[-1].replace("'", '')
var1=ant[0].replace("'", '')
if len(cons)>2:
val2=cons[-1].replace("'", '')
var2="_".join(cons[:-1]).replace("'", '')
else:
val2=cons[-1].replace("'", '')
var2=cons[0].replace("'", '')
print(val1,val2,var1,var2)
q2 = SELECT count(*) as error FROM Data2 where var1=val1 and var2=val2
out= ps.sqldf(q2, locals())```
But this approach is not helping me to dynamically create table and also I get error as var1 and val1 are not defined. Is there any pythonic way to do it?
来源:https://stackoverflow.com/questions/57079588/writing-sql-query-subquery-for-pandas-multiple-dataframe