Writing SQL query/subquery for pandas multiple dataframe

自作多情 提交于 2019-12-23 04:49:20

问题


I have two data-frame which looks like:

Data1:

         col1             col2 
        ['A_2']           ['C_8']
        ['A_2','B_3']     ['C_7']
        ['B_5']           ['A_3]

Data2:

        A    B    C   
        1    2    8
        2    3    8
        3    5    7
        3    5    7
        1    6    7
        2    3    8
        3    7    9
        4    8    9
        2    3    7
        10   5    11

Here Data1 has two columns which has values as colnameofData2_valueofData2 from Data2.

I have to write a sql query or using pandas to create a output table3 like this:

         col1             col2       count1   error
        ['A_2']           ['C_8']    2        1
        ['A_2','B_3']     ['C_7']    1        2
        ['B_5']           ['A_3]     2        1

For count1 column: From Data1 we pick row 1 and say that Count the rows in Data2 when value in Column A is 2 and Value in Column C is 8. I have to repeat this for every row in Data1 and create the output table3 as above.

For error column: count of rows when When '2' is present in column A and '8' is not present in column c

I am trying to solve either by pandas or by sql using python.

Here is my approach but stuck with sql:

    ```import pandasql as ps


Data1['col1'] =  Data1['col1'].apply(lambda x: 
x.replace('[','').replace(']',''))
Data1['col2'] =  Data1['col2'].apply(lambda x: 
x.replace('[','').replace(']',''))
q1 = SELECT col1,col2,confidence FROM Data1
Data1q1= ps.sqldf(q1, locals())
Data1list=Data1q1.values.tolist()
print(Data1list[0])

part1=Data1list[0]
ant=part1[0].split("_")
cons=part1[1].split("_")

print(ant,cons)

if len(ant)>2:
    val1=ant[-1].replace("'", '')
    var="_".join(ant[:-1]).replace("'", '')
else:
    val1=ant[-1].replace("'", '')
    var1=ant[0].replace("'", '')

if len(cons)>2:
    val2=cons[-1].replace("'", '')
    var2="_".join(cons[:-1]).replace("'", '')
else:
    val2=cons[-1].replace("'", '')
    var2=cons[0].replace("'", '')

 print(val1,val2,var1,var2)

q2 = SELECT count(*) as error FROM Data2 where var1=val1 and var2=val2
out= ps.sqldf(q2, locals())```

But this approach is not helping me to dynamically create table and also I get error as var1 and val1 are not defined. Is there any pythonic way to do it?

来源:https://stackoverflow.com/questions/57079588/writing-sql-query-subquery-for-pandas-multiple-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!