Create different combination / patterns between the data of two columns of a csv file by python

南楼画角 提交于 2019-12-11 11:47:09

问题


I have a .csv file that contain 5 columns, a_id, b_id, var, lo, up. I would like to create different combinations / patterns between two variables based on a_id, b_id, and var.

In addition, at first I would like to delete the records that have no duplicate based on a_id, b_id, because if there is no duplicate, so combination or matching would not be created. As a result, in the dataFile.csv, first record is deleted, because it has no duplicate.

For the combination / pattern between two variables, at first I would like to create single combination on each records for each a_id and b_id. In this case, the values of the 2nd variable is null. This can be shown in the resultFile. For example, if I create different combination / patterns from record 2 to 5, that means where a_id = 103 and b_id = 195, the result can be seen in the resultFile. In the same way other combination / patterns based on a_id, b_id and var will be created as resultFile.csv. On the result file, 1, 2, and 3 in the variable name is use just to identify the variables, it is usually not required in the resultFile. In addition, I used a blank row for each pattern, and it is also not required in the resultFile. I used this just to see the patterns clearly. I have shown different combination of two variables based a_id and b_id. I have different a_id, and different b_id the real data.

Any advice and suggestion is appreciated.

dataFile.csv.

   
a_id      b_id      var      lo      up
103       190       dwel     0       236

103       195       ses      1       3
103       195       ses      4       113
103       195       pv       1       5
103       195       pv       6       29

103       266       dwl      15      92
103       266       dwl      93      144
103       266       dwl      145     521
103       266       ses      1       2
103       266       ses      3       6
103       266       pv       1       2
103       266       pv       3       9
103       266       pv       10      23
103       266       pv       24      33
103       266       Elp      142     711
103       266       Elp      711     885

107       272       dwl      15      95
107       272       dwl      96      624
107       272       ses      1       2
107       272       ses      3       6
107       272       pv       1       2
107       272       pv       3       9
.         .         .        .       .
.         .         .        .       .

resultFile.csv.

The resultFile.csv should be as follows:

   
a_id    b_id    var1    lo    up    var2    lo      up
103     195     ses1    1     3     null    null    null
103     195     ses2    4     113   null    null    null
103     195     pv1     1     5     null    null    null
103     195     pv2     6     29    null    null    null
103     195     ses1    1     3     pv1     1       5
103     195     ses1    1     3     pv2     6       29
103     195     ses2    4     113   pv1     1       5
103     195     ses2    4     113   pv2     6       29

103     266     dwl1    15    92    null    null    null
103     266     dwl2    93    144   null    null    null
103     266     dwl3    145   521   null    null    null
103     266     ses1    1     2     null    null    null
103     266     ses2    3     6     null    null    null
103     266     pv1     1     2     null    null    null
103     266     pv2     3     9     null    null    null
103     266     pv3     10    23    null    null    null
103     266     pv4     24    103   null    null    null
103     266     elp1    142   711   null    null    null
103     266     elp2    712   885   null    null    null
103     266     dwl1    15    92    ses1    1       2
103     266     dwl1    15    92    ses2    3       6
103     266     dwl2    993   144   ses1    1       2
103     266     dwl2    993   144   ses2    3       6
103     266     dwl3    145   521   ses1    1       2
103     266     dwl3    145   521   ses2    3       6
103     266     dwl1    15    92    pv1     1       2
103     266     dwl1    15    92    pv2     3       9
103     266     dwl1    15    92    pv3     10      23
103     266     dwl1    15    92    pv4     24      33
103     266     dwl2    993   144   pv1     1       2
103     266     dwl2    993   144   pv2     3       9
103     266     dwl2    993   144   pv3     10      23
103     266     dwl2    993   144   pv4     24      33
103     266     dwl3    145   521   pv1     1       2
103     266     dwl3    145   521   pv2     3       9
103     266     dwl3    145   521   pv3     10      23
103     266     dwl3    145   521   pv4     24      33
103     266     dwl1    15    92    elp1    142     711
103     266     dwl1    15    92    elp2    712     885
103     266     dwl2    993   144   elp1    142     711
103     266     dwl2    993   144   elp2    712     885
103     266     dwl3    145   521   elp1    142     711
103     266     dwl3    145   521   elp2    712     885
103     266     ses1    1     2     pv1     1       2
103     266     ses1    1     2     pv2     3       9
103     266     ses1    1     2     pv3     10      23
103     266     ses1    1     2     pv4     24      33
103     266     ses2    3     6     pv1     1       2
103     266     ses2    3     6     pv2     3       9
103     266     ses2    3     6     pv3     10      23
103     266     ses2    3     6     pv4     24      33
103     266     ses1    1     2     dwl1    615     992
103     266     ses1    1     2     dwl2    993     144
103     266     ses1    1     2     dwl3    145     210
103     266     ses2    3     6     dwl1    615     992
103     266     ses2    3     6     dwl2    993     144
103     266     ses2    3     6     dwl3    145     210
103     266     ses1    1     2     elp1    142     711
103     266     ses1    1     2     elp2    712     885
103     266     ses2    3     6     elp1    142     711
103     266     ses2    3     6     elp2    712     885
103     266     elp1    142   711   pv1     1       2
103     266     elp1    142   711   pv2     3       9
103     266     elp1    142   711   pv3     10      23
103     266     elp1    142   711   pv4     24      33
103     266     elp2    712   885   pv1     1       2
103     266     elp2    712   885   pv2     3       9
103     266     elp2    712   885   pv3     10      23
103     266     elp2    712   885   pv4     24      33
103     266     elp1    142   711   ses1    1       2
103     266     elp1    142   711   ses2    3       6
103     266     elp2    712   885   ses1    1       2
103     266     elp2    712   885   ses2    3       6
103     266     elp1    142   711   dwl1    615     992
103     266     elp1    142   711   dwl2    993     144
103     266     elp1    142   711   dwl3    145     210
103     266     elp2    712   885   dwl1    615     992
103     266     elp2    712   885   dwl2    993     144
103     266     elp2    712   885   dwl3    145     210
103     266     pv1     1     2     dwl1    615     992
103     266     pv1     1     2     dwl2    993     144
103     266     pv1     1     2     dwl3    145     210
103     266     pv2     3     9     dwl1    615     992
103     266     pv2     3     9     dwl2    993     144
103     266     pv2     3     9     dwl3    145     210
103     266     pv3     10    23    dwl1    615     992
103     266     pv3     10    23    dwl2    993     144
103     266     pv3     10    23    dwl3    145     210
103     266     pv4     24    33    dwl1    615     992
103     266     pv4     24    33    dwl2    993     144
103     266     pv4     24    33    dwl3    145     210
103     266     pv1     1     2     ses1    1       2
103     266     pv1     1     2     ses2    3       6
103     266     pv2     3     9     ses1    1       2
103     266     pv2     3     9     ses2    3       6
103     266     pv3     10    23    ses1    1       2
103     266     pv3     10    23    ses2    3       6
103     266     pv4     24    33    ses1    1       2
103     266     pv4     24    33    ses2    3       6
103     266     pv1     1     2     elp1    142     711
103     266     pv1     1     2     elp2    712     885
103     266     pv2     3     9     elp1    142     711
103     266     pv2     3     9     elp2    712     885
103     266     pv3     10    23    elp1    142     711
103     266     pv3     10    23    elp2    712     885
103     266     pv4     24    33    elp1    142     711
103     266     pv4     24    33    elp2    712     885


回答1:


The following Python solution should get your started:

from itertools import groupby, product
import csv

output_header = ["a_id", "b_id", "var1", "lo", "up", "var2", "lo", "up"]

f_input = open('dataFile.csv', 'rb')
csv_input = csv.reader(f_input)
input_header = next(csv_input)

f_output = open('resultFile.csv', 'wb')
csv_output = csv.writer(f_output)
csv_output.writerow(output_header)


for k1, g1 in groupby(csv_input, key=lambda x: (x[0], x[1])):
    group1 = list(g1)

    if len(group1) > 1:
        for row in group1:
            csv_output.writerow(row + ['null'] * 3)

        p = [list(g2) for k2, g2 in groupby(group1, key=lambda x: x[2])]

        for pairs in product(*p):
            if len(pairs) > 1:
                csv_output.writerow(pairs[0] + pairs[1][2:])

f_input.close()
f_output.close()

This will give you a resultFile.csv file starting as follows:

a_id,b_id,var1,lo,up,var2,lo,up
103,195,ses,1,3,null,null,null
103,195,ses,4,113,null,null,null
103,195,pv,1,5,null,null,null
103,195,pv,6,29,null,null,null
103,195,ses,1,3,pv,1,5
103,195,ses,1,3,pv,6,29
103,195,ses,4,113,pv,1,5
103,195,ses,4,113,pv,6,29
103,266,dwl,15,92,null,null,null
103,266,dwl,93,144,null,null,null
103,266,dwl,145,521,null,null,null
...

Tested using Python 2.6.6 (which I believe the OP is using)



来源:https://stackoverflow.com/questions/34309176/create-different-combination-patterns-between-the-data-of-two-columns-of-a-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!