Python Killed: 9 when running a code using dictionaries created from 2 csv files

后端 未结 3 1699
余生分开走
余生分开走 2021-01-17 17:41

I am running a code that has always worked for me. This time I ran it on 2 .csv files: \"data\" (24 MB) and \"data1\" (475 MB). \"data\" has 3 columns of about 680000 elemen

3条回答
  •  别那么骄傲
    2021-01-17 18:17

    Most likely kernel kills it because your script consumes too much of memory. You need to take different approach and try to minimize size of data in memory.

    You may also find this question useful: Very large matrices using Python and NumPy

    In the following code snippet I tried to avoid loading huge data1.csv into memory by processing it line-by-line. Give it a try.

    import csv
    
    from collections import OrderedDict # to save keys order
    
    with open('data.csv', 'rb') as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        next(reader) #skip header
        d = OrderedDict((rows[2], {"val": rows[1], "flag": False}) for rows in reader)
    
    with open('data1.csv', 'rb') as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        next(reader) #skip header
        for rows in reader:
            if rows[0] in d:
                d[rows[0]]["flag"] = True
    
    import sys
    sys.stdout = open("rs_pos_ref_alt.csv", "w")
    
    for k, v in d.iteritems():
        if v["flag"]:
            print [v["val"], k]
    

提交回复
热议问题