Operations on a very large csv with pandas

前端 未结 2 1054
面向向阳花
面向向阳花 2021-01-29 09:42

I have been using pandas on csv files to get some values out of them. My data looks like this:

\"A\",23.495,41.995,\"this is a sentence with some words\"
\"B\",         


        
相关标签:
2条回答
  • 2021-01-29 09:52

    Okay I misunderstood the chunk parameter. I solved it by doing this:

    frame = pd.DataFrame()
    chunks = pd.read_csv("csvfile.txt", sep=",", header = None,names=
    ["group","val1","val2","text"],chunksize=1000000)
    for df in chunks: 
        freq=Counter(df['group'])
        word1=df[df["text"].str.contains("WORD1")].groupby("group").size()
        word2=df[df["text"].str.contains("WORD2")].groupby("group").size()
        df1 = pd.concat([pd.Series(freq),word1,word2], axis=1)
        frame = frame.add(df1,fill_value=0)
    
    outfile = open("csv_out.txt","w", encoding='utf-8')
    frame.to_csv(outfile, sep=",")
    outfile.close() 
    
    0 讨论(0)
  • 2021-01-29 10:16

    You can specify a chunksize option in the read_csv call. See here for details

    Alternatively you could use the Python csv library and create your own csv Reader or DictReader and then use that to read in data in whatever chunk size you choose.

    0 讨论(0)
提交回复
热议问题