Operations on a very large csv with pandas

前端 未结 2 1058
面向向阳花
面向向阳花 2021-01-29 09:42

I have been using pandas on csv files to get some values out of them. My data looks like this:

\"A\",23.495,41.995,\"this is a sentence with some words\"
\"B\",         


        
2条回答
  •  太阳男子
    2021-01-29 09:52

    Okay I misunderstood the chunk parameter. I solved it by doing this:

    frame = pd.DataFrame()
    chunks = pd.read_csv("csvfile.txt", sep=",", header = None,names=
    ["group","val1","val2","text"],chunksize=1000000)
    for df in chunks: 
        freq=Counter(df['group'])
        word1=df[df["text"].str.contains("WORD1")].groupby("group").size()
        word2=df[df["text"].str.contains("WORD2")].groupby("group").size()
        df1 = pd.concat([pd.Series(freq),word1,word2], axis=1)
        frame = frame.add(df1,fill_value=0)
    
    outfile = open("csv_out.txt","w", encoding='utf-8')
    frame.to_csv(outfile, sep=",")
    outfile.close() 
    

提交回复
热议问题