I have been using pandas on csv files to get some values out of them. My data looks like this:
\"A\",23.495,41.995,\"this is a sentence with some words\"
\"B\",
Okay I misunderstood the chunk parameter. I solved it by doing this:
frame = pd.DataFrame()
chunks = pd.read_csv("csvfile.txt", sep=",", header = None,names=
["group","val1","val2","text"],chunksize=1000000)
for df in chunks:
freq=Counter(df['group'])
word1=df[df["text"].str.contains("WORD1")].groupby("group").size()
word2=df[df["text"].str.contains("WORD2")].groupby("group").size()
df1 = pd.concat([pd.Series(freq),word1,word2], axis=1)
frame = frame.add(df1,fill_value=0)
outfile = open("csv_out.txt","w", encoding='utf-8')
frame.to_csv(outfile, sep=",")
outfile.close()
You can specify a chunksize
option in the read_csv
call. See here for details
Alternatively you could use the Python csv library and create your own csv Reader or DictReader and then use that to read in data in whatever chunk size you choose.