问题
I am trying to read a large dataset in .csv format which is update automatically using the pandas library. The problem is that in my data, the first row is a string without double quotation marks, and the other colums are strings with double quotation marks. It is not possible for me to adjust the .csv file manually.
A simplified dataset would look like this
- A,"B","C","D"
- comp_a,"tree","house","door"
- comp_b,"truck","red","blue"
I need the data to be stored as separate columns without the quotation marks like this:
- A B C D
- comp_a tree house door
- comp_b truck red blue
I tried using
import pandas as pd
df_csv = pd.read(path_to_file,delimiter=',')
which gives me the complete header as a single variable for the last column
- A,"B","C","D"
- comp_a "tree" "house" "door"
- comp_b "truck" "red" "blue"
The closest result to the one i need was by using the following
df_csv = pd.read(path_to_file,delimiter=',',quoting=3)
which correctly recognizes each column, but adds in a bunch of extra double quotes.
- "A ""B"" ""C"" ""D"""
- "comp_a ""tree"" ""house"" ""door"""
- "comp_b ""truck"" ""red"" ""blue"""
Setting quoting to a value from 0 to 2 just reads an entire row as a single column.
Does anyone know how I can remove all quotation marks when reading the .csv file?
回答1:
Just load the data with pd.read_csv()
and then use .replace('"','', regex=True)
In one line it would be:
df = pd.read_csv(filename, sep=',').replace('"','', regex=True)
To set the columns names:
df.columns = df.iloc[0]
And drop row 0:
df = df.drop(index=0).reset_index(drop=True)
回答2:
you can replace "
after read_csv
and save that file again using df_csv.to_csv('fname')
df_csv.apply(lambda x:x.str.replace('"', ""))
回答3:
Consider your data in a file data.csv like
$> more data.csv
A,"B","C","D"
comp_a,"tree","house","door"
comp_b,"truck","red","blue"
Perhaps a newer pandas version would solve your problem from itself, e.g. at pd.__version__ = '0.23.1'
In [1]: import pandas as pd
In [2]: pd.read_csv('data.csv')
Out[2]:
A B C D
0 comp_a tree house door
1 comp_b truck red blue
Otherwise apply a replace on the read-out
pd.read_csv('data.csv').replace('"', '')
来源:https://stackoverflow.com/questions/51359010/pandas-data-with-double-quote