I have a csv file which I want to read using python panda. The header and lines looks the following:
A ^B^C^D^E ^F ^G ^H^I^J^K
Read the file as you have done and then strip extra whitespace for each column which is a string:
df = (pd.read_csv('input.csv', sep="^")
.apply(lambda x: x.str.strip() if isinstance(x, str) else x))
Use regex \s*\^
which means 0 or more whitespace and ^, you have to specify the python engine here to avoid a warning about regex support:
In [152]:
t="""A ^B^C^D^E ^F ^G ^H^I^J^K^L^M^N"""
df= pd.read_csv(io.StringIO(t), sep='\s*\^', engine='python')
df.columns
Out[152]:
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'], dtype='object')
Can't you supply regex as a seperator?
sep = re.compile(r'[\^\s]+')
Your separator can be a regular expression, so try something like this:
df = pd.read_csv('input.csv', sep="[ ^]+")
The regular expression should use any number of spaces or carets (^) in a row as a single separator.
If the only whitespace in your file is the extra whitespace between columns (i.e. no columns have raw text with spaces), an easy fix would be to simply remove all the spaces in the file. An example command to do that would be:
<input.csv tr -d '[[:blank:]]' > new_input.txt