I have a csv file from a database I\'ve converted into a Pandas DataFrame that I\'m trying to clean up. One of the issues is that multiple values have been input into single cel
I would be inclined to use a lookahead; how you do so depends on your expected data.
This is a negative lookahead. it says "a comma that is not followed by whitespace" and would be preferred if you are sure that all comments with commas have whitespace, and would want to treat "red,green" as something to split.
data.str.split('[,](?!\s)').apply(pd.Series)
Another option is a positive lookahead for something that looks like a valid value; your example was numbers, so for instance this would split only on a comma that is followed by a number:
data.str.split('[,](?:\d)').apply(pd.Series)
Regular expressions are very powerful, but honestly, I am not sure that this solution will be great for you if this is a long-term problem. Getting most cases right as a one-time migration should be fine, but longer term I would consider trying to solve the problem before it gets here. Anyway, here's Debuggex's python regex cheat sheet, in case it is useful to you: https://www.debuggex.com/cheatsheet/regex/python