Please forgive my panda newbie question, but I have a column of U.S. towns and states, such as the truncated version shown below (For some strange reason, the name of the co
Without much context or access to your data, I'd suggest something along these lines. First, modify the code that reads your data:
df = pd.read_csv(..., header=None, names=['RegionName'])
# add header=False so as to read the first row as data
Now, extract the state name using str.extract
, this should only extract names as long as they are succeeded by the substring "[edit]". You can then forward fill all NaN values using ffill
.
df['State'] = df['RegionName'].str.extract(
r'(?P.*)(?=\s*\[edit\])'
).ffill()