Best way to handle path with pandas

问题

When I have a pd.DataFrame with paths, I end up doing a lot of .map(lambda path: Path(path).{method_name}, or apply(axis=1) e.g:

(
    pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
    .assign(full_path=lambda df: df.apply(lambda row: Path(row.base_dir) / row.file_name, axis=1))
)
  base_dir file_name     full_path
0    dir_A    file_0  dir_A/file_0
1    dir_B    file_1  dir_B/file_1

It seems odd to me especially because pathlib does implement / so that something like df.base_dir / df.file_name would be more pythonic and natural.

I have not found any path type implemented in pandas, is there something I am missing?

EDIT

I have found it may be better to once for all do sort of a astype(path) then at least for path concatenation with pathlib it is vectorized:

(
    pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
    # this is where I would expect `astype({'base_dir': Path})`
    .assign(**{col_name:lambda df: df[col_name].map(Path) for col_name in ["base_dir", "file_name"]})
    .assign(full_path=lambda df: df.base_dir / df.file_name)
)

回答1:

It seems like the easiest way would be:

df.base_dir.map(Path) / df.file_name.map(Path)

It saves the need for a lambda function, but you still need to map to 'Path'.

Alternatively, just do:

df.base_dir.str.cat(df.file_name, sep="/")

The latter won't work on Windows (who cares, right? :) but will probably run faster.

回答2:

import pandas as pd
import os
   
df = pd.DataFrame({"p1":["path1"],"p2":["path2"]})
df.apply(lambda x:os.path.join(x.p1, x.p2), axis=1)

Output:

0    path1\path2
dtype: object

Edit:

After being told to not use assign you can try this

See .to_json() docs

import os
import pandas as pd       
df = pd.DataFrame({"p1":["path1", "path3"],"p2":["path2", "path4"]})
print(df.to_json(orient="values"))

Output

[["path1","path2"],["path3","path4"]]

From here it's simple, just use map(lambda x:os.path.join(*x), ...) and you get a list of paths.

来源：https://stackoverflow.com/questions/61475633/best-way-to-handle-path-with-pandas

标签

python

pandas

pathlib