How do you append/update to a parquet
file with pyarrow
?
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
tabl
Generally speaking, Parquet datasets consist of multiple files, so you append by writing an additional file into the same directory where the data belongs to. It would be useful to have the ability to concatenate multiple files easily. I opened https://issues.apache.org/jira/browse/PARQUET-1154 to make this possible to do easily in C++ (and therefore Python)