pyarrow

How to install pyarrow on an Alpine Docker image?

梦想的初衷 提交于 2021-01-04 04:31:45
问题 I am trying to install pyarrow using pip in my alpine docker image, but pip is unable to find the package. I'm using the following Dockerfile: FROM python:3.6-alpine3.7 RUN apk add --no-cache musl-dev linux-headers g++ RUN pip install pyarrow output: Sending build context to Docker daemon 4.096kB Step 1/3 : FROM python:3.6-alpine3.7 3.6-alpine3.7: Pulling from library/python ff3a5c916c92: Pull complete 471170bb1257: Pull complete d487cc70216e: Pull complete 9358b3ca3321: Pull complete

How to install pyarrow on an Alpine Docker image?

别等时光非礼了梦想. 提交于 2021-01-04 04:28:20
问题 I am trying to install pyarrow using pip in my alpine docker image, but pip is unable to find the package. I'm using the following Dockerfile: FROM python:3.6-alpine3.7 RUN apk add --no-cache musl-dev linux-headers g++ RUN pip install pyarrow output: Sending build context to Docker daemon 4.096kB Step 1/3 : FROM python:3.6-alpine3.7 3.6-alpine3.7: Pulling from library/python ff3a5c916c92: Pull complete 471170bb1257: Pull complete d487cc70216e: Pull complete 9358b3ca3321: Pull complete

Is there a more idiomatic way to select rows from a PyArrow table based on contents of a column?

拜拜、爱过 提交于 2021-01-01 07:02:38
问题 I have a large PyArrow table with one column called index that I would like to use to partition the table; each separate value of index represents a different quantity in the table. Is there an idiomatic way to select rows from a PyArrow table based on contents of a column? Here's an example table: import pyarrow as pa import pyarrow.parquet as pq import pandas as pd import numpy as np # Example table for data schema irow = np.arange(2**20) dt = 17 df0 = pd.DataFrame({'timestamp': np.array(