fastparquet

error with snappy while importing fastparquet in python

瘦欲@ 提交于 2019-11-29 14:02:27
I have installed installed the following modules in my EC2 server which already has python (3.6) & anaconda installed : snappy pyarrow s3fs fastparquet except fastparquet everything else works on importing. When I try to import fastparquet it throws the following error : [username@ip8 ~]$ conda -V conda 4.2.13 [username@ip-~]$ python Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. import fastparquet Traceback (most recent call last): File "<stdin>",

How to read partitioned parquet files from S3 using pyarrow in python

*爱你&永不变心* 提交于 2019-11-28 20:22:50
问题 I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_date=20-12-2012/abcdsd0324324.snappy.parquet data_folder/serial_number=2/cur_date=27-12-2012/asdsdfsd0324324.snappy.parquet pyarrow's ParquetDataset module has the capabilty to read from partitions. So I have tried the following code : >>> import pandas as pd >>> import pyarrow.parquet as pq >>> import s3fs >>> a = "s3://my_bucker/path/to/data_folder/" >>> dataset = pq

error with snappy while importing fastparquet in python

夙愿已清 提交于 2019-11-28 07:52:12
问题 I have installed installed the following modules in my EC2 server which already has python (3.6) & anaconda installed : snappy pyarrow s3fs fastparquet except fastparquet everything else works on importing. When I try to import fastparquet it throws the following error : [username@ip8 ~]$ conda -V conda 4.2.13 [username@ip-~]$ python Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or