In my python environment, the Rpy and Scipy packages are already installed.
The problem I want to tackle is such:
1) A huge set of financial data are stored in
As @gsk3 noted, bigmemory
is a great package for this, along with the packages biganalytics
and bigtabulate
(there are more, but these are worth checking out). There's also ff
, though that isn't as easy to use.
Common to both R and Python is support for HDF5 (see the ncdf4
or NetCDF4
packages in R), which makes it very speedy and easy to access massive data sets on disk. Personally, I primarily use bigmemory
, though that's R specific. As HDF5 is available in Python and is very, very fast, it's probably going to be your best bet in Python.