Python: handling a large set of data. Scipy or Rpy? And how?

前端 未结 6 593
我在风中等你
我在风中等你 2021-02-04 17:47

In my python environment, the Rpy and Scipy packages are already installed.

The problem I want to tackle is such:

1) A huge set of financial data are stored in

6条回答
  •  情话喂你
    2021-02-04 18:22

    I don't know anything about Rpy. I do know that SciPy is used to do serious number-crunching with truly large data sets, so it should work for your problem.

    As zephyr noted, you may not need either one; if you just need to keep some running sums, you can probably do it in Python. If it is a CSV file or other common file format, check and see if there is a Python module that will parse it for you, and then write a loop that sums the appropriate values.

    I'm not sure how to get the top ten rows. Can you gather them on the fly as you go, or do you need to compute the sums and then choose the rows? To gather them you might want to use a dictionary to keep track of the current 10 best rows, and use the keys to store the metric you used to rank them (to make it easy to find and toss out a row if another row supersedes it). If you need to find the rows after the computation is done, slurp all the data into a numpy.array, or else just take a second pass through the file to pull out the ten rows.

提交回复
热议问题