How to create lazy_evaluated dataframe columns in Pandas

后端未结

关注

 2  1943

逝去的感伤 2021-02-05 11:16

A lot of times, I have a big dataframe df to hold the basic data, and need to create many more columns to hold the derivative data calculated by basic data columns.

2条回答

情书的邮戳 (楼主)

2021-02-05 11:59

Starting in 0.13 (releasing very soon), you can do something like this. This is using generators to evaluate a dynamic formula. In-line assignment via eval will be an additional feature in 0.13, see here

In [19]: df = DataFrame(randn(5, 2), columns=['a', 'b'])

In [20]: df
Out[20]: 
          a         b
0 -1.949107 -0.763762
1 -0.382173 -0.970349
2  0.202116  0.094344
3 -1.225579 -0.447545
4  1.739508 -0.400829

In [21]: formulas = [ ('c','a+b'), ('d', 'a*c')]

Create a generator that evaluates a formula using eval; assigns the result, then yields the result.

In [22]: def lazy(x, formulas):
   ....:     for col, f in formulas:
   ....:         x[col] = x.eval(f)
   ....:         yield x
   ....:

In action

In [23]: gen = lazy(df,formulas)

In [24]: gen.next()
Out[24]: 
          a         b         c
0 -1.949107 -0.763762 -2.712869
1 -0.382173 -0.970349 -1.352522
2  0.202116  0.094344  0.296459
3 -1.225579 -0.447545 -1.673123
4  1.739508 -0.400829  1.338679

In [25]: gen.next()
Out[25]: 
          a         b         c         d
0 -1.949107 -0.763762 -2.712869  5.287670
1 -0.382173 -0.970349 -1.352522  0.516897
2  0.202116  0.094344  0.296459  0.059919
3 -1.225579 -0.447545 -1.673123  2.050545
4  1.739508 -0.400829  1.338679  2.328644

So its user determined ordering for the evaluation (and not on-demand). In theory numba is going to support this, so pandas possibly support this as a backend for eval (which currently uses numexpr for immediate evaluation).

my 2c.

lazy evaluation is nice, but can easily be achived by using python's own continuation/generate features, so building it into pandas, while possible, is quite tricky, and would need a really nice usecase to be generally useful.

0 讨论(0)

查看其它2个回答