Pandas HDF5 Select with Where on non natural-named columns

佐手、 提交于 2019-12-14 03:52:58

问题


in my continuing spree of exotic pandas/HDF5 issues, I encountered the following:

I have a series of non-natural named columns (nb: because of a good reason, with negative numbers being "system" ids etc), which normally doesn't give an issue:

fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'])

however, my select statement does fall over it:

>>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])])
blablabla
File "/srv/www/li/venv/local/lib/python2.7/site-packages/tables/table.py", line 1251, in _required_expr_vars
    raise NameError("name ``%s`` is not defined" % var)
NameError: name ``a`` is not defined

Is there any way to work around it? I could rename my negative value from "a-1" to a "a_1" but that means reloading all of the data in my system. Which is rather much! :)

Suggestions are very welcome!


回答1:


Here's a test table

In [1]: df = DataFrame({ 'a-6' : [1,2,3,np.nan] })

In [2]: df
Out[2]: 
   a-6
0    1
1    2
2    3
3  NaN

In [3]: df.to_hdf('test.h5','df',mode='w',table=True)

 In [5]: df.to_hdf('test.h5','df',mode='w',table=True,data_columns=True)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_kind'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_dtype'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

There is a very way, but would to build this into the code itself. You can do a variable substitution on the column names as follows. Here is the existing routine (in master)

   def select(self):
        """
        generate the selection
        """
        if self.condition is not None:
            return self.table.table.readWhere(self.condition.format(), start=self.start, stop=self.stop)
        elif self.coordinates is not None:
            return self.table.table.readCoordinates(self.coordinates)
        return self.table.table.read(start=self.start, stop=self.stop)

If instead you do this

(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])

e.g. by subsituting x with the column reference, you can get the data.

This could be done on detection of invalid column names, but is pretty tricky.

Unfortunately I would suggest renaming your columns.



来源:https://stackoverflow.com/questions/19277059/pandas-hdf5-select-with-where-on-non-natural-named-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!