Missing data in pandas.crosstab

前端 未结 2 1906
太阳男子
太阳男子 2021-02-03 13:35

I\'m making some crosstabs with pandas:

a = np.array([\'foo\', \'foo\', \'foo\', \'bar\', \'bar\', \'foo\', \'foo\'], dtype=object)
b = np.array([\'one\', \'one\         


        
相关标签:
2条回答
  • 2021-02-03 13:51

    The crosstab function has a parameter called dropna which is set to True by default. This parameter defines whether empty columns (such as the one-shiny column) should be displayed or not.

    I tried calling the funcion like this:

    pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)
    

    and this is what I got:

    b     one          two       
    c    dull  shiny  dull  shiny
    a                            
    bar     1      0     1      0
    foo     2      0     1      2
    

    Hope that was still helpful.

    0 讨论(0)
  • 2021-02-03 14:02

    I don't think there is a way to do this, and crosstab calls pivot_table in the source, which doesn't seem to offer this either. I raised it as an issue here.

    A hacky workaround (which may or may not be the same as you were already using...):

    from itertools import product
    ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
    a_x_b = list(product(np.unique(b), np.unique(c)))
    a_x_b = pd.MultiIndex.from_tuples(a_x_b)
    
    In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
    Out[15]:
          one          two
         dull  shiny  dull  shiny
    a
    bar     1      0     1      0
    foo     2      0     1      2
    

    If product is too slow, here is a numpy implementation of it.

    0 讨论(0)
提交回复
热议问题