Altair/Vega-Lite bar chart: filter top K bars from aggregated field

删除回忆录丶 提交于 2019-12-10 11:34:51

问题


I'm visualizing a dataset that has, for instance, a categorical field. I want to create a bar chart that shows the different categories for that field with their cardinality, sorted in 'ascendind'/'descending' order. This can simply be achieved with altair:

import pandas as pd
import altair as alt

data = {0:{'Name':'Mary', 'Sport':'Tennis'},
    1:{'Name':'Cal', 'Sport':'Tennis'},
    2:{'Name':'John', 'Sport':'Tennis'},
    3:{'Name':'Jane', 'Sport':'Tennis'},
    4:{'Name':'Bob', 'Sport':'Golf'},
    5:{'Name':'Jerry', 'Sport':'Golf'},
    6:{'Name':'Gustavo', 'Sport':'Golf'},
    7:{'Name':'Walter', 'Sport':'Swimming'},
    8:{'Name':'Jessy', 'Sport':'Swimming'},
    9:{'Name':'Patric', 'Sport':'Running'},
    10:{'Name':'John', 'Sport':'Shooting'}}

df = pd.DataFrame(data).T

bars = alt.Chart(df).mark_bar().encode(
    x=alt.X('count():Q', axis=alt.Axis(format='.0d', tickCount=4)),
    y=alt.Y('Sport:N', 
        sort=alt.SortField(op='count', field='Sport:N', order='descending'))
)
bars

Now suppose I'm interested only in the first three most numerous categories. It seemed reasonable to use "transform_window" and “transform_filter” to filter the data but I was unable to find a way to do so. I also went to Vega-Lite Top K example trying to adapt it but without success (my "best" attempt is shown below).

bars.transform_window(window=[alt.WindowFieldDef(op='count', 
                                                 field='Sport:N',
                                                 **{'as':'cardinality'})],
                      frame=[None, None])

bars.transform_window(window=[alt.WindowFieldDef(op='rank',
                                                 field='cardinality',
                                                 **{'as':'rank'})],
                      frame=[None, None],
                      sort=[alt.WindowSortField(field='rank',
                                                order='descending')])

bars.transform_filter( ..... what??? .....)

回答1:


I would probably do this by first using an aggregate transform to compute the number of people in each group, and then proceeding along the lines of the top-K example you linked to.

alt.Chart(df).mark_bar().encode(
    x='count:Q',
    y=alt.Y('Sport:N',
        sort=alt.SortField(field='count', order='descending', op='sum')
    ),
).transform_aggregate(
    count='count()',
    groupby=['Sport']
).transform_window(
    window=[{'op': 'rank', 'as': 'rank'}],
    sort=[{'field': 'count', 'order': 'descending'}]
).transform_filter('datum.rank <= 3')

Note that in Altair version 2.2 (which has not yet been released as I write this) alt.SortField will be renamed to alt.EncodingSortField, because of a change in the underlying Vega-Lite schema.

(side note: the altair API for sorting and window transforming is pretty clunky at the moment, but we are thinking hard about how to improve that)



来源:https://stackoverflow.com/questions/50855610/altair-vega-lite-bar-chart-filter-top-k-bars-from-aggregated-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!