How can I replace a factor levels with the top n levels (by some metric), plus [other]?

后端 未结 1 375
甜味超标
甜味超标 2020-12-31 14:04

For a factor with more than a sensible number of levels to color in a graph, I want to replace any levels that are not in the \'top 10\' with \'other\'.

Alternat

1条回答
  •  醉梦人生
    2020-12-31 14:48

    So after going through several iterations and searching the web, I have created this nice short one.

    hotfactor= function(fac,by,n=10,o="other") {
       levels(fac)[rank(-xtabs(by~fac))[levels(fac)]>n] <- o
       fac
    }
    

    It's great for summarising data, and you can use it to access the great rcolorbrewer color schemes (which each have a limited number of carefully selected colors).


    Usage notes:

    fac should be a factor, and works best with no empty factor levels. You may want to run droplevels(as.factor(mydata)) first.

    It doesn't sort the factor levels. for best results in barcharts you should run the following on the output factor.

    x <- hotfactor(f,val)
    x <- reorder(x,-val,sum)
    

    0 讨论(0)
提交回复
热议问题