Understanding algorithms for measuring trends

前端 未结 4 1010
醉梦人生
醉梦人生 2021-01-30 02:17

What\'s the rationale behind the formula used in the hive_trend_mapper.py program of this Hadoop tutorial on calculating Wikipedia trends?

There are actuall

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-30 02:20

    The reason for moderating the measure by the volume of clicks is not to penalise popular pages but to make sure that you can compare large and small changes with a single measure. If you just use y2 - y1 you will only ever see the click changes on large volume pages. What this is trying to express is "significant" change. 1000 clicks change if you attract 100 clicks is really significant. 1000 click change if you attract 100,000 is less so. What this formula is trying to do is make both of these visible.

    Try it out at a few different scales in Excel, you'll get a good view of how it operates.

    Hope that helps.

提交回复
热议问题