Understanding algorithms for measuring trends

前端 未结 4 1009
醉梦人生
醉梦人生 2021-01-30 02:17

What\'s the rationale behind the formula used in the hive_trend_mapper.py program of this Hadoop tutorial on calculating Wikipedia trends?

There are actuall

4条回答
  •  说谎
    说谎 (楼主)
    2021-01-30 02:40

    The code implements statistics (in this case the "baseline trend"), you should educate yourself on that and everything becomes clearer. Wikibooks has a good instroduction.

    The algorithm takes into account that new pages are by definition more unpopular than existing ones (because - for example - they are linked from relatively few other places) and suggests that those new pages will grow in popularity over time.

    error is the error margin the system expects for its prognoses. The higher error is, the more unlikely the trend will continue as expected.

提交回复
热议问题