Wikipedia Category Hierarchy from dumps

后端 未结 2 1240
野性不改
野性不改 2021-02-05 16:28

Using Wikipedia\'s dumps I want to build a hierarchy for its categories. I have downloaded the main dump (enwiki-latest-pages-articles) and the category SQL dump (enwiki-latest-

相关标签:
2条回答
  • 2021-02-05 17:13

    The category hierarchy information in MediaWiki is stored in the categorylinks table, so you're going to need the categorylinks dump.

    You're also going to need the page (not pages-articles) dump for page id to title mapping.

    0 讨论(0)
  • 2021-02-05 17:19

    Loading the dump of category links etc... to build a wikipedia hierarchy is very long (even if interesting).

    I found fast path that give good result. I rely on wikipedia vital articles hierarchy. See for instance, sensimark for an example use.

    0 讨论(0)
提交回复
热议问题