Finding subcategories of a wikipedia category using category and categorylinks table

倾然丶 夕夏残阳落幕 提交于 2020-01-13 06:46:18

问题


I downloaded the category and categorylinks table sql.gz files from mediawiki and generated the required tables:

category and categorylinks

Manual for the tables: CategoryLinks Category

Consider the following category page of: NoSQL The parent category of this page is Database and Database management. How could I get this information from the two tables? The manual for category table says the following but I am unable to get that information: "

Note: The pages and sub-categories are stored in the categorylinks table."


回答1:


Categories alone have no hierachy. It’s the category pages that make the subcategorization work. So you will also have to get the page_id from the page table to be able to resolve this relation.

It essentially works like this:

  1. Category’s cat_title is a page title.
  2. Find that page_title in the page table, get the page_id
  3. Use the page_id to get the category link in cl_from
  4. Get the parent category title from cl_to
  5. Repeat from 2


来源:https://stackoverflow.com/questions/21782410/finding-subcategories-of-a-wikipedia-category-using-category-and-categorylinks-t

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!