问题
I downloaded the category and categorylinks table sql.gz files from mediawiki and generated the required tables:
category and categorylinks
Manual for the tables: CategoryLinks Category
Consider the following category page of: NoSQL The parent category of this page is Database and Database management. How could I get this information from the two tables? The manual for category table says the following but I am unable to get that information: "
Note: The pages and sub-categories are stored in the categorylinks table."
回答1:
Categories alone have no hierachy. It’s the category pages that make the subcategorization work. So you will also have to get the page_id
from the page
table to be able to resolve this relation.
It essentially works like this:
- Category’s
cat_title
is a page title. - Find that
page_title
in the page table, get thepage_id
- Use the
page_id
to get the category link incl_from
- Get the parent category title from
cl_to
- Repeat from 2
来源:https://stackoverflow.com/questions/21782410/finding-subcategories-of-a-wikipedia-category-using-category-and-categorylinks-t