问题
I downloaded the category and categorylinks table sql.gz files from mediawiki and generated the required tables:
category and categorylinks
Manual for the tables: CategoryLinks Category
Consider the following category page of: NoSQL The parent category of this page is Database and Database management. How could I get this information from the two tables? The manual for category table says the following but I am unable to get that information: "
Note: The pages and sub-categories are stored in the categorylinks table."
回答1:
Categories alone have no hierachy. It’s the category pages that make the subcategorization work. So you will also have to get the page_id from the page table to be able to resolve this relation.
It essentially works like this:
- Category’s
cat_titleis a page title. - Find that
page_titlein the page table, get thepage_id - Use the
page_idto get the category link incl_from - Get the parent category title from
cl_to - Repeat from 2
来源:https://stackoverflow.com/questions/21782410/finding-subcategories-of-a-wikipedia-category-using-category-and-categorylinks-t