问题
I have wikipedia article dumps in different languages. I want to filter them with articles which belong to a category(specifically Category:WikiProject_Biography)
I could get a lot of similar questions for example:
- Wikipedia API to get articles belonging to a category
- How do I get all articles about people from Wikipedia?
However, I would like to do it all offline. That is using dumps, and also for different languages.
Other things which I explored are category table and category link table. MediaWiki_1.28.0_database_schema
回答1:
Fetch the page
and categorylinks
tables from the dump, then run
SELECT
page_namespace,
page_title
FROM
page
JOIN categorylinks ON page_id = cl_from
WHERE
cl_to = 'WikiProject_Biography'
;
to get the list of pages.
来源:https://stackoverflow.com/questions/43178266/extract-wikipedia-articles-belonging-to-a-category-from-offline-dumps