How to group wikipedia categories in python?

后端 未结 6 614
心在旅途
心在旅途 2021-02-01 06:58

For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.

6条回答
  •  遥遥无期
    2021-02-01 07:32

    You could try to classify the wikipedia categories by the mediawiki links and backlinks returned for each category

    import re
    from mediawiki import MediaWiki
    
    #TermFind will search through a list a given term
    def TermFind(term,termList):
        responce=False
        for val in termList:
            if re.match('(.*)'+term+'(.*)',val):
                responce=True
                break
        return responce
    
    #Find if the links and backlinks lists contains a given term 
    def BoundedTerm(wikiPage,term):
        aList=wikiPage.links
        bList=wikiPage.backlinks
        responce=False
        if TermFind(term,aList)==True and TermFind(term,bList)==True:
             responce=True
        return responce
    
    container=[]
    wikipedia = MediaWiki()
    for val in termlist:
        cpage=wikipedia.page(val)
        if BoundedTerm(cpage,'term')==True:
            container.append('medical')
        else:
            container.append('nonmedical')
    

    The idea is to try to guess a term that is shared by most of the categories, I try biology, medicine and disease with good results. Perhaps you can try to use mulpile calls of BoundedTerms to make the clasification, or a single call for multiple terms and combine the result for the classification. Hope it helps

提交回复
热议问题