how to write spacy matcher of POS regex

后端 未结 2 1133
情歌与酒
情歌与酒 2021-01-13 01:32

Spacy has two features I\'d like to combine - part-of-speech (POS) and rule-based matching.

How can I combine them in a neat way?

For example - let\'s say i

相关标签:
2条回答
  • 2021-01-13 01:55

    Sure, simply use the POS attribute.

    import spacy
    nlp = spacy.load('en')
    from spacy.matcher import Matcher
    from spacy.attrs import POS
    matcher = Matcher(nlp.vocab)
    matcher.add_pattern("Adjective and noun", [{POS: 'ADJ'}, {POS: 'NOUN'}])
    
    doc = nlp(u'what are the main issues')
    matches = matcher(doc)
    
    0 讨论(0)
  • 2021-01-13 02:17

    Eyal Shulman's answer was helpful, but it makes you hard code a pattern matcher, not exactly use a regular expression.

    I wanted to use regular expressions, so I made my own solution:

        pattern = r'(<VERB>)*(<ADV>)*(<PART>)*(<VERB>)+(<PART>)*' 
        ## create a string with the pos of the sentence
        posString = ""
        for w in doc[start:end].sent:
            posString += "<" + w.pos_ + ">"
    
        lstVerb = []
        for m in re.compile(pattern).finditer(posString):
            ## each m is a verb phrase match
            ## count the "<" in m to find how many tokens we want
            numTokensInGroup = m.group().count('<')
    
            ## then find the number of tokens that came before that group.
            numTokensBeforeGroup = posString[:m.start()].count('<') 
    
            verbPhrase = sentence[numTokensBeforeGroup:numTokensBeforeGroup+numTokensInGroup]
            ## starting at character offset m.start()
            lstVerb.append(verbPhrase)
    
    0 讨论(0)
提交回复
热议问题