Spacy has two features I\'d like to combine - part-of-speech (POS) and rule-based matching.
How can I combine them in a neat way?
For example - let\'s say i
Sure, simply use the POS attribute.
import spacy
nlp = spacy.load('en')
from spacy.matcher import Matcher
from spacy.attrs import POS
matcher = Matcher(nlp.vocab)
matcher.add_pattern("Adjective and noun", [{POS: 'ADJ'}, {POS: 'NOUN'}])
doc = nlp(u'what are the main issues')
matches = matcher(doc)
Eyal Shulman's answer was helpful, but it makes you hard code a pattern matcher, not exactly use a regular expression.
I wanted to use regular expressions, so I made my own solution:
pattern = r'(<VERB>)*(<ADV>)*(<PART>)*(<VERB>)+(<PART>)*'
## create a string with the pos of the sentence
posString = ""
for w in doc[start:end].sent:
posString += "<" + w.pos_ + ">"
lstVerb = []
for m in re.compile(pattern).finditer(posString):
## each m is a verb phrase match
## count the "<" in m to find how many tokens we want
numTokensInGroup = m.group().count('<')
## then find the number of tokens that came before that group.
numTokensBeforeGroup = posString[:m.start()].count('<')
verbPhrase = sentence[numTokensBeforeGroup:numTokensBeforeGroup+numTokensInGroup]
## starting at character offset m.start()
lstVerb.append(verbPhrase)