问题
I found this reference : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch05s07.html
Is it possible to use it with kwic
function in the quanteda
package to be able to find documents in a corpus containing words that are not "stuck" but close to each other, with maybe a few other words between ?
for example, if I give two words in the function, I would like to find the documents in a corpus where these two words occur but maybe with some words between. For example, you tell me "engine" and "electrical", I will also get the reports where "electrical synchronous engine" appears but not the ones in which "engine" and "electrical" appear in completely different contexts.
回答1:
quanteda does not have a NEAR operator, but you can do the same thing using window
argument of tokens_select()
. In this example, I am searching words five words from "america*" uisng kwic()
:
require(quanteda)
toks <- tokens(data_corpus_inaugural)
toks_america <- tokens_select(toks, "america*", window = 5)
kwic(toks_america, "econom*")
# [2013-Obama, 45] has been tested by crises | economic | recovery has begun. America's
kwic(toks_america, "power")
# [1997-Clinton, 85] it can give Americans the | power | to make a government is
来源:https://stackoverflow.com/questions/49907577/is-it-possible-to-use-kwic-function-to-find-words-near-to-each-other