问题
I'm using the tm package, and looking to get the Flesch-Kincaid scores for a document using R. I found the koRpus package has some a lot of metrics including reading-level, and started using that. However, the object returned seems to be a very complicated s4 object I don't understand how to parse.
So, I apply this to my corpus:
txt <- system.file("texts", "txt", package = "tm")
(d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat")))
f <- function(x) tokenize(x, format="obj", lang='en')
g <- function(x) flesch.kincaid(x)
x <- foreach(i=1:5) %dopar% g(f(d[[i]]))
x is then the vector of flesch.kincaid applied to Ovid.
> x[[1]]
Flesch-Kincaid Grade Level
Parameters: default
Grade: 13.62
Age: 18.62
Text language: en
How can I get just the return values grade=13.62, and age=18.62? The str(x) is so large it's hard to parse, ie:
> str(x[[1]])
Formal class 'kRp.readability' [package "koRpus"] with 49 slots
..@ hyphen :Formal class 'kRp.hyphen' [package "koRpus"] with 3 slots
.. .. ..@ lang : chr "en"
.. .. ..@ desc :List of 5
.. .. .. ..$ num.syll : num 196
.. .. .. ..$ syll.distrib : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. .. .. ..$ avg.syll.word : num 2.18
.. .. .. ..$ syll.per100 : num 218
.. .. ..@ hyphen:'data.frame': 90 obs. of 2 variables:
.. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ...
.. .. .. ..$ word: chr [1:90] "Si" "quis" "in" "hoc" ...
..@ param :List of 1
.. ..$ Flesch.Kincaid: Named num [1:3] 0.39 11.8 15.59
.. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const"
..@ ARI :List of 1
.. ..$ : logi NA
..@ ARI.NRI :List of 1
.. ..$ : logi NA
..@ ARI.simple :List of 1
.. ..$ : logi NA
..@ Bormuth :List of 1
.. ..$ : logi NA
..@ Coleman :List of 1
.. ..$ : logi NA
..@ Coleman.Liau :List of 1
.. ..$ : logi NA
..@ Dale.Chall :List of 1
.. ..$ : logi NA
..@ Dale.Chall.PSK :List of 1
.. ..$ : logi NA
..@ Dale.Chall.old :List of 1
.. ..$ : logi NA
..@ Danielson.Bryan :List of 1
.. ..$ : logi NA
..@ Dickes.Steiwer :List of 1
.. ..$ : logi NA
..@ DRP :List of 1
.. ..$ : logi NA
..@ ELF :List of 1
.. ..$ : logi NA
..@ Flesch :List of 1
.. ..$ : logi NA
..@ Flesch.PSK :List of 1
.. ..$ : logi NA
..@ Flesch.de :List of 1
.. ..$ : logi NA
..@ Flesch.es :List of 1
.. ..$ : logi NA
..@ Flesch.fr :List of 1
.. ..$ : logi NA
..@ Flesch.nl :List of 1
.. ..$ : logi NA
..@ Flesch.Kincaid :List of 3
.. ..$ flavour: chr "default"
.. ..$ grade : num 13.6
.. ..$ age : num 18.6
..@ Farr.Jenkins.Paterson :List of 1
.. ..$ : logi NA
..@ Farr.Jenkins.Paterson.PSK:List of 1
.. ..$ : logi NA
..@ FOG :List of 1
.. ..$ : logi NA
..@ FOG.PSK :List of 1
.. ..$ : logi NA
..@ FOG.NRI :List of 1
.. ..$ : logi NA
..@ FORCAST :List of 1
.. ..$ : logi NA
..@ FORCAST.RGL :List of 1
.. ..$ : logi NA
..@ Fucks :List of 1
.. ..$ : logi NA
..@ Harris.Jacobson :List of 1
.. ..$ : logi NA
..@ Linsear.Write :List of 1
.. ..$ : logi NA
..@ LIX :List of 1
.. ..$ : logi NA
..@ RIX :List of 1
.. ..$ : logi NA
..@ SMOG :List of 1
.. ..$ : logi NA
..@ SMOG.de :List of 1
.. ..$ : logi NA
..@ SMOG.C :List of 1
.. ..$ : logi NA
..@ SMOG.simple :List of 1
.. ..$ : logi NA
..@ Spache :List of 1
.. ..$ : logi NA
..@ Spache.old :List of 1
.. ..$ : logi NA
..@ Strain :List of 1
.. ..$ : logi NA
..@ Traenkle.Bailer :List of 1
.. ..$ : logi NA
..@ TRI :List of 1
.. ..$ : logi NA
..@ Wheeler.Smith :List of 1
.. ..$ : logi NA
..@ Wheeler.Smith.de :List of 1
.. ..$ : logi NA
..@ Wiener.STF :List of 1
.. ..$ : logi NA
..@ lang : chr "en"
..@ desc :List of 26
.. ..$ sentences : int 10
.. ..$ words : int 90
.. ..$ letters : Named num [1:12] 492 0 8 9 14 18 14 9 10 6 ...
.. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ...
.. ..$ all.chars : int 692
.. ..$ syllables : Named num [1:5] 196 25 32 25 8
.. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ...
.. ..$ lttr.distrib : num [1:6, 1:11] 0 0 90 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ...
.. ..$ syll.distrib : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. ..$ syll.uniq.distrib : num [1:6, 1:4] 15 15 61 19.7 19.7 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. ..$ punct : int 17
.. ..$ conjunctions : int 0
.. ..$ prepositions : int 0
.. ..$ pronouns : int 0
.. ..$ foreign : int 0
.. ..$ TTR : num 0.844
.. ..$ avg.sentc.length : num 9
.. ..$ avg.word.length : num 5.47
.. ..$ avg.syll.word : num 2.18
.. ..$ sntc.per.word : num 0.111
.. ..$ sntc.per100 : num 11.1
.. ..$ lett.per100 : num 547
.. ..$ syll.per100 : num 218
.. ..$ FOG.hard.words : NULL
.. ..$ Bormuth.NOL : NULL
.. ..$ Dale.Chall.NOL : NULL
.. ..$ Harris.Jacobson.NOL: NULL
.. ..$ Spache.NOL : NULL
..@ TT.res :'data.frame': 107 obs. of 6 variables:
.. ..$ token : chr [1:107] "Si" "quis" "in" "hoc" ...
.. ..$ tag : chr [1:107] "word.kRp" "word.kRp" "word.kRp" "word.kRp" ...
.. ..$ lemma : chr [1:107] "" "" "" "" ...
.. ..$ lttr : num [1:107] 2 4 2 3 5 6 3 5 6 1 ...
.. ..$ wclass: chr [1:107] "word" "word" "word" "word" ...
.. ..$ desc : chr [1:107] "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" ...
I'd ideally like to assign the F-K score to the meta(d) back in tm.
I'd appreciate learning either how to understand this return object and take out its values, but also, if there's another, better, faster way to get a F-K score, I'm all ears!
回答1:
Similar to @Paul answer but one liner solution
sapply(lapply(x,slot,'Flesch.Kincaid'),'[',c('age','grade'))
[,1] [,2] [,3] [,4] [,5]
age 18.61778 17.62351 17.77699 18.29032 18.645
grade 13.61778 12.62351 12.77699 13.29032 13.645
回答2:
Just use:
slot(x[[1]], "Flesch.Kincaid")
to get the subset of the object that contains these values. To get these in a list for each element in x
, do something like:
list_fk = lapply(x, slot, "Flesch.Kincaid)
...and to get a vector with grade
:
grades = sapply(list_fk, "[[", "grade")
来源:https://stackoverflow.com/questions/14835894/how-do-i-extract-contents-from-a-korpus-object-in-r