I\'m looking to assign some different readability scores to text in R such as the Flesh Kincaid.
Does anyone know of a way to segment words into syllables using R? I
gsk3 is correct: if you want a correct solution, it is non-trivial.
For example, you have to watch out for strange things like silent e at the end of a word (eg pane), or know when it's not silent, as in finale.
However, if you just want a quick-and-dirty approximation, this will do it:
> nchar( gsub( "[^X]", "", gsub( "[aeiouy]+", "X", tolower( x ))))
[1] 1 1 2 2 1 3
To understand how the parts work, just strip away the function calls from the outside in, starting with nchar
and then gsub
, etc... ...until the expression makes sense to you.
But my guess is, considering a fight between R's power vs the profusion of exceptions in the English language, you could get a decent answer (maybe 99% right?) parsing through normal text, without a lot of work - heck, the simple parser above may get 90%+ right. With a little more work, you could deal with silent e's if you like.
It all depends on your application - whether this is good enough or you need something more accurate.