问题
I was given this formula called FRES (Flesch reading-ease test) that is used to measure the readability of a document:
My task is to write a python function that returns the FRES of a text. Hence I need to convert this formula into a python function.
I have re-implemented my code from a answer I got to show what I have so far and the result it has given me:
import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')
import re
from itertools import chain
from nltk.corpus import gutenberg
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
return len(VC.findall(word))
def compute_fres(text):
"""Return the FRES of a text.
>>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
>>> compute_fres(emma) # doctest: +ELLIPSIS
99.40...
"""
for filename in gutenberg.fileids():
sents = gutenberg.sents(filename)
words = gutenberg.words(filename)
num_sents = len(sents)
num_words = len(words)
num_syllables = sum(count_syllables(w) for w in words)
score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)
After running the code this is the result message I got:
Failure
Expected :99.40...
Actual :92.84866041488623
File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
compute_fres(emma) # doctest: +ELLIPSIS
Expected:
99.40...
Got:
92.84866041488623
My function is supposed to pass the doctest and result in 99.40... And I'm also not allowed to edit the syllables function since it came with the task:
import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
return len(VC.findall(word))
This question has being very tricky but at least now it's giving me a result instead of an error message, not sure why it's giving me a different result though.
Any help will be very appreciated. Thank you.
回答1:
BTW, there's the textstat library.
from textstat.textstat import textstat
from nltk.corpus import gutenberg
for filename in gutenberg.fileids():
print(filename, textstat.flesch_reading_ease(filename))
If you're bent on coding up your own, first you've to
- decide if a punctuation is a word
- define how to count no. of syllables in the word.
If punctuation is a word and syllables is counted by the regex in your question, then:
import re
from itertools import chain
from nltk.corpus import gutenberg
def num_syllables_per_word(word):
return len(re.findall('[aeiou]+[^aeiou]+', word))
for filename in gutenberg.fileids():
sents = gutenberg.sents(filename)
words = gutenberg.words(filename) # i.e. list(chain(*sents))
num_sents = len(sents)
num_words = len(words)
num_syllables = sum(num_syllables_per_word(w) for w in words)
score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
print(filename, score)
来源:https://stackoverflow.com/questions/49251629/converting-readability-formula-into-python-function