This should be easy but somehow I\'m not quite getting it.
My assignment is:
Write a function sentenceCapitalizer that has one parameter of type
To allow arbitrary whitespace after the dot. Or to capitalize the full words (It might make the difference for a Unicode text), you could use regular expressions -- re module:
#!/usr/bin/env python3
import re
def sentenceCapitalizer(text):
return re.sub(r"(\.\s+|^)(\w+)",
lambda m: m.group(1) + m.group(2).capitalize(),
text)
s = "hEllo. my name is Joe. what is your name?"
print(sentenceCapitalizer(s))
# -> 'Hello. My name is Joe. What is your name?'
Note: pep8 recommends lowercase names for functions e.g., capitalize_sentence()
instead of sentenceCapitalizer()
.
To accept a larger variaty of texts, you could use nltk package:
# $ pip install nltk
from nltk.tokenize import sent_tokenize, word_tokenize
def sent_capitalize(sentence):
"""Capitalize the first word in the *sentence*."""
words = word_tokenize(sentence)
if words:
words[0] = words[0].capitalize()
return " ".join(words[:-1]) + "".join(words[-1:]) # dot
text = "hEllo. my name is Joe. what is your name?"
# split the text into a list of sentences
sentences = sent_tokenize(text)
print(" ".join(map(sent_capitalize, sentences)))
# -> Hello. My name is Joe. What is your name?