问题
How to approach the problem of building a Punctuation Predictor?
The working demo for the question can be found in this link.
Input Text is as below:
"its been a little while Kirk tells me its actually been
three weeks now that Ive been using this device right here
that is of course the Galaxy S ten I mean Ive just been
living with this phone this has been my phone has the SIM
card in it I took photos I lived live I sent tweets whatsapp
slack email whatever other app this was my smart phone"
回答1:
Predicting punctuation for text (in particular for speech transcriptions) is a well-known problem.
You could try using Punctuator2, either with the provided models or by training new models for text from your domain. Look at the bottom of the README for pointers to some related projects.
Grammarly developed a simpler approach for only inserting periods between run-on sentences, described here:
https://www.grammarly.com/blog/nlp-run-on-sentences/
They did some nice experiments with real vs. artificial training data, which is useful because it's easy to generate training data from texts that you know have reliable punctuation at sentence boundaries, like newspaper text.
来源:https://stackoverflow.com/questions/55786659/how-to-add-punctuation-marks-for-the-sentences