问题
I would like to restore commas and full stops in text without punctuation. For example, let's take this sentence:
I am XYZ I want to execute I have a doubt
And I would like to detect that there should be 1 commas and 1 full stop in the above example:
I am XYZ, I want to execute. I have a doubt.
Can anyone advise me on how to achieve this using Python and NLP concepts?
回答1:
If I understand well, you want to improve the quality of a sentence by adding the appropriate punctuation. This is sometimes called punctuation restoration.
A good first step is to apply the usual NLP pipeline, namely tokenization, POS tagging, and parsing, using libraries such as NLTK or Spacy.
Once this preprocessing is done, you'll have to apply a rule-based or a machine learning approach to define where the punctuation should be, based on the features extracted from the NLP pipeline (e.g. sentence boundaries, parsing tree, POS, etc.).
However this is not a trivial task. It can require strong NLP/AI skills if you want to customise your algorithm.
Some examples that can be reused:
- Here is a simple approach using Spacy, mainly based on sentence boundaries.
- Here is a more complex solution, using the Theano deep learning library.
来源:https://stackoverflow.com/questions/59679563/how-to-restore-punctuation-using-python