How to restore punctuation using Python? [closed]

半腔热情 提交于 2021-02-04 21:57:26

问题


I would like to restore commas and full stops in text without punctuation. For example, let's take this sentence:

I am XYZ I want to execute I have a doubt

And I would like to detect that there should be 1 commas and 1 full stop in the above example:

I am XYZ, I want to execute. I have a doubt.

Can anyone advise me on how to achieve this using Python and NLP concepts?


回答1:


If I understand well, you want to improve the quality of a sentence by adding the appropriate punctuation. This is sometimes called punctuation restoration.

A good first step is to apply the usual NLP pipeline, namely tokenization, POS tagging, and parsing, using libraries such as NLTK or Spacy.

Once this preprocessing is done, you'll have to apply a rule-based or a machine learning approach to define where the punctuation should be, based on the features extracted from the NLP pipeline (e.g. sentence boundaries, parsing tree, POS, etc.).

However this is not a trivial task. It can require strong NLP/AI skills if you want to customise your algorithm.

Some examples that can be reused:

  • Here is a simple approach using Spacy, mainly based on sentence boundaries.
  • Here is a more complex solution, using the Theano deep learning library.


来源:https://stackoverflow.com/questions/59679563/how-to-restore-punctuation-using-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!