Rails gem to break a paragraph into series of sentences

后端 未结 1 655
孤独总比滥情好
孤独总比滥情好 2021-01-06 03:39

I\'m trying to split a paragraph into series of sentences such that each sentence group stays under N characters. In case of a single sentence that is longer than N, it shou

相关标签:
1条回答
  • 2021-01-06 03:48

    There are two non-trivial tasks to achieve what you are after:

    1. splitting a string into sentences
    2. and word-wrapping each sentence with extra care for punctuation.

    I think the first one is not easy to implement from scratch so your best bet might just be to use natural language processing libraries provided that your "third-party language processing service" doesn't have such a feature. I don't know any "rails gem" to meet your requirement.

    Here is just a toy example of splitting a string into sentences using stanford-core-nlp.

    require 'stanford-core-nlp'
    text = "Lorem ipsum, consectetur elit. Donec ut ligula. Sed acumsan posuere tristique. Sed et tristique sem. Aenean sollicitudin, sapien sodales elementum blandit. Fusce urna libero blandit eu aliquet ac rutrum vel tortor."
    pipeline = StanfordCoreNLP.load(:tokenize, :ssplit)
    a = StanfordCoreNLP::Annotation.new(text)
    pipeline.annotate(a)
    sentenses = a.get(:sentences).to_a.map &:to_s # Map with to_s if you want an array of sentence string.
    # => ["Lorem ipsum, consectetur elit.", "Donec ut ligula.", "Sed acumsan posuere tristique.", "Sed et tristique sem.", "Aenean sollicitudin, sapien sodales elementum blandit.", "Fusce urna libero blandit eu aliquet ac rutrum vel tortor."]
    

    The second problem is similar to word-wrapping and if it exactly were a word-wrapping problem, it should be easily solved using existing implementations like ActionView::Helpers::TextHelper.word_wrap. However, there is an extra requirement concerning punctuations. I don't know any existing implementation to achieve exactly the same goal of yours. Maybe you have to come up with your own solution.

    My only idea is to firstly word-wrap each sentence, secondly split each line with a punctuation and then join the pieces again but with limitation on length. I wonder if this would work though.

    0 讨论(0)
提交回复
热议问题