Parsing HTML into formatted plaintext using jsoup

后端 未结 1 462
小鲜肉
小鲜肉 2021-01-13 23:19

I was working on a maven project that allows me to parse a html data from a website. I was able to parse it using this code below:

public void parseData(){
          


        
1条回答
  •  孤街浪徒
    2021-01-13 23:42

    The reason that it is not formatted is that the formatting is in the HTML -- with

    and

      tags etc. Calling .text() on a block element loses that formatting.

      Jsoup has an example HTML to Plain Text convertor which you can adapt to your needs -- by providing the div element as the focus.

      Alternatively, you could just select "div.col-section > *", and iterate through each Element, and print out that text with a newline.

    0 讨论(0)
提交回复
热议问题