I was working on a maven project that allows me to parse a html data from a website. I was able to parse it using this code below:
public void parseData(){
The reason that it is not formatted is that the formatting is in the HTML -- with <p>
and <ol>
tags etc. Calling .text()
on a block element loses that formatting.
Jsoup has an example HTML to Plain Text convertor which you can adapt to your needs -- by providing the div element as the focus.
Alternatively, you could just select "div.col-section > *", and iterate through each Element, and print out that text with a newline.