parsing/extracting a HTML Table, Website in Java

后端 未结 1 1576
深忆病人
深忆病人 2021-02-10 13:55

I want to parse the contents of this HTML table :

\"Blockquote\"

Here is the f

1条回答
  •  生来不讨喜
    2021-02-10 14:14

    Here are the steps you would need to follow:

    1) You could use any of the below java libraries for HTML scraping:

    • Tag Soup
    • HtmlUnit
    • Web-Harvest
    • jARVEST
    • jsoup
    • Jericho HTML Parser
  • 2) Use Xpath helper

    Eg 1: Enter "//tr[1]//td[1]" in the query and it will give all table elements at position (1,1)

    Eg 2: "/html/body[@class='tt']/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td" Will give you all 15 values under Montag.

    Eg 3: "/html/body[@class='tt']/center/table[1]/tbody/tr/td/table/tbody/tr/td" Will give you all 380 entries of the table

    OR

    Example using Jsoup

    import org.jsoup.Jsoup;
    import java.io.IOException;
    
    public class Main {
        public static void main(String[] args) throws IOException {
            org.jsoup.nodes.Document doc = Jsoup.connect("http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm").get();
            org.jsoup.select.Elements rows = doc.select("tr");
            for(org.jsoup.nodes.Element row :rows)
            {
                org.jsoup.select.Elements columns = row.select("td");
                for (org.jsoup.nodes.Element column:columns)
                {
                    System.out.print(column.text());
                }
                System.out.println();
            }
    
        }
    }
    

0 讨论(0)
提交回复
热议问题