parsing/extracting a HTML Table, Website in Java

后端未结

关注

 1  1577

深忆病人

I want to parse the contents of this HTML table :

$\"Blockquote\"$

Here is the f

相关标签:

1条回答

生来不讨喜

2021-02-10 14:14
Here are the steps you would need to follow:

1) You could use any of the below java libraries for HTML scraping:
- Tag Soup
- HtmlUnit
- Web-Harvest
- jARVEST
- jsoup
- Jericho HTML Parser

2) Use Xpath helper

Eg 1: Enter "//tr[1]//td[1]" in the query and it will give all table elements at position (1,1)

Eg 2: "/html/body[@class='tt']/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td" Will give you all 15 values under Montag.

Eg 3: "/html/body[@class='tt']/center/table[1]/tbody/tr/td/table/tbody/tr/td" Will give you all 380 entries of the table

Example using Jsoup

import org.jsoup.Jsoup;
import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        org.jsoup.nodes.Document doc = Jsoup.connect("http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm").get();
        org.jsoup.select.Elements rows = doc.select("tr");
        for(org.jsoup.nodes.Element row :rows)
        {
            org.jsoup.select.Elements columns = row.select("td");
            for (org.jsoup.nodes.Element column:columns)
            {
                System.out.print(column.text());
            }
            System.out.println();
        }

    }
}

0 讨论(0)