Extract Data out of table with JSoup

后端 未结 1 419
深忆病人
深忆病人 2021-01-20 03:05

I want to extract this table with the JSoup-framework to save the content in a \"table\"-array. The first tr-tag is the table header. All followings (not included) describe

相关标签:
1条回答
  • 2021-01-20 04:10

    Here's some example code how you can select only the header:

    Element tableHeader = doc.select("tr").first();
    
    
    for( Element element : tableHeader.children() )
    {
        // Here you can do something with each element
        System.out.println(element.text());
    }
    

    You get the Document by ...

    1. parsing a file: Document doc = Jsoup.parse(f, null); (where f is the File and null the charset, please see jsoup documentation for mor infos)

    2. parsing a website: Document doc = Jsoup.connect("http://your.url.here").get(); (don't miss the http://)

    The output:

    Kl.
    Std.
    Lehrer
    Fach
    Raum
    VLehrer
    VFach
    VRaum
    Info
    

    Now, if you need an array (or better List) of all entries you can create a new class where all informations of each entry is stored. Next you parse the Html via jsoup and fill all fields of the class as well as adding it to list.

    // Note: all values are strings - you'll need to use better types (int, enum whatever) here. But for an example its enough.
    public class Entry
    {
        private String klasse;
        private String stunde;
        private String lehrer;
        private String fach;
        private String raum;
        private String vLehrer;
        private String vFach;
        private String vRaum;
        private String info;
    
    
        // constructor(s) and getter / setter
    
        /*
         * Btw. it's a good idea using two constructors here: one with all arguments and one empty. So you can create a new instance without knowing any data and add it with setter-methods afterwards.
         */
    }
    

    Next the code wich fills your entry (incl. the list where they are stored):

    List<Entry> entries = new ArrayList<>();        // All entries are saved here
    boolean firstSkipped = false;                   // Used to skip first 'tr' tag
    
    
    for( Element element : doc.select("tr") )       // Select all 'tr' tags from document
    {
         // Skip the first 'tr' tag since it's the header
        if( !firstSkipped )
        {
            firstSkipped = true;
            continue;
        }
    
        int index = 0;                              // Instead of index you can use 0, 1, 2, ...
        Entry tableEntry = new Entry();
        Elements td = element.select("td");         // Select all 'td' tags of the 'tr'
    
        // Fill your entry
        tableEntry.setKlasse(td.get(index++).text());
        tableEntry.setStunde(td.get(index++).text());
        tableEntry.setLehrer(td.get(index++).text());
        tableEntry.setFach(td.get(index++).text());
        tableEntry.setRaum(td.get(index++).text());
        tableEntry.setvLehrer(td.get(index++).text());
        tableEntry.setvFach(td.get(index++).text());
        tableEntry.setInfo(td.get(index++).text());
    
        entries.add(tableEntry);                    // Finally add it to the list
    }
    

    If you use your html from the first post you'll get this output:

    [Entry{klasse= , stunde=4, lehrer=Méta, fach=HU, raum= , vLehrer=Shne, vFach= , vRaum=null, info= }]
    

    Note: I simply used System.out.println(entries); for that. So the format of the output is from the toString() Method of Entry.


    Please see Jsoup documentation and especially the one for jsoup selector api.

    0 讨论(0)
提交回复
热议问题