I want to extract this table with the JSoup-framework to save the content in a \"table\"-array. The first tr-tag is the table header. All followings (not included) describe
Here's some example code how you can select only the header:
Element tableHeader = doc.select("tr").first();
for( Element element : tableHeader.children() )
{
// Here you can do something with each element
System.out.println(element.text());
}
You get the Document
by ...
parsing a file: Document doc = Jsoup.parse(f, null);
(where f
is the File
and null
the charset, please see jsoup documentation for mor infos)
parsing a website: Document doc = Jsoup.connect("http://your.url.here").get();
(don't miss the http://
)
The output:
Kl.
Std.
Lehrer
Fach
Raum
VLehrer
VFach
VRaum
Info
Now, if you need an array (or better List
) of all entries you can create a new class where all informations of each entry is stored. Next you parse the Html via jsoup and fill all fields of the class as well as adding it to list.
// Note: all values are strings - you'll need to use better types (int, enum whatever) here. But for an example its enough.
public class Entry
{
private String klasse;
private String stunde;
private String lehrer;
private String fach;
private String raum;
private String vLehrer;
private String vFach;
private String vRaum;
private String info;
// constructor(s) and getter / setter
/*
* Btw. it's a good idea using two constructors here: one with all arguments and one empty. So you can create a new instance without knowing any data and add it with setter-methods afterwards.
*/
}
Next the code wich fills your entry (incl. the list where they are stored):
List entries = new ArrayList<>(); // All entries are saved here
boolean firstSkipped = false; // Used to skip first 'tr' tag
for( Element element : doc.select("tr") ) // Select all 'tr' tags from document
{
// Skip the first 'tr' tag since it's the header
if( !firstSkipped )
{
firstSkipped = true;
continue;
}
int index = 0; // Instead of index you can use 0, 1, 2, ...
Entry tableEntry = new Entry();
Elements td = element.select("td"); // Select all 'td' tags of the 'tr'
// Fill your entry
tableEntry.setKlasse(td.get(index++).text());
tableEntry.setStunde(td.get(index++).text());
tableEntry.setLehrer(td.get(index++).text());
tableEntry.setFach(td.get(index++).text());
tableEntry.setRaum(td.get(index++).text());
tableEntry.setvLehrer(td.get(index++).text());
tableEntry.setvFach(td.get(index++).text());
tableEntry.setInfo(td.get(index++).text());
entries.add(tableEntry); // Finally add it to the list
}
If you use your html from the first post you'll get this output:
[Entry{klasse= , stunde=4, lehrer=Méta, fach=HU, raum= , vLehrer=Shne, vFach= , vRaum=null, info= }]
Note: I simply used System.out.println(entries);
for that. So the format of the output is from the toString()
Method of Entry
.
Please see Jsoup documentation and especially the one for jsoup selector api.