Extracting Information from websites
问题 Not every website exposes their data well, with XML feeds, APIs, etc How could I go about extracting information from a website? For example: ... <div> <div> <span id="important-data">information here</span> </div> </div> ... I come from a background of Java programming and coding with Apache XMLBeans. Is there anything similar to parse HTML, when I know the structure and the data is between a known tag? Thanks 回答1: There are several Open Source HTML Parsers out there for Java. I have used