问题
I was wondering if there is a way to pull specific data from a website using java (eclipse). For example, stock information from Yahoo Finances or from Bloomberg. I've looked around and have found some resources, but I haven't been able to get them to work, perhaps I'm missing something or they're outdated. If possible, I also want to avoid downloading any external resources, I've read up on JSoup and will consider it more seriously if all else fails.
Thanks for the help.
回答1:
The answer is: yes there are many different ways to pull data from websites.
There are essentially 2 alternatives no matter the programming language (Java, .NET, Perl...):
- the website has an API: in this case it will be a REST or SOAP API or perhaps a custom one (REST and SOAP probably account for the vast majority). Check out that website's API documentation if any. Also check out Programmable Web for references.
- the website doesn't have an API. You then need to do what you call here as screen-scraping. Essentially you will send a series of HTTP GET or HTTP POST requests as your browser would. The server replies with a response which contains HTML code. From there on, you need to "parse" the HTML to extract the information you need. This will require heavy duty XPath (if the content is XML) or regular expressions (if the content is HTML or text).
Look at Apache HTTP Components to get you started.
If all you want is Finance information, Google has a JSON/REST API for that and there's a question on SO that will help you: How can I get stock quotes using Google Finance API?.
Yahoo also has one and there is also already an question on it in SO: Yahoo Finance All Currencies quote API Documentation
来源:https://stackoverflow.com/questions/23906609/using-java-to-pull-data-from-web