问题
I want to extract a specific data from the website from its pages...
I dont want to get all the contents of a specific page but i need only some portion (may be data only inside a table or content_div) and i want to do it repeatedly along all the pages of the website..
How can i do that?
回答1:
Use curl to retreive the content and xPath to select the individual elements.
Be aware of copyright though.
回答2:
"extracting content from other websites" is called screen scraping or web scraping.
simple html dom parser is the easiest way(I know) of doing it.
回答3:
You need the php crawler. The key is to use string manipulatin functions such as strstr
, strpos
and substr
.
回答4:
There are ways to do this. Just for fun I created a windows app that went through my account on a well know social network, looked into the correct places and logged the information into an xml file. This information would then be imported elsewhere. However, this sort of application can be used for motives I don't agree with so I never uploaded this.
I would recommend using RSS feeds to extract content.
回答5:
I think, you need to implement something like a spider. You can make an XMLHTTP request and get the content and then do a parsing.
来源:https://stackoverflow.com/questions/2265463/how-to-extract-content-from-other-websites-automatically