How to extract content from other websites automatically?

纵然是瞬间 提交于 2020-01-06 06:36:36

问题


I want to extract a specific data from the website from its pages...

I dont want to get all the contents of a specific page but i need only some portion (may be data only inside a table or content_div) and i want to do it repeatedly along all the pages of the website..

How can i do that?


回答1:


Use curl to retreive the content and xPath to select the individual elements.

Be aware of copyright though.




回答2:


"extracting content from other websites" is called screen scraping or web scraping.

simple html dom parser is the easiest way(I know) of doing it.




回答3:


You need the php crawler. The key is to use string manipulatin functions such as strstr, strpos and substr.




回答4:


There are ways to do this. Just for fun I created a windows app that went through my account on a well know social network, looked into the correct places and logged the information into an xml file. This information would then be imported elsewhere. However, this sort of application can be used for motives I don't agree with so I never uploaded this.

I would recommend using RSS feeds to extract content.




回答5:


I think, you need to implement something like a spider. You can make an XMLHTTP request and get the content and then do a parsing.



来源:https://stackoverflow.com/questions/2265463/how-to-extract-content-from-other-websites-automatically

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!