How do I get text from a website using PHP?

后端 未结 7 1975
北海茫月
北海茫月 2020-12-31 11:33

So, I\'m working on a PHP script, and part of it needs to be able to query a website, then get text from it.

First off, I need to be able to query a certain website

相关标签:
7条回答
  • 2020-12-31 11:41

    You can use file_get_contents or if you need a little more control (i.e. to submit POST requests, to set the user agent string, ...) you may want to look at cURL.

    file_get_contents Example:

    $content = file_get_contents('http://www.example.org');
    

    Basic cURL Example:

    $ch = curl_init('http://www.example.org');
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3');
    
    $content = curl_exec($ch);
    
    curl_close($ch);
    
    0 讨论(0)
  • 2020-12-31 11:46

    The easiest way:

    file_get_contents()

    That will get you the source of the web page.

    You probably want something a bit more complete though, so look into cURL, for better error handling, and setting user-agent, and what not.

    From there, if you want the text only, you are going to have to parse the page. For that, see: How do you parse and process HTML/XML in PHP?

    0 讨论(0)
  • 2020-12-31 11:47

    Can this be done by getting all of the content from the webpage utilizing methods already listed above, and then using regex to remove all characters between open and closed brackets?

    A page that looks like this:

    <html><style> h1 { font-style:... }</style><h1>stuff in here</h1></html>
    

    Would then become this after regex:

    h1 { font-style:... }stuff in here
    

    And because we want to remove all of the code in between various tags such as the [style] tag, we could then first use regex to remove all characters between [style and /style] so that we are just left with:

    stuff in here
    

    Would this work then? Please reply if you think it would or if you foresee errors as I would like to create a tool with this parsing.

    0 讨论(0)
  • 2020-12-31 11:48

    I would do a dom search, take a look at http://www.php.net/manual/es/domdocument.load.php Domxpath might be very useful too: http://php.net/manual/en/class.domxpath.php

    $doc = new DOMDocument;
    $doc->load("http://mysite.com");
    $xpath = new DOMXpath($doc);
    $elements = $xpath->query("*/div[@id='yourTagIdHere']");
    
    0 讨论(0)
  • 2020-12-31 11:50

    you need to use CURL. You can get some samples here

    0 讨论(0)
  • 2020-12-31 11:51

    If you want more control, use cURL. Otherwise: file_get_contents..

    $url  = "http://www.example.com/test.php";  // Site URL.
    $site = file_get_contents($url);             // Gets site response.
    
    0 讨论(0)
提交回复
热议问题