Best method for bulk downloading images from website

前端 未结 2 1114
暗喜
暗喜 2021-01-29 01:26

I will download a lot of images (+20.000) from a website to my server and i\'m trying to figure out the best way to do this since there\'s so many images to download.

Cu

相关标签:
2条回答
  • 2021-01-29 01:47

    first off, i agree with @Rudy Palacois here, wget would probably be better. that said, if you want to do it in PHP, curl would be much faster than file_get_contents, for 2 reasons.

    1: unlike file_get_contents, curl can reuse the same connection to download multiple files, while file_get_contents will create & close a new connection for each download, that takes time, thus curl will be faster (as long as you're not using CURLOPT_FORBID_REUSE / CURLOPT_FRESH_CONNECT , anyway)

    2: curl stops the download when the Content-Length http header's bytes has been downloaded. but file_get_contents completely ignores this header, and keeps downloading everything it can, until the connection is closed. this can again be much slower than curl's approach, because it's up to the web server when the connection will close, on some servers, it's A LOT slower than reading Content-Length bytes.

    (and generally, curl is faster than file_get_contents because curl supports compressed transfers, gzip and deflate, which file_get_contents does not do... but that's generally not applicable for images, most common image formats are already pre-compressed. notable exceptions include .bmp images, though)

    like this:

    $ch = curl_init ();
    curl_setopt ( $ch, CURLOPT_ENCODING, '' ); // if you're downloading files that benefit from compression (like .bmp images), this line enables compressed transfers.
    foreach ( $products as $product ) {
    
        $url = $product->img;
        $imgName = $product->product_id;
        $path = "images/";
    
        $img = $path . $imgName . ".png";
        $img=fopen($img,'wb');
        curl_setopt_array ( $ch, array (
                CURLOPT_URL => $url,
                CURLOPT_FILE => $img 
        ) );
        curl_exec ( $ch );
        fclose($img);
        // file_put_contents ( $img, file_get_contents ( $url ) );
    }
    curl_close ( $ch );
    

    edit: fixed a code-breaking typo, it's called CURLOPT_FILE, not CURLOPT_OUTFILE

    edit 2: CURLOPT_FILE wants a file resource, not a filepath, fixed that too x.x

    0 讨论(0)
  • 2021-01-29 02:00

    If you have access to shell, you could use WGET, I mean, the main problem with php, if you are executing this code from a browser, is the execution time, it will stop after a few minutes or it can be loading forever and get stucked, but if you have a complete URL and a pattern, as I can see, you can create a file with the URLs, one URL per line, list.txt, for example and then execute

    wget -i list.txt
    

    Check this answer too https://stackoverflow.com/a/14578517/5415074

    0 讨论(0)
提交回复
热议问题