Best method for bulk downloading images from website

前端 未结 2 1113
暗喜
暗喜 2021-01-29 01:26

I will download a lot of images (+20.000) from a website to my server and i\'m trying to figure out the best way to do this since there\'s so many images to download.

Cu

2条回答
  •  广开言路
    2021-01-29 01:47

    first off, i agree with @Rudy Palacois here, wget would probably be better. that said, if you want to do it in PHP, curl would be much faster than file_get_contents, for 2 reasons.

    1: unlike file_get_contents, curl can reuse the same connection to download multiple files, while file_get_contents will create & close a new connection for each download, that takes time, thus curl will be faster (as long as you're not using CURLOPT_FORBID_REUSE / CURLOPT_FRESH_CONNECT , anyway)

    2: curl stops the download when the Content-Length http header's bytes has been downloaded. but file_get_contents completely ignores this header, and keeps downloading everything it can, until the connection is closed. this can again be much slower than curl's approach, because it's up to the web server when the connection will close, on some servers, it's A LOT slower than reading Content-Length bytes.

    (and generally, curl is faster than file_get_contents because curl supports compressed transfers, gzip and deflate, which file_get_contents does not do... but that's generally not applicable for images, most common image formats are already pre-compressed. notable exceptions include .bmp images, though)

    like this:

    $ch = curl_init ();
    curl_setopt ( $ch, CURLOPT_ENCODING, '' ); // if you're downloading files that benefit from compression (like .bmp images), this line enables compressed transfers.
    foreach ( $products as $product ) {
    
        $url = $product->img;
        $imgName = $product->product_id;
        $path = "images/";
    
        $img = $path . $imgName . ".png";
        $img=fopen($img,'wb');
        curl_setopt_array ( $ch, array (
                CURLOPT_URL => $url,
                CURLOPT_FILE => $img 
        ) );
        curl_exec ( $ch );
        fclose($img);
        // file_put_contents ( $img, file_get_contents ( $url ) );
    }
    curl_close ( $ch );
    

    edit: fixed a code-breaking typo, it's called CURLOPT_FILE, not CURLOPT_OUTFILE

    edit 2: CURLOPT_FILE wants a file resource, not a filepath, fixed that too x.x

提交回复
热议问题