file_exists() is too slow in PHP. Can anyone suggest a faster alternative?

后端 未结 19 1112
迷失自我
迷失自我 2021-02-05 01:18

When displaying images on our website, we check if the file exists with a call to file_exists(). We fall back to a dummy image if the file was missing.

Howe

相关标签:
19条回答
  • 2021-02-05 01:47

    The fastest way to check existence of a local file is stream_resolve_include_path():

    if (false !== stream_resolve_include_path($s3url)) { 
      //do stuff 
    }
    

    Performance results stream_resolve_include_path() vs file_exists():

    Test name       Repeats         Result          Performance     
    stream_resolve  10000           0.051710 sec    +0.00%
    file_exists     10000           0.067452 sec    -30.44%
    

    In test used absolute paths. Test source is here. PHP version:

    PHP 5.4.23-1~dotdeb.1 (cli) (built: Dec 13 2013 21:53:21)
    Copyright (c) 1997-2013 The PHP Group
    Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies

    0 讨论(0)
  • 2021-02-05 01:49

    When you save a file to a folder, if the upload was successfully, you can store the path to a DB Table.

    Then you will just have to make a query to the database in order to find the path of the requested file.

    0 讨论(0)
  • 2021-02-05 01:50

    I think the best way is to keep the image url in the database and then put it in a session variable especially when you have authentication. These way you dont have to be checking each time a page reloads

    0 讨论(0)
  • 2021-02-05 01:51

    Old question, I'm going to add an answer here. For php 5.3.8, is_file() (for an existing file) is an order of magnitude faster. For a non-existing file, the times are nearly identical. For PHP 5.1 with eaccelerator, they are a little closer.

    PHP 5.3.8 w & w/o APC

    time ratio (1000 iterations)
    Array
    (
        [3."is_file('exists')"] => 1.00x    (0.002305269241333)
        [5."is_link('exists')"] => 1.21x    (0.0027914047241211)
        [7."stream_resolve_inclu"(exists)] => 2.79x (0.0064241886138916)
        [1."file_exists('exists')"] => 13.35x   (0.030781030654907)
        [8."stream_resolve_inclu"(nonexists)] => 14.19x (0.032708406448364)
        [4."is_file('nonexists)"] => 14.23x (0.032796382904053)
        [6."is_link('nonexists)"] => 14.33x (0.033039808273315)
        [2."file_exists('nonexists)"] => 14.77x (0.034039735794067)
    )
    

    PHP 5.1 w/ eaccelerator

    time ratio (1000x)
    Array
    (
        [3."is_file('exists')"] => 1.00x    (0.000458002090454)
        [5."is_link('exists')"] => 1.22x    (0.000559568405151)
        [6."is_link('nonexists')"] => 3.27x (0.00149989128113)
        [4."is_file('nonexists')"] => 3.36x (0.00153875350952)
        [2."file_exists('nonexists')"] => 3.92x (0.00179600715637)
        [1."file_exists('exists"] => 4.22x  (0.00193166732788)
    )
    

    There are a couple of caveats.
    1) Not all "files" are files, is_file() tests for regular files, not symlinks. So on a *nix system, you can't get away with just is_file() unless you are sure that you are only dealing with regular files. For uploads, etc, this may be a fair assumption, or if the server is Windows based, which does not actually have symlinks. Otherwise, you'll have to test is_file($file) || is_link($file).

    2) Performance definitely degrades for all methods if the file is missing and becomes roughly equal.

    3) Biggest caveat. All the methods cache the file statistics to speed lookup, so if the file is changing regularly or quickly, deleted, reappears, deletes, then clearstatcache(); has to be run to insure that the correct file existence information is in the cache. So I tested those. I left out all the filenames and such. The important thing is that almost all the times converge, except stream_resolve_include, which is 4x as fast. Again, this server has eaccelerator on it, so YMMV.

    time ratio (1000x)
    Array
    (
        [7."stream_resolve_inclu...;clearstatcache();"] => 1.00x    (0.0066831111907959)
        [1."file_exists(...........;clearstatcache();"] => 4.39x    (0.029333114624023)
        [3."is_file(................;clearstatcache();] => 4.55x    (0.030423402786255)
        [5."is_link(................;clearstatcache();] => 4.61x    (0.030798196792603)
        [4."is_file(................;clearstatcache();] => 4.89x    (0.032709360122681)
        [8."stream_resolve_inclu...;clearstatcache();"] => 4.90x    (0.032740354537964)
        [2."file_exists(...........;clearstatcache();"] => 4.92x    (0.032855272293091)
        [6."is_link(...............;clearstatcache();"] => 5.11x    (0.034154653549194)
    )
    

    Basically, the idea is, if you're 100% sure that it is a file, not a symlink or a directory, and in all probability, it will exist, then use is_file(). You'll see a definite gain. If the file could be a file or a symlink at any moment, then the failed is_file() 14x + is_link() 14x (is_file() || is_link()), and will end up being 2x slower overall. If the file's existence changes A LOT, then use stream_resolve_include_path().

    So it depends on your usage scenario.

    0 讨论(0)
  • 2021-02-05 01:51

    If you are only checking for existing files, use is_file(). file_exists() checks for a existing file OR directory, so maybe is_file() could be a little faster.

    0 讨论(0)
  • 2021-02-05 01:52

    We fall back to a dummy image if the file was missing

    If you're just interested in falling back to this dummy image, you might want to consider letting the client negotiate with the server by means of a redirect (to the dummy image) on file-not-found.

    That way you'll just have a little redirection overhead and a not-noticeable delay on the client side. At least you'll get rid of the "expensive" (which it isn't, I know) call to file_exists.

    Just a thought.

    0 讨论(0)
提交回复
热议问题