Choosing a random file from a directory (with a large number of files) in Python

前端 未结 5 513
南笙
南笙 2021-02-07 20:58

I have a directory with a large number of files (~1mil). I need to choose a random file from this directory. Since there are so many files, os.listdir naturally tak

5条回答
  •  谎友^
    谎友^ (楼主)
    2021-02-07 21:38

    Alas, I don't think there is a solution to your problem. One, I don't know of portable API that will return you the number of entries in directory (w/o enumerating them first). Two, I don't think there is API to return you directory entry by number and not by name.

    So overall, a program will have to enumerate O(n) directory entries to get a single random one. The trivial approach of determining number of entries and then picking one will either require enough RAM to hold the full listing (os.listdir()) or will have to enumerate 2nd time the directory to find the random(n) item - overall n+n/2 operations on average.

    There is slightly better approach - but only slightly - see randomly-selecting-lines-from-files. In short there is a way to pick random item from list/iterator with unknown length, while reading one item at a time and ensure that any item may be picked with equal probability. But this won't help with os.listdir() because it already returns list in memory that already contains all 1M+ entries - so you can as well ask it about len() ...

提交回复
热议问题