PyPi download counts seem unrealistic

前端 未结 4 745
一向
一向 2021-02-01 12:14

I put a package on PyPi for the first time ~2 months ago, and have made some version updates since then. I noticed this week the download count recording, and was surprised to s

4条回答
  •  无人共我
    2021-02-01 12:44

    Starting with Cairnarvon's summarizing statement:

    "It looks like the main reason PyPI needs mirrors is because it has them."

    I would slightly modify this:

    It might be more the way PyPI actually works and thus has to be mirrored, that might contribute an additional bit (or two :-) to the real traffic.

    At the moment I think you MUST interact with the main index to know what to update in your repository. State is not simply accesible through timestamps on some publicly accessible folder hierarchy. So, the bad thing is, rsync is out of the equation. The good thing is, you MAY talk to the index through JSON, OAuth, XML-RPC or HTTP interfaces.

    For XML-RPC:

    $> python
    >>> import xmlrpclib
    >>> import pprint
    >>> client = xmlrpclib.ServerProxy('http://pypi.python.org/pypi')
    >>> client.package_releases('PartitionSets')
    ['0.1.1']
    

    For JSON eg.:

    $> curl https://pypi.python.org/pypi/PartitionSets/0.1.1/json
    

    If there are approx. 30.000 packages hosted [1] with some being downloaded 50.000 to 300.000 times a week [2] (like distribute, pip, requests, paramiko, lxml, boto, paramike, redis and others) you really need mirrors at least from an accessibilty perspective. Just imagine what a user does when pip install NeedThisPackage fails: Wait? Also company wide PyPI mirrors are quite common acting as proxies for otherwise unrouteable networks. Finally not to forget the wonderful multi version checking enabled through virtualenv and friends. These all are IMO legitimate and potentially wonderful uses of packages ...

    In the end, you never know what an agent really does with a downloaded package: Have N users really use it or just overwrite it next time ... and after all, IMHO package authors should care more for number and nature of uses, than the pure number of potential users ;-)


    Refs: The guestimated numbers are from https://pypi.python.org/pypi (29303 packages) and http://pypi-ranking.info/week (for the weekly numbers, requested 2013-03-23).

提交回复
热议问题