Is CKAN capable of dealing with 100k+ files and TB of data?

前端 未结 2 764
终归单人心
终归单人心 2021-02-09 11:21

What we are wanting to do is create a local data repository for our lab memebers to organize, search upon, access, catalog, reference our data, etc. I feel that CKAN can do all

2条回答
  •  野趣味
    野趣味 (楼主)
    2021-02-09 11:49

    We're using CKAN at the Natural History Museum (data.nhm.ac.uk) for some pretty hefty research datasets - our main specimen collection has 2.8 million records - and it's handling it very well. We have had to extend CKAN with some custom plugins to make this possible though - but they're open source and available on Github.

    Our datasolr extension moves querying large datasets into SOLR, which handles indexing and searching big datasets better than postgres (on our infrastructure anyway) - https://github.com/NaturalHistoryMuseum/ckanext-datasolr.

    To prevent CKAN falling over when users download big files, we moved the packaging and download to a separate service and task queue.

    https://github.com/NaturalHistoryMuseum/ckanext-ckanpackager https://github.com/NaturalHistoryMuseum/ckanpackager

    So yes, CKAN with a few contributed plugins can definitely handle larger datasets. We haven't tested it with TB+ datasets yet, but we will next year when we use CKAN to release some phylogenetic data.

提交回复
热议问题