Is CKAN capable of dealing with 100k+ files and TB of data?

前端未结

关注

 2  770

终归单人心 2021-02-09 11:21

What we are wanting to do is create a local data repository for our lab memebers to organize, search upon, access, catalog, reference our data, etc. I feel that CKAN can do all

2条回答

野趣味 (楼主)

2021-02-09 11:49

We're using CKAN at the Natural History Museum (data.nhm.ac.uk) for some pretty hefty research datasets - our main specimen collection has 2.8 million records - and it's handling it very well. We have had to extend CKAN with some custom plugins to make this possible though - but they're open source and available on Github.

Our datasolr extension moves querying large datasets into SOLR, which handles indexing and searching big datasets better than postgres (on our infrastructure anyway) - https://github.com/NaturalHistoryMuseum/ckanext-datasolr.

To prevent CKAN falling over when users download big files, we moved the packaging and download to a separate service and task queue.

https://github.com/NaturalHistoryMuseum/ckanext-ckanpackager https://github.com/NaturalHistoryMuseum/ckanpackager

So yes, CKAN with a few contributed plugins can definitely handle larger datasets. We haven't tested it with TB+ datasets yet, but we will next year when we use CKAN to release some phylogenetic data.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...