how to increase the sample size used during schema discovery to 'unlimited'?

微笑、不失礼 提交于 2019-11-30 09:52:30

问题


I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.

For more information on these errors, see:

  • No matched schema for {"_id":"...","doc":{...}
  • The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
  • XXXX does not exist in the discovered schema. Document has not been imported

Question:

How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?


回答1:


These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.

  1. Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)

  2. Find a document https://<account>.cloudant.com/_warehouser/<source> where <source> matches the name of the Cloudant database you have issues with

    Note: Check https://<account>.cloudant.com/_warehouser/_all_docs if the document id is not obvious

  3. Substitute "sample_size": null (which scans a sample of 10,000 random documents) with "sample_size": -1 (to scan all documents in your database) or "sample_size": X (to scan X documents in your database where X is a positive integer)

Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.



来源:https://stackoverflow.com/questions/32866542/how-to-increase-the-sample-size-used-during-schema-discovery-to-unlimited

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!