how to increase the sample size used during schema discovery to 'unlimited'?

问题

I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.

For more information on these errors, see:

No matched schema for {"_id":"...","doc":{...}
The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
XXXX does not exist in the discovered schema. Document has not been imported

Question:

How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?

回答1:

These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.

Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)
Find a document https://<account>.cloudant.com/_warehouser/<source> where <source> matches the name of the Cloudant database you have issues with

Note: Check https://<account>.cloudant.com/_warehouser/_all_docs if the document id is not obvious
Substitute "sample_size": null (which scans a sample of 10,000 random documents) with "sample_size": -1 (to scan all documents in your database) or "sample_size": X (to scan X documents in your database where X is a positive integer)

Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.

来源：https://stackoverflow.com/questions/32866542/how-to-increase-the-sample-size-used-during-schema-discovery-to-unlimited

标签

cloudant

cloudant-sdp