I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.
For more information on these errors, see:
- No matched schema for {"_id":"...","doc":{...}
- The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
- XXXX does not exist in the discovered schema. Document has not been imported
Question:
How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?
These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.
Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)
Find a document
https://<account>.cloudant.com/_warehouser/<source>
where<source>
matches the name of the Cloudant database you have issues withNote: Check
https://<account>.cloudant.com/_warehouser/_all_docs
if the document id is not obviousSubstitute
"sample_size": null
(which scans a sample of 10,000 random documents) with"sample_size": -1
(to scan all documents in your database) or"sample_size": X
(to scan X documents in your database where X is a positive integer)
Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.
来源:https://stackoverflow.com/questions/32866542/how-to-increase-the-sample-size-used-during-schema-discovery-to-unlimited