问题
We tried Form Recognizer custom training, with these steps (API 2.0)
https://pnagarjuna.wordpress.com/2020/01/07/azure-form-recognizer-service-custom-model-training-steps/
The Training modell is success (201), but after Check Custom Model Status we got this error
{ "modelInfo": { "modelId": "f17bd306-3c6a-4067-8ef1-5f2e6ced79e1", "status": "invalid", "createdDateTime": "2020-02-05T17:24:30Z", "lastUpdatedDateTime": "2020-02-05T17:24:31Z" }, "trainResult": { "trainingDocuments": [], "errors": [{ "code": "2014", "message": "No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements." }] }}
We also check
https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/overview#custom-model and everything is okay.
How can go further?
Thank you!
Gabor
回答1:
Could you check if the prefix value in your post train request is consistent with the path in your azure blob container? If you put the sample files under the root path of your blob container, then give an empty string for prefix. As train and get trained model request are asynchronized in form recognizer v2.0, so some post request argument related error can only be fetched via get trained model request.
回答2:
@Nini,
Could you provide an example for prefix value?
I face the same issue like author does.
I use 2.0 API version. I generated SAS for whole container, the I use the next request in order to train custom model
{
"source": "https://{resourcename}.blob.core.windows.net/{containername}?sp=rl&st=2020-02-13T11:19:53Z&se=2021-02-14T11:19:00Z&sv=2019-02-02&sr=c&sig={signature}",
"sourceFilter": {
"prefix": "/USMF/VendorInvoices/Vendor - 1001/",
"includeSubFolders": false
},
"useLabelFile": false
}
target folder URI: https://{resourcename}.blob.core.windows.net/{container name}/USMF/VendorInvoices/Vendor - 1001/
Response body:
{
"modelInfo": {
"modelId": "4e23f488-d8db-4c98-8018-4cd337d9a655",
"status": "invalid",
"createdDateTime": "2020-02-13T12:07:52Z",
"lastUpdatedDateTime": "2020-02-13T12:07:52Z"
},
"keys": {
"clusters": {}
},
"trainResult": {
"trainingDocuments": [],
"errors": [{
"code": "2014",
"message": "No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements."
}]
}
}
If I keep training data set under root and therefore the prefix value is empty string then everything is OK.
回答3:
Thank you for reporting this. Any chance you can switch from policy defined SAS token (one with sig={signature}) to sas token with explicit permissions? (one with sp={permissionenum})
回答4:
Could you explain your thought in details?
Here is what I did.
I generated the SAS token without applying any access policy. SAS is generated for whole container. I just chose Read, List permissions from the list and expiration date.
I am wondered that if I keep training data set under root folder then everything is OK. But when I put files under folder structure then the form recognizer service can't find those files.
回答5:
The question has been resolved.
It's not an service issue definitely.
First of all, my prefix shouldn't contain '/' symbol at the beginning. Another important point is the prefix is case sensitive. In my case I've uploaded file with "USMF/VendorInvoices/Vendor - 1001/" prefix but requested model training with "usmf/VendorInvoices/Vendor - 1001/". So, this led to the error message - No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements.
来源:https://stackoverflow.com/questions/60090693/form-recognizer-invalid-model-status