Regarding container of form recogniser,OCR and labeltool containers

问题

We are trying to use the container preview of form recogniser,OCR and labeltool and have following questions:

Is there any software which can help us to classify similar kind of documents. This will help us to categorize document and create training dataset
Is there any way to give the model user-defined name. Following is output from model query API.It is difficult to tie it back to different kind of models:

{
    "modelId": "f136f65b-bb94-493b-a798-a3e8023ea1b5",
    "status": "ready",
    "createdDateTime": "2020-05-06T21:35:58+00:00",
    "lastUpdatedDateTime": "2020-05-06T21:36:06+00:00"
}

I can see models file stored in \output\subscriptions\global\models where /output directory shared container in docker compose file. Is it possible to import this model to new containers.
- Models have json and gz file with the same nae as model id
- I am also attaching docker compose file for your reference
Is there way to fine tune or update same custom model(same model id) with model training data
We were also trying the labeltool but it only takes Azure blob as input. Is it possible to provide input same as we do for training of form recognizer. We are struggling to get this setup and if it is not resolved we might to start looking to alternatives.

回答1:

Following are answers to your questions:

To classify documents you can use custom vision to build a document classifier or use text classification and OCR. In addition you can use the Form Recognizer train without labels run it on the training data and use the cluster option within the model to classify similar documents and pages in the training dataset.
Friendly Model name is not yet available in Form Recognizer, its a future feature on our roadmap but not available yet.
Models can't be copied between containers, you can use the same data-set to train a model in a different container. Models can be copied between subscriptions, resources and regions when using the Form Recognizer cloud service.
Each train creates a new model ID in order not to overwrite the previous model you can't update existing models.
Form Recognizer v2.0 release is not yet available in containers, only Form Recognizer v1.0 release is currently available in containers. Form Recognizer v2.0 will be also available in containers shortly. When using containers release all the data remains on premise and the labeling tool once available for the v2.0 containers release will also take as input a local or mounted disk and not blob.

Thanks ! Neta - MSFT

来源：https://stackoverflow.com/questions/61827556/regarding-container-of-form-recogniser-ocr-and-labeltool-containers

标签

azure

containers

microsoft-cognitive

form-recognizer