Document Classification With Power Automate And Azure AI

Document Classification With Power Automate And Azure AI

Power Automate can perform document classification with the help of Azure AI. Document classification is when AI reviews a document and determines what type of document it is. The result also includes a confidence score. Once a document is classified we can tag it with the correct document type in SharePoint. We could also use it to choose the correct data extraction AI model to get data from the invoice.

Table of Contents
• Introduction: The Document Library With Automated Document ClassificationOpen Azure AI Document Intelligence StudioSetup A New Project For The Custom Classification ModelLabel Sample Invoices With The Document TypeTrain The Document Classification ModelTest The Document Classification ModelCreate A SharePoint Document Library For Document ClassificationTrigger A Power Automate Flow When A File Is AddedClassify A Document Using The HTTP ActionObtain The Endpoint And The Subscription KeyGet The Document Classification ResultUpdate The Document Type And Confidence Score File PropertiesRun The Power Automate Flow To Classify A Document




Introduction: The Document Library With Automated Document Classification

Workers at an insurance company receive several types of documents and upload them to a SharePoint library. Once the invoice is uploaded Azure AI Document Intelligence classifies the document and writes its document type and confidence score to the file’s metadata in SharePoint




Open Azure AI Document Intelligence Studio

We will begin by training a custom classification model to identify document types. Go to Azure AI Document Intelligence Studio and choose custom classification model.



Then select Create a project.



Setup A New Project For The Custom Classification Model

To start a new project we must enter several details into a project setup wizard. On the Enter project details screen name the project Form Recognizer Tutorial.



Choose an Azure subscription, resource group and Form Recognizer Resource. If none exist, create new ones. Select the API version 2023-07-31 (3.1 General Availability).



Pick an existing storage account and blob containers or create a new ones. Leave the folder path as blank.



Press the Create Project button to exit the project setup wizard.



The new project will appear in the My Projects menu. Open the Form Recognizer Tutorial Project.


Label Sample Invoices With The Document Type

The Form Recognizer Tutorial project opens to the label data screen. On this screen we must upload a set of invoices. Download the invoices in this Github repository for use in this tutorial. Then drag and drop Contoso Invoices #1-5 into the file upload area.



Add a new document type named Contoso Invoice.



Then select each invoice one-by-one in the drag and drop area and select Contoso Invoice from the dropdown menu that appear. When we do this the invoice name appears under the document type on the right menu.



Also upload Adatum invoices #1-5, add a new type called Adatum Invoice, and label each document so it appears in the right side menu.




Train The Document Classification Model

Now that we have labelled all of the invoices its time to time our document classification model. Select the Train button in the upper right corner of the Label data screen.





Assign the Model ID aibuilderinvoices to the classification model. Check the confirmation box and press the Train button.



After a few moments the aibuilderinvoices model appears in the Models tab.




Test The Document Classification Model

We can test the document model before we use it in Power Automate. Go to the Test tab and upload the invoice file named Adatum6.pdf. Press the Run analysis button. The document is correctly classified as an Adatum Invoice with a confidence score of 41.70%.




Create A SharePoint Document Library For Document Classification

The automation we will build requires a SharePoint document library. Create a new library named FormRecognizerTutorial with the following columns:

  • Name (text)
  • Document Type (text)
  • Confidence Score (number)




Trigger A Power Automate Flow When A File Is Added

When an invoice is uploaded to a document library we want to start an automated flow to classify the document. Create a new Power Automate flow and select SharePoint – When a file is created (properties only) as the trigger.



Then add a SharePoint – Get file content using path action to get the file we will perform document classification on.




Classify A Document Using The HTTP Action

Document classification is not included as standard action Power Automate. To use the classification model we created in Document Intelligence Studio we will need to create a new HTTP action and choose the POST method.

For reference, here is a link to the API documentation:
https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2023-07-31/operations/ClassifyDocument




Use the following URI. Input the modelName as aibuilderinvoices. We will learn how to find the proper value for endpoint in a moment. For now

https://{endpoint}/formrecognizer/documentClassifiers/{modelName}:analyze?api-version=2023-07-31



Apply the following Headers. We will also learn where to find the Key shortly.

KeyValue
Content-Typeapplication/json
Ocp-Apim-Subscription-Key{Key 1}



Write this code in the Body of the HTTP request to send the file content in Base 64 format. Alternatively, we could supply a link to a document and replace base64Source with urlSource.

{
  "base64Source": @{base64(body('Get_file_content_using_path'))}
}



Obtain The Endpoint And The Subscription Key

We need the endpoint and the subscription key for use in the HTTP action. Go to portal.azure.com and search for Document Intelligences.




Choose the FormRecognizerTutorial project we setup earlier in the tutorial.



Go to the Keys and Endpoint tab. Copy the highlighted values into the URI property HTTP action.




Get The Document Classification Result

When we run the HTTP POST action to classify the document it does not immediately return a response. Instead, it places the document into a processing queue. We must retrieve the result in another action by using the Result ID.

Create a new Data Operations – Compose action to store the Result ID.



Use this code in the Compose action.

outputs('HTTP:_POST_Classify_Document')['headers']?['apim-request-id']



Add a Schedule – Delay action to the flow. Make the flow wait 10 seconds for the document to be processed.



Then add another HTTP action to the flow but this time make it a GET action to retrieve the result.

For reference, here is a link to the API documentation:
https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2023-07-31/operations/GetClassifyDocumentResult




Use this following code in the URI field. Notice that the Result ID is supplied to the analyze results endpoint.

https://{endpoint}/formrecognizer/documentClassifiers/{modelName}/analyzeResults/@{outputs('Compose:_Result_ID')}?api-version=2023-07-31



Supply the same Headers as the HTTP POST action.

KeyValue
Content-Typeapplication/json
Ocp-Apim-Subscription-Key{Key 1}




Update The Document Type And Confidence Score File Properties

The HTTP GET action returns the document type, the confidence score and other useful information about the document. We want to write the document type and the confidence score to the file’s metadata in SharePoint.

Add two Data Operations – Compose actions to the flow.



Use this code in the first action to get the Document Type.

first(body('HTTP:_GET_Classify_Result')?['analyzeResult']?['documents'])?['docType']



Write this code in the second action to get the Confidence Score.

first(body('HTTP:_GET_Classify_Result')?['analyzeResult']?['documents'])?['confidence']



Finally, add a SharePoint – Update file properties action to the flow and supply the document type and confidence score.




Run The Power Automate Flow To Classify A Document

We are now done building the flow. Perform a test run on the flow and place the document Adatum Invoice #6 in the document library.



After a few moments the document is updated with a document type and a confidence score.




Questions?

If you have any questions or feedback about Document Classification With Power Automate And Azure AI please leave a message in the comments section below. You can post using your email address and are not required to create an account to join the discussion.

Matthew Devaney

Subscribe
Notify of
guest

15 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Darren
11 months ago

This is good stuff, man. Please always feel free to post links to your blog posts over in the Skool.com community. 👍👍

Peter
11 months ago

Good evening from the Netherlands. I am very interested in this setup, thanks for this. Reason is that I am working for a municipality in the Netherlands and the use case is that records management and retention labeling is maybe too complex. So we are investigating the option of using power automate and AI to get document marked with retention when a project is final and we need assign the documents the correct retention labels. Question is if this scenario of power automate could be applied to retention labels as well?

Thanks!

Peter

Peter
11 months ago

Perfect, this help me to get this usecase worked out. We need to check the Trainable classifiers, but since the Archive Law is so complex we need to check this in detail. But we can at least start with this.

Dennis
Dennis
10 months ago

Great manual!
Thanks for all you work!

I’ve just test some other models and start them succesfully with Power Automate.

I’m wondering if it is possible to get all the fields that I define in my custom model back as a JSON message?
Now I have the feeling that I am getting a lot of data back, including the positions of the found data, but that this is not in accordance with the model I created.
Do you have an idea if this is possible? And if so how?2

FYI I am trying to get a fixed invoice format back in json in such a way that, for example, I have a fixed column containing the VAT value and, for example, also a column for the invoice amount incl. VAT

Sushovan Ray
Sushovan Ray
6 months ago

hi Matthew, I have developed flow with all the correct details and steps you mentioned here. Still HTTP action is giving result of “Resource Not Found” and “model not found” , though my model and resource is present in Azure with proper endpoint and key. What can be the reason of this error? Can you suggest? Thanks in Advance.

Happy
Happy
4 months ago
Reply to  Sushovan Ray

I am also facing same issue. now formrecognizer is document intelligence so that uri is making issues. please help

Marie
Marie
3 months ago
Reply to  Sushovan Ray

Hi! I am encountering the same error but only in the HTTP GET. I already tested it, and I am successfully getting a result ID from the POST but fails to find the resource in the GET… weird. Any suggestions?

DirkAndTheMac
DirkAndTheMac
3 months ago

Hi Matthew,
We have just had a good long training course with Julian Sharp and your blog came up a lot.. So thought I’d check it out. This article is great. My particular interest in this instance is with creating more of a guided approach whereby the user selects a batch of files to upload and process and a standard set of data is obtained; Where recognition fails they can use the Doc AI UI to select data that has not been correctly recognised. We could maybe even use the failures to causing an (eventual) feedback into later models to improve performance over time. Do you have any pointers?

James Hinton
James Hinton
1 month ago

Hi Matthew, do you know whether the https://westus.dev.cognitive.microsoft.com/ site has been deprecated? I received “HTTP Error 503. The service is unavailable” when I tried to browse there.
Thanks in advance.

James Hinton
James Hinton
30 days ago

Thanks Matthew

Maximiliano
Maximiliano
28 days ago

Very good work Matt!
I was able to copy your work and do it in mine.
I only had to change the POST and GET urls, I copy them below:

POST:
https://{url}/documentintelligence/documentModels/{modelid}:analyze?api-version=2024-07-31-preview

GET:
https://{url}/documentintelligence/documentModels/{modelid}/analyzeResults/{output-cookie}?api-version=2024-07-31-preview