How To Extract Data From Web Page Using Power Automate

How To Extract Data From Web Page Using Power Automate

Power Automate cloud flows can be used to extract data from a web page. You can get the web page HTML by using an HTTP action and output its details with an AI Prompt. This technique will only work on web pages where text is included when the initial page is loaded. When the text is generated after the page loaded (JavaScript, client-side rendering, etc.) you will need to use Power Automate Desktop instead.

Table of Contents
• Introduction: The Get Contract Details AutomationGet The Web Page HTML Using The HTTP ActionConvert The Web Page HTML to Plain TextCreate An AI Prompt To Extract Data From Web Page TextInput The Webpage Text As A Variable In An AI PromptRun The Power Automate Flow To Extract Webpage Data




Introduction: The Get Contract Details Automation

A construction firm uses a Power Automate cloud flow to read a government website and extract details about the contracts awarded each day.



The contract details are output in JSON format. From there the construction firm can save the contracts to a SharePoint list, a Dataverse table, etc.




Get The Web Page HTML Using The HTTP Action

Open Power Automate and create a new instant flow named Get Webpage Plain Text.



Add an HTTP action to the flow to get the web page HTML.



Use the GET method to request the web page.

GET



Input the URL of the website we want to extract data from.

https://www.defense.gov/News/Contracts/Contract/Article/4032268/



Then use the Header Accept text/html. This tells the server it should respond with the web page html.

KeyValue
Accepttext/html




Convert The Web Page HTML to Plain Text

The HTTP action will return the HTML for the web page but we only want the text displayed to the user. Insert an HTML to text action into the flow. Then supply the body of the HTTP action into the content field.



The web page content is output as plain text with all of the HTML tags removed.




Create An AI Prompt To Extract Data From Web Page Text

We want to extract the details of all contracts from the contract text. To do this we can build an AI Prompt. Go to the AI Prompts menu and choose build your own prompt.



Name the AI Prompt Extract Government Contract Announcement Details.



Use this prompt to instruct the large-language model what details should be output.

I want you to read an announcement of contracts awarded by Department of Defense. Make a list of all the contracts and provide the following information if it is available:

Contractor Name: the name company who was awarded the contract.
Contractor City: the city where the Contractor is located.
Contractor State: the state where the Contractor is located. Must be a 2-letter state code.
Amount: the amount awarded to the Contractor. Must be a number value.
Completion Date: when the work is expected to be completed. Date must be in the format YYYY-MM-DD
Work Description: the goods or services being provided by the Contractor
Work City: the city where the work will be performed
Work State: the state where the work will be performed
Contracting Branch: the branch of the military the contracted the work
Contracting Agency: the government agency who contracted the work
Contracting Activity: the alphanumeric unique identifier of the contracting activity
Contract: the alphanumeric unique identifier of the contract

Here is the announcement of contracts text:




Input The Web Page Text As A Variable In An AI Prompt

The web page text must be included in the prompt for the AI to evaluate it. To do this, create a new input variable named ContractsText and insert it at the end of the prompt. Use the web page plain text as sample data.



Choose the output format JSON and then test the prompt. We can now see the contract details output in the prompt response. Press the save custom prompt button and exit the prompt editor menu.




Run The Power Automate Flow To Extract Web Page Data

We are now done building the flow to extract contract details from the web page. Run the flow to ensure it works.



A JSON array of contracts is output by the AI Prompt and each item can be saved to SharePoint as a new list item or to Dataverse as a new record, etc.





Questions?

If you have any questions or feedback about How To Extract Data From Web Page Using Power Automate please leave a message in the comments section below. You can post using your email address and are not required to create an account to join the discussion.

Matthew Devaney

Subscribe
Notify of
guest

1 Comment
Oldest
Newest
Inline Feedbacks
View all comments
Nisar
Nisar
21 hours ago

This is a super cool idea.

Thank you for explaining each step so clearly.