How To Extract Data From Web Page Using Power Automate
Power Automate cloud flows can be used to extract data from a web page. You can get the web page HTML by using an HTTP action and output its details with an AI Prompt. This technique will only work on web pages where text is included when the initial page is loaded. When the text is generated after the page loaded (JavaScript, client-side rendering, etc.) you will need to use Power Automate Desktop instead.
Table of Contents
• Introduction: The Get Contract Details Automation
• Get The Web Page HTML Using The HTTP Action
• Convert The Web Page HTML to Plain Text
• Create An AI Prompt To Extract Data From Web Page Text
• Input The Webpage Text As A Variable In An AI Prompt
• Run The Power Automate Flow To Extract Webpage Data
Introduction: The Get Contract Details Automation
A construction firm uses a Power Automate cloud flow to read a government website and extract details about the contracts awarded each day.
The contract details are output in JSON format. From there the construction firm can save the contracts to a SharePoint list, a Dataverse table, etc.
Get The Web Page HTML Using The HTTP Action
Open Power Automate and create a new instant flow named Get Webpage Plain Text.
Add an HTTP action to the flow to get the web page HTML.
Use the GET method to request the web page.
GET
Input the URL of the website we want to extract data from.
https://www.defense.gov/News/Contracts/Contract/Article/4032268/
Then use the Header Accept text/html. This tells the server it should respond with the web page html.
Key | Value |
Accept | text/html |
Convert The Web Page HTML to Plain Text
The HTTP action will return the HTML for the web page but we only want the text displayed to the user. Insert an HTML to text action into the flow. Then supply the body of the HTTP action into the content field.
The web page content is output as plain text with all of the HTML tags removed.
Create An AI Prompt To Extract Data From Web Page Text
We want to extract the details of all contracts from the contract text. To do this we can build an AI Prompt. Go to the AI Prompts menu and choose build your own prompt.
Name the AI Prompt Extract Government Contract Announcement Details.
Use this prompt to instruct the large-language model what details should be output.
I want you to read an announcement of contracts awarded by Department of Defense. Make a list of all the contracts and provide the following information if it is available: Contractor Name: the name company who was awarded the contract. Contractor City: the city where the Contractor is located. Contractor State: the state where the Contractor is located. Must be a 2-letter state code. Amount: the amount awarded to the Contractor. Must be a number value. Completion Date: when the work is expected to be completed. Date must be in the format YYYY-MM-DD Work Description: the goods or services being provided by the Contractor Work City: the city where the work will be performed Work State: the state where the work will be performed Contracting Branch: the branch of the military the contracted the work Contracting Agency: the government agency who contracted the work Contracting Activity: the alphanumeric unique identifier of the contracting activity Contract: the alphanumeric unique identifier of the contract Here is the announcement of contracts text: |
Input The Web Page Text As A Variable In An AI Prompt
The web page text must be included in the prompt for the AI to evaluate it. To do this, create a new input variable named ContractsText and insert it at the end of the prompt. Use the web page plain text as sample data.
Choose the output format JSON and then test the prompt. We can now see the contract details output in the prompt response. Press the save custom prompt button and exit the prompt editor menu.
Run The Power Automate Flow To Extract Web Page Data
We are now done building the flow to extract contract details from the web page. Run the flow to ensure it works.
A JSON array of contracts is output by the AI Prompt and each item can be saved to SharePoint as a new list item or to Dataverse as a new record, etc.
Did You Enjoy This Article? 😺
Subscribe to get new Power Apps & Power Automate articles sent to your inbox each week for FREE
Questions?
If you have any questions or feedback about How To Extract Data From Web Page Using Power Automate please leave a message in the comments section below. You can post using your email address and are not required to create an account to join the discussion.
This is a super cool idea.
Thank you for explaining each step so clearly.