How To Extract Data From Web Page Using Power Automate
data:image/s3,"s3://crabby-images/16aed/16aed9da5fecdad551fcf2c063bf3d0601fb8b28" alt="How To Extract Data From Web Page Using Power Automate"
Power Automate cloud flows can be used to extract data from a web page. You can get the web page HTML by using an HTTP action and output its details with an AI Prompt. This technique will only work on web pages where text is included when the initial page is loaded. When the text is generated after the page loaded (JavaScript, client-side rendering, etc.) you will need to use Power Automate Desktop instead.
Table of Contents
• Introduction: The Get Contract Details Automation
• Get The Web Page HTML Using The HTTP Action
• Convert The Web Page HTML to Plain Text
• Create An AI Prompt To Extract Data From Web Page Text
• Input The Webpage Text As A Variable In An AI Prompt
• Run The Power Automate Flow To Extract Webpage Data
Introduction: The Get Contract Details Automation
A construction firm uses a Power Automate cloud flow to read a government website and extract details about the contracts awarded each day.
data:image/s3,"s3://crabby-images/9e03e/9e03eb149c70934fe4631ea1ebe2e03abcd7afaf" alt=""
The contract details are output in JSON format. From there the construction firm can save the contracts to a SharePoint list, a Dataverse table, etc.
data:image/s3,"s3://crabby-images/f67ed/f67ed061abcf3570684e70f776c9b54b3dc43cce" alt=""
Get The Web Page HTML Using The HTTP Action
Open Power Automate and create a new instant flow named Get Webpage Plain Text.
data:image/s3,"s3://crabby-images/11bfb/11bfb7a405e30c919eaf5ee0ed1208b158a85bc9" alt=""
Add an HTTP action to the flow to get the web page HTML.
data:image/s3,"s3://crabby-images/70adb/70adb12e5ebad03484b4f7c8ef116bd71cda7e9f" alt=""
Use the GET method to request the web page.
GET
Input the URL of the website we want to extract data from.
https://www.defense.gov/News/Contracts/Contract/Article/4032268/
Then use the Header Accept text/html. This tells the server it should respond with the web page html.
Key | Value |
Accept | text/html |
Convert The Web Page HTML to Plain Text
The HTTP action will return the HTML for the web page but we only want the text displayed to the user. Insert an HTML to text action into the flow. Then supply the body of the HTTP action into the content field.
data:image/s3,"s3://crabby-images/b7c25/b7c253e3f0b0d0bf7a685022de53a074971d8802" alt=""
The web page content is output as plain text with all of the HTML tags removed.
data:image/s3,"s3://crabby-images/62873/62873698a4bbd22f56cb6c463905ca41ea94ffb9" alt=""
Create An AI Prompt To Extract Data From Web Page Text
We want to extract the details of all contracts from the contract text. To do this we can build an AI Prompt. Go to the AI Prompts menu and choose build your own prompt.
data:image/s3,"s3://crabby-images/ba585/ba5853d686b0219b0dd8f8cc2d343c85faffa9b3" alt=""
Name the AI Prompt Extract Government Contract Announcement Details.
data:image/s3,"s3://crabby-images/ea11c/ea11c1e7bad604cc8782a26320c9d3a3242fcc38" alt=""
Use this prompt to instruct the large-language model what details should be output.
I want you to read an announcement of contracts awarded by Department of Defense. Make a list of all the contracts and provide the following information if it is available: Contractor Name: the name company who was awarded the contract. Contractor City: the city where the Contractor is located. Contractor State: the state where the Contractor is located. Must be a 2-letter state code. Amount: the amount awarded to the Contractor. Must be a number value. Completion Date: when the work is expected to be completed. Date must be in the format YYYY-MM-DD Work Description: the goods or services being provided by the Contractor Work City: the city where the work will be performed Work State: the state where the work will be performed Contracting Branch: the branch of the military the contracted the work Contracting Agency: the government agency who contracted the work Contracting Activity: the alphanumeric unique identifier of the contracting activity Contract: the alphanumeric unique identifier of the contract Here is the announcement of contracts text: |
Input The Web Page Text As A Variable In An AI Prompt
The web page text must be included in the prompt for the AI to evaluate it. To do this, create a new input variable named ContractsText and insert it at the end of the prompt. Use the web page plain text as sample data.
data:image/s3,"s3://crabby-images/520bc/520bcbd0ad1ab0d6dda9c8b6d4cef973ae3c3a2f" alt=""
Choose the output format JSON and then test the prompt. We can now see the contract details output in the prompt response. Press the save custom prompt button and exit the prompt editor menu.
data:image/s3,"s3://crabby-images/44191/441916def3eb13fd0ebdf230c38daf368211d186" alt=""
Run The Power Automate Flow To Extract Web Page Data
We are now done building the flow to extract contract details from the web page. Run the flow to ensure it works.
data:image/s3,"s3://crabby-images/e6d31/e6d313fa1748b8979091bf79f88a7518060bf396" alt=""
A JSON array of contracts is output by the AI Prompt and each item can be saved to SharePoint as a new list item or to Dataverse as a new record, etc.
data:image/s3,"s3://crabby-images/c524d/c524dd82e1cc766ea64d19cc31898f83fa26b506" alt=""
Did You Enjoy This Article? 😺
Subscribe to get new Power Apps & Power Automate articles sent to your inbox each week for FREE
Questions?
If you have any questions or feedback about How To Extract Data From Web Page Using Power Automate please leave a message in the comments section below. You can post using your email address and are not required to create an account to join the discussion.
This is a super cool idea.
Thank you for explaining each step so clearly.
Awesome content! I was able to get data from this very article. However, it does not always work – I wasn’t able to extract data from local online-store for example. Error 403.
Denis,
Correct, it will not work for every site. You just have to try it and see.
Hello Matthew,
Thank you for this post and everything at your site. I have been assisted many times by the content here. Question about this post … how can you tell if it is a web page where text is included when the initial page is loaded? What about this site? https://community.powerplatform.com/profile/#notifications
Thanks!
Chris,
There’s no way quicker way to know than to build the flow and just plugin URL. That’s what I do!
Thanks for the reply!
Rats … this site didn’t work. Desktop here I come I suppose.
community.powerplatform.com/profile/#notifications
Trying to make a synopsis of the day’s posts at that site.
Nice use of an AI prompt. Thanks for sharing!
Great article. I have limited experience with AI Prompts especially with token usage. The Microsoft documentation reads that AI Prompts don’t use tokens in the context of Copilot Studio. Am I interpreting the documentation correctly that using your solution with Power Automate would consume AI Builder credits?
Mike,
Correct, in Power Automate AI Prompts use credits.
The current cost for GPT 4o mini is:
$0.50 per million input tokens
$1.50 per million output tokens