Tutorial: Scraping data
Getting basic data from websites.
Last updated
Getting basic data from websites.
Last updated
In this tutorial, we're going to scrape data from this webpage: https://demo.goless.com/.
To get started, access the extension, open the dashboard and click on "New Workflow".
Your workflow will start with a trigger. A trigger is an action defining when and under what conditions your automation should execute. By default, the trigger is set to "Manual" mode, meaning the automation will only run when you initiate it yourself.
You have the option to select a different trigger or add multiple triggers for your automation. These could include intervals, schedules, context menus (right-click on web pages), specific dates, on browser start-up, or keyboard shortcuts.
Next, add the "New tab" block. This means that upon automation initiation, a new tab will open with the address you specify, which in this case will be the webpage for scraping: https://demo.goless.com/.
Next, we add a "Loop elements" block. This block will iterate over the elements on the page as a list. We need to capture all blocks with the class .post
. Thus, we specify the CSS Selector as .post
.
To select this .post
class, visit the demo.goless.com site and enable Element Selector in the extension.
We then hover our mouse over the desired block to obtain information about the required class.
Next, within the .post
element, we need to get the title. To do this, we add a "Get text" block to our workflow. In the settings, we specify: {{ loopData@items }} .title
- here we instruct the script to take the elements from the previous block (items in our case, which should be pre-defined as Loop ID) and search within it for the CSS class .title
.
If you need to capture several fields and export them, you will need to set up a table. Select the "Insert to table" checkbox and choose into which field of the table the data should be added. Click on the table icon in the top-right corner beforehand to create a table format.
To terminate the loop, add a Loop Breakpoint and specify the id of the Loop elements, which is items
in our case.
The final block is the data export. You need to add an "Export data" block to download the gathered data upon completion.
And with that, our workflow setup is complete. Upon running, the automation will save the data from the website in a csv file.