How to Scrape Lists with Browserbear (A No Code Guide)

Some web scraping situations may call for grouping list items together in a field. Here are some tips for setting up nocode list scraping tasks with Browserbear.
by Julianne Youngberg · May 2023

Contents

    Web scraping most often involves extracting data from various elements on a webpage, with each one being a separate field that can be deposited in a database. But in some cases, it’s useful to group similar items together in a field, such as in the case of a list.

    Lists can be ordered, unordered, or description-based, and may contain nested lists. You might find them commonly used for things such as:

    • Batched items
    • Product descriptions
    • Recipe ingredients

    Since the number of list items can vary across different pages, it’s more practical to scrape the entire list as a single item rather than per-line.

    There are several methods to extract lists from webpages, but using a nocode web scraping tool like Browserbear can make the process much simpler. This article will guide you through setting up a list scraping task in Browserbear and provide tips for creating similar workflows on your own.

    Note : Not all lists are as they seem. Table of contents, for example, might look nested on the frontend but not be a true list. Rather, they are just styled to look like one using CSS or a CMS (content management system). Keep in mind that these might present different results when scraped.

    How to Set up a List Scraping Task on Browserbear

    When scraping lists, it’s important to set tools up in a way that accurately selects all the data and maintains formatting when pulled into a database. We will use Browserbear to create a task that can be triggered by an event or on a schedule.

    Log into your Browserbear account (or create a free trial account if you don’t have one — no credit card required! 🎉). Click the Tasks tab, then the Create a Task button.

    Give your task a name, then Save.

    Screenshot of Browserbear new task setup with red arrow pointing to save

    You will now be on a page where you can add steps, integrate with other apps, run your task, and review completed runs.

    Let’s create a simple task to scrape a list of ingredients from a recipe site. Click Add Step and set up the following actions:

    Step 1. Go

    Select go as the Action, insert your destination website URL, and choose domcontentloaded as the wait instructions.

    Screenshot of Browserbear go action setup

    Click Save.

    Bear Tip 🐻: We will use domcontentloaded as the load instructions in this situation because we are only scraping text. Other options include networkidle and load, with the former being the safest in the majority of situations.

    Step 2. Save Structured Data

    This step will define a parent container and the child elements you want to scrape within it. We will only set up an action for one container. It will then apply to all others on the page containing the same elements.

    You can also use Zapier to override the task’s initial destination URL and send it to another page with the same elements, which is what was done in this example.

    Bear Tip 🐻: We are using the save_structured_data action to save multiple items in a structured format, but the save_text action will also work if you’re only looking to scrape the list.

    Select save_structured_data as the Action.

    You will need to use the Browserbear Helper Chrome extension to retrieve the helper configs. Activate the extension on the fully loaded destination site.

    Screenshot of Allrecipes site with red arrow pointing to Browserbear Helper Chrome extension

    Hover over the parent container until a blue outline indicates the selection. Take special care to make sure the entire list you are scraping is within the container.

    Screenshot of Allrecipes site with Browserbear Helper blue outline around parent container

    Click the element, then copy the config in the popup window.

    Screenshot of Browserbear Helper config

    Paste the config into the Helper section of your save_structured_data step.

    Screenshot of Browserbear save_structured_data action setup with red box around helper config

    Your parent container has been defined; now, we have to identify the individual HTML elements we want to scrape.

    Return to your destination site and make sure the extension is activated. Hover over the list you want to scrape and click when a blue outline surrounds the entire block.

    Screenshot of Allrecipes site with blue outline around ingredients element

    Copy the config and paste it into the Helper Config section of the Data Picker. Add a name and specify the type of data you want to pull (the default is Text ).

    Screenshot of Browserbear save_structured_data action setup with red box around data picker

    Click Add Data.

    Repeat this step for any other children HTML elements you want scraped, then click Save once everything has successfully loaded onto the log on the right side of the task builder.

    Screenshot of Browserbear save_structured_data action setup with red box around log

    Return to your task page, then click Run Task to test your automation.

    Screenshot of Browserbear task page

    A run will appear at the bottom of your task page, showing if it is successfully completed.

    Screenshot of Browserbear task page with red arrow pointing to Log

    If you click Log , you’ll be able to see the output of your browser task.

    Note that the output is in HTML format; tags like \n, \n1, and so on signify line breaks.

    Screenshot of Browserbear log with red box around output

    To make it easier to view, we’ve used Zapier to automatically send the structured data to Airtable.

    Screenshot of Airtable record with red box around ingredients

    As you can see, the list items are still properly formatted with line breaks, making it easy for you to view and manipulate according to your needs.

    3 Tips for Scraping Lists from Web Pages

    Scraping lists from web pages can be challenging because it requires ensuring the accuracy of extracted data as well as correct formatting. Here are some tips to increase the accuracy of your automation and ensure you get the expected output:

    1 - Select Containers that Work with a Variety of Webpages

    Consider the characteristics of the website you're scraping before deciding on the save action and setting up the automation. Does a single page have multiple parent containers with lists you want to scrape? Or is there only one list you're extracting from multiple pages?

    Choosing an action and container that work reliably will help. When using the Chrome extension to select the config of the list element, make sure the blue outline contains the entire list, without any items left out.

    Screenshot of Allrecipes site with blue outline around ingredients container

    Test the selection across multiple pages to ensure lists can be fully extracted. Keep in mind that this is only possible if the webpages have identical structures.

    2 - Override the “Go” Action with Zapier

    When scraping from multiple non-consecutive webpages, you'll need to override the first destination URL. This is possible using the REST API or using Zapier for a no-code option.

    The zap to override the “ go ” action is simple: a trigger that actives on a schedule or in response to an event, followed by an action that signals Browserbear to launch a task run.

    Screenshot of Zapier flow with Airtable new record trigger and Browserbear create run action

    Set up the action to map to a dynamic field with links to other webpages you want to scrape. In this example, we're using Airtable.

    Screenshot of Zapier Browserbear create run action setup

    When links are added to the database, they override the destination URL and trigger a new task run.

    3 - Use Database Formulas to Adjust Formatting

    The list you extract from a webpage is in HTML format and may require some formatting adjustments. If you need to adjust spacing, remove unnecessary text, or make other formatting changes, it's often easier to do so in the database.

    Familiarize yourself with the syntax and a few basic formulas on your database of choice. On Airtable, for instance, \ndenotes a line break and REGEX formulas can be useful for correcting irregularities in your lists. Formulas like TRIM and SUBSTITUTE can also be used to remove unnecessary text.

    Screenshot of Airtable TRIM and SUBSTITUTE formula

    Check the task run log to see how the extracted data is formatted, making it easier to adjust the result to your needs.

    Scrape Lists Accurately with a Browser Automation Tool

    Many data scraping use cases will still involve saving each element as its own item. However, there are still situations where you might have to group similar things together. By selecting the right containers and setting up the appropriate actions, you can easily and accurately extract batched items, product descriptions, recipe ingredients, and other list-based information.

    With these tips and tools, you’ll be on your way to efficiently and effectively scraping lists from the web with Browserbear.

    About the authorJulianne Youngberg@paradoxicaljul
    Julianne is a technical content specialist fascinated with digital tools and how they can optimize our lives. She enjoys bridging product-user gaps using the power of words.

    Automate & Scale
    Your Web Scraping

    Browserbear helps you get the data you need to run your business, with our nocode task builder and integrations

    How to Scrape Lists with Browserbear (A No Code Guide)
    How to Scrape Lists with Browserbear (A No Code Guide)