How to Scrape Data from a Website Using Browserbear (Part 1)

Data scraping has become essential for businesses and organizations of all sizes to serve different purposes, including e-commerce price comparison, real estate data analysis, data gathering, and market research. In this article, we'll learn how to scrape data using Browserbear.
by Josephine Loo · February 2023

Contents

    Data scraping, also known as web scraping, is a technique used to extract large amounts of data from websites and various sources, transforming unstructured data into a structured format. This process can be automated and makes collecting information from various sources more efficient.

    To make it more efficient, you can use a cloud-based data scraping tool. It provides greater accessibility, scalability, and flexibility. One such tool is Browserbear, a cloud-based browser automation tool that you can use to automate various browser tasks, including scraping data.

    In this tutorial, we will learn how to scrape data from a website using Browserbear. To demonstrate how the job title, company, location, salary, and link on a job board can be saved as structured data using Browserbear, we will use this Browserbear playground as an example.

    scraping a job board (Browserbear playground)

    What is Browserbear

    Browserbear is a scalable, cloud-based browser automation tool that helps you to automate any browser task. From automated website testing, scraping data, taking scheduled screenshots for archiving, to automating other repetitive browser tasks, you can do it with Browserbear.

    Similar to Bannerbear, the featured automated image and video generation tool from the same company, you can easily create no-code automated workflows by integrating it with other apps like Google Sheets,  Airtable, and more on Zapier. Besides that, you can also use the REST API to trigger tasks to run in the cloud and receive data from the completed tasks.

    Scraping a Website Using Browserbear

    You will need a Browserbear account to follow this tutorial. If you don't have one, you can create a free account.

    1. Create a Task

    After logging in to your account, go to Tasks and create a new task.

    create a task

    Enter a name for your task and click the “Save” button.

    enter a name for the new task

    2. Add Steps - Go to URL

    Click “Add Step” to add your first step. Then, select “go” from the Action dropdown and enter the URL of the website that you want to scrape (https://playground.browserbear.com/jobs/).

    step 1 (got to URL)

    For the Wait Until option, select “networkidle” to wait until no new network requests are made for 500ms.

    Depending on how a website is coded, you can also use other options, like:

    • load - waits for the browser load event to fire.
    • domcontentloaded - waits for the DOMContentLoaded event to fire.

    This will go to the URL entered and wait until there’s no new network request for 500ms  before triggering the next step. Save the step and you will be brought to the previous page.

    3. Add Steps - Save Structured Data

    Click “Add Step” to add the second step. For this step, we will need to specify:

    i) A parent container (in Helper)

    ii) The children HTML elements that hold the data (in Data Picker)

    the parent container and children HTML elements of the Browserbear playground job board

    Note: The children HTML elements must be contained within the parent container.

    To locate the parent container and the children HTML elements, we will need to pass a helper config, which is a JSON object that contains the XPath and the HTML snippet of the elements to Browserbear.

    We will use the Browserbear Helper Chrome extension to help us get the helper config. Install and pin it on your browser.

    Browserbear Helper Chrome extension

    In Browserbear, select “save_structured_data” from the Action dropdown and you will see a textbox for Helper. This is where we will add the helper config that points to the parent container.

    step 2 (save structured data)

    On the job board, click on the extension icon and hover your mouse over the card. You should see a blue box surrounding it.

    using the Browserbear Helper extension

    Click on the card to get the helper config. Copy and paste it into the Helper textbox in your Browserbear dashboard.

    the helper config that points to the parent container

    Browserbear should show you the selected element (article[class=“job card”]).

    pasting the helper config into the Helper

    This will look for every HTML element with the article tag and the job card class (parent container). However, it doesn’t tell Browserbear which HTML element, in particular, to retrieve the data from.

    Therefore, we need to further specify the children HTML elements in the Data Picker. On the job board, click on the job title to get the helper config that refers to it.

    selecting the children HTML element (job title) using Browserbear Helper extension

    Then, enter a name for the data (job_title) and paste the config. Click “Add Data” to add it to your structured data’s result.

    adding data using the Data Picker job title added to the structured data

    Repeat the same process for other texts like company , location , and salary. For the link to the job, choose href to retrieve the URL.

    retrieving the link from an HTML element

    After selecting all the data, click the “Save” button to save the step.

    4. Run the Task

    Click “Run Task” to start scraping data from the URL. You will see the task running when you scroll to the bottom of the page.

    the Browserbear task running

    5. View the Log/Result

    When the task has finished, you can click on the “Log” button to view the result.

    task finished running

    The result will be a Run Object that contains the outputs from various steps in your task. All outputs will be saved in the outputs object and named in the format of  [step_id]_[action_name].

    For this task, we will only have outputs from the second step (Save Structured Data) as the first step (Go) is only an action that navigates to a URL.

    the result (structured data) in the log

    Congrats, you have created your first web scraping task on Browserbear and run it successfully!

    Scraping Multiple Pages

    If the website that you want to scrape has a pagination, you can add a few more steps to the task to scrape data from the next page.

    Add a new step to the task and select the “click” action. As before, use the Browserbear Helper extension and click on the button that brings you to the next page to get the config.

    adding a "click" step

    If there is more than one button that has the same class, you can specify the particular button using the Advanced Mode. One way to do so is by specifying the text content of the button, like this:

    using the Advanced Mode to locate a particular button

    This will select only the button that has the specified text.

    Next, we will add a new step for repeating the “Save Structured Data” step.

    Select the “go_to_step” action and choose “save_structured_data”. Then, enter how many times you want to repeat the selected step.

    adding "go_to_step"

    The task will now look like this:

    an overview of the steps of the task

    Running the Task Using REST API

    Once you have created a task and run it successfully from your Browserbear dashboard, you can trigger it by making a POST request to the Browserbear API.

    You will need the API Key and Task ID:

    retrieving the Browserbear API Key retrieving the Task ID

    After getting the API Key and Task ID, you can make a POST request to the API:

    async function runTask(body) {
    
      const res = await fetch(`https://api.browserbear.com/v1/tasks/${TASK_UID}/runs`, {
        method: 'POST',
        body: body
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${API_KEY}`,
        },
      });
    
      return await res.json();
    }
    
    const run = await runTask(body);
    

    The Browserbear API is asynchronous. To receive the result, you can use either webhook or polling.

    You will need the ID of the run to get the result via API. As the result of the ‘save_structured_data’ step will be saved as an array with its name following the format of [step_id]_save_structured_data, you need to get the step’s ID too.

    Both of them can be retrieved from the POST request’s response:

    {
      created_at: '2023-02-10T03:15:34.949Z',
      video_url: null,
      webhook_url: null,
      metadata: null,
      uid: 'Pv3RoZnwG1onXDaEzM',
      steps: [
        { action: 'go', uid: 'WpZRN5VygWMyd6JgYv', config: [Object] },
        {
          action: 'save_structured_data',
          uid: 'E1RNV0rb0GnzOxmQ6X',
          config: [Object]
        }
      ],
      status: 'running',
      finished_in_seconds: null,
      task: '76m9KbJwQ3jn5k1RZQ',
      outputs: []
    }
    

    To get the result, send a GET request to the API:

    async function getRun(runId) {
      const res = await fetch(`https://api.browserbear.com/v1/tasks/${TASK_UID}/runs/${runId}`, {
        method: 'GET',
        headers: {
          Authorization: `Bearer ${API_KEY}`,
        },
      });
    
      return await res.json();
    }
    
    const runResult = await getRun(run.uid);
    const structuredData = runResult.outputs[`${SAVE_STRUCTURED_DATA_STEP_ID}_save_structured_data`];
    

    This will be part of the /array when you log structuredData:

    [
      {
        job_title: 'Education Representative',
        company: 'Wildebeest Group',
        location: 'Angola',
        link: '/jobs/P6AAxc_iWXY-education-representative/',
        salary: '$51,000 / year'
      },
      {
        job_title: 'International Advertising Supervisor',
        company: 'Fix San and Sons',
        location: "Democratic People's Republic of Korea",
        link: '/jobs/_j_CPB1RFk0-international-advertising-supervisor/',
        salary: '$13,000 / year'
      },
      {
        job_title: 'Farming Strategist',
        company: 'Y Solowarm LLC',
        location: 'Poland',
        link: '/jobs/ocXapDzGUOA-farming-strategist/',
        salary: '$129,000 / year'
      }
      ...
    ]
    

    🐻 View the full code that shows you how to receive the result using both webhook and API polling on GitHub.

    Conclusion

    Congratulations! You have successfully learned how to scrape data from a website using Browserbear. With the steps outlined above, you can easily scrape data from any website with a few clicks and minimum coding. To know about the Browserbear API in details, refer to the API Reference.

    🐻 If you'd like to learn the advanced method of using Browserbear to scrape a website, continue with How to Scrape Data from a Website Using Browserbear (Part 2).

    About the authorJosephine Loo
    Josephine is an automation enthusiast. She loves automating stuff and helping people to increase productivity with automation.

    Automate & Scale
    Your Web Scraping

    Browserbear helps you get the data you need to run your business, with our nocode task builder and integrations

    How to Scrape Data from a Website Using Browserbear (Part 1)
    How to Scrape Data from a Website Using Browserbear (Part 1)