How to Automatically Delete Duplicate Records in Airtable (Part 2)

Learn how to take your Airtable deduplication process a step further by integrating Zapier to automate the scanning and removal process.
by Julianne Youngberg · February 2024

Contents

    Automating the process of removing extra records in a database is helpful in many use cases, particularly for web scraping. Most of the time, your goal is to collect and prepare data for use with minimal manual involvement.

    When you’re scraping and storing data online, there is a higher chance of encountering duplicate records. This could be due to extractions from various sites, overlapping sources, and different formats. You may even scrape the same data multiple times during different task runs.

    Implementing a deduplication process helps maintain a reliable source of information. In Part 1, you learned to build an Airtable base that flags records as duplicates based on specific criteria, such as matching fields or unique identifiers.

    But even after you’ve identified extra records, you still need to repeat the process for new datasets and remove them periodically. In this article, you’ll learn how to connect with Zapier to do this automatically.

    What You Will Create

    Duplicate records affect the accuracy and reliability of your Airtable base. The workflow you build will add new records to the master lookup so they can be cross-checked against others. Any entries flagged as duplicates will periodically be deleted.

    The tools you will use are:

    • Airtable : To store data and identify duplicate records
    • Browserbear (optional): To extract data with browser automation
    • Zapier : To facilitate the movement and deduplication of new data

    At the end of this tutorial, you should have an Airtable base that automatically scans new records for repeat values…

    Screenshot of Airtable base with duplicate values

    …and then deletes them to produce a more accurate database:

    Screenshot of Airtable base without duplicate values

    Let’s get started!

    How to Automate Airtable Deduplication with Zapier

    In Part 1 of this article, you learned how to set an Airtable base up to identify duplicates. But because the solution only scans current records, you’ll need to complete it with a method to automatically link new records, and then delete extras periodically.

    We’ll be picking things up from where we left off, with an Airtable base that’s able to flag duplicate entries. Two zaps will help us complete the automated process.

    Set Up a Zap to Send New Records to Airtable

    This zap will send new batches of data to Airtable in a format that automatically adds it to the lookup value.

    ❗️ Note : This tutorial assumes a web scraping use case where a Browserbear task collects batches of data that need to be stored. However, you can customize the data source and trigger according to your specific requirements.

    Log into your Zapier account, click + Create Zap , and set up the following events:

    Trigger: Run Finished in Browserbear

    Choose Browserbear as the app and Run Finished as the event. You’ll need to connect your Browserbear account using your API key.

    Set up the trigger by selecting the scraping task you created.

    Screenshot of Zapier Browserbear run finished trigger

    Test the trigger to ensure a task run is found, then continue.

    Bear Tip 🐻: Learn more about how to set up a scraping task and send it to Airtable here.

    Action: Create Records (With Line Item Support) in Airtable

    Choose Airtable as the app and Create Records (With Line Item Support) as the event. If you haven’t connected your Airtable account yet, you will be prompted to do so.

    Set up the action by selecting the right base and table, then mapping each output item to its corresponding Airtable field.

    Screenshot of Zapier Airtable create records with line item support action

    In the Lookup Data field, enter the text: Link All Records. This will link your new record to the consolidated lookup value in your Airtable base.

    Screenshot of Zapier Airtable create records with line item support action lookup data field value

    Test the action, then save the zap.

    Bear Tip 🐻: Using this action to create multiple records requires your data to be in a structured format—an array. If you’re pulling from different or multiple data sources, you may have to include a Formatter step.

    Set Up a Zap to Automatically Delete Duplicates

    This zap will find records marked as duplicates, then delete them.

    From your Zapier home page, click + Create Zap , and set up the following events:

    Trigger: Every Day in Schedule by Zapier

    Choose Schedule by Zapier as the app and Every Day as the event.

    Set up the trigger by choosing your preferred trigger time. You can also modify this to be a different frequency.

    Screenshot of Zapier every day in schedule by Zapier trigger

    Test the trigger, then continue.

    Action: Find Many Records (With Line Item Support) in Airtable

    Choose Airtable as the app and Find Many Records (With Line Item Support) as the event.

    Select the right base and table. Insert the text Duplicate? in the Search by Field field and Duplicate in the Search Value field. You can also limit the view to the Duplicate view.

    Screenshot of Zapier Airtable find many records with line item support action setup

    Make sure the action is not considered a “success” when nothing is found. This simply causes the automation to end here if there are no duplicates.

    Screenshot of Zapier Airtable find many records with line item support success field setup

    Test the action, then continue.

    Action: Text in Formatter by Zapier

    Choose Formatter by Zapier as the app and Text as the event.

    Select Spreadsheet-Style Formula as the transform type.

    You will want to produce a URL matching the format provided by Airtable’s documentation:

    https://api.airtable.com/v0/{baseId}/{tableIdOrName}/{recordId}
    

    This will result in a formula that looks something like this:

    https://api.airtable.com/v0/appJ7gJ4RVKWNWNPB/tblLW8LXOKY34d4Em/recaDyXRqA6Oxey8s
    

    Screenshot of Zapier text in formatter by Zapier action formula setup

    Test the action.

    Bear Tip 🐻: Retrieve a base’s data (base IDs, table IDs, field IDs, etc.) by going to the Introduction section of the API Reference and selecting your project.

    Action: Create Loop From Line Items in Looping by Zapier

    Since the number of duplicate values will vary each time the automation is run, we'll use a loop that deletes the records one by one until they are all removed.

    ❗️ Note : There are other methods you can use to delete more than one record, such as by using Airtable's Delete Multiple Records endpoint or appending duplicate records with Zapier's Digest tool until a certain number is accumulated. However, these methods can be challenging given the way Zapier encodes query strings in a request. We have found looping to be the most efficient method for most use cases.

    Choose Looping by Zapier as the app and Create Loop From Line Items as the event.

    Insert records into the left side Values to Loop section and map the line item list of URLs from the previous step into the right.

    Screenshot of Zapier looping by Zapier create loop from line items action setup

    You can also add a limit to the maximum number of loop iterations if you want to ensure your API quotas are managed.

    Test the action. You should see a loop iteration of your values in the output.

    Action: API Request (Beta) in Airtable

    Choose Airtable as the app and API Request (Beta) as the event.

    Set up the action by choosing DELETE as the HTTP Method and inserting the previous step’s output as the URL.

    Screenshot of Zapier Airtable API request action

    You will have to set up one additional request header for authorization:

    Authorization: Bearer {Your_Token_Here}

    Screenshot of Zapier Airtable API request action authorization header

    Click continue, and test the action. Any duplicate test data on your Airtable base should be deleted.

    Screenshot of Airtable base without duplicate values

    When you’re happy with your workflow, publish the zap to take it live!

    Bear Tip 🐻: Create a personal access token by visiting the Developer Hub, and ensure your token scopes include data.records:read and data.records:write.

    Minimizing Manual Database Management

    The best databases are accurate, up-to-date, and easy to understand. Automation can help you improve your data quality by reducing human errors and inconsistencies. Many processes including duplication removal can be automated in several ways, and there's no one method that will work for every use case. Putting in the work to decide what best suits your needs will pay off in the long run.

    The deduplication process is one area of database management that becomes more reliable and time-efficient with less manual intervention. The best part is that setting it up requires zero code!

    About the authorJulianne Youngberg@paradoxicaljul
    Julianne is a technical content specialist fascinated with digital tools and how they can optimize our lives. She enjoys bridging product-user gaps using the power of words.

    Automate & Scale
    Your Web Scraping

    Browserbear helps you get the data you need to run your business, with our nocode task builder and integrations

    How to Automatically Delete Duplicate Records in Airtable (Part 2)
    How to Automatically Delete Duplicate Records in Airtable (Part 2)