How to Dynamically Filter Duplicates in Google Sheets (No Code)
Duplicate records are fairly common when you work with large amounts of data. Some use cases—such as scraping and storing data from online sources—pretty much guarantee that you’ll encounter duplicates on a near-daily basis.
Left unchecked, excess records can quickly cause problems in any database app. They can reduce documentation accuracy and complicate workflow creation. Implementing deduplication measures can streamline your processes and ensure unique records.
This article will highlight the significance of deduplication and show you a simple method to dynamically filter duplicate rows or columns in Google Sheets.
Why Remove Duplicate Records in Google Sheets?
Sheets is a cloud-based spreadsheet app offered by Google as a part of their complete office suite. With its ability to connect and sync data with other apps, Sheets allows users to create automated workflows without the need for complex coding. This integration capability makes it a good option for use cases like data scraping.
Whether due to human error or overlaps in automation runs, duplicate records affect the reliability of your data. Removing or filtering them can be useful for several reasons:
- Enhanced data integrity : Removing duplicates ensures that your data is accurate and reliable, which contributes to higher-quality insights.
- Improved efficiency : Proactive deduplication saves time that would otherwise be spent cleaning up your data before use.
- Streamlined collaboration : One well-maintained source of data ensures there’s no confusion when several people are working with the same material.
Removing duplicates can involve deleting the excess (either manually or automatically), but it can also mean identifying unique rows and creating another set of accurate data. The second approach is the one we will take in this guide.
Deduplication Methods on Google Sheets
Every database software has a different approach to deduplication. Some options you can explore with Google Sheets are:
- Data cleanup tool : This built-in tool allows users to select the columns they want to analyze and remove any duplicate rows. While it is easy and fast, it still requires manual work.
- Paid extensions : Extensions like Ablebits enable automated deduplication for a price.
- Pre-built functions : Functions (or formulas) allow users to customize, automate, and analyze their spreadsheets.
- Custom scripts : Apps Script and the Google Sheets API give users more control over automated spreadsheet actions, but they require a certain level of technical proficiency.
Understanding which approach to use based on technicality and budget is the key to maintaining your database. In this article, we’ll explore the simplest automatic deduplication method: the UNIQUE function.
Advantages & Limitations of the UNIQUE Function
The biggest upside to using UNIQUE is that it’s a single built-in formula that’s easy to set up. It’s fully dynamic, requiring no additional intervention when new rows are added. You also don’t need to integrate other extensions or tools.
Downsides to consider when selecting a deduplication method is that UNIQUE can only be used with data ranges; selecting specific columns to analyze would make this function much more complicated to use. It also only flags exact matches.
Since it essentially filters a dataset and displays unique values, this method doesn’t actively delete duplicates or allow filtered data to be edited. You would have to make changes to the initial dataset.
This combination of advantages and limitations makes UNIQUE perfect for use cases that don’t involve much active database management, such as automated web scraping.
How to Create a Deduplicated Dataset with the UNIQUE Function
Setting up a deduplicated dataset with UNIQUE can be done in a matter of minutes. Log into your Google Sheets account, then access (or create) a spreadsheet containing your data, prepped with header columns.
Click the + button on the left side of the bottom sidebar to add a new sheet.
Copy the headers from any rows or columns you want to display on your deduplicated sheet.
Bear Tip 🐻: This method must be used with a single data range, so ensure all of the data you want to display together is also placed together on your initial dataset.
Insert your UNIQUE formula into cell A2, or the cell where you want your deduplicated dataset to begin.
Bear Tip 🐻: You can use UNIQUE on the same sheet as your initial dataset if you prefer to see them side-by-side.
The UNIQUE function syntax is:
UNIQUE(range, [by_column], [exactly_once]).
| **Part** | **Description** |
| range | The data to filter by unique entries. |
| by_column | Whether to filter the data by columns or by rows. By default, this is false. |
| exactly_once | Whether to return only entries with no duplicates. By default, this is false. |
Your formula will depend on your dataset, but here are some examples to help you build yours:
=UNIQUE(B2:G9, FALSE, TRUE)
Pressing the enter key will apply the formula and display your deduplicated dataset as a range:
And that’s all there is to it!
Tips & Tricks
The UNIQUE formula is an incredibly simple way to deduplicate a dataset, but we can take it a step further. Here are some ideas to add on to your process:
- Order and deduplicate your output data in the same step by nesting your UNIQUE formula within a SORT function.
- Nest your UNIQUE formula within a FILTER function to only display rows or columns that meet pre-specified conditions.
- Highlight duplicate rows or columns by using COUNTIF with conditional formatting.
- Use your deduplicated sheet as the trigger for any new data-based automations to avoid wasting resources on duplicates.
- Write a simple QUERY to only display n rows or columns in your deduplicated dataset.
You can upgrade your deduplication in endless ways within Google Sheets or by integrating other tools.
The Simplest Google Sheets Deduplication Method
Not every situation calls for a complicated or expensive automatic duplicate flagging and removal process. Sometimes, all you need is a way to only display unique information to use for documentation, analysis, or even workflow automation.
Google Sheets’s UNIQUE function is probably the simplest way to deduplicate your dataset, and it might be a solution for you.