If you were stranded on a desert island, this is the Google Sheets formula you’d want to have with you. It’s got it all.
What This Formula Can Do for You
Join two tables
This formula joins two tables to produce results similar to those you’d get from a LEFT JOIN in SQL. That is to say, for every row of data on the left (table “A”) it gives you the first matched row from table “B”. In those instances where a row in table “A” has no matching row in table “B”, you get an empty cell.
Multiple match criteria
The standard way to match up data in Google Sheets is with the VLOOKUP function. With VLOOKUP you specify a “key” from table “A” (e.g. a specific cell like “A2”). VLOOKUP then looks for the first matching value from the first column in table “B”. But what if you need to match against the values from multiple columns from tables A and B?
For example, what if we want to find a row in Table B that matches on both the date and campaign ID?
Table A
Date | Campaign ID | Cost |
---|---|---|
2020-02-01 | 987654321 | 10 |
2020-02-01 | 123456789 | 20 |
2020-02-02 | 987654321 | 30 |
2020-02-02 | 123456789 | 40 |
Table B
Date | Campaign ID | Sessions | Users |
---|---|---|---|
2020-02-01 | 987654321 | 12 | 10 |
2020-02-01 | 123456789 | 24 | 20 |
2020-02-02 | 987654321 | 36 | 30 |
2020-02-02 | 123456789 | 48 | 40 |
Your “ultimate formula” overcomes this apparent limitation, matching against any combination of multiple columns.
Columns in your lookup range can be in any order
Another apparent limitation of VLOOKUP is that the “key” in second table needs to be the first column of the table. But what if it’s not? Your “ultimate formula” circumvents this apparent limitation, as well!
Reduces inefficiency and errors
There’s a principle in programming that says “don’t repeat yourself”. The idea here is that doing things like copying and pasting a formula down a bunch of rows is inefficient and error-prone.
The old-school method is to copy and paste (or drag) formulas down a bunch of rows. But suppose you later need to make a change to your formula. After you make the change to the formula in one cell, you have to remember to copy-and-paste or drag the formula down to the rest of the cells in the column. And what if you accidentally miss a row?
With the ultimate formula, you add the formula to one cell – just one cell – and the formula automatically applies to every row you want it to apply to. So, if you need to change something in the formula, you only need to change it once in one single cell. And every other row is automatically updated.
One formula in one cell gets results for every row
What’s more, copying-and-pasting a formula might get the job done if your data set is always the same size. But what if its size can change? For example, maybe you’re pulling in your data dynamically via an API connector such as Supermetrics, or you’re using IMPORTDATA to import data from a dynamically updated CSV.
With your “ultimate formula”, you never have to worry about your data set growing beyond the point to which you’ve applied your formula. Your “ultimate formula” grows and shrinks with your data set, guaranteeing that your formula will be applied to every row of data regardless of the size of the data set.
Gets as many columns as you want from the lookup table
For every row where a match is found, your “ultimate formula” will return as many columns as you’d like. For example, this one formula can return both the sessions and users from table B in our example above.
Only applies the formula to rows with data
Ever seen that annoying “#N/A” error? You’ll get that when, for example, you apply a formula to a row with no data. You’ll also get the error if a VLOOKUP doesn’t find any matching rows. The formula you’re about to learn eliminates both of these cases so you never have to see the #N/A error again.
The Data Sets
For our example, we’re going to use two sets of data: 1) ad performance data pulled from Facebook Ads, and 2) session data pulled from Google Analytics.
Facebook Ads
Google Analytics
The Formula
Let’s cut the BS and get to the formula already. Once you’ve got the formula and can see what it does, we’ll break it down so you can see how all of the elements work together.
=IFERROR(FILTER(VLOOKUP(A2:A&C2:C&G2:G, {google_analytics!A2:A&google_analytics!B2:B&google_analytics!C2:C,google_analytics!D2:E}, {2,3}, FALSE), A2:A<>""))
You can see it in action here.
Step-by-Step Tutorial
You know the old saying, “Give a man a fish…”
Well, now you’ve got a fish. But rather than just sending you on your way, I’d much rather see you use this “fish” as a vehicle for developing some kick ass Google Sheets knowledge and skills.
So, if you’re up for it, let’s break down step-by-step what this formula is doing.
FILTER
The first function we should talk about is FILTER.
The FILTER function looks at a range of cells and then gives you only those cells that match a set of criteria. For example, you might use the filter function to get a list of all positive numbers from a column.
I use FILTER in almost every formula I write. Seriously. It’s that useful. Most often I use it in a perhaps slightly unconventional way. I use it to apply a formula across all rows that have stuff in them (and only rows that have stuff in them).
Suppose you’ve got 10 rows of data. You want to apply a formula to every row of data. And you only want your formula to be applied to those 10 rows that have data. In other words, you want the formula to be applied to the ten rows, and not to the other 990 empty rows that might be there in your sheet.
In our “ultimate” formula, we’re using the FILTER function to apply the VLOOKUP function to every row where the cells in column A have a value. (More accurately, we’re applying the VLOOKUP to every row where cells A2 through the last cell in column A are greater than or less than “empty”.)
=FILTER( do something, A2:A<>"")
Why might you want to use FILTER instead of, say, ARRAYFORMULA?
Firstly, ARRAYFORMULA will apply your formula to every row in your spreadsheet. A lot of times, what that means is that ARRAYFORMULA will fill every row with “stuff”. Going back to our example where we’ve got 10 rows of data, ARRAYFORMULA can put stuff in all 1,000 rows, effectively changing your table from a table of 10 rows to a table of 1,000. FILTER, on the other hand, fills only the rows we actually want filled, i.e. the 10 rows that have data.
Secondly, because ARRAYFORMULA applies a formula to every row in the spreadsheet, a complex formula (particularly one that uses a function like VLOOKUP) might consume significant resources, hindering performance. I haven’t confirmed this but it’s reasonable to assume that FILTER, which only applies your formula to the rows you want it to, will consume fewer resources so your sheet doesn’t take as much of a performance hit.
VLOOKUP
VLOOKUP is the primary function in our formula. With VLOOKUP, you provide a value that you want to find in another range. VLOOKUP searches for that value in the first column of the range you specify. Then it returns the column (or columns) you request. For example, suppose you’ve got two tables (or “ranges”).
A | |
1 | Book ID |
2 | ABC |
3 | XYZ |
A | B | |
1 | Book ID | Book Name |
2 | DEF | Don Quixote |
3 | ABC | The Great Gatsby |
4 | JKL | Moby Dick |
5 | XYZ | Lolita |
The first table lists the IDs of books that have been sold. The second table lists all of the details about every book in your catalog. You can use VLOOKUP to match each item in your “sold_books” table with its corresponding name from the “book_catalog” table.
A | B | |
1 | Book ID | Book Name |
2 | ABC | =VLOOKUP(A2,book_catalog!A:B,2,FALSE) |
3 | XYZ | =VLOOKUP(A3,book_catalog!A:B,2,FALSE) |
The results of our VLOOKUP function look like this:
A | B | |
1 | Book ID | Book Name |
2 | ABC | The Great Gatsby |
3 | XYZ | Lolita |
So, that’s what VLOOKUP does. Now let’s take a look at each of its parts, learn what their roles are, and how to use them.
Search Key
The first parameter in the VLOOKUP function is the “search key”. You can think of this kind of like a dewey decimal number at a library. It’s an identifier that allows you to locate a specific record (e.g. book). In our example above, the first VLOOKUP is searching for ABC in our catalog of books. And the second VLOOKUP is searching for XYZ.
Range
The second parameter is the “range”. The “range” is analogous to the collection of books in your library. You’re looking for a book with a specific ID (e.g. “ABC”) somewhere within a “range” of books. In our case, the range is columns A through B in our “book_catalog” sheet.
It’s important to note that VLOOKUP is going to look for the search key in the first column of your range.
We include additional columns in our range so that we can get the stuff we want from those other columns. In our example, we want to get the book’s name. In our book_catalog table, the book name is contained in column B. So, we need to include column B in our range.
Index
The third parameter in the VLOOKUP function is the “index”. The index specifies which column number (or numbers) within your range you want to get back. I stress column number because the first column in your range is always column “1”. If your range is A:B, you’ve got two available columns: 1 and 2. If your range is X:Y, you’ve still got just two available columns, and those columns are also 1 and 2.
In our example, we’ve got a range of book_catalog!A:B. So, we’ve got two columns in our range: column 1 and column 2. We want to get the book name of each of our sold books. The book name is in column 2 of our range. So, we set the index to “2”.
=VLOOKUP(A2,book_catalog!A:B,2,FALSE)
is_sorted
The last parameter in our VLOOKUP is “is_sorted”. This parameter tells the VLOOKUP function whether or not the first column of your range is sorted alphanumerically in ascending order (e.g. A, B, C or 1, 2, 3). You’ll set the “is_sorted” parameter to either TRUE or FALSE. TRUE tells the VLOOKUP function that the first column of your range is sorted alphanumerically. Having your range sorted alphanumerically apparently significantly speeds up the VLOOKUP process. However, in practice, it rarely works out that you’ll have a sorted range. So, 99.99% of the time, you’re just going to set this to FALSE.
Concatenation
What if you need to search through your range using a key comprised of multiple columns? In our Facebook Ads / Google Analytics example, none of the columns by themselves are unique. Date, campaign ID, ad ID… The values of each of these are found in multiple rows. There are multiple rows in both tables with the date “2020-02-02”. There are multiple rows with ad ID “23844229530770213”. Our key needs to be unique. But in this case, no single column contains unique values. It’s the combination multiple columns where we find the “uniqueness”. For example, there is only one column in each table where the date is “2020-02-02” and the ad ID is “23844229530770213”.
This is where concatenation comes in. Concatenation combines two or more values into a single value. For example, it can take “2020-02-02” and “23844229530770213”, and combine them into a single value: “2020-02-0223844229530770213”.
My favorite way to concatenate in Google Sheets is with the ampersand (&). Just stick the ampersand between any values or cell references you want to glue together. The formula below combines (i.e. concatenates) cells A2 and B2 into a single value.
=A2&B2
In our ultimate formula, we use concatenation to create a unique key comprised of the values from columns A, C, and G.
A2:A&C2:C&G2:G
This concatenation combines the values in our day, campaign ID, and ad ID columns in our Facebook table into single values of day+campaign ID+ad ID. So, “2020-02-02”, “23844229530770213”, and “23844229530770213” become the unique value “2020-02-022384422953077021323844229530770213”.
Outstanding! We’ve now got a set of unique keys.
VLOOKUP(A2:A&C2:C&G2:G,...
Arrays
The next trick up our sleeve is the “array”. And speaking of tricks… In my opinion, in the context of Google Sheets, arrays are a little bit tricky to understand.
Google describes an array as a “table (consisting of rows and columns) of values.” You can create an array by putting stuff inside curly brackets {}. The brackets group the stuff together.
Columns
If you want to arrange your values or ranges into columns, use a comma to separate the values or ranges. For example, {2, 3} creates two columns, each with one row. The value in the first column is 2. And the value in the second column is 3.
Rows
If you want to arrange your values or ranges into rows, use a semi-colon to separate your values or ranges. For example, {2; 3} creates two rows. The value in the first row is 2. And the value in the second row is 3.
Another way to think of the difference between what the comma does and the semi-colon does is this. The comma aligns things side by side (horizontally). The semi-colon “appends”, i.e. stacks things on top of each other (vertically).
2 | 3 | 4 |
2 |
3 |
4 |
In our ultimate formula, you’ll see two places where we use concatenation inside the VLOOKUP function.
VLOOKUP(A2:A&C2:C&G2:G, {google_analytics!A2:A&google_analytics!B2:B&google_analytics!C2:C,google_analytics!D2:E}, {2,3}, FALSE)
We mentioned that VLOOKUP looks for the “search key” in the first column of the “range”. Then it returns (i.e. “gives you back”) the value(s) from the indexes, i.e. column(s), you ask for.
Let’s dive deeper into these features, explore a couple challenges, and discover a way to overcome those challenges.
Just to refresh… The search key is that first parameter; the thing you’re using to search, e.g. your Dewey decimal number. The range is the stuff you want to search through and which contains the stuff you want to get back. And the index specifies which column(s) within your range you want to get back.
The importance of the first column in the VLOOKUP range
The first challenge with VLOOKUP is that it looks for the search key in the first column of your range only. There’s no option to specify which column to search. It’s always going to look in the first column of your range no matter what.
But what if the column you need to search through isn’t the first column in your range? What if it’s is situated after the column you want to get back?
A | B | |
1 | Book ID | Book Name |
2 | ABC | =VLOOKUP(A2,book_catalog!A:B,… |
3 | XYZ |
A | B | |
1 | Book Name | Book ID |
2 | Don Quixote | DEF |
3 | The Great Gatsby | ABC |
4 | Moby Dick | JKL |
5 | Lolita | XYZ |
One sneaky way to over come this is to create an array. You create an array where you say the first column in your array is the column you want to search through. Then you add the rest of the range as additional columns.
Using our “book” example above, you’d create an array where the first column in your array is column B from your book catalog. Then you’d use a comma (because you’re creating columns in your array). Finally, you’d add the rest of the range that contains the stuff you want to get back.
{book_catalog!B:B, book_catalog!A:A}
1 | 2 | |
1 | Book ID | Book Name |
2 | DEF | Don Quixote |
3 | ABC | The Great Gatsby |
4 | JKL | Moby Dick |
5 | XYZ | Lolita |
Now, the column you need to search through is the first column in this “virtual” range you’ve created. Now you can look up the book name even the “book name” column originally appeared before the column you needed to search through.
How to create the first column in the ultimate formula
Now let’s apply this to our ultimate formula. If you recall, we needed to use concatenation to create our unique search key. Now we need to find that unique search key in our range. The trouble is… There is no single column in our range that will match our concatenated search key. We have to create a column.
To do this, we’re going to combine concatenation with our newly acquired array skills.
First, let’s create a column whose values will match our concatenated search keys. Recall that our search key combined (i.e. concatenated) the “Day”, “Campaign ID”, and “Ad ID” columns from our Facebook table.
We need to create a “virtual” column from our Analytics table whose values will match “Day”, “Campaign ID”, and “Ad ID”.
So, we need to concatenate columns A, B, and C from our Analytics table (ga:date, ga:campaignCode, and ga:adContent).
google_analytics!A:A&google_analytics!B:B&google_analytics!C:C
We’re using our ampersand (&) technique to combine the values of columns A, B, and C into a single column.
Next, we need to create our “virtual table” where our newly concatenated “virtual” Google Analytics column is the first column in that “virtual table”. To do this, we create an array.
{google_analytics!A:A&google_analytics!B:B&google_analytics!C:C, google_analytics!D:E}
We use the curly brackets to tell Google Sheets that we’re creating an array. We set the first “column” in our array to our concatenated “virtual” column. Then we use a comma (because we want to add columns to our array). Then we tack on the part of our Google Analytics table that has the stuff we want to get back, namely columns D and E.
Our new “virtual table” looks something like this behind the scenes.
IFERROR
The final function in our ultimate Google Sheets formula is the IFERROR function. This function simply tells Google Sheets what to do whenever an error is returned in one of the rows to which your formula is applied.
For example, suppose you’re dividing column A by column B. Suppose that in one of the rows, column A is 25 and column B is zero (0). Since you can’t divide a number by zero, Google Sheets returns a “division by zero” error. You can use the IFERROR function to return some other value (such as an empty cell) whenever there’s an error such as “division by zero”.
=IFERROR(25/0)
The formula above will return an empty cell.
Conclusion
There you have it. The ultimate Google Sheets formula for joining two sets of data.
=IFERROR(FILTER(VLOOKUP(A2:A&C2:C&G2:G, {google_analytics!A2:A&google_analytics!B2:B&google_analytics!C2:C,google_analytics!D2:E}, {2,3}, FALSE), A2:A<>""))
I hope you find this formula useful. Please leave a comment below to tell me what you think, offer feedback, or ask questions. Have a fantastic day!
Interesting, but I prefer using MATCH and INDIRECT; for example if the [data] is in [table] column A-G, with the join in column B, then =INDIRECT(“[table]!A” & MATCH([data], [Table]!B:B, 0) & “:G” & MATCH([data], [Table]!B:B, 0))
MATCH([data], [Table]!B:B, 0) will output a number corresponding to the row, giving you something like “[table]!A6:G6”, which in an indirect function will output the appropriate array.
What if, instead of joining rows with the same “key”, we want to exclude them from the actual final table? Like let’s say Table 1 contains all the list of items and Table 2 contains all items that are out of stock and you want to come up with Table 3 which will ultimately contain all items that are still available. Basically, Table 3 = Table 1 – Table 2? Hope for a reply.
Good question. This is a common need. To achieve the result you’re looking for, add a column to table1 (all items). In this new column, use ISNUMBER(MATCH(A2:A,table2!A:A,0)) to identify whether or not each item in table1 is found in table2 (out of stock items). Once you’ve got that, you can create a new table, table3, and use QUERY to retrieve those rows from table1 where A = FALSE (i.e. items from table1 that are not found in table2). Here’s an example.
In your spreadsheet, you could also just use and avoid adding a new column!
=FILTER(all_items!A2:A, COUNTIF(out_of_stock!A2:A, all_items!A2:A) = 0)
This worked perfectly for me. Thanks Dave!
Nice article … good effort to provide lots of detail. Now if you can do a RIGHT (OUTER) JOIN, I’d really be impressed!
..for me too
Alright, guys. Here’s what I’ve come up with. Google Sheets Formula to Create a Right Outer Join
Excellent! thank you very much Dave, I’m going ot review your new post, it will be really useful.
Thank you, Dave! This was a fantastic tutorial. I learned about everything I read! Couldn’t find what terms to use to search Google documentation to find what I was looking for, and this tutorial pulled it all together.
One thing that still seems like magic to me is how enclosing VLOOKUP — which really wants to just return a single value, even though it deceptively accepts a range (i.e. multiple rows) in its “search key” parameter, making me thing it’ll repeat for all values in that range — how just wrapping it in FILTER results in VLOOKUP _somehow_ now being applied for each result that FILTER returns. There’s a connection there I’m not seeing that’s probably key to understanding a bunch of other stuff with Sheets functions.
When I put VLOOKUP inside an array, inside ARRAYFORMULA, then it applies the VLOOKUP for all rows, and that makes sense to me, given the purpose of ARRAYFORMULA. However, it has that problem you discuss about empty rows and processing for all rows to the extent of the sheet, vs using FILTER that scopes it to only filled rows. Still, this makes more sense to me than FILTER in terms of the expected outcome.
Under the hood, I suspect that FILTER is effectively ARRAYFORMULA with an added “match” component.
ARRAYFORMULA loops through each row or column in a range. When using a non-array function such as VLOOKUP, ARRAYFORMULA iterates through each of the cells in the search_key range, looks for a matching record in the first column of the (search) range, and returns the value from the (return) index.
FILTER likely loops through each row or column in a range and applies nested functions just as ARRAYFORMULA does. However, it then reduces the returned range to only those rows or columns that match a specified set of criteria.
=FILTER(A:A,B:B=TRUE)
In this example, FILTER loops through the range A:A and checks to see if B:B=TRUE. It then returns the resulting range from A:A where B:B=TRUE.
=FILTER(VLOOKUP(A:A,data!A:B,2,TRUE),B:B=TRUE)
This formula loops through A:A and performs a VLOOKUP on each cell in A:A. The VLOOKUP returns the index (2) from the row where the value from first column from data!A:B matches that of the corresponding row in A:A. FILTER then reduces the range of A:A to only those rows where B:B=TRUE.
Makes sense, thank you, Dave!
Question on the speed/resourcing of your use of FILTER vs ARRAYFORMULA.
If I do one of the following:
=ARRAYFORMULA(IF(LEN(A:A),DO_THE_THING,))
OR
= ARRAYFORMULA(IF(ISBLANK(A:A),,DO_THE_THING))
In both instances, the use of the IF should reduce the cost of the formula because it’s checking the content of the cell in question first compared to FILTER running against all cells in the range and returning only the ones that then meet the criteria?