Welcome to part 2 of my Learn SQL series! Learning advanced SQL has been a fun ride so far, and to be honest, I’ve been seeing the benefits of understanding SQL so much more and more as I look into advanced data engineering and analytics.
Today’s topic is about CTEs, or also commonly known as Common Table Expressions.
The reason I want to go into CTEs is quite simple - it’s easy to read and understand. I’m sure, many of you would also somewhat agree!
What is a CTE?
A Common Table Expression is a type of temporary table which exists throughout the length of a long query. You can kind of think of it like a temporary result that you can use within main table queries via SELECT, UPDATE, or DELETE statements.
Why CTEs?
As I mentioned before, the main reason why I like CTEs is that they’re so much easier to parse out by eye compared to other temporary expressions such as derived tables.
Consider the following example.
Given the following table, write a SQL query which finds the largest number that only appears once.
num |
---|
8 |
8 |
3 |
3 |
1 |
4 |
5 |
6 |
For the sample table above, the query should return:
num |
---|
6 |
There are two ways to compute this solution. We’ll go over a derived table solution, and a solution which utilizes CTEs.
Solution
The idea behind the solution is simple. We want to do the following:
- Find the set of numbers where total count is not greater than 1
- Find the max number from the set specified in 1
Using a Derived Table
SELECT
MAX(num) AS num
FROM (
SELECT
num
FROM my_numbers
GROUP BY num
HAVING COUNT(*) = 1
) valid_nums
Using a CTE
WITH ValidNums AS (
SELECT
num
FROM my_numbers
GROUP BY num
HAVING COUNT(*) = 1
)
SELECT
MAX(num) AS num
FROM ValidNums
Summary
Did you notice anything different between the two queries?
To be fair, both queries look relatively the same since this problem doesn’t require complicated table joins or filtering. I, however, think it’s important to see that the logical grouping (at eye level) in the CTE solution parses better than the derived query solution.
It is possible to give longer names to derived queries as well as CTEs. For example, the derived query solution calls our temporary
table: valid_nums
.
This method of naming can cause glance-over parsing difficulty when it involves multiple complex joins. The join names and definitions are usually cluttered on the same lines, which makes it harder to understand.
Kind of like programming best practices, right?
Ready to try it out? I promise you won’t regret it!