Mastering Correlated Subqueries in SQL Made Easy


Correlated Subqueries in SQL might sound a bit complex at first, but trust me, they’re a great tool once you get the hang of them! In this blog, we’ll dive into how these handy queries work and why they can make your database tasks more efficient. Whether you’re just starting out or looking to level up your SQL game, understanding correlated subqueries can bridge gaps in data analysis. Stick around, and let’s explore them together!

What Are Correlated Subqueries in SQL?

A correlated subquery is a type of subquery that depends on the outer query for its values. Unlike a regular subquery (also called a non-correlated subquery), which can be executed independently, a correlated subquery is executed once for every row processed by the outer query.

In simpler terms, the subquery is re-evaluated for each row in the outer query because it refers to a column from the outer query. This creates a connection between the inner and outer queries — hence the term “correlated.”

🔄 Relationship Between Inner and Outer Query:

  • The outer query provides a row of data.
  • The inner (subquery) uses that data to perform its own operation and return a result.
  • That result is used by the outer query to decide whether to include the row.

🧠 Simple Analogy:

Imagine you’re a teacher checking which students scored more than the average score in their respective class.
For each student (outer query), you must calculate the average score of their class (inner query). You can’t do that calculation until you know which class the student belongs to — so the subquery must depend on the outer query.

Syntax of a Correlated Subquery

Here’s the basic syntax:

SELECT column1, column2
FROM outer_table AS ot
WHERE column3 > (
    SELECT AVG(column3)
    FROM inner_table AS it
    WHERE it.related_column = ot.related_column
);

🔍 Explanation of Each Part:

  • outer_table AS ot: The main table being queried.
  • SELECT AVG(column3): The inner query calculates a value (like average) for comparison.
  • WHERE it.related_column = ot.related_column: This is the key — the inner query uses a column (ot.related_column) from the outer query.

This makes the subquery correlated — it must run for each row of the outer query, since it depends on a value from that row.

Simple Example:

Use Case: Find all employees whose salary is higher than the average salary in their department.

SELECT e.employee_name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

What’s Happening Here:

  • For each employee in the outer query, the subquery:
    • Calculates the average salary for that employee’s department.
    • Then compares the employee’s salary to that average.
  • If their salary is greater than the average, they’re included in the result.

This is a classic example of a correlated subquery in action.

Our AI-powered sql online compiler revolutionises coding by letting users instantly write, run, and test their SQL code. It’s like having a programming partner that provides real-time feedback and suggestions, ensuring you’re learning and improving with every query. Ready to transform your coding experience? Give it a whirl!

Real-Life Examples of Correlated Subqueries

Example 1: Finding Employees Earning More Than the Average Salary in Their Department

SELECT e.employee_name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

🧠 Step-by-Step Explanation:

  1. The outer query selects employees’ names, salaries, and department IDs.
  2. For each employee, the subquery calculates the average salary of their department.
  3. The query then compares the employee’s salary with that average.
  4. Only employees earning more than their department’s average are returned.

💡 Use case: Great for performance reviews or identifying top earners within teams.

Example 2: Identifying Customers With Orders Above Their Average Order Value

SELECT o.customer_id, o.order_id, o.order_amount
FROM orders o
WHERE o.order_amount > (
    SELECT AVG(o2.order_amount)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);

🧠 Step-by-Step Explanation:

  1. The outer query lists orders with their amounts and customer IDs.
  2. For each order, the inner query calculates the average order value for that customer.
  3. If the order is greater than the average for that customer, it’s selected.

💡 Use case: Helpful in marketing to identify high-value purchases or detect buying trends.

Use Cases and When to Use Correlated Subqueries

Ideal Scenarios:

  • Row-by-row analysis across grouped data.
  • Filtering based on related but not directly joinable metrics.
  • Calculations that vary for each row, like comparing against group averages.

Comparison With JOINs:

FeatureCorrelated SubqueriesJOINs
DependencySubquery depends on outer queryTables are joined directly
ReadabilityMore readable for row-by-row logicBetter for combining flat datasets
PerformanceSlower for large datasetsGenerally faster

Performance Considerations:

  • Correlated subqueries re-run the inner query for every row, which can slow down large queries.
  • Use EXISTS or JOINs when performance matters.
  • Indexes on filtering columns can help reduce cost.

Common Mistakes and How to Avoid Them

1: Using Correlated Subqueries Instead of JOINs

  • JOINs are more efficient in many cases.
  • Use correlated subqueries only when the logic truly requires row-level recalculations.

2: Misunderstanding Variable Scope

  • Remember that the inner query can access columns from the outer query — but not the other way around.
  • Mistakes in referencing outer query columns can result in errors or incorrect results.

3: Poor Performance on Large Datasets

  • Avoid correlated subqueries in queries involving millions of rows.
  • Try rewriting using CTEs (Common Table Expressions) or window functions.

Practice Exercises

Exercise 1: Employees With Above-Average Department Salary

Problem:
List employee names who earn more than the average salary of their department.

SELECT e.employee_name
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

Exercise 2: Orders Higher Than the Customer’s Average Order Value

Problem:
Find order IDs where the order value is above that customer’s average order value.

SELECT o.order_id
FROM orders o
WHERE o.order_amount > (
    SELECT AVG(o2.order_amount)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);

Exercise 3: Products Priced Above Their Category Average

Problem:
Show products that are priced higher than the average price in their category.

SELECT p.product_name, p.category_id, p.price
FROM products p
WHERE p.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category_id = p.category_id
);

📝 Tip: Test each query on a small dataset to understand how the subquery runs for each row.

Conclusion

Correlated Subqueries in SQL boost your data querying skills by allowing dynamic filtering based on related data. They empower you to create complex queries with ease. Dive into them, and you’ll feel a sense of accomplishment. For more programming languages, visit Newtum and explore more.

Edited and Compiled by

This article was compiled and edited by @rasikadeshpande, who has over 4 years of experience in writing. She’s passionate about helping beginners understand technical topics in a more interactive way.

About The Author