Joining Data in SQL Answer Key – Datacamp

Jack16306
57 Min Read

Joining data is an essential skill which enables us to draw information from separate tables together into a single, meaningful set of results. Learn to supercharge your queries using table joins and relational set theory in Joining Data in SQL course on Datacamp.

In this course, you’ll learn how to:

  • Work with more than one table in SQL
  • Use inner joins, outer joins and cross joins
  • Leverage set theory, including unions, intersect, and except clauses
  • Create nested queries

Every step is accompanied by exercises and opportunities to apply the theory and grow your confidence in SQL.

Course Url: https://app.datacamp.com/learn/courses/joining-data-in-sql

1. Introducing Inner Joins

In this chapter, you’ll be introduced to the concept of joining tables and will explore all the ways you can enrich your queries using joins—beginning with inner joins.

1.1. The ins and outs of INNER JOIN

In this closing chapter, you’ll begin by investigating semi-joins and anti-joins. Next, you’ll learn how to use nested queries. Last but not least, you’ll wrap up the course with some challenges!

Your first join

Throughout this course, you’ll be working with the countries database, which contains information about the most populous world cities and countries, and provides country-level economic data, population data, and geographic data. The database also contains information on languages spoken in each country.

You can see the different tables in this database to get a sense of what they contain by clicking on the corresponding tabs. Click through them and familiarize yourself with the fields that seem to be shared across tables before you continue with the course.

In this exercise, you’ll use the cities and countries tables to build your first inner join. You’ll start off by selecting all columns in step 1, performing your join in step 2, and then refining your join to choose specific columns in step 3.

  • Begin by selecting all columns from the cities table, using the SQL shortcut that selects all.
-- Select all columns from cities
SELECT *
FROM cities;
  • Perform an inner join with the cities table on the left and the countries table on the right; do not alias tables here or in the next step.
  • To perform your join, identifying the relevant column names in both tables that provide the country code by inspecting the tabs in the console.
SELECT * 
FROM cities
-- Inner join to countries
INNER JOIN countries
-- Match on country codes
ON cities.country_code=countries.code;
  • Complete the SELECT statement to keep only the name of the city, the name of the country, and the region the country is located in (in the order specified).
  • Alias the name of the city AS city and the name of the country AS country.
-- Select name fields (with alias) and region 
SELECT cities.name AS city ,countries.name AS country,region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;

Great work! Writing out full table names when joining tables creates a lot of extra code. In the next exercise, you’ll explore how you can do more aliasing to limit the amount of writing.

Joining with aliased tables

Recall from the video that instead of writing full table names in queries, you can use table aliasing as a shortcut. The alias can be used in other parts of your query, such as the SELECT statement!

You also learned that when you SELECT fields, a field can be ambiguous. For example, imagine two tables, apples and oranges, both containing a column called color. You need to use the syntax apples.color or oranges.color in your SELECT statement to point SQL to the correct table. Without this, you would get the following error:

column reference “color” is ambiguous

In this exercise, you’ll practice joining with aliased tables. You’ll use data from both the countries and economies tables to examine the inflation rate in 2010 and 2015.

When writing joins, many SQL users prefer to write the SELECT statement after writing the join code, in case the SELECT statement requires using table aliases.

  • Start with your inner join in line 5; join the tables countries AS c (left) with economies (right), aliasing economies AS e.
  • Next, use code as your joining field in line 7; do not use the USING command here.
  • Lastly, select the following columns in order in line 2: code from the countries table (aliased as country_code), name, year, and inflation_rate.
-- Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
-- Join to economies (alias e)
INNER JOIN economies AS e
-- Match on code field using table aliases
ON c.code = e.code;

Nicely done! Notice that only the code field is ambiguous, so it requires a table name or alias before it. All the other fields (name, year, and inflation_rate) do not occur in more than one table name, so do not require table names or aliasing in the SELECT statement. Using table aliases takes some getting used to, but it will save you a lot of typing, especially when your query involves joining tables!

USING in action

In the previous exercises, you performed your joins using the ON keyword. Recall that when both the field names being joined on are the same, you can take advantage of the USING clause.

You’ll now explore the languages table from our database. Which languages are official languages, and which ones are unofficial?

You’ll employ USING to simplify your query as you explore this question.

  • Use the country code field to complete the INNER JOIN with USING; do not change any alias names.
SELECT c.name AS country, l.name AS language, official
FROM countries AS c
INNER JOIN languages AS l
-- Match using the code column
USING (code);

Great work! It looks like Afghanistan has multiple official and unofficial languages! A parting word of caution when using USING: columns can sometimes have the same name but actually contain vastly different data. Always remember to check what you are joining on by displaying and viewing your data first!

1.2. Defining relationships

Relationships in our database

Now that you know more about the different types of relationships that can exist between tables, it’s time to examine a few relationships in the countries database!

To answer questions about table relationships, you can explore the tables displayed as tabs in your console.

What best describes the relationship between code in the countries table and country_code in the cities table?

Possible Answers

  • This is a many-to-many relationship.
  • This is a one-to-one relationship.
  • This is a one-to-many relationship.

Which of these options best describes the relationship between the countries table and the languages table?

Possible Answers

  • This is a one-to-many relationship.
  • This is a many-to-many relationship.
  • This is a one-to-one relationship.

That’s right! Recall the example from the video: Belgium has three official languages, French, German, and Dutch. Conversely, languages can be official in many countries: Dutch is an official language of both the Netherlands and Belgium, but not Germany. Because of the many types of relationships tables can have, there are many ways to join data.

Inspecting a relationship

You’ve just identified that the countries table has a many-to-many relationship with the languages table. That is, many languages can be spoken in a country, and a language can be spoken in many countries.

This exercise looks at each of these in turn. First, what is the best way to query all the different languages spoken in a country? And second, how is this different from the best way to query all the countries that speak each language?

Recall that when writing joins, many users prefer to write SQL code out of order by writing the join first (along with any table aliases), and writing the SELECT statement at the end.

  • Start with the join statement in line 6; perform an inner join with the countries table as c on the left with the languages table as l on the right.
  • Make use of the USING keyword to join on code in line 8.
  • Lastly, in line 2, select the country name, aliased as country, and the language name, aliased as language.
-- Select country and language names, aliased
SELECT c.name AS country, l.name AS language
-- From countries (aliased)
FROM countries AS c
-- Join to languages (aliased)
INNER JOIN languages AS l
-- Use code as the joining field with the USING keyword
USING (code);
  • Rearrange the SELECT statement so that the language column appears on the left and the country column on the right.
  • Sort the results by language.
-- Rearrange SELECT statement, keeping aliases
SELECT l.name AS language, c.name AS country
FROM countries AS c
INNER JOIN languages AS l
USING(code)
-- Order the results by language
ORDER BY language

Question

Select the incorrect answer from the following options.

The query you generated in step 1 is provided below. Run this query (or the amendment you made in step 2) in the console to find the answer to the question.

SELECT c.name AS country, l.name AS language

FROM countries AS c

INNER JOIN languages AS l

USING(code)

ORDER BY country;

Possible Answers

  • There are at least three languages spoken in Armenia.
  • Alsatian is spoken in more than one country.
  • Bhojpuri is spoken in two countries.

Correct! Alsatian is only spoken in France. Well done getting through this exercise. When we read SQL results, we expect the most important column to be on the far left, and it’s helpful if results are ordered by relevance to the question at hand. By default, results are ordered by the column from the left table, but you can change this using ORDER BY.

1.3. Multiple joins

Joining multiple tables

You’ve seen that the ability to combine multiple joins using a single query is a powerful feature of SQL.

Suppose you are interested in the relationship between fertility and unemployment rates. Your task in this exercise is to join tables to return the country name, year, fertility rate, and unemployment rate in a single result from the countries, populations and economies tables.

  • Perform an inner join of countries AS c (left) with populations AS p (right), on code.
  • Select name, year and fertility_rate.
-- Select relevant fields
SELECT name,year,fertility_rate
-- Inner join countries and populations, aliased, on code
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code;
  • Chain another inner join to your query with the economies table AS e, using code.
  • Select name, and using table aliases, select year and unemployment_rate from economies.
-- Select fields
SELECT name, e.year, fertility_rate, unemployment_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code
ON c.code = e.code;

Good work! You may have noticed an issue with these results, though. Have a look at the results for Albania. The 2010 value for fertility_rate is also paired with the 2015 value for unemployment_rate, whereas we only want data from one year per record. You’ll fix this in the next exercise!

Checking multi-table joins

Have a look at the results for Albania from the previous query below. You can see that the 2015 fertility_rate has been paired with 2010 unemployment_rate, and vice versa.

name year fertility_rate unemployment_rate
Albania 2015 1.663 17.1
Albania 2010 1.663 14
Albania 2015 1.793 17.1
Albania 2010 1.793 14

Instead of four records, the query should return two: one for each year. The last join was performed on c.code = e.code, without also joining on year. Your task in this exercise is to fix your query by explicitly stating that both the country code and year should match!

  • Modify your query so that you are joining to economies on year as well as code.
SELECT name, e.year, fertility_rate, unemployment_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
INNER JOIN economies AS e
ON c.code = e.code
-- Add an additional joining condition such that you are also joining on year
    AND p.year = e.year;

Good work! You can check that this fixed the issue by looking at the results for Albania. There are only two lines of Albania results now: one for 2010 and one for 2015.

2. Outer Joins, Cross Joins and Self Joins

After familiarizing yourself with inner joins, you will come to grips with different kinds of outer joins. Next, you will learn about cross joins. Finally, you will learn about situations in which you might join a table with itself.

2.1. LEFT and RIGHT JOINs

Remembering what is LEFT

To become faster at writing queries, it’s helpful to memorize their structure. In this exercise, you will reconstruct the order of the steps of LEFT JOIN from memory!

  • Drag the code blocks provided to re-order them into a syntactically correct SQL query!

Well done! It will be useful to know the core building blocks of a join like the back of your hand: SELECT, FROM, your JOIN keyword and the ON or USING condition!

This is a LEFT JOIN, right?

Nice work getting to grips with the structure of joins! In this exercise, you’ll explore the differences between INNER JOIN and LEFT JOIN. This will help you decide which type of join to use.

As before, you will be using the cities and countries tables.

You’ll begin with an INNER JOIN with the cities table (left) and countries table (right). This helps if you are interested only in records where a country is present in both tables.

You’ll then change to a LEFT JOIN. This helps if you’re interested in returning all countries in the cities table, whether or not they have a match in the countries table.

  • Perform an inner join with cities AS c1 on the left and countries as c2 on the right.
  • Use code as the field to merge your tables on.
SELECT 
    c1.name AS city,
    code,
    c2.name AS country,
    region,
    city_proper_pop
FROM cities AS c1
-- Perform an inner join with cities as c1 and countries as c2 on country code
INNER JOIN countries as c2
ON c1.country_code = c2.code
ORDER BY code DESC;
  • Change the code to perform a LEFT JOIN instead of an INNER JOIN.
  • After executing this query, have a look at how many records the query result contains.
SELECT 
    c1.name AS city, 
    code, 
    c2.name AS country,
    region, 
    city_proper_pop
FROM cities AS c1
-- Join right table (with alias)
LEFT JOIN countries as c2
ON c1.country_code = c2.code
ORDER BY code DESC;

Perfect! Notice that the INNER JOIN resulted in 230 records, whereas the LEFT JOIN returned 236 records. Remember that the LEFT JOIN is a type of outer join: its result is not limited to only those records that have matches for both tables on the joining field.

Building on your LEFT JOIN

You’ll now revisit the use of the AVG() function introduced in a previous course.

Being able to build more than one SQL function into your query will enable you to write compact, supercharged queries.

You will use AVG() in combination with a LEFT JOIN to determine the average gross domestic product (GDP) per capita by region in 2010.

  • Complete the LEFT JOIN with the countries table on the left and the economies table on the right on the code field.
  • Filter the records from the year 2010.
SELECT name, region, gdp_percapita
FROM countries AS c
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Filter for the year 2010
WHERE year = 2010;
  • To calculate per capita GDP per region, begin by grouping by region.
  • After your GROUP BY, choose region in your SELECT statement, followed by average GDP per capita using the AVG() function, with AS avg_gdp as your alias
-- Select region, and average gdp_percapita as avg_gdp
SELECT region, AVG(gdp_percapita) AS avg_gdp
FROM countries AS c
LEFT JOIN economies AS e
USING(code)
WHERE year = 2010
-- Group by region
GROUP BY region;
  • Order the result set by the average GDP per capita from highest to lowest.
  • Return only the first 10 records in your result.
SELECT region, AVG(gdp_percapita) AS avg_gdp
FROM countries AS c
LEFT JOIN economies AS e
USING(code)
WHERE year = 2010
GROUP BY region
-- Order by descending avg_gdp
ORDER BY avg_gdp DESC
-- Return only first 10 records
LIMIT 10;

Nice work! You successfully executed a LEFT JOIN and applied a GROUP BY to the result of your JOIN. Building up your SQL vocabulary in this way will enable you to answer questions of ever-increasing complexity!

Is this RIGHT?

You learned that right joins are not used as commonly as left joins. A key reason for this is that right joins can always be re-written as left joins, and because joins are typically typed from left to right, joining from the left feels more intuitive when constructing queries.

It can be tricky to wrap one’s head around when left and right joins return equivalent results. You’ll explore this in this exercise!

  • Write a new query using RIGHT JOIN that produces an identical result to the LEFT JOIN provided.
-- Modify this query to use RIGHT JOIN instead of LEFT JOIN
SELECT countries.name AS country, languages.name AS language, percent
FROM languages
RIGHT JOIN countries
USING(code)
ORDER BY language;

Correct: when converting a LEFT JOIN to a RIGHT JOIN, change both the type of join and the order of the tables to get equivalent results. You would get different results if you only changed the table order. The order of fields you are joining ON still does not matter.

  • The order of the tables you join on will matter as you convert the query from LEFT JOIN to RIGHT JOIN!

2.2. FULL JOINs

Comparing joins

In this exercise, you’ll examine how results can differ when performing a full join compared to a left join and inner join by joining the countries and currencies tables. You’ll be focusing on the North American region and records where the name of the country is missing.

You’ll begin with a full join with countries on the left and currencies on the right. Recall the workings of a full join with the diagram below!

You’ll then complete a similar left join and conclude with an inner join, observing the results you see along the way.

  • Perform a full join with countries (left) and currencies (right).
  • Filter for the North America region or NULL country names.
SELECT name AS country, code, region, basic_unit
FROM countries
-- Join to currencies
FULL JOIN currencies
USING (code)
-- Where region is North America or name is null
WHERE region = 'North America' OR region IS NULL
ORDER BY region;
  • Repeat the same query as before, turning your full join into a left join with the currencies table.
  • Have a look at what has changed in the output by comparing it to the full join result
SELECT name AS country, code, region, basic_unit
FROM countries
-- Join to currencies
LEFT JOIN currencies
USING (code)
WHERE region = 'North America' 
    OR name IS NULL
ORDER BY region;
  • Repeat the same query again, this time performing an inner join of countries with currencies.
  • Have a look at what has changed in the output by comparing it to the full join and left join results!
SELECT name AS country, code, region, basic_unit
FROM countries
-- Join to currencies
INNER JOIN currencies
USING (code)
WHERE region = 'North America' 
    OR name IS NULL
ORDER BY region;

Have you kept an eye out for the different numbers of records these queries returned? The FULL JOIN query returned 18 records, the LEFT JOIN returned four records, and the INNER JOIN only returned three records. Does this add more color to the diagrams you have seen for these three types of join?

Chaining FULL JOINs

As you have seen in the previous chapter on INNER JOIN, it is possible to chain joins in SQL, such as when looking to connect data from more than two tables.

Suppose you are doing some research on Melanesia and Micronesia, and are interested in pulling information about languages and currencies into the data we see for these regions in the countries table. Since languages and currencies exist in separate tables, this will require two consecutive full joins involving the countries, languages and currencies tables.

  • Complete the FULL JOIN with countries as c1 on the left and languages as l on the right, using code to perform this join.
  • Next, chain this join with another FULL JOIN, placing currencies on the right, joining on code again.
SELECT 
    c1.name AS country, 
    region, 
    l.name AS language,
    basic_unit, 
    frac_unit
FROM countries as c1 
-- Full join with languages (alias as l)
FULL JOIN languages as l
USING (code)
-- Full join with currencies (alias as c2)
FULL JOIN currencies as c2
USING (code)
WHERE region LIKE 'M%esia';

Well done! The first FULL JOIN in the query pulled countries and languages, and the second FULL JOIN added in currency data for each record in the result of the first FULL JOIN.

2.3. Crossing into CROSS JOIN

Histories and languages

Well done getting to know all about CROSS JOIN! As you have learned, CROSS JOIN can be incredibly helpful when asking questions that involve looking at all possible combinations or pairings between two sets of data.

Imagine you are a researcher interested in the languages spoken in two countries: Pakistan and India. You are interested in asking:

  1. What are the languages presently spoken in the two countries?
  2. Given the shared history between the two countries, what languages could potentially have been spoken in either country over the course of their history?

In this exercise, we will explore how INNER JOIN and CROSS JOIN can help us answer these two questions, respectively.

  • Complete the code to perform an INNER JOIN of countries AS c with languages AS l using the code field to obtain the languages currently spoken in the two countries.
SELECT c.name AS country, l.name AS language
-- Inner join countries as c with languages as l on code
FROM countries AS c
INNER JOIN languages AS l
USING (code)
WHERE c.code IN ('PAK','IND')
    AND l.code in ('PAK','IND');
  • Change your INNER JOIN to a different kind of join to look at possible combinations of languages that could have been spoken in the two countries given their history.
  • Observe the differences in output for both joins.
SELECT c.name AS country, l.name AS language
FROM countries AS c        
-- Perform a cross join to languages (alias as l)
CROSS JOIN languages AS l
WHERE c.code in ('PAK','IND')
    AND l.code in ('PAK','IND');

Nice one! Notice that the INNER JOIN returned 25 records, whereas the CROSS JOIN returned 50 records, as it took all combinations of languages returned by the INNER JOIN for both countries. Notice that this returns duplicate records in cases where both countries speak the same language. We will learn how to deal with duplicates in subsequent lessons.

Choosing your join

Now that you’re fully equipped to use joins, try a challenge problem to test your knowledge!

You will determine the names of the five countries and their respective regions with the lowest life expectancy for the year 2010. Use your knowledge about joins, filtering, sorting and limiting to create this list!

  • Complete the join of countries AS c with populations as p.
  • Filter on the year 2010.
  • Sort your results by life expectancy in ascending order.
  • Limit the result to five countries.
SELECT 
    c.name AS country,
    region,
    life_expectancy AS life_exp
FROM countries AS c
-- Join to populations (alias as p) using an appropriate join
LEFT JOIN populations as p
ON c.code = p.country_code
-- Filter for only results in the year 2010
WHERE year = 2010
-- Sort by life_exp
ORDER BY life_exp
-- Limit to five records
LIMIT 5;

Nice work! Did you notice that more than one type of join can be used to return the five records in our result? All four types of joins we have learned will return the same result.

2.4. Self joins

Comparing a country to itself

Self joins are very useful for comparing data from one part of a table with another part of the same table. Suppose you are interested in finding out how much the populations for each country changed from 2010 to 2015. You can visualize this change by performing a self join.

In this exercise, you’ll work to answer this question by joining the populations table with itself. Recall that, with self joins, tables must be aliased. Use this as an opportunity to practice your aliasing!

Since you’ll be joining the populations table to itself, you can alias populations first as p1 and again as p2. This is good practice whenever you are aliasing tables with the same first letter.

  • Perform an inner join of populations with itself ON country_code, aliased p1 and p2 respectively.
  • Select the country_code from p1 and the size field from both p1 and p2, aliasing p1.size as size2010 and p2.size as size2015 (in that order).
-- Select aliased fields from populations as p1
SELECT p1.country_code,p1.size AS size2010, p2.size AS size2015
-- Join populations as p1 to itself, alias as p2, on country code
FROM populations AS p1
INNER JOIN populations AS p2
ON p1.country_code = p2.country_code;

Since you want to compare records from 2010 and 2015, eliminate unwanted records by extending the WHERE statement to include only records where the p1.year matches p2.year – 5.

SELECT 
    p1.country_code, 
    p1.size AS size2010, 
    p2.size AS size2015
FROM populations AS p1
INNER JOIN populations AS p2
ON p1.country_code = p2.country_code
WHERE p1.year = 2010
-- Filter such that p1.year is always five years before p2.year
    AND p1.year = p2.year - 5;

Nice one! See how it’s possible to eliminate unwanted records using a calculated field, such as the one you’ve subtracted five from? That’s a great trick to know.

All joins on deck

Excellent work! You’ve made it to the end of the chapter. In this exercise, you will test your knowledge on all the joins you’ve learned so far.

For each of the problems presented, think carefully about what types of tables are involved and how each of the joins you have learned relates to NULL values.

  • Drag and drop the situations presented into the appropriate type of join.
All joins

Congratulations! You are well on your way to becoming a joining pro.

3. Set Theory for SQL Joins

In this chapter, you will learn about using set theory operations in SQL, with an introduction to UNION, UNION ALL, INTERSECT, and EXCEPT clauses. You’ll explore the predominant ways in which set theory operations differ from join operations.

3.1. Set theory for SQL Joins

UNION vs. UNION ALL

Nice work learning all about UNION and UNION ALL!

Two tables, languages and currencies, are provided. Run the queries provided in the console and select the correct answer for the multiple-choice questions in this exercise.

Question

What result will the following SQL query produce?

SELECT *

FROM languages

UNION

SELECT *

FROM currencies;

Possible Answers

  • All records from both tables, dropping duplicate records (if any)
  • A SQL error, because languages and currencies do not have the same number of fields
  • A SQL error, because languages and currencies do not have the same number of records

Possible Answers

  • An ordered list of each country code in languages and currencies, including duplicates
  • An ordered list of each unique country code in languages and currencies
  • An unordered list of each country code in languages and currencies, including duplicates
  • An unordered list of each unique country code in languages and currencies

Possible Answers

  • An empty result
  • A stacked list of every curr_id from currencies and every code from languages
  • A SQL error, because code and curr_id are not of the same data type
  • A SQL error, because code and curr_id do not have the same name

Correct! Both queries on the left and right of the set operation must have the same data types. The names of the fields do not need to be the same, as the result will always contain field names from the left query.

Comparing global economies

Are you ready to perform your first set operation?

In this exercise, you have two tables, economies2015 and economies2019, available to you under the tabs in the console. You’ll perform a set operation to stack all records in these two tables on top of each other, excluding duplicates.

When drafting queries containing set operations, it is often helpful to write the queries on either side of the operation first, and then call the set operator. The instructions are ordered accordingly.

  • Begin your query by selecting all fields from economies2015.
  • Create a second query that selects all fields from economies2019.
  • Perform a set operation to combine the two queries you just created, ensuring you do not return duplicates.
-- Select all fields from economies2015
SELECT *
FROM economies2015
-- Set operation
UNION
-- Select all fields from economies2019
SELECT *
FROM economies2019
ORDER BY code, year;

Your first UNION! UNION can be helpful for consolidating data from multiple tables into one result, which as you have seen, can then be ordered in meaningful ways.

Comparing two set operations

You learned in the video exercise that UNION ALL returns duplicates, whereas UNION does not. In this exercise, you will dive deeper into this, looking at cases for when UNION is appropriate compared to UNION ALL.

You will be looking at combinations of country code and year from the economies and populations tables.

  • Perform an appropriate set operation that determines all pairs of country code and year (in that order) from economies and populations, excluding duplicates.
  • Order by country code and year.
-- Query that determines all pairs of code and year from economies and populations, without duplicates
SELECT code, year
FROM economies
UNION 
SELECT country_code, year
FROM populations
ORDER BY code, year;

Amend the query to return all combinations (including duplicates) of country code and year in the economies or the populations tables.

SELECT code, year
FROM economies
-- Set theory clause
UNION ALL
SELECT country_code, year
FROM populations
ORDER BY code, year;

Nicely done! UNION returned 434 records, whereas UNION ALL returned 814. Are you able to spot the duplicates in the UNION ALL?

3.2. At the INTERSECT

INTERSECT

Well done getting through the material on INTERSECT!

Let’s say you are interested in those countries that share names with cities. Use this task as an opportunity to show off your knowledge of set theory in SQL!

  • Return all city names that are also country names.
-- Return all cities with the same name as a country
SELECT name
FROM cities
INTERSECT
SELECT name
FROM countries;

Nice one! It looks as though Singapore is the only country in our database that has a city with the same name!

Review UNION and INTERSECT

Which of the following definitions of set operations is correct?

Possible Answers

  • UNION: returns all records (potentially duplicates) in both tables
  • UNION ALL: returns only unique records
  • INTERSECT: returns only records appearing in both tables
  • None of the above definitions are correct.

Correct! INTERSECT is a robust set operation for finding the set of identical records between two sets of records.

3.3. EXCEPT

You’ve got it, EXCEPT…

Just as you were able to leverage INTERSECT to find the names of cities with the same names as countries, you can also do the reverse, using EXCEPT.

In this exercise, you will find the names of cities that do not have the same names as their countries.

  • Return all cities that do not have the same name as a country.
-- Return all cities that do not have the same name as a country
SELECT name
FROM cities
EXCEPT
SELECT name
FROM countries
ORDER BY name;

EXCEPTional! Note that if countries had been on the left and cities on the right, you would have returned the opposite: all countries that do not have the same name as a city.

Calling all set operators

Congratulations! You’ve now made your way to the challenge problem for this chapter. Test your knowledge of set operators in SQL by classifying the below use cases into the correct buckets.

Think of how the information in each use case could be stored as tables, and recall the Venn diagrams you have learned, shown below!

  • Drag and drop the use cases provided into the correct set operator.

That’s right. Congratulations! Mastering set operations is a great tool in your SQL arsenal. As you’ve seen, set operations can sometimes even help re-write other types of joins in more compact ways.

4. Subqueries

4.1. Subquerying with semi joins and anti joins

Multiple WHERE clauses

You’ve learned about semi joins in the form of nested subqueries within the WHERE clause of the main query. In this exercise, you’ll familiarize yourself with semi join syntax by thinking through and re-ordering the lines of code provided. Note that subqueries are queries in their own right, so they can have a WHERE clause of their own! This is why you see two WHERE statements here.

Your task is to construct a semi join that pulls all records from economies2019 where gross_savings in the economies2015 table were below the 2015 global average. The global average gross_savings in 2015 was 22.5, and is already pre-calculated in the lines of code provided.

  • Re-order the lines of code provided to find all records from economies2019 where gross_savings in economies2015 were below the 2015 global average, using code to filter the economies2019 records.
Joining Data in SQL Answer Key 4

Nicely done. You’ve familiarized yourself with semi join syntax, and seen that the WHERE clause can be used in both the main query and the subquery!

Semi join

Great job getting acquainted with semi joins and anti joins! You are now going to practice using semi joins.

Let’s say you are interested in identifying languages spoken in the Middle East. The languages table contains information about languages and countries, but it does not tell you what region the countries belong to. You can build up a semi join by filtering the countries table by a particular region, and then using this to further filter the languages table.

You’ll build up your semi join as you did in the video exercise, block by block, starting with a selection of countries from the countries table, and then leveraging a WHERE clause to filter the languages table by this selection.

  • Select country code as a single field from the countries table, filtering for countries in the ‘Middle East’ region.
-- Select country code for countries in the Middle East
SELECT code
FROM countries
WHERE region = 'Middle East';
  • Write a second query to SELECT the name of each unique language appearing in the languages table; do not use column aliases here.
  • Order the result set by name in ascending order.
-- Select unique language names
SELECT DISTINCT(name)
FROM languages
-- Order by the name of the language
ORDER BY name;

Create a semi join out of the two queries you’ve written, which filters unique languages returned in the first query for only those languages spoken in the ‘Middle East’

SELECT DISTINCT name
FROM languages
-- Add syntax to use bracketed subquery below as a filter
WHERE code IN
    (SELECT code
    FROM countries
    WHERE region = 'Middle East')
ORDER BY name;

Well done writing your first subquery in the form of a semi join. Think of all the opportunities that open up when queries become building blocks of larger queries!

Diagnosing problems using anti join

Nice work on semi joins! The anti join is a related and powerful joining tool. It can be particularly useful for identifying whether an incorrect number of records appears in a join.

Say we are interested in identifying currencies of Oceanian countries. We have the following INNER JOIN which returns 15 records, and want to check that all Oceanian countries in our countries table are included in this result. If not, we want to return the names of any excluded countries.

SELECT c1.code, name, basic_unit AS currency

FROM countries AS c1

INNER JOIN currencies AS c2

ON c1.code = c2.code

WHERE c1.continent = ‘Oceania’;

An anti join will give us the names of these countries, if any! Your task is to write this anti join.

  • Begin your anti join by returning the code and name (in order, not aliased) for all countries in the continent of Oceania from the countries table.
  • Observe the number of records returned and compare this with the provided INNER JOIN, which returns 15 records.
-- Select code and name of countries from Oceania
SELECT code, name
FROM countries
WHERE continent = 'Oceania';

Complete your anti join by adding an additional filter to return every country code that is not included in the currencies table.

SELECT code, name
FROM countries
WHERE continent = 'Oceania'
-- Filter for countries not included in the bracketed subquery
  AND code NOT IN
    (SELECT code
    FROM currencies);

Nice! Your anti join determined which five countries were not included in the INNER JOIN provided. Did you notice that Tuvalu has two currencies, and therefore shows up twice in the INNER JOIN? This is why the INNER JOIN returned 15 rather than 14 results even though Oceania has 19 countries in our database.

4.2. Subqueries inside WHERE and SELECT

Subquery inside WHERE

The video exercise pointed out that subqueries inside WHERE can either be from the same table or a different table. In this exercise, you will nest a subquery from the populations table inside another query, also from the populations table. Your goal is to figure out which countries had high average life expectancies in 2015.

You can use SQL to do calculations for you. Suppose you only want records from 2015 with life_expectancy above 1.15 * avg_life_expectancy. You could use the following SQL query:

SELECT *

FROM populations

WHERE life_expectancy > 1.15 * avg_life_expectancy

AND year = 2015;

  • Begin by calculating the average life expectancy from the populations table.
  • Filter your data to return records from 2015 only.
-- Select average life_expectancy from the populations table
SELECT AVG(life_expectancy)
FROM populations
-- Filter for the year 2015
WHERE year ='2015';

Filter for only those populations where life_expectancy is 1.15 times higher than average.

SELECT *
FROM populations
-- Filter for only those populations where life expectancy is 1.15 times higher than average
WHERE life_expectancy > 1.15 *
  (SELECT AVG(life_expectancy)
   FROM populations
   WHERE year = 2015) 
     AND year = 2015;

Nice work! You may recognize many of these country codes as being relatively wealthy countries, which makes sense as we might expect life expectancy to be higher in wealthier nations.

WHERE do people live?

In this exercise, you will strengthen your knowledge of subquerying using WHERE. Follow the instructions below to get the urban area population for capital cities only. Explore the tables displayed in the console to help identify columns of interest as you build your query.

  • Return the name, country_code and urbanarea_pop for all capital cities (not aliased).
-- Select relevant fields from cities table
SELECT name, country_code, urbanarea_pop
FROM cities
-- Filter using a subquery on the countries table
WHERE name IN
  (SELECT capital
   FROM countries)
ORDER BY urbanarea_pop DESC;

Alright! You’ve got plenty of practice on subqueries inside WHERE. Let’s move on to subqueries inside the SELECT statement.

Subquery inside SELECT

As explored in the video, there are often multiple ways to produce the same result in SQL. You saw that subqueries can provide an alternative to joins to obtain the same result.

In this exercise, you’ll go further in exploring how some queries can be written using either a join or a subquery.

In Step 1, you’ll begin with a LEFT JOIN combined with a GROUP BY to obtain summarized information from two tables in order to select the nine countries with the most cities appearing in the cities table. In Step 2, you’ll write a query that returns the same result as the join, but leveraging a nested query instead.

  • Write a LEFT JOIN on the countries and cities tables to select names of countries in cities (aliasing country name as country), followed by counts of cities as cities_num.
  • Sort by cities_num (descending) and limit to the first nine records.
-- Find top nine countries with the most cities
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM countries
LEFT JOIN cities
ON countries.code = cities.country_code
GROUP BY country
-- Order by count of cities as cities_num
ORDER BY cities_num DESC, country
LIMIT 9;

Ouput

Write a nested subquery that returns an equivalent result to your LEFT JOIN, finding the counts of cities appearing in the cities table as cities_num.

SELECT countries.name AS country,
-- Subquery that provides the count of cities   
  (SELECT COUNT(*)
   FROM cities
   WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

Excellent job! Notice how the subquery involves only one additional step in your SELECT statement, whereas the JOIN and GROUP BY are a two-step process.

4.3. Subqueries inside FROM

Subquery inside FROM

Subqueries inside FROM can help us select columns from multiple tables in a single query.

Say you are interested in determining the number of languages spoken for each country. You want to present this information alongside each country’s local_name, which is a field only present in the countries table and not in the languages table. You’ll use a subquery inside FROM to bring information from these two tables together!

  • Begin with a query that returns country code from languages, and a count of languages spoken in each country as lang_num.
-- Select code, and language count as lang_num
SELECT code, COUNT(*) AS lang_num
FROM languages
GROUP BY code;
  • Select local_name from countries, with the aliased lang_num from your subquery (which has been nested and aliased for you as sub).
  • Use WHERE to match the code field from countries and sub.
-- Select local_name and lang_num from appropriate tables
SELECT local_name, sub.lang_num
FROM countries,
    (SELECT code, COUNT(*) AS lang_num
     FROM languages
     GROUP BY code) AS sub
-- Where codes match    
WHERE countries.code = sub.code
ORDER BY lang_num DESC;

Subquery challenge

You’re near the finish line! Test your understanding of subquerying with a challenge problem.

Suppose you’re interested in analyzing inflation and unemployment rate for certain countries in 2015. You are not interested in countries with “Republic” or “Monarchy” as their form of government, but are interested in all other forms of government, such as emirate federations, socialist states, and commonwealths.

You will use the field gov_form to filter for these two conditions, which represents a country’s form of government. You can review the different entries for gov_form in the countries table.

  • Select country code, inflation_rate, and unemployment_rate from economies.
  • Filter code for the set of countries which do not contain the words “Republic” or “Monarchy” in their gov_form.
-- Select relevant fields
SELECT code, inflation_rate, unemployment_rate
FROM economies
WHERE year = 2015 
  AND code NOT IN
-- Subquery returning country codes filtered on gov_form
  (SELECT code
  FROM countries
  WHERE gov_form LIKE '%Republic%' OR gov_form LIKE '%Monarchy%')
ORDER BY inflation_rate;

Superb work writing the majority of the query yourself. You found that in 2015, South Sudan (with country code SSD) had inflation above 50%! Did you also notice that this query is an anti join?

Final challenge

You’ve made it to the final challenge problem! Get ready to tackle this step-by-step.

Your task is to determine the top 10 capital cities in Europe and the Americas by city_perc, a metric you’ll calculate. city_perc is a percentage that calculates the “proper” population in a city as a percentage of the total population in the wider metro area, as follows:

city_proper_pop / metroarea_pop * 100

Do not use table aliasing in this exercise.

  • From cities, select the city name, country code, proper population, and metro area population, as well as the field city_perc, which calculates the proper population as a percentage of metro area population for each city (using the formula provided).
  • Filter city name with a subquery that selects capital cities from countries in ‘Europe’ or continents with ‘America’ at the end of their name.
  • Exclude NULL values in metroarea_pop.

Order by city_perc (descending) and return only the first 10 rows.

-- Select fields from cities
SELECT 
  name, 
    country_code, 
    city_proper_pop, 
    metroarea_pop,
    city_proper_pop / metroarea_pop * 100 AS city_perc
FROM cities
-- Use subquery to filter city name
WHERE name IN
  (SELECT capital
   FROM countries
   WHERE (continent = 'Europe'
   OR continent LIKE '%America'))
-- Add filter condition such that metroarea_pop does not have null values
    AND metroarea_pop IS NOT NULL
-- Sort and limit the result
ORDER BY city_perc DESC
LIMIT 10;

You’ve identified that Lima has the highest percentage of people living in the city ‘proper’, relative to the wider metropolitan population! Nicely done getting to the top of the summit.

Share this Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *