Data-Driven Decision Making in SQL Answer Key – Datacamp. Learn how to analyze a SQL table and report insights to management.
In this course, you will learn how to use SQL to support decision making. It is based on a case study about an online movie rental company with a database about customer information, movie ratings, background information on actors and more. You will learn to apply SQL queries to study for example customer preferences, customer engagement, and sales development. This course also covers SQL extensions for online analytical processing (OLAP), which makes it easier to obtain key insights from multidimensional aggregated data.
Course Url: https://www.datacamp.com/courses/data-driven-decision-making-in-sql
1. Introduction to business intelligence for a online movie rental database
The first chapter is an introduction to the use case of an online movie rental company, called MovieNow and focuses on using simple SQL queries to extract and aggregated data from its database.
1.1. Introduction to data driven decision making

Exploring the database
Explore the tables and its columns. Which of the following quantities can’t be computed?
Possible Answers
- The number of customers from each country.
- The number of movies with an international award.
- The average rating of a movie.
- The number of movies with the actor Daniel Radcliffe.
Correct answer! There is no information about movie awards in the data.
Exploring the table renting
The table renting includes all records of movie rentals. Each record has a unique ID renting_id. It also contains information about customers (customer_id) and which movies they watched (movie_id). Furthermore, customers can give a rating after watching the movie, and the day the movie was rented is recorded.
- Select all columns from renting.
SELECT * -- Select all
From renting; -- From table renting
Now select only those columns from renting which are needed to calculate the average rating per movie.
SELECT movie_id, -- Select all columns needed to compute the average rating per movie
rating
FROM renting;
output
movie_id | rating |
8 | null |
29 | 10 |
45 | 4 |
Question
In SQL missing values are coded with null. In which column of renting did you notice null values?
- renting_id
- customer_id
- movie_id
- rating
Well observed! The column rating in the renting table has a lot of null values since not every customer gives a rating after watching a movie.
1.2. Filtering and ordering
Working with dates
For the analysis of monthly or annual changes, it is important to select data from specific time periods. You will select records from the table renting of movie rentals. The format of dates is ‘YYYY-MM-DD’.
- Select all movies rented on October 9th, 2018.
SELECT *
FROM renting
WHERE date_renting ='2018-10-09'; -- Movies rented on October 9th, 2018
OUTPUT
renting_id | customer_id | movie_id | rating | date_renting |
1 | 41 | 8 | null | 2018-10-09 |
6 | 50 | 71 | 7 | 2018-10-09 |
- Select all records of movie rentals between beginning of April 2018 till end of August 2018.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-04-01' AND '2018-08-31'; -- from beginning April 2018 to end August 2018
OUTPUT
renting_id | customer_id | movie_id | rating | date_renting |
3 | 108 | 45 | 4 | 2018-06-08 |
8 | 73 | 65 | 10 | 2018-06-05 |
12 | 52 | 65 | 10 | 2018-06-29 |
- Put the most recent records of movie rentals on top of the resulting table and order them in decreasing order.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-04-01' AND '2018-08-31'
ORDER BY rating DESC; -- Order by recency in decreasing order
OUTPUT
renting_id | customer_id | movie_id | rating | date_renting |
3 | 108 | 45 | 4 | 2018-06-08 |
8 | 73 | 65 | 10 | 2018-06-05 |
12 | 52 | 65 | 10 | 2018-06-29 |
Well done! You know now how to select data from specific time periods and how to order them by date.
Selecting movies
The table movies contains all movies available on the online platform.
- Select all movies which are not dramas.
SELECT *
FROM movies
WHERE genre <> 'Drama'; -- All genres except drama
output
movie_id | title | genre | runtime | year_of_release | renting_price |
1 | One Night at McCool’s | Comedy | 93 | 2001 | 2.09 |
3 | What Women Want | Comedy | 127 | 2001 | 2.59 |
5 | The Fellowship of the Ring | Science Fiction & Fantasy | 178 | 2001 | 2.59 |
- Select the movies ‘Showtime’, ‘Love Actually’ and ‘The Fighter’.
SELECT *
FROM movies
WHERE title IN ('Showtime', 'Love Actually', 'The Fighter'); -- Select all movies with the given titles
OUTPUT
movie_id | title | genre | runtime | year_of_release | renting_price |
11 | Showtime | Comedy | 95 | 2002 | 1.79 |
20 | Love Actually | Comedy | 135 | 2003 | 2.29 |
53 | The Fighter | Other | 116 | 2010 | 2.49 |
- Order the movies by increasing renting price.
SELECT *
FROM movies
ORDER BY renting_price DESC ; -- Order the movies by increasing renting price
movie_id | title | genre | runtime | year_of_release | renting_price |
51 | Morning Glory | Comedy | 107 | 2010 | 2.99 |
9 | 11’09”01 September 11 | Drama | 134 | 2002 | 2.99 |
64 | Cloud Atlas | Drama | 172 | 2012 | 2.89 |
52 | Fair Game | Drama | 108 | 2010 | 2.89 |
Congratulations! You selected subsets from the table movies and ordered its records.
Select from renting
Only some users give a rating after watching a movie. Sometimes it is interesting to explore only those movie rentals where a rating was provided.
- Select from table renting all movie rentals from 2018.
- Filter only those records which have a movie rating.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-01-01' AND '2018-12-31' -- Renting in 2018
AND rating IS NOT NULL; -- Rating exists
OUTPUT
renting_id | customer_id | movie_id | rating | date_renting |
3 | 108 | 45 | 4 | 2018-06-08 |
4 | 39 | 66 | 8 | 2018-10-22 |
6 | 50 | 71 | 7 | 2018-10-09 |
Well done! You selected all records from renting which are not null for 2018. Now you are ready for the next step!
1.3. Aggregations – summarizing data
Summarizing customer information
In most business decisions customers are analyzed in groups, such as customers per country or customers per age group
- Count the number of customers born in the 80s.
SELECT COUNT(customer_id) -- Count the total number of customers
FROM customers
WHERE date_of_birth BETWEEN '1980-01-01' AND '1989-12-31'; -- Select customers born between 1980-01-01 and 1989-12-31
OUTPUT
count |
33 |
- Count the number of customers from Germany.
SELECT COUNT(customer_id) -- Count the total number of customers
FROM customers
WHERE country = 'Germany'; -- Select all customers from Germany
output
count |
0 |
- Count the number of countries where MovieNow has customers.
SELECT COUNT(distinct country) -- Count the number of countries
FROM customers;
OUTPUT
count |
11 |
Well done! You aggregated data from the customers table and figured out that MovieNow has customers in 11 countries and that there are no customers from Germany.
Ratings of movie 25
The movie ratings give us insight into the preferences of our customers. Report summary statistics, such as the minimum, maximum, average, and count, of ratings for the movie with ID 25.
- Select all movie rentals of the movie with movie_id 25 from the table renting.
- For those records, calculate the minimum, maximum and average rating and count the number of ratings for this movie.
SELECT MIN(rating) AS min_rating, -- Calculate the minimum rating and use alias min_rating
MAX(rating) AS max_rating, -- Calculate the maximum rating and use alias max_rating
AVG(rating) AS avg_rating, -- Calculate the average rating and use alias avg_rating
COUNT(rating) AS number_ratings -- Count the number of ratings and use alias number_ratings
FROM renting
WHERE movie_id = 25; -- Select all records of the movie with ID 25
OUTPUT
min_rating | max_rating | avg_rating | number_ratings |
5 | 10 | 7.5000000000000000 | 8 |
Congratulations! You summarized the ratings for the movie 25 and used meaningful column names. This movie has 8 ratings with minimum 5, maximum 10 and average 7.5 rating.
Examining annual rentals
You are asked to provide a report about the development of the company. Specifically, your manager is interested in the total number of movie rentals, the total number of ratings and the average rating of all movies since the beginning of 2019.
- First, select all records of movie rentals since January 1st 2019.
SELECT * -- Select all records of movie rentals since January 1st 2019
FROM renting
WHERE date_renting >= '2019-01-01';
OUTPUT
renting_id | customer_id | movie_id | rating | date_renting |
5 | 104 | 15 | 7 | 2019-03-18 |
17 | 22 | 46 | 10 | 2019-02-16 |
18 | 36 | 39 | 10 | 2019-03-20 |
- Now, count the number of movie rentals and calculate the average rating since the beginning of 2019.
SELECT
Count(renting_id), -- Count the total number of rented movies
AVG(rating) -- Add the average rating
FROM renting
WHERE date_renting >= '2019-01-01';
OUTPUT
count | avg |
159 | 7.9462365591397849 |
- Use as alias column names number_renting and average_rating respectively.
SELECT
COUNT(*) AS number_renting, -- Give it the column name number_renting
AVG(rating) AS average_rating -- Give it the column name average_rating
FROM renting
WHERE date_renting >= '2019-01-01';
OUTPUT
number_renting | average_rating |
159 | 7.9462365591397849 |
- Finally, count how many ratings exist since 2019-01-01.
SELECT
COUNT(*) AS number_renting,
AVG(rating) AS average_rating,
COUNT(rating) AS number_ratings -- Add the total number of ratings here.
FROM renting
WHERE date_renting >= '2019-01-01';
output
number_renting | average_rating | number_ratings |
159 | 7.9462365591397849 | 93 |
Congratulations! You managed to extract all KPIs from the renting records by working with aggregations, NULL values and dates.
2. Decision Making with simple SQL queries
More complex queries with GROUP BY, LEFT JOIN and sub-queries are used to gain insight into customer preferences.
2.1. Grouping movies
First account for each country.
Conduct an analysis to see when the first customer accounts were created for each country.
- Create a table with a row for each country and columns for the country name and the date when the first customer account was created.
- Use the alias first_account for the column with the dates.
- Order by date in ascending order.
SELECT country, -- For each country report the earliest date when an account was created
MIN(date_account_start) AS first_account
FROM customers
GROUP BY country
ORDER BY first_account;
OUTPUT
country | first_account |
France | 2017-01-13 |
Hungary | 2017-01-18 |
Belgium | 2017-01-28 |
Well done! You learned that the first customer account was created in France.
Average movie ratings
For each movie the average rating, the number of ratings and the number of views has to be reported. Generate a table with meaningful column names.
- Group the data in the table renting by movie_id and report the ID and the average rating.
SELECT movie_id,
AVG(rating) -- Calculate average rating per movie
FROM renting
GROUP BY movie_id;
output
movie_id | avg |
54 | 8.1666666666666667 |
29 | 8.0000000000000000 |
71 | 8.0000000000000000 |
- Add two columns for the number of ratings and the number of movie rentals to the results table.
- Use alias names avg_rating, number_rating and number_renting for the corresponding columns.
SELECT movie_id,
AVG(rating) AS avg_rating, -- Use as alias avg_rating
COUNT(rating) AS number_rating, -- Add column for number of ratings with alias number_rating
COUNT(renting_id) AS number_renting -- Add column for number of movie rentals with alias number_renting
FROM renting
GROUP BY movie_id;
output
movie_id | avg_rating | number_rating | number_renting |
54 | 8.1666666666666667 | 6 | 9 |
29 | 8.0000000000000000 | 7 | 11 |
71 | 8.0000000000000000 | 4 | 6 |
- Order the rows of the table by the average rating such that it is in decreasing order.
- Observe what happens to NULL values.
SELECT movie_id,
AVG(rating) AS avg_rating,
COUNT(rating) AS number_ratings,
COUNT(*) AS number_renting
FROM renting
GROUP BY movie_id
ORDER BY avg_rating DESC; -- Order by average rating in decreasing order
OUTPUT
movie_id | avg_rating | number_ratings | number_renting |
11 | null | 0 | 6 |
48 | 10.0000000000000000 | 2 | 5 |
5 | 9.7500000000000000 | 4 | 8 |
Question
Which statement is true for the movie with average rating null?
Possible Answers
- The number of ratings is 6.
- The number of movie rentals is zero.
- The average is null because one of the ratings of the movie is null.
- The average is null because all of the ratings of the movie are null.
Correct answer! The average is null only if all values are null.
Average rating per customer
Similar to what you just did, you will now look at the average movie ratings, this time for customers. So you will obtain a table with the average rating given by each customer. Further, you will include the number of ratings and the number of movie rentals per customer. You will report these summary statistics only for customers with more than 7 movie rentals and order them in ascending order by the average rating.
- Group the data in the table renting by customer_id and report the customer_id, the average rating, the number of ratings and the number of movie rentals.
- Select only customers with more than 7 movie rentals.
- Order the resulting table by the average rating in ascending order.
SELECT customer_id, -- Report the customer_id
AVG(rating), -- Report the average rating per customer
COUNT(rating), -- Report the number of ratings per customer
COUNT(*) -- Report the number of movie rentals per customer
FROM renting
GROUP BY customer_id
HAVING COUNT(*) > 7 -- Select only customers with more than 7 movie rentals
ORDER BY AVG(rating); -- Order by the average rating in ascending order
OUTPUT
customer_id | avg | count | count |
104 | 6.2500000000000000 | 4 | 8 |
28 | 6.7142857142857143 | 7 | 11 |
111 | 7.0000000000000000 | 3 | 10 |
Congratulations! You found out that customer number 104 gave the lowest average ratings for 4 movies.
2.2. Joining movie ratings with customer data
Join renting and customers
For many analyses it is necessary to add customer information to the data in the table renting.
- Augment the table renting with all columns from the table customers with a LEFT JOIN.
- Use as alias’ for the tables r and c respectively.
SELECT * -- Join renting with customers
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id ;
Select only records from customers coming from Belgium.
SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
WHERE c.country = 'Belgium'; -- Select only records from customers coming from Belgium
Average ratings of customers from Belgium.
SELECT AVG(r.rating) -- Average ratings of customers from Belgium
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
WHERE c.country='Belgium';
output
avg |
8.9000000000000000 |
Congratulations! You calculated the average rating for customers from Belgium, which is 8.9.
Aggregating revenue, rentals and active customers
The management of MovieNow wants to report key performance indicators (KPIs) for the performance of the company in 2018. They are interested in measuring the financial successes as well as user engagement. Important KPIs are, therefore, the profit coming from movie rentals, the number of movie rentals and the number of active customers.
- First, you need to join movies on renting to include the renting_price from the movies table for each renting record.
- Use as alias’ for the tables m and r respectively.
SELECT *
FROM renting AS r
LEFT JOIN movies AS m -- Choose the correct join statment
ON r.movie_id = m.movie_id;
Calculate the revenue coming from movie rentals, the number of movie rentals and the number of customers who rented a movie.
SELECT
SUM(m.renting_price), -- Get the revenue from movie rentals
COUNT(*), -- Count the number of rentals
COUNT(DISTINCT r.customer_id) -- Count the number of customers
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;
Now, you can report these values for the year 2018. Calculate the revenue in 2018, the number of movie rentals and the number of active customers in 2018. An active customer is a customer who rented at least one movie in 2018.
SELECT
SUM(m.renting_price),
COUNT(*),
COUNT(DISTINCT r.customer_id)
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
-- Only look at movie rentals in 2018
WHERE date_renting BETWEEN '2018-01-01' AND '2018-12-31' ;
OUTPUT
sum | count | count |
658.02 | 298 | 93 |
Congratulations! You’ve calculated a turnover of 658.02 and found the number of rentals to be 298 and the number of active users to be 93 in 2018.
Movies and actors
You are asked to give an overview of which actors play in which movie
- Create a list of actor names and movie titles in which they act. Make sure that each combination of actor and movie appears only once.
- Use as an alias for the table actsin the two letters ai.
SELECT m.title, -- Create a list of movie titles and actor names
a.name
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
LEFT JOIN actors AS a
ON a.actor_id = ai.actor_id;
output
title | name |
Candy | Abbie Cornish |
Jack and Jill | Adam Sandler |
Simone | Al Pacino |
Congratulations! You joined three tables and created a list of movie titles and actors in this movies.
2.3. Money spent per customer with sub-queries
Income from movies
How much income did each movie generate? To answer this question subsequent SELECT statements can be used.
SELECT m.title, -- Use a join to get the movie title and price for each movie rental
m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;
output
title | renting_price |
Waking Up in Reno | 2.59 |
Two for the Money | 2.79 |
Burn After Reading | 2.39 |
- Report the total income for each movie.
- Order the result by decreasing income.
SELECT rm.title, -- Report the income from movie rentals for each movie
SUM(rm.renting_price) AS income_movie
FROM
(SELECT m.title,
m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id=m.movie_id) AS rm
GROUP BY rm.title
ORDER BY income_movie DESC; -- Order the result by decreasing income
output
title | income_movie |
Bridget Jones – The Edge of Reason | 37.57 |
Fair Game | 34.68 |
The Kingdom | 31.35 |
Question
Which statement about the movie ‘Django Unchained’ is NOT correct?
Possible Answers
- The income from this movie is 29.59.
- The income from ‘Django Unchained’ is lower than from ‘The Kingdom’.
- The income from ‘Django Unchained’ is higher than from ‘Simone’.
- ‘Django Unchained’ has the 5th highest income.
You chose the right anwer! This statement is NOT correct. ‘Django Unchained’ and ‘Simone’ generated the same income.
Age of actors from the USA
Now you will explore the age of American actors and actresses. Report the date of birth of the oldest and youngest US actor and actress.
- Create a subsequent SELECT statements in the FROM clause to get all information about actors from the USA.
- Give the subsequent SELECT statement the alias a.
- Report for actors from the USA the year of birth of the oldest and the year of birth of the youngest actor and actress.
SELECT a.gender, -- Report for male and female actors from the USA
Min(a.year_of_birth), -- The year of birth of the oldest actor
Max(a.year_of_birth) -- The year of birth of the youngest actor
FROM
(SELECT *
FROM actors
WHERE nationality = 'USA') AS a
GROUP BY a.gender;
output
gender | min | max |
female | 1945 | 1993 |
male | 1930 | 1992 |
Congratulations! You learned that in the MovieNow actors’ record the oldest actor was born in 1930 and the oldest actress in 1945.
2.4. Identify favorite actors of customer groups
Identify favorite movies for a group of customers
Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.
- Augment the table renting with customer information and information about the movies.
- For each join use the first letter of the table name as alias.
SELECT *
FROM renting AS r
LEFT JOIN customers AS c -- Add customer information
ON r.customer_id = c.customer_id
LEFT JOIN movies AS m -- Add movie information
ON r.movie_id = m.movie_id;
Select only those records of customers born in the 70s.
SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'; -- Select customers born in the 70s
For each movie, report the number of times it was rented, as well as the average rating. Limit your results to customers born in the 1970s.
SELECT m.title,
COUNT(r.customer_id), -- Report number of views per movie
AVG(r.rating) -- Report the average rating per movie
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title;
- Remove those movies from the table with only one rental.
- Order the result table such that movies with highest rating come first.
SELECT m.title,
COUNT(*),
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title
HAVING COUNT(*) > 1 -- Remove movies with only one rental
ORDER BY AVG(r.rating); -- Order with highest rating first
OUTPUT
title | count | avg |
The Recruit | 2 | 5.0000000000000000 |
Something’s Gotta Give | 3 | 6.0000000000000000 |
Then She Found Me | 4 | 6.0000000000000000 |
Congratulations! You just wrote a query with LEFT JOIN, WHERE, GROUP BY, HAVING and ORDER BY. And you found out that ‘The Fellowship of the Ring’ is a movie with one of the best ratings among people born in the 70s.
Identify favorite actors for Spain
You’re now going to explore actor popularity in Spain. Use as alias the first letter of the table, except for the table actsin use ai instead.
Augment the table renting with information about customers and actors.
SELECT *
FROM renting as r
LEFT JOIN customers as c -- Augment table renting with information about customers
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai -- Augment the table renting with the table actsin
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a -- Augment table renting with information about actors
ON ai.actor_id = a.actor_id;
- Report the number of movie rentals and the average rating for each actor, separately for male and female customers.
- Report only actors with more than 5 movie rentals.
SELECT a.name, c.gender,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
GROUP BY a.name, c.gender -- For each actor, separately for male and female customers
HAVING AVG(r.rating) IS NOT NULL
AND COUNT(*) > 5 -- Report only actors with more than 5 movie rentals
ORDER BY avg_rating DESC, number_views DESC;
Now, report the favorite actors only for customers from Spain.
SELECT a.name, c.gender,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
WHERE c.country = 'Spain' -- Select only customers from Spain
GROUP BY a.name, c.gender
HAVING AVG(r.rating) IS NOT NULL
AND COUNT(*) > 5
ORDER BY avg_rating DESC, number_views DESC;
output
name | gender | number_views | avg_rating |
Catherine Keener | female | 6 | 8.0000000000000000 |
Emma Watson | male | 7 | 7.6000000000000000 |
Daniel Radcliffe | male | 7 | 7.6000000000000000 |
Well done! You found out that in this table Catherine Keener is the favorite actress among female Spain customers and that male customers from Spain like the actors from Harry Potter best: Emma Watson, Daniel Radcliffe and Rupert Grint.
KPIs per country
In chapter 1 you were asked to provide a report about the development of the company. This time you have to prepare a similar report with KPIs for each country separately. Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
- Augment the table renting with information about customers and movies.
- Use as alias the first latter of the table name.
- Select only records about rentals since beginning of 2019.
SELECT *
FROM renting AS r -- Augment the table renting with information about customers
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN movies AS m-- Augment the table renting with information about movies
ON r.movie_id = m.movie_id
WHERE r.date_renting >= '2019-01-01'; -- Select only records about rentals since the beginning of 2019
- Calculate the number of movie rentals.
- Calculate the average rating.
- Calculate the revenue from movie rentals.
- Report these KPIs for each country.
SELECT
c.country, -- For each country report
COUNT(*) AS number_renting, -- The number of movie rentals
AVG(r.rating) AS average_rating, -- The average rating
SUM(m.renting_price) AS revenue -- The revenue from movie rentals
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'
GROUP BY c.country;
output
country | number_renting | average_rating | revenue |
null | 1 | 10.0000000000000000 | 1.79 |
Spain | 26 | 8.0769230769230769 | 57.94 |
Great Britan | 9 | 7.2000000000000000 | 17.91 |
Well done! You calculated a total revenue of 57.94 for Spain, with 26 movie rentals and an average rating of 8.1.
3. Data Driven Decision Making with advanced SQL queries
The concept of nested queries and correlated nested queries is introduced and the functions EXISTS and UNION are used to categorize customers, movies, actors, and more.
3.1. Nested query
Often rented movies
Your manager wants you to make a list of movies excluding those which are hardly ever watched. This list of movies will be used for advertising. List all movies with more than 5 views using a nested query which is a powerful tool to implement selection conditions.
- Select all movie IDs which have more than 5 views.
SELECT movie_id -- Select movie IDs with more than 5 views
FROM renting
GROUP BY movie_id
HAVING COUNT(movie_id) > 5
output
movie_id |
54 |
29 |
71 |
- Select all information about movies with more than 5 views.
SELECT *
FROM movies
GROUP BY movie_id
HAVING movie_id IN -- Select movie IDs from the inner query
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(*) > 5)
Congratulations! You used a nested query to select all movies with more than 5 views and found out that ‘One Night at McCool’s’ is one of them.
Frequent customers
Report a list of customers who frequently rent movies on MovieNow.
- List all customer information for customers who rented more than 10 movies.
SELECT *
FROM customers
GROUP BY customer_id
HAVING customer_id IN -- Select all customers with more than 10 movie rentals
(SELECT customer_id
FROM renting
GROUP BY customer_id
HAVING COUNT(*) > 10);
Well done! You identified Avelaine Corbeil from France as one of the customers who rented more than 10 movies.
Movies with rating above average
For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.
- Calculate the average over all ratings.
SELECT AVG( rating) -- Calculate the total average rating
FROM renting
output
avg |
7.9390243902439024 |
- Select movie IDs and calculate the average rating of movies with rating above average.
SELECT movie_id, -- Select movie IDs and calculate the average rating
AVG(rating)
FROM renting
GROUP BY movie_id
HAVING AVG(rating) > -- Of movies with rating above average
(SELECT AVG(rating)
FROM renting);
output
movie_id | avg |
54 | 8.1666666666666667 |
29 | 8.0000000000000000 |
71 | 8.0000000000000000 |
- The advertising team only wants a list of movie titles. Report the movie titles of all movies with average rating higher than the total average
SELECT title -- Report the movie titles of all movies with average rating higher than the total average
FROM movies
WHERE movie_id IN
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating) >
(SELECT AVG(rating)
FROM renting));
OUTPUT
title |
What Women Want |
The Fellowship of the Ring |
Harry Potter and the Philosopher’s Stone |
Congratulations! You selected all movies with an average rating higher than the total average and found out that the movie ‘What Women Want’ is one of them.
3.2. Correlated nested queries
Analyzing customer behavior
A new advertising campaign is going to focus on customers who rented fewer than 5 movies. Use a correlated query to extract all customer information for the customers of interest.
- First, count number of movie rentals for customer with customer_id=45. Give the table renting the alias r.
-- Count movie rentals of customer 45
SELECT COUNT(*)
FROM renting AS r
WHERE customer_id=45;
OUTPUT
count |
5 |
- Now select all columns from the customer table where the number of movie rentals is smaller than 5.
-- Select customers with less than 5 movie rentals
SELECT *
FROM customers as c
WHERE 5 >
(SELECT count(*)
FROM renting as r
WHERE r.customer_id = c.customer_id);
Congratulations! You managed to use a correlated query to identify the customers of interest.
Customers who gave low ratings
Identify customers who were not satisfied with movies they watched on MovieNow. Report a list of customers with minimum rating smaller than 4.
- Calculate the minimum rating of customer with ID 7.
-- Calculate the minimum rating of customer with ID 7
SELECT MIN(rating)
FROM renting
WHERE customer_ID=7;
OUTPUT
min |
8 |
- Select all customers with a minimum rating smaller than 4. Use the first letter of the table as an alias.
SELECT *
FROM customers AS c
WHERE 4 > -- Select all customers with a minimum rating smaller than 4
(SELECT MIN(rating)
FROM renting AS r
WHERE r.customer_id = c.customer_id);
output
customer_id | name | country | gender | date_of_birth | date_account_start |
28 | Sidney Généreux | France | male | 1980-12-01 | 2017-02-04 |
41 | Zara Mitchell | Great Britan | female | 1994-07-08 | 2017-06-12 |
86 | Albin Jaworski | Poland | male | 1984-05-01 | 2017-12-15 |
Well done! You found out that Sidney Généreux, Zara Mitchell, Albin Jaworski, and Robin J. Himes rated a movie with less than 4.
Movies and ratings with correlated queries
Report a list of movies that received the most attention on the movie platform, (i.e. report all movies with more than 5 ratings and all movies with an average rating higher than 8).
- Select all movies with more than 5 ratings. Use the first letter of the table as an alias.
SELECT *
FROM movies AS m
WHERE 5 < -- Select all movies with more than 5 ratings
(SELECT COUNT(rating)
FROM renting AS r
WHERE m.movie_id = r.movie_id);
output
movie_id | title | genre | runtime | year_of_release | renting_price |
4 | Training Day | Drama | 122 | 2001 | 1.79 |
10 | Simone | Drama | 117 | 2002 | 2.69 |
12 | The Two Towers | Science Fiction & Fantasy | 179 | 2002 | 2.39 |
- Select all movies with an average rating higher than 8.
SELECT *
FROM movies AS m
WHERE 8 < -- Select all movies with an average rating higher than 8
(SELECT AVG(rating)
FROM renting AS r
WHERE r.movie_id = m.movie_id);
output
movie_id | title | genre | runtime | year_of_release | renting_price |
3 | What Women Want | Comedy | 127 | 2001 | 2.59 |
5 | The Fellowship of the Ring | Science Fiction & Fantasy | 178 | 2001 | 2.59 |
6 | Harry Potter and the Philosopher’s Stone | Science Fiction & Fantasy | 152 | 2001 | 2.69 |
Well done! You use correlated queries to figure out that the comedy ‘What women want’ has an average rating higher than 8. Note, that you didn’t need to use a GROUP BY clause to answer this request.
3.3. Queries with EXISTS
Customers with at least one rating
Having active customers is a key performance indicator for MovieNow. Make a list of customers who gave at least one rating.
- Select all records of movie rentals from customer with ID 115.
-- Select all records of movie rentals from customer with ID 115
SELECT *
FROM renting
WHERE customer_id = 115;
output
renting_id | customer_id | movie_id | rating | date_renting |
245 | 115 | 69 | null | 2019-04-24 |
395 | 115 | 11 | null | 2019-04-07 |
498 | 115 | 42 | null | 2019-03-16 |
- Select all records of movie rentals from the customer with ID 115 and exclude records with null ratings.
SELECT *
FROM renting
WHERE rating IS NOT NULL -- Exclude those with null ratings
AND customer_id = 115;
Select all records of movie rentals from the customer with ID 1, excluding null ratings.
SELECT *
FROM renting
WHERE rating IS NOT NULL -- Exclude null ratings
AND customer_id = 1; -- Select all ratings from customer with ID 1
OUTPUT
renting_id | customer_id | movie_id | rating | date_renting |
421 | 1 | 71 | 10 | 2019-01-21 |
520 | 1 | 39 | 6 | 2018-12-29 |
- Select all customers with at least one rating. Use the first letter of the table as an alias.
SELECT *
FROM customers AS c-- Select all customers with at least one rating
WHERE EXISTS
(SELECT *
FROM renting AS r
WHERE rating IS NOT NULL
AND r.customer_id = c.customer_id);
output
customer_id | name | country | gender | date_of_birth | date_account_start |
2 | Wolfgang Ackermann | Austria | male | 1971-11-17 | 2018-10-15 |
4 | Julia Jung | Austria | female | 1991-01-04 | 2017-11-22 |
7 | Annelous Sneep | Belgium | female | 1993-11-14 | 2018-05-12 |
Congratulations! You found out that customer 115 gave no ratings and customer 1 gave two ratings. Then you selected all customers where the table of ratings is not empty.
Actors in comedies
In order to analyze the diversity of actors in comedies, first, report a list of actors who play in comedies and then, the number of actors for each nationality playing in comedies.
- Select the records of all actors who play in a Comedy. Use the first letter of the table as an alias.
SELECT * -- Select the records of all actors who play in a Comedy
FROM actsin AS ai
LEFT JOIN movies AS m
ON ai.movie_id = m.movie_id
WHERE m.genre = 'Comedy';
Make a table of the records of actors who play in a Comedy and select only the actor with ID 1.
SELECT *
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
WHERE m.genre = 'Comedy'
AND ai.actor_id = 1; -- Select only the actor with ID 1
Create a list of all actors who play in a Comedy. Use the first letter of the table as an alias.
SELECT *
FROM actors AS a
WHERE EXISTS
(SELECT *
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
WHERE m.genre = 'Comedy'
AND ai.actor_id = a.actor_id);
output
actor_id | name | year_of_birth | nationality | gender |
2 | Adam Sandler | 1966 | USA | male |
3 | Al Pacino | 1940 | USA | male |
6 | Anjelica Huston | 1951 | USA | female |
- Report the nationality and the number of actors for each nationality.
SELECT a.nationality, COUNT(*) -- Report the nationality and the number of actors for each nationality
FROM actors AS a
WHERE EXISTS
(SELECT ai.actor_id
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
WHERE m.genre = 'Comedy'
AND ai.actor_id = a.actor_id)
GROUP BY a.nationality;
OUTPUT
nationality | count |
Northern Ireland | 1 |
USA | 22 |
South Africa | 1 |
Congratulations! You found out that there is one actor each coming from South Africa, Canada and Northen Ireland, three actors from Great Britain, and 22 from the USA who played in a Comedy.
3.4. Queries with UNION and INTERSECT

SAME COLUMNS ARE SELECTED IN BOTH TABLES

Young actors not coming from the USA
As you’ve just seen, the operators UNION and INTERSECT are powerful tools when you work with two or more tables. Identify actors who are not from the USA and actors who were born after 1990.
- Report the name, nationality and the year of birth of all actors who are not from the USA.
SELECT name, -- Report the name, nationality and the year of birth
nationality,
year_of_birth
FROM actors
WHERE nationality <> 'USA'; -- Of all actors who are not from the USA
Report the name, nationality and the year of birth of all actors who were born after 1990.
SELECT name,
nationality,
year_of_birth
FROM actors
WHERE year_of_birth > 1990; -- Born after 1990
Select all actors who are not from the USA and all actors who are born after 1990.
SELECT name,
nationality,
year_of_birth
FROM actors
WHERE nationality <> 'USA'
UNION -- Select all actors who are not from the USA and all actors who are born after 1990
SELECT name,
nationality,
year_of_birth
FROM actors
WHERE year_of_birth > 1990;
OUTPUT
name | nationality | year_of_birth |
Geoffrey Rush | Australia | 1951 |
Barry Pepper | Canada | 1970 |
Colin Farrell | Ireland | 1976 |
Select all actors who are not from the USA and who are also born after 1990.
SELECT name,
nationality,
year_of_birth
FROM actors
WHERE nationality <> 'USA'
INTERSECT -- Select all actors who are not from the USA and who are also born after 1990
SELECT name,
nationality,
year_of_birth
FROM actors
WHERE year_of_birth > 1990;
OUTPUT
name | nationality | year_of_birth |
Freddie Highmore | British | 1992 |
Congratulations! You identified Freddie Highmore as the only actor who is not from the USA and born after 1990.
Dramas with high ratings
The advertising team has a new focus. They want to draw the attention of the customers to dramas. Make a list of all movies that are in the drama genre and have an average rating higher than 9.
- Select the IDs of all dramas.
SELECT movie_id -- Select the IDs of all dramas
FROM movies
WHERE genre = 'Drama';
Select the IDs of all movies with average rating higher than 9.
SELECT movie_id -- Select the IDs of all movies with average rating higher than 9
FROM renting
GROUP BY movie_id
HAVING AVG(rating)>9;
Select the IDs of all dramas with average rating higher than 9.
SELECT movie_id
FROM movies
WHERE genre = 'Drama'
INTERSECT -- Select the IDs of all dramas with average rating higher than 9
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating)>9;
Select all movies of in the drama genre with an average rating higher than 9.
SELECT *
FROM movies
WHERE movie_id IN -- Select all movies of genre drama with average rating higher than 9
(SELECT movie_id
FROM movies
WHERE genre = 'Drama'
INTERSECT
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating)>9);
OUTPUT
movie_id | title | genre | runtime | year_of_release | renting_price |
42 | No Country for Old Men | Drama | 122 | 2007 | 1.49 |
Well done! You reported to the advertisment team that ‘No Country for Old Men’ is the only drama with average rating higher than 9.
4. Data Driven Decision Making with OLAP SQL queries
The OLAP extensions in SQL are introduced and applied to aggregated data on multiple levels. These extensions are the CUBE, ROLLUP and GROUPING SETS operators.
4.1. OLAP: CUBE operator
Groups of customers
Use the CUBE operator to extract the content of a pivot table from the database. Create a table with the total number of male and female customers from each country.
- Create a table with the total number of customers, of all female and male customers, of the number of customers for each country and the number of men and women from each country.
SELECT gender, -- Extract information of a pivot table of gender and country for the number of customers
country,
COUNT(*)
FROM customers
GROUP BY CUBE (gender, country)
ORDER BY country;
Congratulations! You found out that there are a total of 122 customers and that three out of four customers from Austria are female.
Categories of movies
Give an overview on the movies available on MovieNow. List the number of movies for different genres and release years
- List the number of movies for different genres and the year of release on all aggregation levels by using the CUBE operator.
SELECT genre,
year_of_release,
COUNT(*)
FROM movies
GROUP BY CUBE (genre, year_of_release)
ORDER BY year_of_release;
Question
Which statement is NOT correct about the result table?
Possible Answers
- From all genres (ignoring the year of release) there are most movies in the category Drama.
- In total there are 71 movies available on MovieNow.
- The year of release with most movies is 2014.
- From 2002 there are 2 dramas available on MovieNow.
Well observed! Only one movie from 2014 is available on MovieNow. The highest number of movies is from 2003 with 8 movies.
Analyzing average ratings
Prepare a table for a report about the national preferences of the customers from MovieNow comparing the average rating of movies across countries and genres.
- Augment the records of movie rentals with information about movies and customers, in this order. Use the first letter of the table names as alias.
-- Augment the records of movie rentals with information about movies and customers
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;
Calculate the average rating for each country.
-- Calculate the average rating for each country
SELECT
c.country,
AVG(r.rating)
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY country;
OUTPUT
country | avg |
null | 8.0000000000000000 |
France | 7.7714285714285714 |
Spain | 7.6415094339622642 |
- Calculate the average rating for all aggregation levels of country and genre.
SELECT
c.country,
m.genre,
AVG(r.rating) AS avg_rating -- Calculate the average rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY CUBE (c.country, m.genre); -- For all aggregation levels of country and genre
Question
- What is the average rating over all records, rounded to two digits?
Possible Answers
- 7.94
- null
- 8.80
- 7.86
That’s correct. The average over all records is 7.94.
4.2. ROLLUP






Number of customers
You have to give an overview of the number of customers for a presentation.
- Generate a table with the total number of customers, the number of customers for each country, and the number of female and male customers for each country.
- Order the result by country and gender
-- Count the total number of customers, the number of customers for each country, and the number of female and male customers for each country
SELECT country,
gender,
COUNT(*)
FROM customers
GROUP BY ROLLUP(country, gender)
ORDER BY country, gender; -- Order the result by country and gender
OUTPUT
country | gender | count |
Austria | female | 3 |
Austria | male | 1 |
Austria | null | 4 |
Good job! You aggregated the data on different levels and found out that there are 3 female customers and 1 male customer from Austria, so in total 4 customers from Austria.
Analyzing preferences of genres across countries
You are asked to study the preferences of genres across countries. Are there particular genres which are more popular in specific countries? Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre.
- Augment the renting records with information about movies and customers.
-- Join the tables
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;
Calculate the average ratings and the number of ratings for each country and each genre. Include the columns country and genre in the SELECT clause.
SELECT
c.country, -- Select country
m.genre, -- Select genre
AVG(r.rating), -- Average ratings
COUNT(*) -- Count number of movie rentals
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY (c.country, m.genre) -- Aggregate for each country and each genre
ORDER BY c.country, m.genre;
output
country | genre | avg | count |
Austria | Comedy | 8.0000000000000000 | 1 |
Austria | Drama | 6.0000000000000000 | 2 |
Austria | Mystery & Suspense | null | 1 |
- Finally, calculate the average ratings and the number of ratings for each country and genre, as well as an aggregation over all genres for each country and the overall average and total number.
-- Group by each county and genre with OLAP extension
SELECT
c.country,
m.genre,
AVG(r.rating) AS avg_rating,
COUNT(*) AS num_rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY ROLLUP (c.country, m.genre)
ORDER BY c.country, m.genre;
output
country | genre | avg_rating | num_rating |
Austria | Comedy | 8.0000000000000000 | 1 |
Austria | Drama | 6.0000000000000000 | 2 |
Austria | Mystery & Suspense | null | 1 |
Very well done! You gathered all the information of interest by using a ROLLUP statement.
4.3. GROUPING SETS


Queries with GROUPING SETS
What question CANNOT be answered by the following query?
SELECT
r.customer_id,
m.genre,
AVG(r.rating),
COUNT(*)
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
GROUP BY GROUPING SETS ((r.customer_id, m.genre), (r.customer_id), ());
Possible Answers
- How many movies were watched by each customer?
- What is the average rating for each genre?
- What is the average rating of customer 75 for movies of the Comedy genre?
- What is the overall average rating for all movies from all customers?
Congratulations! You found the question that cannot be answered by this query. GROUPING SETS does not include (genre) (i.e. does not include aggregation for each genre).
Exploring nationality and gender of actors
For each movie in the database, the three most important actors are identified and listed in the table actors. This table includes the nationality and gender of the actors. We are interested in how much diversity there is in the nationalities of the actors and how many actors and actresses are in the list.
- Count the number of actors in the table actors from each country, the number of male and female actors and the total number of actors.
SELECT
nationality, -- Select nationality of the actors
gender, -- Select gender of the actors
COUNT(*) -- Count the number of actors
FROM actors
GROUP BY GROUPING SETS ((nationality), (gender), ()); -- Use the correct GROUPING SETS operation
OUTPUT
nationality | gender | count |
null | null | 145 |
Somalia | null | 1 |
null | null | 2 |
Well done! You used GROUPING SETS to learn that 91 out of 145 actors are from the USA and that there are 90 male and 55 female actors.
Exploring rating by country and gender
Now you will investigate the average rating of customers aggregated by country and gender.
- Select the columns country, gender, and rating and use the correct join to combine the table renting with customer.
SELECT
c.country, -- Select country, gender and rating
c.gender,
r.rating
FROM renting AS r
LEFT JOIN customers AS c -- Use the correct join
ON r.customer_id = c.customer_id;
output
country | gender | rating |
Great Britan | female | null |
Belgium | male | 10 |
Spain | male | 4 |
- Use GROUP BY to calculate the average rating over country and gender. Order the table by country and gender.
SELECT
c.country,
c.gender,
AVG(r.rating) -- Calculate average rating
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY c.country, c.gender -- Order and group by country and gender
ORDER BY c.country, c.gender;
output
country | gender | avg |
Austria | female | 7.0000000000000000 |
Austria | male | 6.0000000000000000 |
Belgium | female | 9.1250000000000000 |
- Now, use GROUPING SETS to get the same result, i.e. the average rating over country and gender.
SELECT
c.country,
c.gender,
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY GROUPING SETS ((country, gender)); -- Group by country and gender with GROUPING SETS
output
country | gender | avg |
Austria | male | 6.0000000000000000 |
France | female | 8.0000000000000000 |
Hungary | female | 7.2857142857142857 |
- Report all information that is included in a pivot table for country and gender in one SQL table.
SELECT
c.country,
c.gender,
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
-- Report all info from a Pivot table for country and gender
GROUP BY GROUPING SETS ((country, gender), (country),(gender),());
output
country | gender | avg |
null | null | 7.9390243902439024 |
null | null | 8.0000000000000000 |
Austria | female | 7.0000000000000000 |
Excellent! You found out that the overall average of ratings is 7.9.
4.4. Bringing it all together

Customer preference for genres
You just saw that customers have no clear preference for more recent movies over older ones. Now the management considers investing money in movies of the best rated genres.
- Augment the records of movie rentals with information about movies. Use the first letter of the table as alias.
SELECT *
FROM renting AS r
LEFT JOIN movies AS m -- Augment the table with information about movies
ON r.movie_id = m.movie_id;
Select records of movies with at least 4 ratings, starting from 2018-04-01.
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( -- Select records of movies with at least 4 ratings
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >=4)
AND r.date_renting >= '2018-04-01'; -- Select records of movie rentals since 2018-04-01
For each genre, calculate the average rating (use the alias avg_rating), the number of ratings (use the alias n_rating), the number of movie rentals (use the alias n_rentals), and the number of distinct movies (use the alias n_movies).
SELECT m.genre, -- For each genre, calculate:
AVG(r.rating) AS avg_rating, -- The average rating and use the alias avg_rating
COUNT(r.rating) AS n_rating, -- The number of ratings and use the alias n_rating
COUNT(*) AS n_rentals, -- The number of movie rentals and use the alias n_rentals
COUNT(DISTINCT r.movie_id) AS n_movies -- The number of distinct movies and use the alias n_movies
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 3)
AND r.date_renting >= '2018-01-01'
GROUP BY m.genre;
output
genre | avg_rating | n_rating | n_rentals | n_movies |
Action & Adventure | 8.7142857142857143 | 7 | 9 | 2 |
Animation | 7.8333333333333333 | 6 | 10 | 2 |
Art House & International | 8.5000000000000000 | 4 | 5 | 1 |
- Order the table by decreasing average rating.
SELECT genre,
AVG(rating) AS avg_rating,
COUNT(rating) AS n_rating,
COUNT(*) AS n_rentals,
COUNT(DISTINCT m.movie_id) AS n_movies
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 3 )
AND r.date_renting >= '2018-01-01'
GROUP BY genre
ORDER BY AVG(rating) DESC; -- Order the table by decreasing average rating
OUTPUT
genre | avg_rating | n_rating | n_rentals | n_movies |
Action & Adventure | 8.7142857142857143 | 7 | 9 | 2 |
Art House & International | 8.5000000000000000 | 4 | 5 | 1 |
Other | 8.4285714285714286 | 7 | 16 | 2 |
Well done! You found out that Action & Adventure has the highest rating.
Customer preference for actors
The last aspect you have to analyze are customer preferences for certain actors.
- Join the tables.
-- Join the tables
SELECT *
FROM renting AS r
LEFT JOIN actsin AS ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id;
For each combination of the actors’ nationality and gender, calculate the average rating, the number of ratings, the number of movie rentals, and the number of actors.
SELECT a.nationality,
a.gender,
AVG(r.rating) AS avg_rating, -- The average rating
COUNT(r.rating) AS n_rating, -- The number of ratings
COUNT(*) AS n_rentals, -- The number of movie rentals
COUNT(DISTINCT a.actor_id) AS n_actors -- The number of actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >=4 )
AND r.date_renting >= '2018-04-01'
GROUP BY GROUPING SETS ((a.nationality,a.gender)); -- Report results for each combination of the actors' nationality and gender
output
nationality | gender | avg_rating | n_rating | n_rentals | n_actors |
Argentina | male | 8.5000000000000000 | 4 | 5 | 1 |
Australia | female | 8.6666666666666667 | 3 | 5 | 1 |
Australia | male | 7.4545454545454545 | 11 | 17 | 3 |
- Provide results for all aggregation levels represented in a pivot table.
SELECT a.nationality,
a.gender,
AVG(r.rating) AS avg_rating,
COUNT(r.rating) AS n_rating,
COUNT(*) AS n_rentals,
COUNT(DISTINCT a.actor_id) AS n_actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
GROUP BY CUBE (a.nationality,a.gender ); -- Provide results for all aggregation levels represented in a pivot table
OUTPUT
nationality | gender | avg_rating | n_rating | n_rentals | n_actors |
Argentina | male | 8.5000000000000000 | 4 | 5 | 1 |
Argentina | null | 8.5000000000000000 | 4 | 5 | 1 |
Australia | female | 8.6666666666666667 | 3 | 5 | 1 |
Great job! You sent your report on customer preferences to management. Now they can take an informed, data-driven decision based on your detailed analysis. Congratulations!