Data-Driven Decision Making in SQL Answer Key – Datacamp 2023

Jack16306
65 Min Read

Data-Driven Decision Making in SQL Answer Key – Datacamp. Learn how to analyze a SQL table and report insights to management.

Contents
1. Introduction to business intelligence for a online movie rental database1.1. Introduction to data driven decision makingExploring the databaseExploring the table renting1.2. Filtering and orderingWorking with datesSelecting moviesSelect from renting1.3. Aggregations – summarizing dataSummarizing customer informationRatings of movie 25Examining annual rentals2. Decision Making with simple SQL queries2.1. Grouping moviesFirst account for each country.Average movie ratingsAverage rating per customer2.2. Joining movie ratings with customer dataJoin renting and customersAggregating revenue, rentals and active customersMovies and actors2.3. Money spent per customer with sub-queriesIncome from moviesAge of actors from the USA2.4. Identify favorite actors of customer groupsIdentify favorite movies for a group of customersIdentify favorite actors for SpainKPIs per country3. Data Driven Decision Making with advanced SQL queries3.1. Nested queryOften rented moviesFrequent customersMovies with rating above average3.2. Correlated nested queriesAnalyzing customer behaviorCustomers who gave low ratingsMovies and ratings with correlated queries3.3. Queries with EXISTSCustomers with at least one ratingActors in comedies3.4. Queries with UNION and INTERSECTYoung actors not coming from the USADramas with high ratings4. Data Driven Decision Making with OLAP SQL queries4.1. OLAP: CUBE operatorGroups of customersCategories of moviesAnalyzing average ratings4.2. ROLLUPNumber of customersAnalyzing preferences of genres across countries4.3. GROUPING SETSQueries with GROUPING SETSExploring nationality and gender of actorsExploring rating by country and gender4.4. Bringing it all togetherCustomer preference for genresCustomer preference for actors

In this course, you will learn how to use SQL to support decision making. It is based on a case study about an online movie rental company with a database about customer information, movie ratings, background information on actors and more. You will learn to apply SQL queries to study for example customer preferences, customer engagement, and sales development. This course also covers SQL extensions for online analytical processing (OLAP), which makes it easier to obtain key insights from multidimensional aggregated data.

Course Url: https://www.datacamp.com/courses/data-driven-decision-making-in-sql

1. Introduction to business intelligence for a online movie rental database

The first chapter is an introduction to the use case of an online movie rental company, called MovieNow and focuses on using simple SQL queries to extract and aggregated data from its database.

1.1. Introduction to data driven decision making

Exploring the database

Explore the tables and its columns. Which of the following quantities can’t be computed?

Possible Answers

  • The number of customers from each country.
  • The number of movies with an international award.
  • The average rating of a movie.
  • The number of movies with the actor Daniel Radcliffe.

Correct answer! There is no information about movie awards in the data.

Exploring the table renting

The table renting includes all records of movie rentals. Each record has a unique ID renting_id. It also contains information about customers (customer_id) and which movies they watched (movie_id). Furthermore, customers can give a rating after watching the movie, and the day the movie was rented is recorded.

  • Select all columns from renting.
SELECT *  -- Select all
From renting;        -- From table renting

Now select only those columns from renting which are needed to calculate the average rating per movie.

SELECT movie_id,  -- Select all columns needed to compute the average rating per movie
       rating
FROM renting;

output

movie_id rating
8 null
29 10
45 4

Question

In SQL missing values are coded with null. In which column of renting did you notice null values?

  • renting_id
  • customer_id
  • movie_id
  • rating

Well observed! The column rating in the renting table has a lot of null values since not every customer gives a rating after watching a movie.

1.2. Filtering and ordering

Working with dates

For the analysis of monthly or annual changes, it is important to select data from specific time periods. You will select records from the table renting of movie rentals. The format of dates is ‘YYYY-MM-DD’.

  • Select all movies rented on October 9th, 2018.
SELECT *
FROM renting
WHERE date_renting ='2018-10-09'; -- Movies rented on October 9th, 2018

OUTPUT

renting_id customer_id movie_id rating date_renting
1 41 8 null 2018-10-09
6 50 71 7 2018-10-09
  • Select all records of movie rentals between beginning of April 2018 till end of August 2018.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-04-01' AND '2018-08-31'; -- from beginning April 2018 to end August 2018

OUTPUT

renting_id customer_id movie_id rating date_renting
3 108 45 4 2018-06-08
8 73 65 10 2018-06-05
12 52 65 10 2018-06-29
  • Put the most recent records of movie rentals on top of the resulting table and order them in decreasing order.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-04-01' AND '2018-08-31'
ORDER BY rating DESC; -- Order by recency in decreasing order

OUTPUT

renting_id customer_id movie_id rating date_renting
3 108 45 4 2018-06-08
8 73 65 10 2018-06-05
12 52 65 10 2018-06-29

Well done! You know now how to select data from specific time periods and how to order them by date.

Selecting movies

The table movies contains all movies available on the online platform.

  • Select all movies which are not dramas.
SELECT *
FROM movies
WHERE genre <> 'Drama'; -- All genres except drama

output

movie_id title genre runtime year_of_release renting_price
1 One Night at McCool’s Comedy 93 2001 2.09
3 What Women Want Comedy 127 2001 2.59
5 The Fellowship of the Ring Science Fiction & Fantasy 178 2001 2.59
  • Select the movies ‘Showtime’, ‘Love Actually’ and ‘The Fighter’.
SELECT *
FROM movies
WHERE title IN ('Showtime', 'Love Actually', 'The Fighter'); -- Select all movies with the given titles

OUTPUT

movie_id title genre runtime year_of_release renting_price
11 Showtime Comedy 95 2002 1.79
20 Love Actually Comedy 135 2003 2.29
53 The Fighter Other 116 2010 2.49
  • Order the movies by increasing renting price.
SELECT *
FROM movies
ORDER BY renting_price DESC ; -- Order the movies by increasing renting price
movie_id title genre runtime year_of_release renting_price
51 Morning Glory Comedy 107 2010 2.99
9 11’09”01 September 11 Drama 134 2002 2.99
64 Cloud Atlas Drama 172 2012 2.89
52 Fair Game Drama 108 2010 2.89

Congratulations! You selected subsets from the table movies and ordered its records.

Select from renting

Only some users give a rating after watching a movie. Sometimes it is interesting to explore only those movie rentals where a rating was provided.

  • Select from table renting all movie rentals from 2018.
  • Filter only those records which have a movie rating.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-01-01' AND '2018-12-31' -- Renting in 2018
AND rating IS NOT NULL; -- Rating exists

OUTPUT

renting_id customer_id movie_id rating date_renting
3 108 45 4 2018-06-08
4 39 66 8 2018-10-22
6 50 71 7 2018-10-09

Well done! You selected all records from renting which are not null for 2018. Now you are ready for the next step!

1.3. Aggregations – summarizing data

Summarizing customer information

In most business decisions customers are analyzed in groups, such as customers per country or customers per age group

  • Count the number of customers born in the 80s.
SELECT COUNT(customer_id) -- Count the total number of customers
FROM customers
WHERE date_of_birth BETWEEN '1980-01-01' AND '1989-12-31'; -- Select customers born between 1980-01-01 and 1989-12-31

OUTPUT

count
33
  • Count the number of customers from Germany.
SELECT COUNT(customer_id)   -- Count the total number of customers
FROM customers
WHERE country = 'Germany'; -- Select all customers from Germany

output

count
0
  • Count the number of countries where MovieNow has customers.
SELECT COUNT(distinct country)   -- Count the number of countries
FROM customers;

OUTPUT

count
11

Well done! You aggregated data from the customers table and figured out that MovieNow has customers in 11 countries and that there are no customers from Germany.

Ratings of movie 25

The movie ratings give us insight into the preferences of our customers. Report summary statistics, such as the minimum, maximum, average, and count, of ratings for the movie with ID 25.

  • Select all movie rentals of the movie with movie_id 25 from the table renting.
  • For those records, calculate the minimum, maximum and average rating and count the number of ratings for this movie.
SELECT MIN(rating) AS min_rating, -- Calculate the minimum rating and use alias min_rating
       MAX(rating) AS max_rating, -- Calculate the maximum rating and use alias max_rating
       AVG(rating) AS avg_rating, -- Calculate the average rating and use alias avg_rating
       COUNT(rating) AS number_ratings -- Count the number of ratings and use alias number_ratings
FROM renting
WHERE movie_id = 25; -- Select all records of the movie with ID 25

OUTPUT

min_rating max_rating avg_rating number_ratings
5 10 7.5000000000000000 8

Congratulations! You summarized the ratings for the movie 25 and used meaningful column names. This movie has 8 ratings with minimum 5, maximum 10 and average 7.5 rating.

Examining annual rentals

You are asked to provide a report about the development of the company. Specifically, your manager is interested in the total number of movie rentals, the total number of ratings and the average rating of all movies since the beginning of 2019.

  • First, select all records of movie rentals since January 1st 2019.
SELECT * -- Select all records of movie rentals since January 1st 2019
FROM renting
WHERE date_renting >= '2019-01-01'; 

OUTPUT

renting_id customer_id movie_id rating date_renting
5 104 15 7 2019-03-18
17 22 46 10 2019-02-16
18 36 39 10 2019-03-20
  • Now, count the number of movie rentals and calculate the average rating since the beginning of 2019.
SELECT 
    Count(renting_id), -- Count the total number of rented movies
    AVG(rating) -- Add the average rating
FROM renting
WHERE date_renting >= '2019-01-01';

OUTPUT

count avg
159 7.9462365591397849
  • Use as alias column names number_renting and average_rating respectively.
SELECT 
    COUNT(*) AS number_renting, -- Give it the column name number_renting
    AVG(rating) AS average_rating  -- Give it the column name average_rating
FROM renting
WHERE date_renting >= '2019-01-01';

OUTPUT

number_renting average_rating
159 7.9462365591397849
  • Finally, count how many ratings exist since 2019-01-01.
SELECT 
    COUNT(*) AS number_renting,
    AVG(rating) AS average_rating, 
    COUNT(rating) AS number_ratings -- Add the total number of ratings here.
FROM renting
WHERE date_renting >= '2019-01-01';

output

number_renting average_rating number_ratings
159 7.9462365591397849 93

Congratulations! You managed to extract all KPIs from the renting records by working with aggregations, NULL values and dates.

2. Decision Making with simple SQL queries

More complex queries with GROUP BY, LEFT JOIN and sub-queries are used to gain insight into customer preferences.

2.1. Grouping movies

First account for each country.

Conduct an analysis to see when the first customer accounts were created for each country.

  • Create a table with a row for each country and columns for the country name and the date when the first customer account was created.
  • Use the alias first_account for the column with the dates.
  • Order by date in ascending order.
SELECT country, -- For each country report the earliest date when an account was created
    MIN(date_account_start) AS first_account
FROM customers
GROUP BY country
ORDER BY first_account;

OUTPUT

country first_account
France 2017-01-13
Hungary 2017-01-18
Belgium 2017-01-28

Well done! You learned that the first customer account was created in France.

Average movie ratings

For each movie the average rating, the number of ratings and the number of views has to be reported. Generate a table with meaningful column names.

  • Group the data in the table renting by movie_id and report the ID and the average rating.
SELECT movie_id, 
       AVG(rating)    -- Calculate average rating per movie
FROM renting
GROUP BY movie_id;

output

See also  Intermediate SQL Answer Key – Datacamp
movie_id avg
54 8.1666666666666667
29 8.0000000000000000
71 8.0000000000000000
  • Add two columns for the number of ratings and the number of movie rentals to the results table.
  • Use alias names avg_rating, number_rating and number_renting for the corresponding columns.
SELECT movie_id, 
       AVG(rating) AS avg_rating, -- Use as alias avg_rating
       COUNT(rating) AS number_rating,                -- Add column for number of ratings with alias number_rating
       COUNT(renting_id) AS number_renting            -- Add column for number of movie rentals with alias number_renting
FROM renting
GROUP BY movie_id;

output

movie_id avg_rating number_rating number_renting
54 8.1666666666666667 6 9
29 8.0000000000000000 7 11
71 8.0000000000000000 4 6
  • Order the rows of the table by the average rating such that it is in decreasing order.
  • Observe what happens to NULL values.
SELECT movie_id, 
       AVG(rating) AS avg_rating,
       COUNT(rating) AS number_ratings,
       COUNT(*) AS number_renting
FROM renting
GROUP BY movie_id
ORDER BY avg_rating DESC; -- Order by average rating in decreasing order

OUTPUT

movie_id avg_rating number_ratings number_renting
11 null 0 6
48 10.0000000000000000 2 5
5 9.7500000000000000 4 8

Question

Which statement is true for the movie with average rating null?

Possible Answers

  • The number of ratings is 6.
  • The number of movie rentals is zero.
  • The average is null because one of the ratings of the movie is null.
  • The average is null because all of the ratings of the movie are null.

Correct answer! The average is null only if all values are null.

Average rating per customer

Similar to what you just did, you will now look at the average movie ratings, this time for customers. So you will obtain a table with the average rating given by each customer. Further, you will include the number of ratings and the number of movie rentals per customer. You will report these summary statistics only for customers with more than 7 movie rentals and order them in ascending order by the average rating.

  • Group the data in the table renting by customer_id and report the customer_id, the average rating, the number of ratings and the number of movie rentals.
  • Select only customers with more than 7 movie rentals.
  • Order the resulting table by the average rating in ascending order.
SELECT customer_id, -- Report the customer_id
      AVG(rating),  -- Report the average rating per customer
      COUNT(rating),  -- Report the number of ratings per customer
      COUNT(*)  -- Report the number of movie rentals per customer
FROM renting
GROUP BY customer_id
HAVING COUNT(*) > 7 -- Select only customers with more than 7 movie rentals
ORDER BY AVG(rating); -- Order by the average rating in ascending order

OUTPUT

customer_id avg count count
104 6.2500000000000000 4 8
28 6.7142857142857143 7 11
111 7.0000000000000000 3 10

Congratulations! You found out that customer number 104 gave the lowest average ratings for 4 movies.

2.2. Joining movie ratings with customer data

Join renting and customers

For many analyses it is necessary to add customer information to the data in the table renting.

  • Augment the table renting with all columns from the table customers with a LEFT JOIN.
  • Use as alias’ for the tables r and c respectively.
SELECT * -- Join renting with customers
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id ;

Select only records from customers coming from Belgium.

SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
WHERE c.country = 'Belgium'; -- Select only records from customers coming from Belgium

Average ratings of customers from Belgium.

SELECT AVG(r.rating) -- Average ratings of customers from Belgium
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
WHERE c.country='Belgium';

output

avg
8.9000000000000000

Congratulations! You calculated the average rating for customers from Belgium, which is 8.9.

Aggregating revenue, rentals and active customers

The management of MovieNow wants to report key performance indicators (KPIs) for the performance of the company in 2018. They are interested in measuring the financial successes as well as user engagement. Important KPIs are, therefore, the profit coming from movie rentals, the number of movie rentals and the number of active customers.

  • First, you need to join movies on renting to include the renting_price from the movies table for each renting record.
  • Use as alias’ for the tables m and r respectively.
SELECT *
FROM renting AS r
LEFT JOIN movies AS m -- Choose the correct join statment
ON r.movie_id = m.movie_id;

Calculate the revenue coming from movie rentals, the number of movie rentals and the number of customers who rented a movie.

SELECT 
    SUM(m.renting_price), -- Get the revenue from movie rentals
    COUNT(*), -- Count the number of rentals
    COUNT(DISTINCT r.customer_id) -- Count the number of customers
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;

Now, you can report these values for the year 2018. Calculate the revenue in 2018, the number of movie rentals and the number of active customers in 2018. An active customer is a customer who rented at least one movie in 2018.

SELECT 
    SUM(m.renting_price), 
    COUNT(*), 
    COUNT(DISTINCT r.customer_id)
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
-- Only look at movie rentals in 2018
WHERE date_renting BETWEEN '2018-01-01' AND '2018-12-31' ;

OUTPUT

sum count count
658.02 298 93

Congratulations! You’ve calculated a turnover of 658.02 and found the number of rentals to be 298 and the number of active users to be 93 in 2018.

Movies and actors

You are asked to give an overview of which actors play in which movie

  • Create a list of actor names and movie titles in which they act. Make sure that each combination of actor and movie appears only once.
  • Use as an alias for the table actsin the two letters ai.
SELECT m.title, -- Create a list of movie titles and actor names
       a.name
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
LEFT JOIN actors AS a
ON a.actor_id = ai.actor_id;

output

title name
Candy Abbie Cornish
Jack and Jill Adam Sandler
Simone Al Pacino

Congratulations! You joined three tables and created a list of movie titles and actors in this movies.

2.3. Money spent per customer with sub-queries

Income from movies

How much income did each movie generate? To answer this question subsequent SELECT statements can be used.

SELECT m.title, -- Use a join to get the movie title and price for each movie rental
       m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;

output

title renting_price
Waking Up in Reno 2.59
Two for the Money 2.79
Burn After Reading 2.39
  • Report the total income for each movie.
  • Order the result by decreasing income.
SELECT rm.title, -- Report the income from movie rentals for each movie 
       SUM(rm.renting_price) AS income_movie
FROM
       (SELECT m.title, 
               m.renting_price
       FROM renting AS r
       LEFT JOIN movies AS m
       ON r.movie_id=m.movie_id) AS rm
GROUP BY rm.title
ORDER BY income_movie DESC; -- Order the result by decreasing income

output

title income_movie
Bridget Jones – The Edge of Reason 37.57
Fair Game 34.68
The Kingdom 31.35

Question

Which statement about the movie ‘Django Unchained’ is NOT correct?

Possible Answers

  • The income from this movie is 29.59.
  • The income from ‘Django Unchained’ is lower than from ‘The Kingdom’.
  • The income from ‘Django Unchained’ is higher than from ‘Simone’.
  • ‘Django Unchained’ has the 5th highest income.

You chose the right anwer! This statement is NOT correct. ‘Django Unchained’ and ‘Simone’ generated the same income.

Age of actors from the USA

Now you will explore the age of American actors and actresses. Report the date of birth of the oldest and youngest US actor and actress.

  • Create a subsequent SELECT statements in the FROM clause to get all information about actors from the USA.
  • Give the subsequent SELECT statement the alias a.
  • Report for actors from the USA the year of birth of the oldest and the year of birth of the youngest actor and actress.
SELECT a.gender, -- Report for male and female actors from the USA 
       Min(a.year_of_birth), -- The year of birth of the oldest actor
       Max(a.year_of_birth) -- The year of birth of the youngest actor
FROM
   (SELECT *
   FROM actors
   WHERE nationality = 'USA') AS a
GROUP BY a.gender;

output

gender min max
female 1945 1993
male 1930 1992

Congratulations! You learned that in the MovieNow actors’ record the oldest actor was born in 1930 and the oldest actress in 1945.

2.4. Identify favorite actors of customer groups

Identify favorite movies for a group of customers

Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.

  • Augment the table renting with customer information and information about the movies.
  • For each join use the first letter of the table name as alias.
SELECT *
FROM renting AS r
LEFT JOIN customers AS c  -- Add customer information
ON r.customer_id = c.customer_id
LEFT JOIN movies AS m   -- Add movie information
ON r.movie_id = m.movie_id;

Select only those records of customers born in the 70s.

SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'; -- Select customers born in the 70s

For each movie, report the number of times it was rented, as well as the average rating. Limit your results to customers born in the 1970s.

SELECT m.title, 
COUNT(r.customer_id), -- Report number of views per movie
AVG(r.rating) -- Report the average rating per movie
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title;
  • Remove those movies from the table with only one rental.
  • Order the result table such that movies with highest rating come first.
SELECT m.title, 
COUNT(*),
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title
HAVING COUNT(*) > 1 -- Remove movies with only one rental
ORDER BY AVG(r.rating); -- Order with highest rating first

OUTPUT

title count avg
The Recruit 2 5.0000000000000000
Something’s Gotta Give 3 6.0000000000000000
Then She Found Me 4 6.0000000000000000

Congratulations! You just wrote a query with LEFT JOIN, WHERE, GROUP BY, HAVING and ORDER BY. And you found out that ‘The Fellowship of the Ring’ is a movie with one of the best ratings among people born in the 70s.

Identify favorite actors for Spain

You’re now going to explore actor popularity in Spain. Use as alias the first letter of the table, except for the table actsin use ai instead.

Augment the table renting with information about customers and actors.

SELECT *
FROM renting as r 
LEFT JOIN customers as c  -- Augment table renting with information about customers 
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai  -- Augment the table renting with the table actsin
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a  -- Augment table renting with information about actors
ON ai.actor_id = a.actor_id;
  • Report the number of movie rentals and the average rating for each actor, separately for male and female customers.
  • Report only actors with more than 5 movie rentals.
SELECT a.name,  c.gender,
       COUNT(*) AS number_views, 
       AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id


GROUP BY a.name, c.gender -- For each actor, separately for male and female customers
HAVING AVG(r.rating) IS NOT NULL 
  AND COUNT(*) > 5 -- Report only actors with more than 5 movie rentals
ORDER BY avg_rating DESC, number_views DESC;

Now, report the favorite actors only for customers from Spain.

SELECT a.name,  c.gender,
       COUNT(*) AS number_views, 
       AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
WHERE c.country = 'Spain' -- Select only customers from Spain
GROUP BY a.name, c.gender
HAVING AVG(r.rating) IS NOT NULL 
  AND COUNT(*) > 5 
ORDER BY avg_rating DESC, number_views DESC;

output

See also  Exploratory Data Analysis in SQL Answer Key – Datacamp 2023
name gender number_views avg_rating
Catherine Keener female 6 8.0000000000000000
Emma Watson male 7 7.6000000000000000
Daniel Radcliffe male 7 7.6000000000000000

Well done! You found out that in this table Catherine Keener is the favorite actress among female Spain customers and that male customers from Spain like the actors from Harry Potter best: Emma Watson, Daniel Radcliffe and Rupert Grint.

KPIs per country

In chapter 1 you were asked to provide a report about the development of the company. This time you have to prepare a similar report with KPIs for each country separately. Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.

  • Augment the table renting with information about customers and movies.
  • Use as alias the first latter of the table name.
  • Select only records about rentals since beginning of 2019.
SELECT *
FROM renting AS r -- Augment the table renting with information about customers
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN movies AS m-- Augment the table renting with information about movies
ON r.movie_id = m.movie_id
WHERE r.date_renting >= '2019-01-01'; -- Select only records about rentals since the beginning of 2019
  • Calculate the number of movie rentals.
  • Calculate the average rating.
  • Calculate the revenue from movie rentals.
  • Report these KPIs for each country.
SELECT 
    c.country,                      -- For each country report
    COUNT(*) AS number_renting,   -- The number of movie rentals
    AVG(r.rating) AS average_rating,  -- The average rating
    SUM(m.renting_price) AS revenue  -- The revenue from movie rentals
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'
GROUP BY c.country;

output

country number_renting average_rating revenue
null 1 10.0000000000000000 1.79
Spain 26 8.0769230769230769 57.94
Great Britan 9 7.2000000000000000 17.91

Well done! You calculated a total revenue of 57.94 for Spain, with 26 movie rentals and an average rating of 8.1.

3. Data Driven Decision Making with advanced SQL queries

The concept of nested queries and correlated nested queries is introduced and the functions EXISTS and UNION are used to categorize customers, movies, actors, and more.

3.1. Nested query

Often rented movies

Your manager wants you to make a list of movies excluding those which are hardly ever watched. This list of movies will be used for advertising. List all movies with more than 5 views using a nested query which is a powerful tool to implement selection conditions.

  • Select all movie IDs which have more than 5 views.
SELECT movie_id -- Select movie IDs with more than 5 views
FROM renting
GROUP BY movie_id
HAVING COUNT(movie_id) > 5

output

movie_id
54
29
71
  • Select all information about movies with more than 5 views.
SELECT *
FROM movies
GROUP BY movie_id
HAVING movie_id IN  -- Select movie IDs from the inner query
    (SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(*) > 5)

Congratulations! You used a nested query to select all movies with more than 5 views and found out that ‘One Night at McCool’s’ is one of them.

Frequent customers

Report a list of customers who frequently rent movies on MovieNow.

  • List all customer information for customers who rented more than 10 movies.
SELECT *
FROM customers
GROUP BY customer_id
HAVING customer_id IN          -- Select all customers with more than 10 movie rentals
    (SELECT customer_id
    FROM renting
    GROUP BY customer_id
    HAVING COUNT(*) > 10);

Well done! You identified Avelaine Corbeil from France as one of the customers who rented more than 10 movies.

Movies with rating above average

For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.

  • Calculate the average over all ratings.
SELECT AVG( rating) -- Calculate the total average rating
FROM renting

output

avg
7.9390243902439024
  • Select movie IDs and calculate the average rating of movies with rating above average.
SELECT movie_id, -- Select movie IDs and calculate the average rating 
       AVG(rating)
FROM renting
GROUP BY movie_id
HAVING AVG(rating) >           -- Of movies with rating above average
    (SELECT AVG(rating)
    FROM renting);

output

movie_id avg
54 8.1666666666666667
29 8.0000000000000000
71 8.0000000000000000
  • The advertising team only wants a list of movie titles. Report the movie titles of all movies with average rating higher than the total average
SELECT title -- Report the movie titles of all movies with average rating higher than the total average
FROM movies
WHERE movie_id IN
    (SELECT movie_id
     FROM renting
     GROUP BY movie_id
     HAVING AVG(rating) > 
        (SELECT AVG(rating)
         FROM renting));

OUTPUT

title
What Women Want
The Fellowship of the Ring
Harry Potter and the Philosopher’s Stone

Congratulations! You selected all movies with an average rating higher than the total average and found out that the movie ‘What Women Want’ is one of them.

3.2. Correlated nested queries

Analyzing customer behavior

A new advertising campaign is going to focus on customers who rented fewer than 5 movies. Use a correlated query to extract all customer information for the customers of interest.

  • First, count number of movie rentals for customer with customer_id=45. Give the table renting the alias r.
-- Count movie rentals of customer 45
SELECT COUNT(*)
FROM renting AS r
WHERE customer_id=45;

OUTPUT

count
5
  • Now select all columns from the customer table where the number of movie rentals is smaller than 5.
-- Select customers with less than 5 movie rentals
SELECT *
FROM customers as c
WHERE 5 > 
    (SELECT count(*)
    FROM renting as r
    WHERE r.customer_id = c.customer_id);

Congratulations! You managed to use a correlated query to identify the customers of interest.

Customers who gave low ratings

Identify customers who were not satisfied with movies they watched on MovieNow. Report a list of customers with minimum rating smaller than 4.

  • Calculate the minimum rating of customer with ID 7.
-- Calculate the minimum rating of customer with ID 7
SELECT MIN(rating)
FROM renting
WHERE customer_ID=7;

OUTPUT

min
8
  • Select all customers with a minimum rating smaller than 4. Use the first letter of the table as an alias.
SELECT *
FROM customers AS c
WHERE 4 > -- Select all customers with a minimum rating smaller than 4 
    (SELECT MIN(rating)
    FROM renting AS r
    WHERE r.customer_id = c.customer_id);

output

customer_id name country gender date_of_birth date_account_start
28 Sidney Généreux France male 1980-12-01 2017-02-04
41 Zara Mitchell Great Britan female 1994-07-08 2017-06-12
86 Albin Jaworski Poland male 1984-05-01 2017-12-15

Well done! You found out that Sidney Généreux, Zara Mitchell, Albin Jaworski, and Robin J. Himes rated a movie with less than 4.

Movies and ratings with correlated queries

Report a list of movies that received the most attention on the movie platform, (i.e. report all movies with more than 5 ratings and all movies with an average rating higher than 8).

  • Select all movies with more than 5 ratings. Use the first letter of the table as an alias.
SELECT *
FROM movies AS m
WHERE 5 < -- Select all movies with more than 5 ratings
    (SELECT COUNT(rating)
    FROM renting AS r
    WHERE m.movie_id = r.movie_id);

output

movie_id title genre runtime year_of_release renting_price
4 Training Day Drama 122 2001 1.79
10 Simone Drama 117 2002 2.69
12 The Two Towers Science Fiction & Fantasy 179 2002 2.39
  • Select all movies with an average rating higher than 8.
SELECT *
FROM movies AS m
WHERE 8 < -- Select all movies with an average rating higher than 8
    (SELECT AVG(rating)
    FROM renting AS r
    WHERE r.movie_id = m.movie_id);

output

movie_id title genre runtime year_of_release renting_price
3 What Women Want Comedy 127 2001 2.59
5 The Fellowship of the Ring Science Fiction & Fantasy 178 2001 2.59
6 Harry Potter and the Philosopher’s Stone Science Fiction & Fantasy 152 2001 2.69

Well done! You use correlated queries to figure out that the comedy ‘What women want’ has an average rating higher than 8. Note, that you didn’t need to use a GROUP BY clause to answer this request.

3.3. Queries with EXISTS

Customers with at least one rating

Having active customers is a key performance indicator for MovieNow. Make a list of customers who gave at least one rating.

  • Select all records of movie rentals from customer with ID 115.
-- Select all records of movie rentals from customer with ID 115
SELECT *
FROM renting
WHERE customer_id = 115;

output

renting_id customer_id movie_id rating date_renting
245 115 69 null 2019-04-24
395 115 11 null 2019-04-07
498 115 42 null 2019-03-16
  • Select all records of movie rentals from the customer with ID 115 and exclude records with null ratings.
SELECT *
FROM renting
WHERE rating IS NOT NULL -- Exclude those with null ratings
AND customer_id = 115;

Select all records of movie rentals from the customer with ID 1, excluding null ratings.

SELECT *
FROM renting
WHERE rating IS NOT NULL -- Exclude null ratings
AND customer_id = 1; -- Select all ratings from customer with ID 1

OUTPUT

renting_id customer_id movie_id rating date_renting
421 1 71 10 2019-01-21
520 1 39 6 2018-12-29
  • Select all customers with at least one rating. Use the first letter of the table as an alias.
SELECT *
FROM customers AS c-- Select all customers with at least one rating
WHERE EXISTS
    (SELECT *
    FROM renting AS r
    WHERE rating IS NOT NULL 
    AND r.customer_id = c.customer_id);

output

customer_id name country gender date_of_birth date_account_start
2 Wolfgang Ackermann Austria male 1971-11-17 2018-10-15
4 Julia Jung Austria female 1991-01-04 2017-11-22
7 Annelous Sneep Belgium female 1993-11-14 2018-05-12

Congratulations! You found out that customer 115 gave no ratings and customer 1 gave two ratings. Then you selected all customers where the table of ratings is not empty.

Actors in comedies

In order to analyze the diversity of actors in comedies, first, report a list of actors who play in comedies and then, the number of actors for each nationality playing in comedies.

  • Select the records of all actors who play in a Comedy. Use the first letter of the table as an alias.
SELECT *  -- Select the records of all actors who play in a Comedy
FROM actsin AS ai
LEFT JOIN movies AS m
ON ai.movie_id = m.movie_id
WHERE m.genre = 'Comedy';

Make a table of the records of actors who play in a Comedy and select only the actor with ID 1.

SELECT *
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
WHERE m.genre = 'Comedy'
AND ai.actor_id = 1; -- Select only the actor with ID 1

Create a list of all actors who play in a Comedy. Use the first letter of the table as an alias.

SELECT *
FROM actors AS a
WHERE EXISTS
    (SELECT *
     FROM actsin AS ai
     LEFT JOIN movies AS m
     ON m.movie_id = ai.movie_id
     WHERE m.genre = 'Comedy'
     AND ai.actor_id = a.actor_id);

output

actor_id name year_of_birth nationality gender
2 Adam Sandler 1966 USA male
3 Al Pacino 1940 USA male
6 Anjelica Huston 1951 USA female
  • Report the nationality and the number of actors for each nationality.
SELECT a.nationality, COUNT(*) -- Report the nationality and the number of actors for each nationality
FROM actors AS a
WHERE EXISTS
    (SELECT ai.actor_id
     FROM actsin AS ai
     LEFT JOIN movies AS m
     ON m.movie_id = ai.movie_id
     WHERE m.genre = 'Comedy'
     AND ai.actor_id = a.actor_id)
GROUP BY a.nationality;

OUTPUT

nationality count
Northern Ireland 1
USA 22
South Africa 1

Congratulations! You found out that there is one actor each coming from South Africa, Canada and Northen Ireland, three actors from Great Britain, and 22 from the USA who played in a Comedy.

See also  Introduction to SQL Answer Key – Datacamp

3.4. Queries with UNION and INTERSECT

SAME COLUMNS ARE SELECTED IN BOTH TABLES

Young actors not coming from the USA

As you’ve just seen, the operators UNION and INTERSECT are powerful tools when you work with two or more tables. Identify actors who are not from the USA and actors who were born after 1990.

  • Report the name, nationality and the year of birth of all actors who are not from the USA.
SELECT name,  -- Report the name, nationality and the year of birth
       nationality, 
       year_of_birth
FROM actors
WHERE nationality <> 'USA'; -- Of all actors who are not from the USA

Report the name, nationality and the year of birth of all actors who were born after 1990.

SELECT name, 
       nationality, 
       year_of_birth
FROM actors
WHERE year_of_birth > 1990; -- Born after 1990

Select all actors who are not from the USA and all actors who are born after 1990.

SELECT name, 
       nationality, 
       year_of_birth
FROM actors
WHERE nationality <> 'USA'
UNION -- Select all actors who are not from the USA and all actors who are born after 1990
SELECT name, 
       nationality, 
       year_of_birth
FROM actors
WHERE year_of_birth > 1990;

OUTPUT

name nationality year_of_birth
Geoffrey Rush Australia 1951
Barry Pepper Canada 1970
Colin Farrell Ireland 1976

Select all actors who are not from the USA and who are also born after 1990.

SELECT name, 
       nationality, 
       year_of_birth
FROM actors
WHERE nationality <> 'USA'
INTERSECT -- Select all actors who are not from the USA and who are also born after 1990
SELECT name, 
       nationality, 
       year_of_birth
FROM actors
WHERE year_of_birth > 1990;

OUTPUT

name nationality year_of_birth
Freddie Highmore British 1992

Congratulations! You identified Freddie Highmore as the only actor who is not from the USA and born after 1990.

Dramas with high ratings

The advertising team has a new focus. They want to draw the attention of the customers to dramas. Make a list of all movies that are in the drama genre and have an average rating higher than 9.

  • Select the IDs of all dramas.
SELECT movie_id  -- Select the IDs of all dramas
FROM movies
WHERE genre = 'Drama';

Select the IDs of all movies with average rating higher than 9.

SELECT movie_id -- Select the IDs of all movies with average rating higher than 9
FROM renting
GROUP BY movie_id
HAVING AVG(rating)>9;

Select the IDs of all dramas with average rating higher than 9.

SELECT movie_id
FROM movies
WHERE genre = 'Drama'
INTERSECT  -- Select the IDs of all dramas with average rating higher than 9
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating)>9;

Select all movies of in the drama genre with an average rating higher than 9.

SELECT *
FROM movies
WHERE movie_id IN -- Select all movies of genre drama with average rating higher than 9
   (SELECT movie_id
    FROM movies
    WHERE genre = 'Drama'
    INTERSECT
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING AVG(rating)>9);

OUTPUT

movie_id title genre runtime year_of_release renting_price
42 No Country for Old Men Drama 122 2007 1.49

Well done! You reported to the advertisment team that ‘No Country for Old Men’ is the only drama with average rating higher than 9.

4. Data Driven Decision Making with OLAP SQL queries

The OLAP extensions in SQL are introduced and applied to aggregated data on multiple levels. These extensions are the CUBE, ROLLUP and GROUPING SETS operators.

4.1. OLAP: CUBE operator

Groups of customers

Use the CUBE operator to extract the content of a pivot table from the database. Create a table with the total number of male and female customers from each country.

  • Create a table with the total number of customers, of all female and male customers, of the number of customers for each country and the number of men and women from each country.
SELECT gender, -- Extract information of a pivot table of gender and country for the number of customers
       country,
       COUNT(*)
FROM customers
GROUP BY CUBE (gender, country)
ORDER BY country;

Congratulations! You found out that there are a total of 122 customers and that three out of four customers from Austria are female.

Categories of movies

Give an overview on the movies available on MovieNow. List the number of movies for different genres and release years

  • List the number of movies for different genres and the year of release on all aggregation levels by using the CUBE operator.
SELECT genre,
       year_of_release,
       COUNT(*)
FROM movies
GROUP BY CUBE (genre, year_of_release)
ORDER BY year_of_release;

Question

Which statement is NOT correct about the result table?

Possible Answers

  • From all genres (ignoring the year of release) there are most movies in the category Drama.
  • In total there are 71 movies available on MovieNow.
  • The year of release with most movies is 2014.
  • From 2002 there are 2 dramas available on MovieNow.

Well observed! Only one movie from 2014 is available on MovieNow. The highest number of movies is from 2003 with 8 movies.

Analyzing average ratings

Prepare a table for a report about the national preferences of the customers from MovieNow comparing the average rating of movies across countries and genres.

  • Augment the records of movie rentals with information about movies and customers, in this order. Use the first letter of the table names as alias.
-- Augment the records of movie rentals with information about movies and customers
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;

Calculate the average rating for each country.

-- Calculate the average rating for each country
SELECT 
    c.country,
    AVG(r.rating)
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY country;

OUTPUT

country avg
null 8.0000000000000000
France 7.7714285714285714
Spain 7.6415094339622642
  • Calculate the average rating for all aggregation levels of country and genre.
SELECT 
    c.country, 
    m.genre, 
    AVG(r.rating) AS avg_rating -- Calculate the average rating 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY CUBE (c.country, m.genre); -- For all aggregation levels of country and genre

Question

  • What is the average rating over all records, rounded to two digits?

Possible Answers

  • 7.94
  • null
  • 8.80
  • 7.86

That’s correct. The average over all records is 7.94.

4.2. ROLLUP

rollup

>

Number of customers

You have to give an overview of the number of customers for a presentation.

  • Generate a table with the total number of customers, the number of customers for each country, and the number of female and male customers for each country.
  • Order the result by country and gender
-- Count the total number of customers, the number of customers for each country, and the number of female and male customers for each country
SELECT country,
       gender,
       COUNT(*)
FROM customers
GROUP BY ROLLUP(country, gender)
ORDER BY country, gender; -- Order the result by country and gender

OUTPUT

country gender count
Austria female 3
Austria male 1
Austria null 4

Good job! You aggregated the data on different levels and found out that there are 3 female customers and 1 male customer from Austria, so in total 4 customers from Austria.

Analyzing preferences of genres across countries

You are asked to study the preferences of genres across countries. Are there particular genres which are more popular in specific countries? Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre.

  • Augment the renting records with information about movies and customers.
-- Join the tables
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;

Calculate the average ratings and the number of ratings for each country and each genre. Include the columns country and genre in the SELECT clause.

SELECT 
    c.country, -- Select country
    m.genre, -- Select genre
    AVG(r.rating), -- Average ratings
    COUNT(*)  -- Count number of movie rentals
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY (c.country, m.genre) -- Aggregate for each country and each genre
ORDER BY c.country, m.genre;

output

country genre avg count
Austria Comedy 8.0000000000000000 1
Austria Drama 6.0000000000000000 2
Austria Mystery & Suspense null 1
  • Finally, calculate the average ratings and the number of ratings for each country and genre, as well as an aggregation over all genres for each country and the overall average and total number.
-- Group by each county and genre with OLAP extension
SELECT 
    c.country, 
    m.genre, 
    AVG(r.rating) AS avg_rating, 
    COUNT(*) AS num_rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY ROLLUP (c.country, m.genre)
ORDER BY c.country, m.genre;

output

country genre avg_rating num_rating
Austria Comedy 8.0000000000000000 1
Austria Drama 6.0000000000000000 2
Austria Mystery & Suspense null 1

Very well done! You gathered all the information of interest by using a ROLLUP statement.

4.3. GROUPING SETS

Queries with GROUPING SETS

What question CANNOT be answered by the following query?

SELECT 

  r.customer_id, 

  m.genre, 

  AVG(r.rating), 

  COUNT(*)

FROM renting AS r

LEFT JOIN movies AS m

ON r.movie_id = m.movie_id

GROUP BY GROUPING SETS ((r.customer_id, m.genre), (r.customer_id), ());

Possible Answers

  • How many movies were watched by each customer?
  • What is the average rating for each genre?
  • What is the average rating of customer 75 for movies of the Comedy genre?
  • What is the overall average rating for all movies from all customers?

Congratulations! You found the question that cannot be answered by this query. GROUPING SETS does not include (genre) (i.e. does not include aggregation for each genre).

Exploring nationality and gender of actors

For each movie in the database, the three most important actors are identified and listed in the table actors. This table includes the nationality and gender of the actors. We are interested in how much diversity there is in the nationalities of the actors and how many actors and actresses are in the list.

  • Count the number of actors in the table actors from each country, the number of male and female actors and the total number of actors.
SELECT 
    nationality, -- Select nationality of the actors
    gender, -- Select gender of the actors
    COUNT(*) -- Count the number of actors
FROM actors
GROUP BY GROUPING SETS ((nationality), (gender), ()); -- Use the correct GROUPING SETS operation

OUTPUT

nationality gender count
null null 145
Somalia null 1
null null 2

Well done! You used GROUPING SETS to learn that 91 out of 145 actors are from the USA and that there are 90 male and 55 female actors.

Exploring rating by country and gender

Now you will investigate the average rating of customers aggregated by country and gender.

  • Select the columns country, gender, and rating and use the correct join to combine the table renting with customer.
SELECT 
    c.country, -- Select country, gender and rating
    c.gender,
    r.rating
FROM renting AS r
LEFT JOIN customers AS c -- Use the correct join
ON r.customer_id = c.customer_id;

output

country gender rating
Great Britan female null
Belgium male 10
Spain male 4
  • Use GROUP BY to calculate the average rating over country and gender. Order the table by country and gender.
SELECT 
    c.country, 
    c.gender,
    AVG(r.rating) -- Calculate average rating
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY c.country, c.gender -- Order and group by country and gender
ORDER BY c.country, c.gender;

output

country gender avg
Austria female 7.0000000000000000
Austria male 6.0000000000000000
Belgium female 9.1250000000000000
  • Now, use GROUPING SETS to get the same result, i.e. the average rating over country and gender.
SELECT 
    c.country, 
    c.gender,
    AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY GROUPING SETS ((country, gender)); -- Group by country and gender with GROUPING SETS

output

country gender avg
Austria male 6.0000000000000000
France female 8.0000000000000000
Hungary female 7.2857142857142857
  • Report all information that is included in a pivot table for country and gender in one SQL table.
SELECT 
    c.country, 
    c.gender,
    AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
-- Report all info from a Pivot table for country and gender
GROUP BY GROUPING SETS ((country, gender), (country),(gender),());

output

country gender avg
null null 7.9390243902439024
null null 8.0000000000000000
Austria female 7.0000000000000000

Excellent! You found out that the overall average of ratings is 7.9.

4.4. Bringing it all together

Customer preference for genres

You just saw that customers have no clear preference for more recent movies over older ones. Now the management considers investing money in movies of the best rated genres.

  • Augment the records of movie rentals with information about movies. Use the first letter of the table as alias.
SELECT *
FROM renting AS r
LEFT JOIN movies AS m -- Augment the table with information about movies
ON r.movie_id = m.movie_id;

Select records of movies with at least 4 ratings, starting from 2018-04-01.

SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( -- Select records of movies with at least 4 ratings
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >=4)
AND r.date_renting >= '2018-04-01'; -- Select records of movie rentals since 2018-04-01

For each genre, calculate the average rating (use the alias avg_rating), the number of ratings (use the alias n_rating), the number of movie rentals (use the alias n_rentals), and the number of distinct movies (use the alias n_movies).

SELECT m.genre, -- For each genre, calculate:
       AVG(r.rating) AS avg_rating, -- The average rating and use the alias avg_rating
       COUNT(r.rating) AS n_rating, -- The number of ratings and use the alias n_rating
       COUNT(*) AS n_rentals,     -- The number of movie rentals and use the alias n_rentals
       COUNT(DISTINCT r.movie_id) AS n_movies -- The number of distinct movies and use the alias n_movies
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 3)
AND r.date_renting >= '2018-01-01'
GROUP BY m.genre;

output

genre avg_rating n_rating n_rentals n_movies
Action & Adventure 8.7142857142857143 7 9 2
Animation 7.8333333333333333 6 10 2
Art House & International 8.5000000000000000 4 5 1
  • Order the table by decreasing average rating.
SELECT genre,
       AVG(rating) AS avg_rating,
       COUNT(rating) AS n_rating,
       COUNT(*) AS n_rentals,     
       COUNT(DISTINCT m.movie_id) AS n_movies 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 3 )
AND r.date_renting >= '2018-01-01'
GROUP BY genre
ORDER BY AVG(rating) DESC; -- Order the table by decreasing average rating

OUTPUT

genre avg_rating n_rating n_rentals n_movies
Action & Adventure 8.7142857142857143 7 9 2
Art House & International 8.5000000000000000 4 5 1
Other 8.4285714285714286 7 16 2

Well done! You found out that Action & Adventure has the highest rating.

Customer preference for actors

The last aspect you have to analyze are customer preferences for certain actors.

  • Join the tables.
-- Join the tables
SELECT *
FROM renting AS r
LEFT JOIN actsin AS ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id;

For each combination of the actors’ nationality and gender, calculate the average rating, the number of ratings, the number of movie rentals, and the number of actors.

SELECT a.nationality,
       a.gender,
       AVG(r.rating) AS avg_rating, -- The average rating
       COUNT(r.rating) AS n_rating, -- The number of ratings
       COUNT(*) AS n_rentals, -- The number of movie rentals
       COUNT(DISTINCT a.actor_id) AS n_actors -- The number of actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >=4 )
AND r.date_renting >= '2018-04-01'
GROUP BY GROUPING SETS ((a.nationality,a.gender)); -- Report results for each combination of the actors' nationality and gender

output

nationality gender avg_rating n_rating n_rentals n_actors
Argentina male 8.5000000000000000 4 5 1
Australia female 8.6666666666666667 3 5 1
Australia male 7.4545454545454545 11 17 3
  • Provide results for all aggregation levels represented in a pivot table.
SELECT a.nationality,
       a.gender,
       AVG(r.rating) AS avg_rating,
       COUNT(r.rating) AS n_rating,
       COUNT(*) AS n_rentals,
       COUNT(DISTINCT a.actor_id) AS n_actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
GROUP BY CUBE (a.nationality,a.gender ); -- Provide results for all aggregation levels represented in a pivot table

OUTPUT

nationality gender avg_rating n_rating n_rentals n_actors
Argentina male 8.5000000000000000 4 5 1
Argentina null 8.5000000000000000 4 5 1
Australia female 8.6666666666666667 3 5 1

Great job! You sent your report on customer preferences to management. Now they can take an informed, data-driven decision based on your detailed analysis. Congratulations!

Share this Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *