PostgreSQL Summary Stats and Window Functions Answer Key – Datacamp

Jack16306
40 Min Read

Learn how to create queries for analytics and data engineering with window functions, the SQL secret weapon!

Have you ever wondered how data professionals use SQL to solve real-world business problems, like generating rankings, calculating moving averages and running totals, deduplicating data, or performing time intelligence? If you already know how to select, filter, order, join and group data with SQL, this course is your next step. By the end, you will be writing queries like a pro! You will learn how to create queries for analytics and data engineering with window functions, the SQL secret weapon! Using flights data, you will discover how simple it is to use window functions, and how flexible and efficient they are.

Course Url: https://www.datacamp.com/courses/postgresql-summary-stats-and-window-functions

1. Introduction to window functions

In this chapter, you’ll learn what window functions are, and the two basic window function subclauses, ORDER BY and PARTITION BY.

1.1. Introduction

Window functions vs GROUP BY

Which of the following is FALSE?

  • Unlike GROUP BY results, window functions don’t reduce the number of rows in the output.
  • Window functions can fetch values from other rows into the table, whereas GROUP BY functions cannot.
  • Window functions can open a “window” to another table, whereas GROUP BY functions cannot.
  • Window functions can calculate running totals and moving averages, whereas GROUP BY functions cannot.

Correct! This choice is false because window functions take as input the rows of the table you’re querying.

Numbering rows

The simplest application for window functions is numbering rows. Numbering rows allows you to easily fetch the nth row. For example, it would be very difficult to get the 35th row in any given table if you didn’t have a column with each row’s number.

  • Number each row in the dataset.
SELECT
  *,
  -- Assign numbers to each row
  ROW_NUMBER() OVER() AS Row_N
FROM Summer_Medals
ORDER BY Row_N ASC;

Output

year city sport discipline athlete country gender event medal row_n
1896 Athens Aquatics Swimming HAJOS Alfred HUN Men 100M Freestyle Gold 1
1896 Athens Aquatics Swimming HERSCHMANN Otto AUT Men 100M Freestyle Silver 2
1896 Athens Aquatics Swimming DRIVAS Dimitrios GRE Men 100M Freestyle For Sailors Bronze 3

Good job! You’ve written your first window function.

Numbering Olympic games in ascending order

The Summer Olympics dataset contains the results of the games between 1896 and 2012. The first Summer Olympics were held in 1896, the second in 1900, and so on. What if you want to easily query the table to see in which year the 13th Summer Olympics were held? You’d need to number the rows for that.

  • Assign a number to each year in which Summer Olympic games were held.
SELECT
  Year,

  -- Assign numbers to each year
  ROW_NUMBER() OVER() AS Row_N
FROM (
  SELECT DISTINCT Year
  FROM Summer_Medals
  ORDER BY Year ASC
) AS Years
ORDER BY Year ASC;

Output

year row_n
1896 1
1900 2
1904 3

Great! You’ve seen how row numbering works.

1.2. ORDER BY

Numbering Olympic games in descending order

You’ve already numbered the rows in the Summer Medals dataset. What if you need to reverse the row numbers so that the most recent Olympic games’ rows have a lower number?

  • Assign a number to each year in which Summer Olympic games were held so that rows with the most recent years have lower row numbers
SELECT
  Year,
  -- Assign the lowest numbers to the most recent years
  ROW_NUMBER() OVER (ORDER BY years DESC) AS Row_N
FROM (
  SELECT DISTINCT Year
  FROM Summer_Medals
) AS Years
ORDER BY Year;

Output

year row_n
1896 27
1900 26
1904 25

Great! You’ve written your first ORDER BY subclause and changed the behavior of the ROW_NUMBER() window

Numbering Olympic athletes by medals earned

Row numbering can also be used for ranking. For example, numbering rows and ordering by the count of medals each athlete earned in the OVER clause will assign 1 to the highest-earning medalist, 2 to the second highest-earning medalist, and so on.

  • For each athlete, count the number of medals he or she has earned.
SELECT
  -- Count the number of medals each athlete has earned
  Athlete,
  COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete
ORDER BY Medals DESC;

Output

athlete medals
PHELPS Michael 22
LATYNINA Larisa 18
ANDRIANOV Nikolay 15

Having wrapped the previous query in the Athlete_Medals CTE, rank each athlete by the number of medals they’ve earned.

WITH Athlete_Medals AS (
  SELECT
    -- Count the number of medals each athlete has earned
    Athlete,
    COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Athlete)


SELECT
  -- Number each athlete by how many medals they've earned
  Athlete,
  ROW_NUMBER() OVER (ORDER BY Medals DESC) AS Row_N
FROM Athlete_Medals
ORDER BY Medals DESC;

Output

athlete row_n
PHELPS Michael 1
LATYNINA Larisa 2
ANDRIANOV Nikolay 3

Great! In chapter 2, you’ll learn more about ranking functions.

Reigning weightlifting champions

A reigning champion is a champion who’s won both the previous and current years’ competitions. To determine if a champion is reigning, the previous and current years’ results need to be in the same row, in two different columns.

  • Return each year’s gold medalists in the Men’s 69KG weightlifting competition.
SELECT
  -- Return each year's champions' countries
  year,
  Country AS champion
FROM Summer_Medals
WHERE
  Discipline = 'Weightlifting' AND
  Event = '69KG' AND
  Gender = 'Men' AND
  Medal = 'Gold';

Output

year champion
2000 BUL
2004 CHN
2008 CHN

Having wrapped the previous query in the Weightlifting_Gold CTE, get the previous year’s champion for each year.

WITH Weightlifting_Gold AS (
  SELECT
    -- Return each year's champions' countries
    Year,
    Country AS champion
  FROM Summer_Medals
  WHERE
    Discipline = 'Weightlifting' AND
    Event = '69KG' AND
    Gender = 'Men' AND
    Medal = 'Gold')


SELECT
  Year, Champion,
  -- Fetch the previous year's champion
  LAG(Champion) OVER
    (ORDER BY champion ASC) AS Last_Champion
FROM Weightlifting_Gold
ORDER BY Year ASC;

Output

year champion last_champion
2000 BUL null
2004 CHN BUL
2008 CHN CHN

Well done! You can now compare the two champion columns to see if a champion is reigning.

1.3. PARTITION BY

Reigning champions by gender

You’ve already fetched the previous year’s champion for one event. However, if you have multiple events, genders, or other metrics as columns, you’ll need to split your table into partitions to avoid having a champion from one event or gender appear as the previous champion of another event or gender.

  • Return the previous champions of each year’s event by gender.
WITH Tennis_Gold AS (
  SELECT DISTINCT
    Gender, Year, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Event = 'Javelin Throw' AND
    Medal = 'Gold')


SELECT
  Gender, Year,
  Country AS Champion,
  -- Fetch the previous year's champion by gender
  LAG(Country) OVER (PARTITION BY Gender
            ORDER BY Year ASC) AS Last_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;

Output

gender year champion last_champion
Men 2000 CZE null
Men 2004 NOR CZE
Men 2008 NOR NOR
Men 2012 TTO NOR

Great! Partitioning correctly split the champions by gender, so that data on champions of one gender does not get mixed into the other gender’s results.

Reigning champions by gender and event

In the previous exercise, you partitioned by gender to ensure that data about one gender doesn’t get mixed into data about the other gender. If you have multiple columns, however, partitioning by only one of them will still mix the results of the other columns.

  • Return the previous champions of each year’s events by gender and event
WITH Athletics_Gold AS (
  SELECT DISTINCT
    Gender, Year, Event, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Discipline = 'Athletics' AND
    Event IN ('100M', '10000M') AND
    Medal = 'Gold')


SELECT
  Gender, Year, Event,
  Country AS Champion,
  -- Fetch the previous year's champion by gender and event
  LAG(Country) OVER (PARTITION BY gender, event
            ORDER BY Year ASC) AS Last_Champion
FROM Athletics_Gold
ORDER BY Event ASC, Gender ASC, Year ASC;

Output

gender year event champion last_champion
Men 2000 10000M ETH null
Men 2004 10000M ETH ETH
Men 2008 10000M ETH ETH
Men 2012 10000M GBR ETH

Good job! You can partition by more than one column in case your groups are spread over several columns.

Row numbers with partitioning

If you run ROW_NUMBER() OVER (PARTITION BY Year ORDER BY Medals DESC) on the following table, what row number would the 2008 Iranian record have?

| Year | Country | Medals |

|——|———|——–|

| 2004 | IRN     | 32     |

| 2004 | LBN     | 17     |

| 2004 | KSA     | 4      |

| 2008 | IRQ     | 29     |

| 2008 | IRN     | 27     |

| 2008 | UAE     | 12     |

Possible Answers

  • 5
  • 1
  • 2

Well done! Since the table is partitioned by year, Iran is the second row in that partition.

2. Fetching, ranking, and paging

In this chapter, you’ll learn three practical applications of window functions: fetching values from different parts of the table, ranking rows according to their values, and binning rows into different tables.

2.1. Fetching

Future gold medalists

Fetching functions allow you to get values from different parts of the table into one row. If you have time-ordered data, you can “peek into the future” with the LEAD fetching function. This is especially useful if you want to compare a current value to a future value.

  • For each year, fetch the current gold medalist and the gold medalist 3 competitions ahead of the current row.
WITH Discus_Medalists AS (
  SELECT DISTINCT
    Year,
    Athlete
  FROM Summer_Medals
  WHERE Medal = 'Gold'
    AND Event = 'Discus Throw'
    AND Gender = 'Women'
    AND Year >= 2000)


SELECT
  -- For each year, fetch the current and future medalists
  Year,
  Athlete,
  LEAD(Athlete, 3) OVER (ORDER BY Year ASC) AS Future_Champion
FROM Discus_Medalists
ORDER BY Year ASC;

Output

year athlete future_champion
2000 ZVEREVA Ellina PERKOVIC Sandra
2004 SADOVA Natalya null
2008 BROWN TRAFTON Stephanie null
2012 PERKOVIC Sandra null

Good job! You fetched future competitions’ results with LEAD().

First athlete by name

It’s often useful to get the first or last value in a dataset to compare all other values to it. With absolute fetching functions like FIRST_VALUE, you can fetch a value at an absolute position in the table, like its beginning or end.

  • Return all athletes and the first athlete ordered by alphabetical order.
WITH All_Male_Medalists AS (
  SELECT DISTINCT
    Athlete
  FROM Summer_Medals
  WHERE Medal = 'Gold'
    AND Gender = 'Men')


SELECT
  -- Fetch all athletes and the first athlete alphabetically
  athlete,
  FIRST_value(athlete) OVER (
    ORDER BY athlete ASC
  ) AS First_Athlete
FROM All_Male_Medalists;

Output

athlete first_athlete
AABYE Edgar AABYE Edgar
AALTONEN Paavo Johannes AABYE Edgar
AAS Thomas Valentin AABYE Edgar
ABALMASAU Aliaksei AABYE Edgar

Great! You can use absolute position fetching functions to fetch values at fixed positions in your table or partition.

Last country by name

Just like you can get the first row’s value in a dataset, you can get the last row’s value. This is often useful when you want to compare the most recent value to previous values.

  • Return the year and the city in which each Olympic games were held.
  • Fetch the last city in which the Olympic games were held.
WITH Hosts AS (
  SELECT DISTINCT Year, City
    FROM Summer_Medals)


SELECT
  Year,
  City,
  -- Get the last city in which the Olympic games were held
  LAST_VALUE(City) OVER (
   ORDER BY Year ASC
   RANGE BETWEEN
     UNBOUNDED PRECEDING AND
     UNBOUNDED FOLLOWING
  ) AS Last_City
FROM Hosts
ORDER BY Year ASC;

Output

year city last_city
1896 Athens London
1900 Paris London
1904 St Louis London
1908 London London

Well done! Now you can get the values of the rows at the beginning and end of your table.

2.2. Ranking

PostgreSQL Summary Stats and Window Functions pic 1
PostgreSQL Summary Stats and Window Functions pic 2
PostgreSQL Summary Stats and Window Functions pic 3
PostgreSQL Summary Stats and Window Functions pic 4

Ranking athletes by medals earned

In chapter 1, you used ROW_NUMBER to rank athletes by awarded medals. However, ROW_NUMBER assigns different numbers to athletes with the same count of awarded medals, so it’s not a useful ranking function; if two athletes earned the same number of medals, they should have the same rank.

  • Rank each athlete by the number of medals they’ve earned — the higher the count, the higher the rank — with identical numbers in case of identical values.
WITH Athlete_Medals AS (
  SELECT
    Athlete,
    COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Athlete)


SELECT
  Athlete,
  Medals,
  -- Rank athletes by the medals they've won
  RANK() OVER (ORDER BY Medals DESC) AS Rank_N
FROM Athlete_Medals
ORDER BY Medals DESC;
athlete medals rank_n
PHELPS Michael 22 1
LATYNINA Larisa 18 2
ANDRIANOV Nikolay 15 3
MANGIAROTTI Edoardo 13 4

Well done! RANK’s output corresponds to the actual Olympics’ ranking system.

Ranking athletes from multiple countries

In the previous exercise, you used RANK to assign rankings to one group of athletes. In real-world data, however, you’ll often find numerous groups within your data. Without partitioning your data, one group’s values will influence the rankings of the others.

Also, while RANK skips numbers in case of identical values, the most natural way to assign rankings is not to skip numbers. If two countries are tied for second place, the country after them is considered to be third by most people.

  • Rank each country’s athletes by the count of medals they’ve earned — the higher the count, the higher the rank — without skipping numbers in case of identical values.
WITH Athlete_Medals AS (
  SELECT
    Country, Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country IN ('JPN', 'KOR')
    AND Year >= 2000
  GROUP BY Country, Athlete
  HAVING COUNT(*) > 1)


SELECT
  Country,
  -- Rank athletes in each country by the medals they've won
  athlete,
  DENSE_RANK() OVER (PARTITION BY Country
                ORDER BY Medals DESC) AS Rank_N
FROM Athlete_Medals
ORDER BY Country ASC, RANK_N ASC;
country athlete rank_n
JPN KITAJIMA Kosuke 1
JPN UCHIMURA Kohei 2
JPN TAKEDA Miho 3
JPN TACHIBANA Miya 3

Good job! DENSE_RANK’s way of ranking is how we’d typically assign ranks in real life.

DENSE_RANK’s output

You have the following table:

| Country | Medals |

|———|——–|

| IRN     | 23     |

| IRQ     | 19     |

| LBN     | 19     |

| SYR     | 19     |

| BHR     | 7      |

| KSA     | 3      |

If you were to use DENSE_RANK to order the Medals column in descending order, what rank would BHR be assigned?

  • 5
  • 3
  • 6

Correct! DENSE_RANK would rank BHR as 3rd.

2.3. Paging

PostgreSQL Summary Stats and Window Functions 2.3

Paging events

There are exactly 666 unique events in the Summer Medals Olympics dataset. If you want to chunk them up to analyze them piece by piece, you’ll need to split the events into groups of approximately equal size.

  • Split the distinct events into exactly 111 groups, ordered by event in alphabetical order.
WITH Events AS (
  SELECT DISTINCT Event
  FROM Summer_Medals)
  
SELECT
  --- Split up the distinct events into 111 unique groups
  event,
  NTILE(111) OVER (ORDER BY event ASC) AS Page
FROM Events
ORDER BY Event ASC;

Output

event page
+ 100KG 1
+ 100KG (Heavyweight) 1
+ 100KG (Super Heavyweight) 1
+ 105KG 1

Good! NTILE() allows you to make the size of the dataset you’re working with more manageable.

Top, middle, and bottom thirds

Splitting your data into thirds or quartiles is often useful to understand how the values in your dataset are spread. Getting summary statistics (averages, sums, standard deviations, etc.) of the top, middle, and bottom thirds can help you determine what distribution your values follow.

  • Split the athletes into top, middle, and bottom thirds based on their count of medals.
WITH Athlete_Medals AS (
  SELECT Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Athlete
  HAVING COUNT(*) > 1)
  
SELECT
  Athlete,
  Medals,
  -- Split athletes into thirds by their earned medals
  NTILE(3) OVER (ORDER BY Medals DESC) AS Third
FROM Athlete_Medals
ORDER BY Medals DESC, Athlete ASC;

Output

athlete medals third
PHELPS Michael 22 1
LATYNINA Larisa 18 1
ANDRIANOV Nikolay 15 1
MANGIAROTTI Edoardo 13 1
  • Return the average of each third.
WITH Athlete_Medals AS (
  SELECT Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Athlete
  HAVING COUNT(*) > 1),
  
  Thirds AS (
  SELECT
    Athlete,
    Medals,
    NTILE(3) OVER (ORDER BY Medals DESC) AS Third
  FROM Athlete_Medals)
  
SELECT
  -- Get the average medals earned in each third
  Third,
  AVG(Medals) AS Avg_Medals
FROM Thirds
GROUP BY Third
ORDER BY Third ASC;

Output

third avg_medals
1 3.7864464692482916
2 2.0000000000000000
3 2.0000000000000000

Great! Using NTILE() and summary statistic functions, you could see the differences in the top, middle, and bottom thirds.

3. Aggregate window functions and frames

In this chapter, you’ll learn how to use aggregate functions you’re familiar with, like `AVG()` and `SUM()`, as window functions, as well as how to define frames to change a window function’s output.

3.1. Aggregate window functions

Running totals of athlete medals

The running total (or cumulative sum) of a column helps you determine what each row’s contribution is to the total sum.

  • Return the athletes, the number of medals they earned, and the medals running total, ordered by the athletes’ names in alphabetical order.
WITH Athlete_Medals AS (
  SELECT
    Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'USA' AND Medal = 'Gold'
    AND Year >= 2000
  GROUP BY Athlete)


SELECT
  -- Calculate the running total of athlete medals
  athlete,
  Medals,
  SUM(Medals) OVER (ORDER BY Athlete ASC) AS Max_Medals
FROM Athlete_Medals
ORDER BY Athlete ASC;

Output

athlete medals max_medals
ABDUR-RAHIM Shareef 1 1
ABERNATHY Brent 1 2
ADRIAN Nathan 3 5
AHRENS Chris 1 6

Great job! You’ve used your first aggregate window function.

Maximum country medals by year

Getting the maximum of a country’s earned medals so far helps you determine whether a country has broken its medals record by comparing the current year’s earned medals and the maximum so far.

  • Return the year, country, medals, and the maximum medals earned so far for each country, ordered by year in ascending order.
WITH Country_Medals AS (
  SELECT
    Year, Country, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country IN ('CHN', 'KOR', 'JPN')
    AND Medal = 'Gold' AND Year >= 2000
  GROUP BY Year, Country)


SELECT
  -- Return the max medals earned so far per country
  Year,
  Country,
  Medals,
  MAX(Medals) OVER (PARTITION BY country
                ORDER BY Year ASC) AS Max_Medals
FROM Country_Medals
ORDER BY Country ASC, Year ASC;

Output

year country medals max_medals
2000 CHN 39 39
2004 CHN 52 52
2008 CHN 74 74
2012 CHN 56 74

Good work! As with other window functions, you can use partioning with aggregate window functions.

Minimum country medals by year

So far, you’ve seen MAX and SUM, aggregate functions normally used with GROUP BY, being used as window functions. You can also use the other aggregate functions, like MIN, as window functions.

  • Return the year, medals earned, and minimum medals earned so far.
WITH France_Medals AS (
  SELECT
    Year, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'FRA'
    AND Medal = 'Gold' AND Year >= 2000
  GROUP BY Year)


SELECT
  Year,
  Medals,
  MIN(Medals) OVER (ORDER BY Year ASC) AS Min_Medals
FROM France_Medals
ORDER BY Year ASC;
year medals min_medals
2000 22 22
2004 21 21
2008 25 21
2012 30 21

Well done! All aggregate functions can be used as window functions.

3.2. Frams

Number of rows in a frame

How many rows does the following frame span?

ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING

  • 5
  • 6
  • 4

Correct! This frame spans the 3 rows preceding the current row, the current row itself, and the two rows after it.

Moving maximum of Scandinavian athletes’ medals

Frames allow you to restrict the rows passed as input to your window function to a sliding window for you to define the start and finish.

Adding a frame to your window function allows you to calculate “moving” metrics, inputs of which slide from row to row.

  • Return the year, medals earned, and the maximum medals earned, comparing only the current year and the next year.
WITH Scandinavian_Medals AS (
  SELECT
    Year, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country IN ('DEN', 'NOR', 'FIN', 'SWE', 'ISL')
    AND Medal = 'Gold'
  GROUP BY Year)


SELECT
  -- Select each year's medals
  year,
  Medals,
  -- Get the max of the current and next years'  medals
  MAX(Medals) OVER (ORDER BY year ASC
             ROWS BETWEEN current row
             AND 1 following) AS Max_Medals
FROM Scandinavian_Medals
ORDER BY Year ASC;

Output

year medals max_medals
1896 1 1
1900 1 77
1908 77 141
1912 141 159

Good job! You’ve defined your first frame and restricted the window function’s input rows.

Moving maximum of Chinese athletes’ medals

Frames allow you to “peek” forwards or backward without first using the relative fetching functions, LAG and LEAD, to fetch previous rows’ values into the current row.

  • Return the athletes, medals earned, and the maximum medals earned, comparing only the last two and current athletes, ordering by athletes’ names in alphabetical order.
WITH Chinese_Medals AS (
  SELECT
    Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'CHN' AND Medal = 'Gold'
    AND Year >= 2000
  GROUP BY Athlete)


SELECT
  -- Select the athletes and the medals they've earned
  athlete,
  medals,
  -- Get the max of the last two and current rows' medals 
  MAX(Medals) OVER (ORDER BY athlete ASC
            ROWS BETWEEN 2 preceding
            AND current row) AS Max_Medals
FROM Chinese_Medals
ORDER BY Athlete ASC;
athlete medals max_medals
CAI Yalin 1 1
CAI Yun 1 1
CAO Lei 1 1
CAO Yuan 1 1

Great! You’ve used all three clauses used to define a frame’s start and end.

3.3. Moving averages and totals

rows almost

Moving average’s frame

If you want your moving average to cover the last 3 and current Olympic games, how would you define its frame?

  • ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  • ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
  • ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

Correct! This covers the current Olympic games, as well as the three preceding sets of games.

Moving average of Russian medals

Using frames with aggregate window functions allow you to calculate many common metrics, including moving averages and totals. These metrics track the change in performance over time.

  • Calculate the 3-year moving average of medals earned.
WITH Russian_Medals AS (
  SELECT
    Year, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'RUS'
    AND Medal = 'Gold'
    AND Year >= 1980
  GROUP BY Year)


SELECT
  Year, Medals,
  --- Calculate the 3-year moving average of medals earned
  AVG(Medals) OVER
    (ORDER BY Year ASC
     ROWS BETWEEN
     2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM Russian_Medals
ORDER BY Year ASC;

Output

year medals medals_ma
1996 36 36.0000000000000000
2000 66 51.0000000000000000
2004 47 49.6666666666666667
2008 43 52.0000000000000000

Great! You’ve used all three clauses used to define a frame’s start and end and calculate a moving average.

Moving total of countries’ medals

What if your data is split into multiple groups spread over one or more columns in the table? Even with a defined frame, if you can’t somehow separate the groups’ data, one group’s values will affect the average of another group’s values.

  • Calculate the 3-year moving sum of medals earned per country.
WITH Country_Medals AS (
  SELECT
    Year, Country, COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Year, Country)


SELECT
  Year, Country, Medals,
  -- Calculate each country's 3-game moving total
  SUM(Medals) OVER
    (PARTITION BY Country
     ORDER BY Year ASC
     ROWS BETWEEN
     2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM Country_Medals
ORDER BY Country ASC, Year ASC;
year country medals medals_ma
2008 AFG 1 1
2012 AFG 1 2
1988 AHO 1 1
1984 ALG 2 2

Good job! You can also use partitioning with frames.

4. Beyond window functions

In this last chapter, you’ll learn some techniques and functions that are useful when used together with window functions.

4.1. Pivoting

A basic pivot

You have the following table of Pole Vault gold medalist countries by gender in 2008 and 2012.

| Gender | Year | Country |

|——–|——|———|

| Men    | 2008 | AUS     |

| Men    | 2012 | FRA     |

| Women  | 2008 | RUS     |

| Women  | 2012 | USA     |

Pivot it by Year to get the following reshaped, cleaner table.

| Gender | 2008 | 2012 |

|——–|——|——|

| Men    | AUS  | FRA  |

| Women  | RUS  | USA  |

  • Create the correct extension.
  • Fill in the column names of the pivoted table.
-- Create the correct extention to enable CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;


SELECT * FROM CROSSTAB($$
  SELECT
    Gender, Year, Country
  FROM Summer_Medals
  WHERE
    Year IN (2008, 2012)
    AND Medal = 'Gold'
    AND Event = 'Pole Vault'
  ORDER By Gender ASC, Year ASC;
-- Fill in the correct column names for the pivoted table
$$) AS ct (Gender VARCHAR,
           Year VARCHAR,
           Country VARCHAR)


ORDER BY Gender ASC;
gender year country
Men AUS FRA
Women RUS USA

Good job! You’ve pivoted a simple table by a column. Try a more complex table in the next exercise.

Pivoting with ranking

You want to produce an easy scannable table of the rankings of the three most populous EU countries by how many gold medals they’ve earned in the 2004 through 2012 Olympic games. The table needs to be in this format:

| Country | 2004 | 2008 | 2012 |

|———|——|——|——|

| FRA     | …  | …  | …  |

| GBR     | …  | …  | …  |

| GER     | …  | …  | …  |

You’ll need to count the gold medals each country has earned, produce the ranks of each country by medals earned, then pivot the table to this shape.

  • Count the gold medals that France (FRA), the UK (GBR), and Germany (GER) have earned per country and year.
-- Count the gold medals per country and year
SELECT
  country,
  year,
  COUNT (*) AS Awards
FROM Summer_Medals
WHERE
  Country IN ('FRA', 'GBR', 'GER')
  AND Year IN (2004, 2008, 2012)
  AND Medal = 'Gold'
GROUP BY country, year
ORDER BY Country ASC, Year ASC

Output

country year awards
FRA 2004 21
FRA 2008 25
FRA 2012 30
GBR 2004 17
  • Select the country and year columns, then rank the three countries by how many gold medals they earned per year.
WITH Country_Awards AS (
  SELECT
    Country,
    Year,
    COUNT(*) AS Awards
  FROM Summer_Medals
  WHERE
    Country IN ('FRA', 'GBR', 'GER')
    AND Year IN (2004, 2008, 2012)
    AND Medal = 'Gold'
  GROUP BY Country, Year)


SELECT
  -- Select Country and Year
  Country,
  Year,
  -- Rank by gold medals earned per year
  RANK() OVER (PARTITION BY Year
     ORDER BY Awards DESC) :: INTEGER AS rank
FROM Country_Awards
ORDER BY Country ASC, Year ASC;

Output

country year rank
FRA 2004 2
FRA 2008 3
FRA 2012 3
GBR 2004 3
  • Pivot the query’s results by Year by filling in the new table’s correct column names.
CREATE EXTENSION IF NOT EXISTS tablefunc;


SELECT * FROM CROSSTAB($$
  WITH Country_Awards AS (
    SELECT
      Country,
      Year,
      COUNT(*) AS Awards
    FROM Summer_Medals
    WHERE
      Country IN ('FRA', 'GBR', 'GER')
      AND Year IN (2004, 2008, 2012)
      AND Medal = 'Gold'
    GROUP BY Country, Year)


  SELECT
    Country,
    Year,
    RANK() OVER
      (PARTITION BY Year
       ORDER BY Awards DESC) :: INTEGER AS rank
  FROM Country_Awards
  ORDER BY Country ASC, Year ASC;
-- Fill in the correct column names for the pivoted table
$$) AS ct (Country VARCHAR,
           "2004" INTEGER,
           "2008" INTEGER,
           "2012" INTEGER)


Order by Country ASC;
country 2004 2008 2012
FRA 2 3 3
GBR 3 2 1
GER 1 1 2

Well done! The pivoted rankings table is very easy to scan.

4.2. ROLLUP and CUBE

Country-level subtotals

You want to look at three Scandinavian countries’ earned gold medals per country and gender in the year 2004. You’re also interested in Country-level subtotals to get the total medals earned for each country, but Gender-level subtotals don’t make much sense in this case, so disregard them.

  • Count the gold medals awarded per country and gender.
  • Generate Country-level gold award counts.
-- Count the gold medals per country and gender
SELECT
  Country,
  Gender,
  COUNT(*) AS Gold_Awards
FROM Summer_Medals
WHERE
  Year = 2004
  AND Medal = 'Gold'
  AND Country IN ('DEN', 'NOR', 'SWE')
-- Generate Country-level subtotals
GROUP BY Country, ROLLUP(Gender)
ORDER BY Country ASC, Gender ASC;
country gender gold_awards
DEN Men 4
DEN Women 15
DEN null 19
NOR Men 3

Great! You`ve correctly used ROLLUP to generate a subgroup’s total.

All group-level subtotals

You want to break down all medals awarded to Russia in the 2012 Olympic games per gender and medal type. Since the medals all belong to one country, Russia, it makes sense to generate all possible subtotals (Gender- and Medal-level subtotals), as well as a grand total.

Generate a breakdown of the medals awarded to Russia per country and medal type, including all group-level subtotals and a grand total.

  • Count the medals awarded per gender and medal type.
  • Generate all possible group-level counts (per gender and medal type subtotals and the grand total).
-- Count the medals per gender and medal type
SELECT
  Gender,
  Medal,
  Count(*) AS Awards
FROM Summer_Medals
WHERE
  Year = 2012
  AND Country = 'RUS'
-- Get all possible group-level subtotals
GROUP BY CUBE(Gender, Medal)
ORDER BY Gender ASC, Medal ASC;

Output

gender medal awards
Men Bronze 34
Men Gold 23
Men Silver 7
Men null 64

Excellent! You now know when to use ROLLUP and when to use CUBE.

4.3. A survey of useful functions

PostgreSQL Summary Stats and Window Functions 4.3
PostgreSQL Summary Stats and Window Functions 4.3 - 2
PostgreSQL Summary Stats and Window Functions 4.3 - 3

Cleaning up results

Returning to the breakdown of Scandinavian awards you previously made, you want to clean up the results by replacing the nulls with meaningful text.

  • Turn the nulls in the Country column to All countries, and the nulls in the Gender column to All genders.
SELECT
  -- Replace the nulls in the columns with meaningful text
  COALESCE(Country, 'All countries') AS Country,
  COALESCE(Gender, 'All genders') AS Gender,
  COUNT(*) AS Awards
FROM Summer_Medals
WHERE
  Year = 2004
  AND Medal = 'Gold'
  AND Country IN ('DEN', 'NOR', 'SWE')
GROUP BY ROLLUP(Country, Gender)
ORDER BY Country ASC, Gender ASC;
country gender awards
All countries All genders 29
DEN All genders 19
DEN Men 4
DEN Women 15

Good job! You’ve made ROLLUP’s output become clean and readable.

Summarizing results

After ranking each country in the 2000 Olympics by gold medals awarded, you want to return the top 3 countries in one row, as a comma-separated string. In other words, turn this:

| Country | Rank |

|———|——|

| USA     | 1    |

| RUS     | 2    |

| AUS     | 3    |

| …     | …  |

into this:

USA, RUS, AUS

  • Rank countries by the medals they’ve been awarded.
WITH Country_Medals AS (
  SELECT
    Country,
    COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE Year = 2000
    AND Medal = 'Gold'
  GROUP BY Country)


  SELECT
    Country,
    -- Rank countries by the medals awarded
    RANK() OVER (ORDER BY Medals DESC) AS Rank
  FROM Country_Medals
  ORDER BY Rank ASC;
country rank
USA 1
RUS 2
AUS 3
CHN 4
  • Return the top 3 countries by medals awarded as one comma-separated string.
WITH Country_Medals AS (
  SELECT
    Country,
    COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE Year = 2000
    AND Medal = 'Gold'
  GROUP BY Country),


  Country_Ranks AS (
  SELECT
    Country,
    RANK() OVER (ORDER BY Medals DESC) AS Rank
  FROM Country_Medals
  ORDER BY Rank ASC)


-- Compress the countries column
SELECT STRING_AGG(Country, ', ')
FROM Country_Ranks
-- Select only the top three ranks
WHERE Rank <= 3;

Output

string_agg
USA, RUS, AUS

Great! You’ve compressed three rows into one, losing no information in the process.

Share this Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *