Functions for Manipulating Data in PostgreSQL Answer Key – Datacamp

Jack16306
65 Min Read

Functions for Manipulating Data in PostgreSQL – Datacamp. Learn the most important PostgreSQL functions for manipulating, processing, and transforming data.

Contents
1.  Overview of Common Data Types1.1. WelcomeText data typesGetting information about your databaseDetermining data types1.2. Date and time data typesProperties of date and time data typesInterval data types1.3. Working with ARRAYsAccessing data in an ARRAYSearching an ARRAY with ANYSearching an ARRAY with @>2. Working with DATE/TIME Functions and Operators2.1. Overview of basic arithmetic operatorsAdding and subtracting date and time valuesINTERVAL arithmeticCalculating the expected return date2.2. Functions for retrieving current date/timeCurrent timestamp functionsWorking with the current date and timeManipulating the current date and time2.3. Extracting and transforming date/ time dataUsing EXTRACTUsing DATE_TRUNCPutting it all together3. Parsing and Manipulating Text3.1. Reformatting string and character dataConcatenating stringsChanging the case of string dataReplacing string data3.2. Parsing string and character dataDetermining the length of stringsTruncating stringsExtracting substrings from text dataCombining functions for string manipulation3.3. Truncating and padding string dataPaddingThe TRIM functionPutting it all together4. Full-text Search and PostgresSQL Extensions4.1. Overview of Common Data TypesA review of the LIKE operatorWhat is a tsvector?Basic full-text search4.2. Extending PostgreSQLUser-defined data typesGetting info about user-defined data typesUser-defined functions in Sakila4.3. Intro to PostgreSQL extensionsEnabling extensionsMeasuring similarity between two stringsLevenshtein distance examplesPutting it all together

This course will provide you an understanding of how to use built-in PostgreSQL functions in your SQL queries to manipulate different types of data including strings, character, numeric and date/time. We’ll travel back to a time where Blockbuster video stores were on every corner and if you wanted to watch a movie, you actually had to leave your house to rent a DVD! You’ll also get an introduction into the robust full-text search capabilities which provides a powerful tool for indexing and matching keywords in a PostgreSQL document. And finally, you’ll learn how to extend these features by using PostgreSQL extensions.

Course URL: https://www.datacamp.com/courses/functions-for-manipulating-data-in-postgresql

1.  Overview of Common Data Types

Learn about the properties and characteristics of common data types including strings, numerics and arrays and how to retrieve information about your database.

1.1. Welcome

Text data types

You learned about some of the common data types that you’ll work within PostgreSQL, some characteristics of these types, and how to determine the data type of a column in an existing table. Think back to the video and answer the following question:

Which of the following is not a valid text data type in PostgreSQL?

  • TEXT
  • STRING
  • CHAR
  • VARCHAR

Correct! STRING is not a valid PostgreSQL data type.

Getting information about your database

As we saw in the video, PostgreSQL has a system database called INFORMATION_SCHEMA that allows us to extract information about objects, including tables, in our database.

In this exercise we will look at how to query the tables table of the INFORMATION_SCHEMA database to discover information about tables in the DVD Rentals database including the name, type, schema, and catalog of all tables and views and then how to use the results to get additional information about columns in our tables.

  • Select all columns from the INFORMATION_SCHEMA.TABLES system database. Limit results that have a public table_schema.
-- Select all columns from the TABLES system database
 SELECT * 
 FROM INFORMATION_SCHEMA.TABLES
 -- Filter by schema
 WHERE table_schema = 'public';

Output

table_catalog table_schema table_name table_type self_referencing_column_name reference_generation user_defined_type_catalog user_defined_type_schema user_defined_type_name is_insertable_into is_typed commit_action
pgdata public actor BASE TABLE null null null null null YES NO null
pgdata public category BASE TABLE null null null null null YES NO null
pgdata public film_actor BASE TABLE null null null null null YES NO null

Select all columns from the INFORMATION_SCHEMA.COLUMNS system database. Limit by table_name to actor

 -- Select all columns from the COLUMNS system database
 SELECT * 
 FROM INFORMATION_SCHEMA.COLUMNS 
 WHERE table_name = 'actor';

Great job! Let’s further explore the INFORMATION_SCHEMA.COLUMNS system database in the next exercise.

Determining data types

The columns table of the INFORMATION_SCHEMA database also allows us to extract information about the data types of columns in a table. We can extract information like the character or string length of a CHAR or VARCHAR column or the precision of a DECIMAL or NUMERIC floating point type.

Using the techniques you learned in the lesson, let’s explore the customer table of our DVD Rental database.

  • Select the column name and data type from the INFORMATION_SCHEMA.COLUMNS system database.

Limit results to only include the customer table.

-- Get the column name and data type
SELECT
    column_name, 
    DATA_TYPE
-- From the system database information schema
FROM INFORMATION_SCHEMA.COLUMNS 
-- For the customer table
WHERE table_name = 'customer';
column_name data_type
active integer
store_id smallint
create_date date

Great job! You now have the tool to determine the data types of existing database columns.

1.2. Date and time data types

Properties of date and time data types

Which of the following is NOT correct?

  • TIMESTAMP data types contain both date and time values.
  • DATE data types use an yyyy-mm-dd format.
  • INTERVAL types are representations of periods of time.
  • TIME data types are stored with a timezone by default.

Great job! TIME data types can be stored with a timezone but they are stored without a timezone by default.

Interval data types

INTERVAL data types provide you with a very useful tool for performing arithmetic on date and time data types. For example, let’s say our rental policy requires a DVD to be returned within 3 days. We can calculate the expected_return_date for a given DVD rental by adding an INTERVAL of 3 days to the rental_date from the rental table. We can then compare this result to the actual return_date to determine if the DVD was returned late.

Let’s try this example in the exercise.

  • Select the rental date and return date from the rental table.
  • Add an INTERVAL of 3 days to the rental_date to calculate the expected return date`.
SELECT
    -- Select the rental and return dates
    rental_date,
    return_date,
    -- Calculate the expected_return_date
    rental_date + INTERVAL '3 days' AS expected_return_date
FROM rental;
rental_date return_date expected_return_date
2005-05-24 22:53:30 2005-05-26 22:04:30 2005-05-27 22:53:30
2005-05-24 22:54:33 2005-05-28 19:40:33 2005-05-27 22:54:33
2005-05-24 23:03:39 2005-06-01 22:12:39 2005-05-27 23:03:39

Great job! Looking at the result, we can determine that a DVD was returned late if a record has an expected_return_date earlier than the actual return_date.

1.3. Working with ARRAYs

Accessing data in an ARRAY

In our DVD Rentals database, the film table contains an ARRAY for special_features which has a type of TEXT[]. Much like any ARRAY data type in PostgreSQL, a TEXT[] array can store an array of TEXT values. This comes in handy when you want to store things like phone numbers or email addresses as we saw in the lesson.

Let’s take a look at the special_features column and also practice accessing data in the ARRAY.Select the title and special features from the film table and compare the results between the two columns.

-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film;

Select all films that have a special feature Trailers by filtering on the first index of the special_features ARRAY.

-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film
-- Use the array index of the special_features column
WHERE special_features[1] = 'Trailers';

Output

title special_features
BEDAZZLED MARRIED Trailers,Deleted Scenes,Behind the Scenes
BEHAVIOR RUNAWAY Trailers,Deleted Scenes,Behind the Scenes
BLUES INSTINCT Trailers,Deleted Scenes,Behind the Scenes
  • Now let’s select all films that have Deleted Scenes in the second index of the special_features ARRAY.
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film
-- Use the array index of the special_features column
WHERE special_features[2] = 'Deleted Scenes';

Great job! Understanding how to access ARRAY data types in PostgreSQL is an important skill in your SQL arsenal.

Searching an ARRAY with ANY

As we saw in the video, PostgreSQL also provides the ability to filter results by searching for values in an ARRAY. The ANY function allows you to search for a value in any index position of an ARRAY. Here’s an example.

WHERE ‘search text’ = ANY(array_name)

When using the ANY function, the value you are filtering on appears on the left side of the equation with the name of the ARRAY column as the parameter in the ANY function.Match ‘Trailers’ in any index of the special_features ARRAY regardless of position.

SELECT
  title, 
  special_features 
FROM film 
-- Modify the query to use the ANY function 
WHERE 'Trailers' = ANY (special_features);

Output

title special_features
BEDAZZLED MARRIED Trailers,Deleted Scenes,Behind the Scenes
BEHAVIOR RUNAWAY Trailers,Deleted Scenes,Behind the Scenes
BLUES INSTINCT Trailers,Deleted Scenes,Behind the Scenes

Awesome! The ANY function is a flexible tool that you will use often when searching an ARRAY data type in PostgreSQL.

Searching an ARRAY with @>

The contains operator @> operator is alternative syntax to the ANY function and matches data in an ARRAY using the following syntax.

WHERE array_name @> ARRAY[‘search text’] :: type[]

So let’s practice using this operator in the exercise.

  • Use the contains operator to match the text Deleted Scenes in the special_features column.
SELECT 
  title, 
  special_features 
FROM film 
-- Filter where special_features contains 'Deleted Scenes'
WHERE special_features @> ARRAY['Deleted Scenes'];

Output

title special_features
BEACH HEARTBREAKERS Deleted Scenes,Behind the Scenes
BEAST HUNCHBACK Deleted Scenes,Behind the Scenes
BEDAZZLED MARRIED Trailers,Deleted Scenes,Behind the Scenes

Great job! Now that you have learned about the properties and characteristics of common PostgreSQL data types, we will begin to dive into built-in functions to manipulate and transform these data types in your queries beginning with date and time functions and operators.

2. Working with DATE/TIME Functions and Operators

Explore how to manipulate and query date and time objects including how to use the current timestamp in your queries, extract subfields from existing date and time fields and what to expect when you perform date and time arithmetic.

2.1. Overview of basic arithmetic operators

Adding and subtracting date and time values

In this exercise, you will calculate the actual number of days rented as well as the true expected_return_date by using the rental_duration column from the film table along with the familiar rental_date from the rental table.

This will require that you dust off the skills you learned from prior courses on how to join two or more tables together. To select columns from both the film and rental tables in a single query, we’ll need to use the inventory table to join these two tables together since there is no explicit relationship between them. Let’s give it a try!

  • Subtract the rental_date from the return_date to calculate the number of days_rented.
SELECT f.title, f.rental_duration,
    -- Calculate the number of days rented
    r.return_date - r.rental_date AS days_rented
FROM film AS f
     INNER JOIN inventory AS i ON f.film_id = i.film_id
     INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;

Output

title rental_duration days_rented
ACE GOLDFINGER 3 6 days, 19:30:00
ACE GOLDFINGER 3 null
ACE GOLDFINGER 3 8 days, 0:08:00
  • Now use the AGE() function to calculate the days_rented.
SELECT f.title, f.rental_duration,
    -- Calculate the number of days rented
    AGE(r.return_date, r.rental_date) AS days_rented
FROM film AS f
    INNER JOIN inventory AS i ON f.film_id = i.film_id
    INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;

Output

See also  PostgreSQL Summary Stats and Window Functions Answer Key – Datacamp
title rental_duration days_rented
ACE GOLDFINGER 3 6 days, 19:30:00
ACE GOLDFINGER 3 null
ACE GOLDFINGER 3 8 days, 0:08:00
ACE GOLDFINGER 3 1 day, 2:09:00

Great job! Notice that there are some records that have a null value for the days_rented calculation. We’ll dig into why this is accurate and what it means in the next exercise.

INTERVAL arithmetic

If you were running a real DVD Rental store, there would be times when you would need to determine what film titles were currently out for rental with customers. In the previous exercise, we saw that some of the records in the results had a NULL value for the return_date. This is because the rental was still outstanding.

Each rental in the film table has an associated rental_duration column which represents the number of days that a DVD can be rented by a customer before it is considered late. In this example, you will exclude films that have a NULL value for the return_date and also convert the rental_duration to an INTERVAL type. Here’s a reminder of one method for performing this conversion.

SELECT INTERVAL ‘1’ day * timestamp ‘2019-04-10 12:34:56’

  • Convert rental_duration by multiplying it with a 1 day INTERVAL
  • Subtract the rental_date from the return_date to calculate the number of days_rented.
  • Exclude rentals with a NULL value for return_date.
SELECT
    f.title,
    -- Convert the rental_duration to an interval
    INTERVAL '1' day * f.rental_duration,
    -- Calculate the days rented as we did previously
    r.return_date - r.rental_date AS days_rented
FROM film AS f
    INNER JOIN inventory AS i ON f.film_id = i.film_id
    INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
-- Filter the query to exclude outstanding rentals
WHERE r.return_date IS NOT NULL
ORDER BY f.title;

Output

title ?column? days_rented
ACE GOLDFINGER 3 days, 0:00:00 3 days, 1:12:00
ACE GOLDFINGER 3 days, 0:00:00 8 days, 0:02:00
ACE GOLDFINGER 3 days, 0:00:00 6 days, 19:30:00
ACE GOLDFINGER 3 days, 0:00:00 6 days, 21:32:00

Great job! Now let’s put it all together to calculate the actual expected_return_date in the final exercise.

Calculating the expected return date

So now that you’ve practiced how to add and subtract timestamps and perform relative calculations using intervals, let’s use those new skills to calculate the actual expected return date of a specific rental. As you’ve seen in previous exercises, the rental_duration is the number of days allowed for a rental before it’s considered late. To calculate the expected_return_date you will want to use the rental_duration and add it to the rental_date.

  • Convert rental_duration by multiplying it with a 1-day INTERVAL.
  • Add it to the rental date.
SELECT
    f.title,
    r.rental_date,
    f.rental_duration,
    -- Add the rental duration to the rental date
    INTERVAL '1' day * f.rental_duration + rental_date AS expected_return_date,
    r.return_date
FROM film AS f
    INNER JOIN inventory AS i ON f.film_id = i.film_id
    INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
title rental_date rental_duration expected_return_date return_date
ACE GOLDFINGER 2005-08-17 09:33:02 3 2005-08-20 09:33:02 2005-08-24 05:03:02
ACE GOLDFINGER 2006-02-14 15:16:03 3 2006-02-17 15:16:03 null
ACE GOLDFINGER 2005-07-28 05:04:47 3 2005-07-31 05:04:47 2005-08-05 05:12:47
ACE GOLDFINGER 2005-07-07 19:46:51 3 2005-07-10 19:46:51 2005-07-08 21:55:51

Great job! We can now compare the expected_return_date to the actual return_date to determine if a rental was returned late. In the next video, we’ll learn about how to use the current date and time values in our queries.

2.2. Functions for retrieving current date/time

Current timestamp functions

Use the console to explore the NOW(), CURRENT_TIMESTAMP, CURRENT_DATE and CURRENT_TIME functions and their outputs to determine which of the following is NOT correct?

  • NOW() returns the current date and time as a timestamp with timezone.
  • CURRENT_TIMESTAMP returns the current timestamp without timezone.
  • CURRENT_DATE returns the current date value without a time value.
  • CURRENT_TIME returns the current time value without a date value.

Great job! CURRENT_TIMESTAMP is analogous with NOW() and returns a timestamp with timezone by default.

Working with the current date and time

Because the Sakila database is a bit dated and most of the date and time values are from 2005 or 2006, you are going to practice using the current date and time in our queries without using Sakila. You’ll get back into working with this database in the next video and throughout the remainder of the course. For now, let’s practice the techniques you learned about so far in this chapter to work with the current date and time.

As you learned in the video, NOW() and CURRENT_TIMESTAMP can be used interchangeably.

  • Use NOW() to select the current timestamp with timezone.
-- Select the current timestamp
SELECT NOW();

Select the current date without any time value.

-- Select the current date
SELECT current_date;

Now, let’s use the CAST() function to eliminate the timezone from the current timestamp

--Select the current timestamp without a timezone
SELECT CAST( NOW() AS timestamp 
  • Finally, let’s select the current date.

Use CAST() to retrieve the same result from the NOW() function

SELECT 
    -- Select the current date
    CURRENT_DATE,
    -- CAST the result of the NOW() function to a date
    CAST( NOW() AS date )

Excellent! Understanding how to retrieve and manipulate the current date and time in your queries is a critical SQL skill that you will use often.

Manipulating the current date and time

Most of the time when you work with the current date and time, you will want to transform, manipulate, or perform operations on the value in your queries. In this exercise, you will practice adding an INTERVAL to the current timestamp as well as perform some more advanced calculations.

Let’s practice retrieving the current timestamp. For this exercise, please use CURRENT_TIMESTAMP instead of the NOW() function and if you need to convert a date or time value to a timestamp data type, please use the PostgreSQL specific casting rather than the CAST() function.

  • Select the current timestamp without timezone and alias it as right_now.
--Select the current timestamp without timezone
SELECT CURRENT_TIMESTAMP::timestamp AS right_now;

Now select a timestamp five days from now and alias it as five_days_from_now.

SELECT
    CURRENT_TIMESTAMP::timestamp AS right_now,
    INTERVAL '5 day' + CURRENT_TIMESTAMP AS five_days_from_now;

Output

right_now five_days_from_now
2023-01-17 11:18:49.631709 2023-01-22 11:18:49.631709+01:00
  • Finally, let’s use a second-level precision with no fractional digits for both the right_now and five_days_from_now fields.
SELECT
    CURRENT_TIMESTAMP(2)::timestamp AS right_now,
    interval '5 days' + CURRENT_TIMESTAMP(2) AS five_days_from_now;

Output

right_now five_days_from_now
2023-01-17 11:19:39.010000 2023-01-22 11:19:39.010000+01:00

Great job! Understanding how to retrieve and manipulate the current date and time in your queries is a critical SQL skill that you will use often.

2.3. Extracting and transforming date/ time data

Using EXTRACT

You can use EXTRACT() and DATE_PART() to easily create new fields in your queries by extracting sub-fields from a source timestamp field.

Now suppose you want to produce a predictive model that will help forecast DVD rental activity by day of the week. You could use the EXTRACT() function with the dow field identifier in our query to create a new field called dayofweek as a sub-field of the rental_date column from the rental table.

You can COUNT() the number of records in the rental table for a given date range and aggregate by the newly created dayofweek column.Get the day of the week from the rental_date column.

SELECT 
  -- Extract day of week from rental_date
  Extract(dow FROM rental_date) AS dayofweek 
FROM rental 
LIMIT 100;

Count the total number of rentals by day of the week

-- Extract day of week from rental_date
SELECT 
  EXTRACT(dow FROM rental_date) AS dayofweek, 
  -- Count the number of rentals
  Count(*) as rentals 
FROM rental 
GROUP BY 1;

Output

dayofweek rentals
0 2320
6 2311
1 2247
2 2463

Excellent work! Using the EXTRACT() function can help determine hidden insights in your data like what days of the week are the busiest for DVD rentals.

Using DATE_TRUNC

The DATE_TRUNC() function will truncate timestamp or interval data types to return a timestamp or interval at a specified precision. The precision values are a subset of the field identifiers that can be used with the EXTRACT() and DATE_PART() functions. DATE_TRUNC() will return an interval or timestamp rather than a number. For example

SELECT DATE_TRUNC(‘month’, TIMESTAMP ‘2005-05-21 15:30:30’);

Result: 2005-05-01 00;00:00

Now, let’s experiment with different precisions and ultimately modify the queries from the previous exercises to aggregate rental activity.

  • Truncate the rental_date field by year.
-- Truncate rental_date by year
SELECT DATE_TRUNC('year', rental_date) AS rental_year
FROM rental;

Output

rental_year
2005-01-01 00:00:00
2005-01-01 00:00:00
2005-01-01 00:00:00

Now modify the previous query to truncate the rental_date by month.

-- Truncate rental_date by month
SELECT DATE_TRUNC('month', rental_date) AS rental_month
FROM rental;

Output

rental_month
2005-05-01 00:00:00
2005-05-01 00:00:00
  • Let’s see what happens when we truncate by day of the month.
-- Truncate rental_date by day of the month 
SELECT DATE_TRUNC('day', rental_date) AS rental_day 
FROM rental;

Output

rental_day
2005-05-24 00:00:00
2005-05-24 00:00:00
  • Finally, count the total number of rentals by rental_day and alias it as rentals.
SELECT 
  DATE_TRUNC('day', rental_date) AS rental_day,
  -- Count total number of rentals 
  Count(*) AS rentalS 
FROM rental
GROUP BY 1;

Output

rental_day rentals
2005-05-28 00:00:00 196
2005-05-25 00:00:00 137
2005-05-29 00:00:00 154

Awesome job! You can now use DATE_TRUNC() to manipulate timestamp data types and create new fields with different levels of precision.

Putting it all together

Many of the techniques you’ve learned in this course will be useful when building queries to extract data for model training. Now let’s use some date/time functions to extract and manipulate some DVD rentals data from our fictional DVD rental store.

In this exercise, you are going to extract a list of customers and their rental history over 90 days. You will be using the EXTRACT(), DATE_TRUNC(), and AGE() functions that you learned about during this chapter along with some general SQL skills from the prerequisites to extract a data set that could be used to determine what day of the week customers are most likely to rent a DVD and the likelihood that they will return the DVD late.

  • Extract the day of the week from the rental_date column using the alias dayofweek.
  • Use an INTERVAL in the WHERE clause to select records for the 90 day period starting on 5/1/2005.
SELECT 
  -- Extract the day of week date part from the rental_date
  EXTRACT(dow FROM rental_date) AS dayofweek,
  AGE(return_date, rental_date) AS rental_days
FROM rental AS r 
WHERE 
  -- Use an INTERVAL for the upper bound of the rental_date 
  rental_date BETWEEN CAST('2005-05-01' AS DATE)
   AND CAST('2005-05-01' AS DATE) + INTERVAL '90 day';

Output

dayofweek rental_days
2 1 day, 23:11:00
2 3 days, 20:46:00
2 7 days, 23:09:00
  • Finally, use a CASE statement and DATE_TRUNC() to create a new column called past_due which will be TRUE if the rental_days is greater than the rental_duration otherwise, it will be FALSE.
SELECT 
  c.first_name || ' ' || c.last_name AS customer_name,
  f.title,
  r.rental_date,
  -- Extract the day of week date part from the rental_date
  EXTRACT(dow FROM r.rental_date) AS dayofweek,
  AGE(r.return_date, r.rental_date) AS rental_days,
  -- Use DATE_TRUNC to get days from the AGE function
  CASE WHEN DATE_TRUNC('day', AGE(r.return_date, r.rental_date)) > 
  -- Calculate number of d
    f.rental_duration * INTERVAL '1' day 
  THEN TRUE 
  ELSE FALSE END AS past_due 
FROM 
  film AS f 
  INNER JOIN inventory AS i 
    ON f.film_id = i.film_id 
  INNER JOIN rental AS r 
    ON i.inventory_id = r.inventory_id 
  INNER JOIN customer AS c 
    ON c.customer_id = r.customer_id 
WHERE 
  -- Use an INTERVAL for the upper bound of the rental_date 
  r.rental_date BETWEEN CAST('2005-05-01' AS DATE) 
  AND CAST('2005-05-01' AS DATE) + INTERVAL '90 day';

Output

See also  Exploratory Data Analysis in SQL Answer Key – Datacamp 2023
customer_name title rental_date dayofweek rental_days past_due
CHARLOTTE HUNTER BLANKET BEVERLY 2005-05-24 22:53:30 2 1 day, 23:11:00 false
TOMMY COLLAZO FREAKY POCUS 2005-05-24 22:54:33 2 3 days, 20:46:00 false
MANUEL MURRELL GRADUATE LORD 2005-05-24 23:03:39 2 7 days, 23:09:00 false

Wow! Awesome work! PostgreSQL date/time functions provide powerful tools for manipulating, cleaning and transforming transactional data.

3. Parsing and Manipulating Text

Learn how to manipulate string and text data by transforming case, parsing and truncating text and extracting substrings from larger strings.

3.1. Reformatting string and character data

porsql

Concatenating strings

In this exercise and the ones that follow, we are going to derive new fields from columns within the customer and film tables of the DVD rental database.

We’ll start with the customer table and create a query to return the customers name and email address formatted such that we could use it as a “To” field in an email script or program. This format will look like the following:

Brian Piccolo <[email protected]>

In the first step of the exercise, use the || operator to do the string concatenation and in the second step, use the CONCAT() functions.

  • Concatenate the first_name and last_name columns separated by a single space followed by email surrounded by < and >.
-- Concatenate the first_name and last_name and email 
SELECT first_name || ' ' || last_name || ' <' || email || '>' AS full_email 
FROM customer

Output

full_email
MARY SMITH <[email protected]>
PATRICIA JOHNSON <[email protected]>

Now use the CONCAT() function to do the same operation as the previous step

-- Concatenate the first_name and last_name and email
SELECT CONCAT(first_name, ' ', last_name,  ' <', email, '>') AS full_email 
FROM customer

Output

full_email
MARY SMITH <[email protected]>
PATRICIA JOHNSON <[email protected]>
LINDA WILLIAMS <[email protected]>

Great job! String concatenation is an important skill that you will use often.

Changing the case of string data

Now you are going to use the film and category tables to create a new field called film_category by concatenating the category name with the film’s title. You will also format the result using functions you learned about in the video to transform the case of the fields you are selecting in the query; for example, the INITCAP() function which converts a string to title case.

  • Convert the film category name to uppercase.
  • Convert the first letter of each word in the film’s title to upper case.
  • Concatenate the converted category name and film title separated by a colon.

Convert the description column to lowercase.

SELECT 
  -- Concatenate the category name to coverted to uppercase
  -- to the film title converted to title case
  Upper(c.name)  || ': ' || INITCAP(f.title) AS film_category, 
  -- Convert the description column to lowercase
  Lower(f.description) AS description
FROM 
  film AS f 
  INNER JOIN film_category AS fc 
    ON f.film_id = fc.film_id 
  INNER JOIN category AS c 
    ON fc.category_id = c.category_id;

Output

film_category description
ACTION: Werewolf Lola a fanciful story of a man and a sumo wrestler who must outrace a student in a monastery
ACTION: Waterfront Deliverance a unbelieveable documentary of a dentist and a technical writer who must build a womanizer in nigeria

Great job! Transforming strings comes in handy when preparing data for analysis.

Replacing string data

Sometimes you will need to make sure that the data you are extracting does not contain any whitespace. There are many different approaches you can take to cleanse and prepare your data for these situations. A common technique is to replace any whitespace with an underscore.

In this example, we are going to practice finding and replacing whitespace characters in the title column of the film table using the REPLACE() function.

  • Replace all whitespace with an underscore.
SELECT 
  -- Replace whitespace in the film title with an underscore
  Replace(title, ' ', '_') AS title
FROM film; 

Output

title
BEACH_HEARTBREAKERS
BEAST_HUNCHBACK

Awesome! The REPLACE() function is a powerful tool that can be used in many different scenarios as you are preparing and manipulating your datasets.

3.2. Parsing string and character data

Determining the length of strings

Determining the number of characters in a string is something that you will use frequently when working with data in a SQL database. Many situations will require you to find the length of a string stored in your database. For example, you may need to limit the number of characters that are displayed in an application or you may need to ensure that a column in your dataset contains values that are all the same length. In this example, we are going to determine the length of the description column in the film table of the DVD Rental database.

  • Select the title and description columns from the film table.
  • Find the number of characters in the description column with the alias desc_len.
SELECT 
  -- Select the title and description columns
  title,
  description,
  -- Determine the length of the description column
  char_length(description) AS desc_len
FROM film;

Output

title description desc_len
BEACH HEARTBREAKERS A Fateful Display of a Womanizer And a Mad Scientist who must Outgun a A Shark in Soviet Georgia 96
BEAST HUNCHBACK A Awe-Inspiring Epistle of a Student And a Squirrel who must Defeat a Boy in Ancient China 90

Great job! As you’ll see in future exercises, the LENGTH() function is useful when combined with other string manipulation functions and operators.

Truncating strings

In the previous exercise, you calculated the length of the description column and noticed that the number of characters varied but most of the results were over 75 characters. There will be many times when you need to truncate a text column to a certain length to meet specific criteria for an application. In this exercise, we will practice getting the first 50 characters of the description column.

  • Select the first 50 characters of the description column with the alias short_desc
SELECT 
  -- Select the first 50 characters of description
  LEFT(description, 50) AS short_desc
FROM 
  film AS f;

Output

short_desc
A Fateful Display of a Womanizer And a Mad Scienti
A Awe-Inspiring Epistle of a Student And a Squirre
A Astounding Character Study of a Madman And a Rob

Excellent work! If you look at the results of the query you’ll notice that there are several results where the last word has been cut off because it hit the 50 character limit. The last exercise in this chapter will demonstrate how you can truncate this text at the last full word that is less than 50 characters.

Extracting substrings from text data

In this exercise, you are going to practice how to extract substrings from text columns. The Sakila database contains the address table which stores the street address for all the rental store locations. You need a list of all the street names where the stores are located but the address column also contains the street number. You’ll use several functions that you’ve learned about in the video to manipulate the address column and return only the street address.

  • Extract only the street address without the street number from the address column.
  • Use functions to determine the starting and ending position parameters.
SELECT 
  -- Select only the street name from the address table
  SUBSTRING(address from POSITION(' ' in address)+1 FOR char_length(address))
FROM 
  address;

Output

substring
MySakila Drive
MySQL Boulevard
Workhaven Lane

Nice! The SUBSTRING() function is useful when you need to parse substrings from the middle of text data and as you can see can be powerful when combined with the POSITION() and LENGTH() functions.

Combining functions for string manipulation

In the next example, we are going to break apart the email column from the customer table into three new derived fields. Parsing a single column into multiple columns can be useful when you need to work with certain subsets of data. Email addresses have embedded information stored in them that can be parsed out to derive additional information about our data. For example, we can use the techniques we learned about in the video to determine how many of our customers use an email from a specific domain.

  • Extract the characters to the left of the @ of the email column in the customer table and alias it as username.

Now use SUBSTRING to extract the characters after the @ of the email column and alias the new derived field as domain.

SELECT
  -- Extract the characters to the left of the '@'
  LEFT(email, POSITION('@' IN email)-1) AS username,
  -- Extract the characters to the right of the '@'
  SUBSTRING(email FROM POSITION('@' IN email)+1 FOR char_length(email)) AS domain
FROM customer;

Output

username domain
MARY.SMITH sakilacustomer.org
PATRICIA.JOHNSON sakilacustomer.org
LINDA.WILLIAMS sakilacustomer.org

Awesome job! In the next video, we will continue to explore some additional functions for manipulating strings and finish the chapter with a comprehensive example to practice many of the functions we learned about.

3.3. Truncating and padding string data

Padding

Padding strings is useful in many real-world situations. Earlier in this course, we learned about string concatenation and how to combine the customer’s first and last name separated by a single blank space and also combined the customer’s full name with their email address.

The padding functions that we learned about in the video are an alternative approach to do this task. To use this approach, you will need to combine and nest functions to determine the length of a string to produce the desired result. Remember when calculating the length of a string you often need to adjust the integer returned to get the proper length or position of a string.

Let’s revisit the string concatenation exercise but use padding functions.

  • Add a single space to the end or right of the first_name column using a padding function.
  • Use the || operator to concatenate the padded first_name to the last_name column.
-- Concatenate the first_name and last_name 
SELECT 
    ___ || ___(___, LENGTH(last_name)+1) AS full_name
FROM customer; 

Output

full_name
MARY SMITH
PATRICIA JOHNSON
  • Now add a single space to the left or beginning of the last_name column using a different padding function than the first step.
  • Use the || operator to concatenate the first_name column to the padded last_name.
-- Concatenate the first_name and last_name 
SELECT 
    first_name || LPAD(last_name, LENGTH(last_name)+1) AS full_name
FROM customer;

Output

full_name
MARY SMITH
PATRICIA JOHNSON
  • Add a single space to the right or end of the first_name column.
  • Add the characters < to the right or end of last_name column.

Finally, add the characters > to the right or end of the email column.

-- Concatenate the first_name and last_name 
SELECT 
    RPAD(first_name, LENGTH(first_name)+1) 
    || RPAD(last_name, LENGTH(last_name)+2, ' <') 
    || RPAD(email, LENGTH(email)+1, '>') AS full_email
FROM customer;

Output

full_email
MARY SMITH <[email protected]>
PATRICIA JOHNSON <[email protected]>
LINDA WILLIAMS <[email protected]>

Great job! Padding strings with whitespace or another string using LPAD() and RPAD() helps you keep your queries simple and clean.

The TRIM function

In this exercise, we are going to revisit and combine a couple of exercises from earlier in this chapter. If you recall, you used the LEFT() function to truncate the description column to 50 characters but saw that some words were cut off and/or had trailing whitespace. We can use trimming functions to eliminate the whitespace at the end of the string after it’s been truncated.

  • Convert the film category name to uppercase and use the CONCAT() concatenate it with the title.
  • Truncate the description to the first 50 characters and make sure there is no leading or trailing whitespace after truncating.
-- Concatenate the uppercase category name and film title
SELECT 
  CONCAT(upper(c.name), ': ', f.title) AS film_category, 
  -- Truncate the description remove trailing whitespace
  TRIM(RPAD(f.description, 50)) AS film_desc
FROM 
  film AS f 
  INNER JOIN film_category AS fc 
    ON f.film_id = fc.film_id 
  INNER JOIN category AS c 
    ON fc.category_id = c.category_id;

Output

See also  Data-Driven Decision Making in SQL Answer Key – Datacamp 2023
film_category film_desc
ACTION: WEREWOLF LOLA A Fanciful Story of a Man And a Sumo Wrestler who
ACTION: WATERFRONT DELIVERANCE A Unbelieveable Documentary of a Dentist And a Tec
ACTION: UPRISING UPTOWN A Fanciful Reflection of a Boy And a Butler who mu

Awesome! This exercise demonstrated how you can combine and nest functions in your queries to accomplish more complex string manipulation tasks.

Putting it all together

In this exercise, we are going to use the film and category tables to create a new field called film_category by concatenating the category name with the film’s title. You will also practice how to truncate text fields like the film table’s description column without cutting off a word.

To accomplish this we will use the REVERSE() function to help determine the position of the last whitespace character in the description before we reach 50 characters. This technique can be used to determine the position of the last character that you want to truncate and ensure that it is less than or equal to 50 characters AND does not cut off a word.

This is an advanced technique but I know you can do it! Let’s dive in.

SELECT 
  UPPER(c.name) || ': ' || f.title AS film_category, 
  -- Truncate the description without cutting off a word
  LEFT(description, 50 - 
    -- Subtract the position of the first whitespace character
    position(
      ' ' IN REVERSE(LEFT(description, 50))
    )
  ) 
FROM 
  film AS f 
  INNER JOIN film_category AS fc 
    ON f.film_id = fc.film_id 
  INNER JOIN category AS c 
    ON fc.category_id = c.category_id;

Output

film_category left
ACTION: WEREWOLF LOLA A Fanciful Story of a Man And a Sumo Wrestler who
ACTION: WATERFRONT DELIVERANCE A Unbelieveable Documentary of a Dentist And a
ACTION: UPRISING UPTOWN A Fanciful Reflection of a Boy And a Butler who
ACTION: TRUMAN CRAZY A Thrilling Epistle of a Moose And a Boy who must

Excellent job! In this chapter you learned a lot of built-in functions.

4. Full-text Search and PostgresSQL Extensions

An introduction into some more advanced capabilities of PostgreSQL like full-text search and extensions.

4.1. Overview of Common Data Types

Learn about the properties and characteristics of common data types including strings, numerics and arrays and how to retrieve information about your database.

A review of the LIKE operator

The LIKE operator allows us to filter our queries by matching one or more characters in text data. By using the % wildcard we can match one or more characters in a string. This is useful when you want to return a result set that matches certain characteristics and can also be very helpful during exploratory data analysis or data cleansing tasks.

Let’s explore how different usage of the % wildcard will return different results by looking at the film table of the Sakila DVD Rental database.

  • Select all columns for all records that begin with the word GOLD.
-- Select all columns
SELECT *
FROM film
-- Select only records that begin with the word 'GOLD'
WHERE title LIKE 'GOLD%';

— Select only records that begin with the word ‘GOLD’

SELECT *
FROM film
-- Select only records that end with the word 'GOLD'
WHERE title like '%GOLD';
  • Finally, select all records that contain the word ‘GOLD’.
SELECT *
FROM film
-- Select only records that contain the word 'GOLD'
WHERE title LIKE '%GOLD%';

Excellent job! While the LIKE operator is a simple pattern matching tool in your SQL toolbox, it’s an expensive operation in terms of performance, so let’s practice a much better approach using full-text search.

What is a tsvector?

You saw how to convert strings to tsvector and tsquery in the video and, in this exercise, we are going to dive deeper into what these functions actually return after converting a string to a tsvector. In this example, you will convert a text column from the film table to a tsvector and inspect the results. Understanding how full-text search works is the first step in more advanced machine learning and data science concepts like natural language processing.

  • Select the film description and convert it to a tsvector data type.
-- Select the film description as a tsvector
SELECT to_tsvector(description)
FROM film;

Output

film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating last_update special_features
58 BEACH HEARTBREAKERS A Fateful Display of a Womanizer And a Mad Scientist who must Outgun a A Shark in Soviet Georgia 2006 1 1 6 2.99 122 16.99 G 2006-02-15 05:03:00 Deleted Scenes,Behind the Scenes
60 BEAST HUNCHBACK A Awe-Inspiring Epistle of a Student And a Squirrel who must Defeat a Boy in Ancient China 2006 1 1 3

output

to_tsvector
‘display’:3 ‘fate’:2 ‘georgia’:19 ‘mad’:9 ‘must’:12 ‘outgun’:13 ‘scientist’:10 ‘shark’:16 ‘soviet’:18 ‘woman’:6
‘ancient’:18 ‘awe’:3 ‘awe-inspir’:2 ‘boy’:16 ‘china’:19 ‘defeat’:14 ‘epistl’:5 ‘inspir’:4 ‘must’:13 ‘squirrel’:11 ‘student’:8
‘abandon’:19 ‘astound’:2 ‘charact’:3 ‘fun’:20 ‘hous’:21 ‘mad’:15 ‘madman’:7 ‘meet’:13 ‘must’:12 ‘robot’:10 ‘scientist’:16 ‘studi’:4
‘berlin’:17 ‘drama’:3 ‘husband’:9 ‘must’:11 ‘outrac’:12 ‘student’:6 ‘sumo’:14 ‘unbeliev’:2 ‘wrestler’:15

Excellent job! Now that you’ve seen what a tsvector data type looks like, let’s see how we can use it to perform a basic full-text search.

Searching text will become something you do repeatedly when building applications or exploring data sets for data science. Full-text search is helpful when performing exploratory data analysis for a natural language processing model or building a search feature into your application.

In this exercise, you will practice searching a text column and match it against a string. The search will return the same result as a query that uses the LIKE operator with the % wildcard at the beginning and end of the string, but will perform much better and provide you with a foundation for more advanced full-text search queries. Let’s dive in.

  • Select the title and description columns from the film table.
  • Perform a full-text search on the title column for the word elf.
-- Select the title and description
SELECT title, description
FROM film
-- Convert the title to a tsvector and match it against the tsquery 
WHERE to_tsvector(title) @@ to_tsquery('elf');

Output

title description
GHOSTBUSTERS ELF A Thoughtful Epistle of a Dog And a Feminist who must Chase a Composer in Berlin
ELF MURDER A Action-Packed Story of a Frisbee And a Woman who must Reach a Girl in An Abandoned Mine Shaft
ENCINO ELF A Astounding Drama of a Feminist And a Teacher who must Confront a Husband in A Baloon

Excellent job! Now let’s learn how we can extend PostgeSQL with user-defined data types, functions and extensions.

4.2. Extending PostgreSQL

User-defined data types

ENUM or enumerated data types are great options to use in your database when you have a column where you want to store a fixed list of values that rarely change. Examples of when it would be appropriate to use an ENUM include days of the week and states or provinces in a country.

Another example can be the directions on a compass (i.e., north, south, east and west.) In this exercise, you are going to create a new ENUM data type called compass_position.

  • Create a new enumerated data type called compass_position.
  • Use the four positions of a compass as the values.
-- Create an enumerated data type, compass_position
CREATE TYPE compass_position AS ENUM (
    -- Use the four cardinal directions
    'North', 
    'South',
    'East', 
    'West'
);

Verify that the new data type has been created by looking in the pg_type system table.

-- Confirm the new data type is in the pg_type system table
SELECT typname
FROM pg_type
WHERE typname='compass_position';

Excellent job! Now let’s take a closer look at some of the sample user-defined data types that are available in the Sakila DVD Rental database.

Getting info about user-defined data types

The Sakila database has a user-defined enum data type called mpaa_rating. The rating column in the film table is an mpaa_rating type and contains the familiar rating for that film like PG or R. This is a great example of when an enumerated data type comes in handy. Film ratings have a limited number of standard values that rarely change.

When you want to learn about a column or data type in your database the best place to start is the INFORMATION_SCHEMA. You can find information about the rating column that can help you learn about the type of data you can expect to find. For enum data types, you can also find the specific values that are valid for a particular enum by looking in the pg_enum system table. Let’s dive into the exercises and learn more.

  • Select the column_name, data_type, udt_name.
  • Filter for the rating column in the film table.
-- Select the column name, data type and udt name columns
SELECT column_name, data_type, udt_name
FROM INFORMATION_SCHEMA.COLUMNS 
-- Filter by the rating column in the film table
WHERE table_name ='film' AND column_name='rating';

Output

column_name data_type udt_name
rating USER-DEFINED mpaa_rating

Select all columns from the pg_type table where the type name is equal to mpaa_rating.

SELECT *
FROM pg_type 
WHERE typname ='mpaa_rating'

Excellent job! Notice that the mpaa_rating type has a typcategory of E which means its an enumerated data type.

User-defined functions in Sakila

If you were running a real-life DVD Rental store, there are many questions that you may need to answer repeatedly like whether a film is in stock at a particular store or the outstanding balance for a particular customer. These types of scenarios are where user-defined functions will come in very handy. The Sakila database has several user-defined functions pre-defined. These functions are available out-of-the-box and can be used in your queries like many of the built-in functions we’ve learned about in this course.

In this exercise, you will build a query step-by-step that can be used to produce a report to determine which film title is currently held by which customer using the inventory_held_by_customer() function.

  • Select the title and inventory_id columns from the film and inventory tables in the database.
-- Select the film title and inventory ids
SELECT 
    f.title, 
    i.inventory_id
FROM film AS f 
    -- Join the film table to the inventory table
    INNER JOIN inventory AS i ON f.film_id=i.film_id 

Output

title inventory_id
ACE GOLDFINGER 9
ACE GOLDFINGER 10
ACE GOLDFINGER 11
  • inventory_id is currently held by a customer and alias the column as held_by_cust
-- Select the film title, rental and inventory ids
SELECT 
    f.title, 
    i.inventory_id,
    -- Determine whether the inventory is held by a customer
    inventory_held_by_customer (i.inventory_id) AS held_by_cust 
FROM film as f 
    -- Join the film table to the inventory table
    INNER JOIN inventory AS i ON f.film_id=i.film_id 

Output

title inventory_id held_by_cust
ACE GOLDFINGER 9 366
ACE GOLDFINGER 10 null
ACE GOLDFINGER 11 null

Now filter your query to only return records where the inventory_held_by_customer() function returns a non-null value.

-- Select the film title and inventory ids
SELECT 
    f.title, 
    i.inventory_id,
    -- Determine whether the inventory is held by a customer
    inventory_held_by_customer(i.inventory_id) as held_by_cust
FROM film as f 
    INNER JOIN inventory AS i ON f.film_id=i.film_id 
WHERE
    -- Only include results where the held_by_cust is not null
    inventory_held_by_customer(i.inventory_id) is not null

Output

title inventory_id held_by_cust
ACE GOLDFINGER 9 366
AFFAIR PREJUDICE 21 111
AFRICAN EGG 25 590

Excellent job! User-defined types and functions provide you with advanced capabilities for managing and querying your data in PostgreSQL.

4.3. Intro to PostgreSQL extensions

Enabling extensions

Before you can use the capabilities of an extension it must be enabled. As you have previously learned, most PostgreSQL distributions come pre-bundled with many useful extensions to help extend the native features of your database. You will be working with fuzzystrmatch and pg_trgm in upcoming exercises but before you can practice using the capabilities of these extensions you will need to first make sure they are enabled in our database. In this exercise you will enable the pg_trgm extension and confirm that the fuzzystrmatch extension, which was enabled in the video, is still enabled by querying the pg_extension system table.

  • Enable the pg_trgm extension
-- Enable the pg_trgm extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

Now confirm that both fuzzystrmatch and pg_trgm are enabled by selecting all rows from the appropriate system table

-- Select all rows extensions
SELECT * 
FROM pg_extension;

Output

oid extname extowner extnamespace extrelocatable extversion extconfig extcondition
13681 plpgsql 10 11 false 1.0 null null
16456 fuzzystrmatch 16384 2200 true 1.1 null null
16467 pg_trgm 16384 2200 true 1.6 null null

Excellent job! You’re now ready to do some advanced full-text search queries using the fuzzystrmatch and pg_trgm extensions.

Measuring similarity between two strings

Now that you have enabled the fuzzystrmatch and pg_trgm extensions you can begin to explore their capabilities. First, we will measure the similarity between the title and description from the film table of the Sakila database.

  • Select the film title and description.
  • Calculate the similarity between the title and description.
-- Select the title and description columns
SELECT 
  title, 
  description, 
  -- Calculate the similarity
  similarity(title, description)
FROM 
  film

Output

title description similarity
BEACH HEARTBREAKERS A Fateful Display of a Womanizer And a Mad Scientist who must Outgun a A Shark in Soviet Georgia 0
BEAST HUNCHBACK A Awe-Inspiring Epistle of a Student And a Squirrel who must Defeat a Boy in Ancient China 0.022222223
BEDAZZLED MARRIED A Astounding Character Study of a Madman And a Robot who must Meet a Mad Scientist in An Abandoned Fun House 0.029126214

Excellent job! Looking at the similarity() column indicates that the title and description columns are not very similar based on the low number returned for most of the results. Now let’s take a closer at how we can use the levenshtein function to account for grammatical errors in the search text.

Levenshtein distance examples

Now let’s take a closer look at how we can use the levenshtein function to match strings against text data. If you recall, the levenshtein distance represents the number of edits required to convert one string to another string being compared.

In a search application or when performing data analysis on any data that contains manual user input, you will always want to account for typos or incorrect spellings. The levenshtein function provides a great method for performing this task. In this exercise, we will perform a query against the film table using a search string with a misspelling and use the results from levenshtein to determine a match. Let’s check it out.

  • Select the film title and film description.
  • Calculate the levenshtein distance for the film title with the string JET NEIGHBOR.
-- Select the title and description columns
SELECT  
  title, 
  description, 
  -- Calculate the levenshtein distance
  levenshtein(title, 'JET NEIGHBOR') AS distance
FROM 
  film
ORDER BY 3

Output

title description distance
JET NEIGHBORS A Amazing Display of a Lumberjack And a Teacher who must Outrace a Woman in A U-Boat 1
HILLS NEIGHBORS A Epic Display of a Hunter And a Feminist who must Sink a Car in A U-Boat 6
BED HIGHBALL A Astounding Panorama of a Lumberjack And a Dog who must Redeem a Woman in An Abandoned Fun House 7

Excellent job! Because we sorted by the results of the levenshtein function, you can see that the first result is the closest match because it requires one edit to match the plural version of the word NEIGHBOR from film title.

Putting it all together

In this exercise, we are going to use many of the techniques and concepts we learned throughout the course to generate a data set that we could use to predict whether the words and phrases used to describe a film have an impact on the number of rentals.

First, you need to create a tsvector from the description column in the film table. You will match against a tsquery to determine if the phrase “Astounding Drama” leads to more rentals per month. Next, create a new column using the similarity function to rank the film descriptions based on this phrase.

  • Select the title and description for all DVDs from the film table.
  • Perform a full-text search by converting the description to a tsvector and match it to the phrase ‘Astounding & Drama’ using a tsquery in the WHERE clause.

Hint

  • To perform a full-text search, convert the description column to a tsvector using to_tsvector(description).
  • Use the match operator @@ to compare it to the results of the to_tsquery(‘Astounding & Drama’) function.
-- Select the title and description columns
SELECT  
  title, 
  description
FROM 
  film
WHERE 
  -- Match "Astounding Drama" in the description
  to_tsvector(description) @@
  to_tsquery('Astounding & Drama');

Output

title description
COWBOY DOOM A Astounding Drama of a Boy And a Lumberjack who must Fight a Butler in A Baloon
BIKINI BORROWERS A Astounding Drama of a Astronaut And a Cat who must Discover a Woman in The First Manned Space Station
CAMPUS REMEMBER A Astounding Drama of a Crocodile And a Mad Cow who must Build a Robot in A Jet Boat
  • Add a new column that calculates the similarity of the description with the phrase ‘Astounding Drama’.
  • Sort the results by the new similarity column in descending order.
SELECT 
  title, 
  description, 
  -- Calculate the similarity
  similarity(description, 'Astounding Drama')
FROM 
  film 
WHERE 
  to_tsvector(description) @@ 
  to_tsquery('Astounding & Drama') 
ORDER BY 
  similarity(description, 'Astounding Drama') DESC;

Output

title description similarity
COWBOY DOOM A Astounding Drama of a Boy And a Lumberjack who must Fight a Butler in A Baloon 0.24637681
GLASS DYING A Astounding Drama of a Frisbee And a Astronaut who must Fight a Dog in Ancient Japan 0.23943663

Great work! We have just scratched the surface of what you can do with full-text search and natural language processing with PostgreSQL extensions. I encourage you to keep exploring these capabilities.

Share this Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *