Much of the world’s raw data—from electronic medical records to customer transaction histories—lives in organized collections of tables called relational databases. Being able to wrangle and extract data from these databases using SQL is an essential skill within the data industry and in increasing demand.
In this two-hour introduction to SQL, you’ll get to know the theory and the practice through bite-sized videos and interactive exercises where you can put your new-found skills to the test.
SQL is an essential language for building and maintaining relational databases, which opens the door to a range of careers in the data industry and beyond. You’ll start this course by covering data organization, tables, and best practices for database construction.
The second half of this course looks at creating SQL queries for selecting data that you need from your database. You’ll have the chance to practice your querying skills before moving on to customizing and saving your results.
PostGreSQL and SQL Server are two of the most popular SQL flavors. You’ll finish off this course by looking at the differences, benefits, and applications of each. By the end of the course you’ll have some hands-on experience in learning SQL and the grounding to start applying it on projects or continue your learning in a more specialized direction.
Course URL: https://www.datacamp.com/courses/introduction-to-sql
3.1. Relational Databases
Before writing any SQL queries, it’s important to understand the underlying data. In this chapter, we’ll discover the role of SQL in creating and querying relational databases. Using a database for a local library, we will explore database and table organization, data types and storage, and best practices for database construction.
3.1.1. Databases
Data organization
If you’d like to use SQL to gain insights from data, understanding the organization of a database is an important first step. Take a look at the database below. Which of the following statements correctly describes its organization?
- This is a table containing three relational databases: employees, job_levels, and departments.
- This is a relational database containing three tables: employees, job_levels, and departments.
- This is a database, but it is not relational, because no relationship exists between job levels and departments.
- This is not a database because there is no SQL code shown.
Well done! This is an example of how data in databases is organized into many tables.
Database advantages
Imagine you are part of a discussion at work about whether or not to create a database. You’ve learned about several advantages of storing data in a database rather than other traditional formats like spreadsheets. See if you can remember what they are!
- Determine whether or not each statement is an advantage of databases.
Well done! There are many advantages of databases, which is why much of the world’s data is stored in them. You’re ready to dive into the details of tables now.
3.1.2. Tables
Picking a unique ID
You’ve learned that a unique identifier is a unique value that identifies a record so that it can be distinguished from other records in the same table.
Let’s take a closer look at the employees table. Which of the fields do you think is best suited to be a unique identifier?
- name
- dept_id
- year_hired
- id
Yes, that’s right! id is unique across all current employees, and we can expect that new employees will receive unique ids as well.
- Move each card to the proper bucket based on whether or not it is a good suggestion for improvement.
Well done on improving that table! Getting naming conventions right makes life much easier when we begin writing SQL queries, so it’s important to set ourselves up for success!
Our very own table
We’ve set up a database inside this course, and the books table is available in the exercise. You’ll use SQL to query this table in the next chapter, but for now, it’s time to explore what data books holds!
Your task is to choose the option below that best describes the information contained in books. There’s no need to do any coding in this exercise; you can answer this question by looking at the books table in the console next to the words “query result.” Because some book titles are long, you may need to scroll to the right in order to see all the information that the books table contains.
- books contains records for id, title, author, year, and genre.
- books contains fields for id, title, author, year, and genre.
- books contains records for title, author, year, and genre. id is a unique identifier but not a record.
- books contains fields for title, author, year, and genre. id is a unique identifier but not a field.
Great work! Each record contains information about a book, organized into id, title, author, year, and genre fields.
3.1.3. Data
At your service
Now that you know more about how data is stored, it’s time to test those skills!
Select the statement about database storage that is false.
- Servers can be used for storing website information as well as databases.
- A server can handle requests from many computers at once.
- Servers are usually personal computers such as laptops.
- Data from a database is physically stored on a server.
That’s right! While it is technically possible for a laptop to be a server, laptops aren’t well suited to the job since they are not powerful enough to handle many requests at once and don’t have as much storage capacity as larger computers.
Finding data types
Imagine that you are starting a new job and have just started getting to know your new employer’s database. You know that it’s important to know the data type—such as VARCHAR, INT, or NUMERIC—corresponding to each field in a table. Where could you find this information?
- You can find this information by looking at each table in the database.
- You can find this information by looking at a diagram of relationships between tables.
- You can find this information by looking at the values in each field for each table.
- You can find this information by looking at a database schema.
Awesome work. Database schemas show data types for each field in all tables, and they also show relationships between tables. Looking at a schema is an excellent way to get to know a new database!
Choice of type
You’ve learned that when a table is created, a data type must be indicated for each field. Choosing the correct data type allows the data to be stored correctly and makes certain operations associated with that data type available. For example, mathematical operations can be performed on NUMERIC and INT data types, but not on VARCHAR data. Thus, it makes sense to store numerical values as NUMERIC or INT so that you can perform math operations on them if needed.
In this exercise, you’ll practice selecting the proper data type for your data!
- Drag the field description to the bucket indicating the best data type to use for that field.
Setting the table in style
Imagine that you are designing a database and the following table has been suggested. Your task is to provide feedback on how this table could be improved. Use the skills you learned in the last video to critique it!
Congratulations! Now that you know what data types to expect in each field, we’re ready to query!
3.2. Querying
Learn your first SQL keywords for selecting relevant data from database tables! After practicing querying skills in a database of books, you’ll customize query results using aliasing and save them as views so they can be shared. Finally, you’ll explore the differences between SQL flavors and databases such as PostgreSQL and SQL Server.
3.2.1. Introducing queries
SQL strengths
Which of the below scenarios describes a situation in which using SQL would be useful?
- All data needed to answer the business question is presented in a spreadsheet, and no complicated relationships exist between different data points.
- Large amounts of data about many different but related areas of a business are housed in a relational database.
- The data needed to answer the business question doesn’t exist yet.
Exactly! When we’ve got this type of vast and interrelated data, SQL is excellent for targeting and returning only the data which is relevant to our business question.
Developing SQL style
Recall from the video that it’s important to pay attention to the formatting of SQL queries in order to make them readable. This is especially helpful as you learn more keywords and your queries get longer.
In this exercise, you’ll review the below query about the patrons table. This code will run properly, but it is messy and hard to read. Your task is to determine which edits are appropriate to improve the query so that it follows best practices for SQL style.
SELECT CARD_NUM, TOTAL_FINE
from patrons
Here’s a reminder about what the patrons table looks like!
Drag each suggestion to the proper zone depending on whether or not it will improve the query’s style.
Great style! Your coworkers will thank you when you share clean and readable queries with them.
Querying the books table
You’re ready to practice writing your first SQL queries using the SELECT and FROM keywords. Recall from the video that SELECT is used to choose the fields that will be included in the result set, while FROM is used to pick the table in which the fields are listed.
Feel free to explore books in the exercise. Let’s zoom in on this table in the database schema to see the fields and data types it contains.
Your task in this exercise is to practice selecting fields from books.
- Use SQL to return a result set of all book titles included in the books table.
-- Return all titles from the books table
SELECT title
FROM books;
- Select both the title and author fields from books.
-- Select title and author from the books table
SELECT title,author,books
FROM books;
- Select all fields from the books table.
-- Select all fields from the books table
SELECT *
FROM books;
Well done! You’ve mastered your first two SQL keywords. The more keywords you know, the more complex SQL queries you’ll be able to write!
3.2.2. Writing queries
Making queries DISTINCT
You’ve learned that the DISTINCT keyword can be used to return unique values in a field. In this exercise, you’ll use this understanding to find out more about the books table!
There are 350 books in the books table, representing all of the books that our local library has available for checkout. But how many different authors are represented in these 350 books? The answer is surely less than 350. For example, J.K. Rowling wrote all seven Harry Potter books, so if our library has all Harry Potter books, seven books will be written by J.K Rowling. There are likely many more repeat authors!
- Write SQL code that returns a result set with just one column listing the unique authors in the books table.
-- Select unique authors from the books table
SELECT DISTINCT author
FROM books;
- Update the code to return the unique author and genre combinations in the books table.
-- Select unique authors and genre combinations from the books table
SELECT DISTINCT author,genre
FROM books;
You’ve passed this exercise with DISTINCTion! Notice that you found 247 unique authors in the books table overall but 249 unique combinations of authors and genres. This means there are one or two authors who have written books in multiple genres!
Aliasing
While the default column names in a SQL result set come from the fields they are created from, you’ve learned that aliasing can be used to rename these result set columns. This can be helpful for clarifying the intent or contents of the column.
Your task in this exercise is to incorporate an alias into one of the SQL queries that you worked with in the previous exercise!
- Add an alias to the SQL query to rename the author column to unique_author in the result set.
-- Alias author so that it becomes unique_author
SELECT DISTINCT author AS unique_author
FROM books;
It’s AS easy AS that! Great work. The alias you just implemented makes it clear that only unique authors are listed in the results and that there are no duplicates. This is clear even to someone who is reading only the result set and does not know the SQL code behind the results.
VIEWing your query
You’ve worked hard to create the below SQL query:
SELECT DISTINCT author AS unique_author
FROM books;
What if you’d like to be able to refer to it later, or allow others to access and use the results? The best way to do this is by creating a view. Recall that a view is a virtual table: it’s very similar to a real table, but rather than the data itself being stored, the query code is stored for later use.
- Add a single line of code that saves the results of the written query as a view called library_authors.
-- Save the results of this query as a view called library_authors
CREATE VIEW library_authors AS
SELECT DISTINCT author AS unique_author
FROM books;
- Check that the view was created by selecting all columns from library_authors.
-- Your code to create the view:
CREATE VIEW library_authors AS
SELECT DISTINCT author AS unique_author
FROM books;
-- Select all columns from library_authors
SELECT *
FROM library_authors;
Amazing! As your SQL queries become long and complex, you’ll want to be able to save your queries for referencing later. Views can also be useful when the information contained in a database table isn’t quite what you need. You can create your own custom view with exactly the information you are looking for, without needing to edit the database itself, which you may not have permission to do. Creating views is a valuable skill to have, and you’ve mastered it!
3.2.3. SQL flavors
Comparing flavors
The video introduced several differences between SQL Server and PostgreSQL. These are just two of many relational database systems available, but the differences are fairly representative of the types of differences you’ll see between other SQL database systems and flavors as well.
- Drag the statement to the database management system that it correctly describes.
Excellent work! Both SQL Server and PostgreSQL are very popular SQL flavors. As your SQL journey continues, you’re sure to see their names pop up!
Limiting results
Let’s take a look at a few of the genres represented in our library’s books.
Recall that limiting results is useful when testing code since result sets can have thousands of results! Queries are often written with a LIMIT of just a few records to test out code before selecting thousands of results from the database.
Let’s practice with LIMIT!
- Using PostgreSQL, select the genre field from the books table; limit the number of results to 10.
-- Select the first 10 genres from books using PostgreSQL
SELECT genre
FROM books
LIMIT 10;
There’s no LIMIT to your SQL skills! Great work. You can see from this exercise how it’s nice to work with small result sets by limiting the number of results.
Translating between flavors
In the previous exercise, you wrote the following code using PostgreSQL:
SELECT genre
FROM books
LIMIT 10;
The database in this course is a PostgreSQL database, so you won’t be able to run SQL Server code in any of the exercises. What if you did want to update the above query to work with SQL Server, though? How would you do that?
- Replace FROM with TABLE
- Replace SELECT, FROM, and LIMIT with the corresponding SQL Server keywords
- Replace LIMIT with TOP
- Replace LIMIT with TOP and remove the ; at the end of the query
Looks like you’ve got the flavor of things! Great work. Only a few keyword differences exist between PostgreSQL and SQL Server—most syntax is the same!