This blog is related to part 2 of 4 (see part 1 — Understanding Relational Database ). We will focus on querying and reading data from an existing database, as part of my learning path in understanding SQL. We will be using the MySQL Workbench server and a sample bank database which will be available on my Github repository.
The main focus will be to formulate queries using
SELECT, FROM, WHERE, GROUP BY, HAVING and
ORDER BY clauses to retrieve data from a single table using MySQL connected to DBeaver.
Introducing the SELECT statement
The command most used in SQL statement is the
SELECT clause, which is used to retrieve data from tables stored in a database. A simple version of the command contains the three basic parts:
WHERE. These parts match the basic questions needed by every query:
SELECT columns -> What output we want to see?
FROM table(s) -> What tables are involved?
WHERE condition -> What are the constraints?
FROM clauses, the
SELECT statement can contain many other clauses. Regardless of which RDBMS we are using, the list below is the big 6 elements of a
SELECT [DISTINCT|ALL] (*, column [AS alias]...)
[GROUP BY group_by_expression]
[ORDER BY columname]
The capitalized words
SELECT, FROM, WHERE and so forth are keywords (words with a predefined meaning). Each keyword is referred to us as a clause:
WHERE clause and so on).
SELECT is the primary clause of the
SELECT statement followed by
FROM clause. We use
SELECT to specify the columns we want to include in our result set. The columns are drawn from the table (or view) we specify in the
FROM clause. (we can also draw them from several tables simultaneously, which is discussed in part 3 and part 4) we can also use aggregate functions, such as Sum(salary), or mathematical expressions, such as Quantity * Price.
In the simplest form, the
SELECT statement contains the syntax for selecting columns, selecting rows, grouping data, joining tables, and performing calculations.
How to use simple
SELECT statement to query data from a single table with
SELECT * FROM customer; -> select all information from customer table (columns and rows)SELECT city FROM customer -> select only one column from customer tableSELECT city, cust_id FROM customer -> select specific columns from customerSELECT DISTINCT CustState FROM customer -> select unique values of custstate from customer, without duplicates
To have the
SELECT statement return all data from a specific table, we type an asterisk (*) . It is a great way to quickly see what data a table contains. This statement retrieves all data stored in the
When specifying more than one column in the
SELECT clause, commas should separate the columns listed. Keep in mind that if we forget a comma, the system interprets the subsequent field name as a column alias and we don’t get the results we intended.
SELECT * FROM table without using a
WHERE clause will return the entire table (all columns and all rows). Be cautious, in a large database we should limit our data.
Now that we know how to select columns from a table, we can look at some other operations such as how to use a column alias, arithmetic operations and eliminate duplicate output.
We’ll begin with relatively easy queries by presenting queries from our bank database to show the basics of creating a query.
Consider a typical banking diagram in managing the institution’s data:
In this case, our database includes 11 related tables, containing information about:
- Customers (Cust_id, Address, etc.)
- Business (employee, account, etc.)
- Product and so on.
Each table contain information organized into tuples (rows or record) and attributes (columns or fields). For example, our account table contains 11 columns and 24 rows:
- Each column contains an attribute related to the bank account:
ACCOUNT_ID, AVAIL_BALANCE, CLOSE_DATE, CUSTOMER_ID etc.
- Each row corresponds to one specific account (example:
ACCOUNT_ID = 2, has an
available balance = 500, owned by
customer_idnumber 1 and this account is a
As we know, from the previous article, normalized tables contain Primary Key (PK) and Foreign Keys (FK). The
ACCOUNT_ID column is known as a primary key, which serves as a unique identifier for each record in the account table.
OPEN_BRANCH_ID that appears in the table corresponds to the foreign key.
Consider the first query using product_type table and pty as alias>
SELECT * FROM product_type; -> select all columnsSELECT pty.* FROM product_type pty -> retrieve all columnsSELECT pty.prodyct_type_cd, pty.name FROM product_type pty -> select two specific columns using pty as alias
The above statements are equivalent.
Pty is used as an alias to product_type table. The output is as follow:
Getting information on the employee table using
Eas an alias for employee:
What is happening is that the alias
Ecan directly follow the table name by writing
employee E in the
FROMclause or can follow the
AS keyword as shown in the following example below:
In this particular example what we are doing is asking the system to do the following: for each employee, retrieve the employee’s first and last name and the first and last name of his/her immediate supervisor.
This question involves only employee table. All of the information about the question we want to see and the constraint is based on columns in the employee table.
In the example, we use a string function
CONCAT to concatenate the two-column name
LAST_NAME in the
SELECT clause. The first concatenation takes
EMPLOYEE_NAME as an alias for the column header, the second concatenation takes
SUPERVISOR_NAMEas an alias.
FROM clause, we can think of
Sas two alias representing different copies of the employee table.
E represents employees in the role of subordinates and
S, represents employees in the role of supervisors.
WHEREclause, we joined the two copies. Notice that there is only one employee relation, and the join condition is meant to join the relation with itself by matching the tuples (rows) that satisfy the join condition
Whenever one or more aliases are given to a relation, we can use these names to represent different references to that same relation.
Note: SQL use operators which are reserved keywords used in the
WHEREclause of a SQL statement to perform arithmetic
(- , + , / , % , *), logical operators
(LIKE, AND, OR, NOT, IN, BETWEEN, ALL, EXISTS etc)and comparison operations
(=, >, <, ≥, <=, <>).
Let’s make more questions involving more combination of keywords:
- Get the names of the employee (s) who were hired in 2002 and their respective supervisor name.
- List employees whose names are Michael or Rick or Jane.
- Get employees whose first name starts with
's’and works in
Dept_ID = 1.
Notice the business questions is a little vague, so we can choose which columns we want to see in the output. To answer these questions we can select columns such as
1. Get the names of employee (s) who was hired in 2002 and they respective supervisor name
The table we want to retrieve information is the employee table. The criteria are defined in
WHERE keyword. The
WHERE clause will select all rows that meet the join condition
E.EMP_ID =S.SUPERIOR_EMP_I and the criteria
YEAR(START_DATE = '2002') .
If we don’t specify
YEAR and we just use
START_DATE = '2002' we get an error
incorrect DATE value: '2002' because actually, the format of the column is
YEAR is used to get only the year without specifying the month or day.
ORDER BYclause sorts the output rows. In this case, we use 1 to indicate that we want to order the output by
START_DATE column since it is in the first position in the select query. The default sorts the output in ascending order
ASC, adding the keyword
DESC results in a descending sort.
2. List employees whos names are Michael or Rick or Jane
IN operator allows us to test multiple possible values options. The syntax is:
WHERE expression IN(value1, value2 ...valueN) .
In the example, we create a condition that checks if a value is included in a list of multiple options. It returns all rows from the employee table where the
FIRST_NAME is either
'Michael', ‘Jane', or ‘Rick'.
IN condition is used to help reduce the need to use multiple
OR conditions in
SELECT statement. It works with values of all data types for example, with character, with numeric values and also using
3. list employees whose first name starts with ‘S’ and works in Dept_ID = 1.
Here we are using the operator
LIKE in the
WHERE clause to find all employee whose
FIRST_NAMEstarts with ‘S’. The operator allows us to perform pattern matching against a string or date format with the use of wildcard characters:
- Percent sign
%matches any sequence of characters;
_represents exactly one single character.
SQL GROUP BY clause
GROUP BY clause is used in a
SELECT statement to divide rows in a table into smaller groups. Then use the group function to return summary information for each group.
SUM()- returns the total sum of a numeric column.
AVG()- returns the average value of a numeric column.
MIN()- returns the smallest value of the selected column.
MAX()- returns the largest value of the selected column.
COUNT()- returns the number of rows that matches a specified criteria.
The figure below illustrates an example of how
GROUP BY works with
COUNT aggregate function.
Now let’s look at how we could use the
GROUB BY clause with these aggregate function.
We might ask what is the total amount of money in an account in proportion to each type of different services
(PRODUCT_CD)and also we want to know how many different accounts exist and the average of the available balance.
Recall that an aggregate function takes in multiple values and reduces them back down to one single value:
In this case,
COUNT function returns the total number of unique values in the account column which is equal to 24. The
SUM function takes in all values without specifying a category and then return back the total sum account of 170,754.46. and the
AVG gives us the average of the available balance. The
ROUND function is used just to returns the values rounded in 2 decimals.
But this doesn't return value per category. To do so, we need to use a
GROUP BY clause:
SELECT statement, the rows are being grouped by
PRODUCT_CD and then the
COUNT, SUM, AVG being calculated for each
PRODUCT_CD separately. If we did not include
GROUP BY clause, i.e, if we omit the clause we would obtain only one row like the table below which is wrong:
We could also use more than one column. Using the following query, we also select
To filter the result by only
PRODUTC_IDwe must use
WHERE before apply
GROUP BY :
So what we’re saying is, select
COUNT, SUM, AVG and filter
PRODUCT_CD where the product is in
'SAV' account, before we actually perform the
GROUP BY on the
OPEN_EMP_ID (these columns are indicated by their position 1 and 2).
ORDER BY clause means that we want to sort the value based on the positions of the columns selected. Keep in mind we could also sort the results by selecting the aggregate function by their positions or by referencing the entire function.
SQL HAVING clause
WHERE keyword cannot be used with aggregate functions. It places conditions on the selected columns. An alternative way is to use
HAVING clause which places conditions on groups created by the
Looking at the above result, now suppose we want to group the types of services
PRODUCT_CD and get only the types that the sum of the account is above 700. The query is as follows:
We have concluded our second part of the article resumed into formulating queries for retrieving data from a single table using MySQL and DBeaver. We have covered an explanation about
GROUP BY and
HAVING clause. We learned about the difference between
HAVING clause with examples.
We covered the fact that the
FROM clauses are the fundamental clauses required to retrieve information from the database and that the remaining clauses —
GROUP BY and
HAVING— are used to conditionally process and filter the information returned by the
In the next blog, the topic will involve
JOIN clause which includes works with more than one table.
Thanks for reading and I hope that you find this blog useful.
- John L. Viescas, Michael J. Hernandez (2014). SQL Queries for Mere Mortals, A Hand on Guide to Data Manipulation in SQL.