SQL Querying on a Bank Database — Part 2
This blog is related to part 2 of 4 (see part 1 — Understanding Relational Database ). We will focus on querying and reading data from an existing database, as part of my learning path in understanding SQL. We will be using the MySQL Workbench server and a sample bank database which will be available on my Github repository.
The main focus will be to formulate queries using SELECT, FROM, WHERE, GROUP BY, HAVING
and ORDER BY
clauses to retrieve data from a single table using MySQL connected to DBeaver.
Introducing the SELECT statement
The command most used in SQL statement is the SELECT
clause, which is used to retrieve data from tables stored in a database. A simple version of the command contains the three basic parts: SELECT
, FROM
and WHERE
. These parts match the basic questions needed by every query:
SELECT columns -> What output we want to see?
FROM table(s) -> What tables are involved?
WHERE condition -> What are the constraints?
Besides the SELECT
and FROM
clauses, the SELECT
statement can contain many other clauses. Regardless of which RDBMS we are using, the list below is the big 6 elements of a SELECT
statement:
SELECT [DISTINCT|ALL] (*, column [AS alias]...)
FROM tablename
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY columname]
The capitalized words SELECT, FROM, WHERE
and so forth are keywords (words with a predefined meaning). Each keyword is referred to us as a clause:SELECT
clause, FROM
clause, WHERE
clause and so on).
SELECT
is the primary clause of the SELECT
statement followed by FROM
clause. We use SELECT
to specify the columns we want to include in our result set. The columns are drawn from the table (or view) we specify in the FROM
clause. (we can also draw them from several tables simultaneously, which is discussed in part 3 and part 4) we can also use aggregate functions, such as Sum(salary), or mathematical expressions, such as Quantity * Price.
In the simplest form, the SELECT
statement contains the syntax for selecting columns, selecting rows, grouping data, joining tables, and performing calculations.
How to use simple SELECT
statement to query data from a single table with SELECT
clause:
SELECT * FROM customer; -> select all information from customer table (columns and rows)SELECT city FROM customer -> select only one column from customer tableSELECT city, cust_id FROM customer -> select specific columns from customerSELECT DISTINCT CustState FROM customer -> select unique values of custstate from customer, without duplicates
To have the SELECT
statement return all data from a specific table, we type an asterisk (*) . It is a great way to quickly see what data a table contains. This statement retrieves all data stored in the customer
table.
When specifying more than one column in the SELECT
clause, commas should separate the columns listed. Keep in mind that if we forget a comma, the system interprets the subsequent field name as a column alias and we don’t get the results we intended.
Running SELECT * FROM table
without using a WHERE
clause will return the entire table (all columns and all rows). Be cautious, in a large database we should limit our data.
Now that we know how to select columns from a table, we can look at some other operations such as how to use a column alias, arithmetic operations and eliminate duplicate output.
We’ll begin with relatively easy queries by presenting queries from our bank database to show the basics of creating a query.
Consider a typical banking diagram in managing the institution’s data:
In this case, our database includes 11 related tables, containing information about:
- Customers (Cust_id, Address, etc.)
- Business (employee, account, etc.)
- Product and so on.
Each table contain information organized into tuples (rows or record) and attributes (columns or fields). For example, our account table contains 11 columns and 24 rows:
- Each column contains an attribute related to the bank account:
ACCOUNT_ID, AVAIL_BALANCE, CLOSE_DATE, CUSTOMER_ID etc.
- Each row corresponds to one specific account (example:
ACCOUNT_ID = 2
, has anavailable balance = 500
, owned bycustomer_id
number 1 and this account is asaving account(SAV)
.
As we know, from the previous article, normalized tables contain Primary Key (PK) and Foreign Keys (FK). The ACCOUNT_ID
column is known as a primary key, which serves as a unique identifier for each record in the account table. CUST_ID
and OPEN_BRANCH_ID
that appears in the table corresponds to the foreign key.
Consider the first query using product_type table and pty as alias>
SELECT * FROM product_type; -> select all columnsSELECT pty.* FROM product_type pty -> retrieve all columnsSELECT pty.prodyct_type_cd, pty.name FROM product_type pty -> select two specific columns using pty as alias
The above statements are equivalent. Pty
is used as an alias to product_type table. The output is as follow:
Getting information on the employee table using E
as an alias for employee:
What is happening is that the alias E
can directly follow the table name by writing employee E
in the FROM
clause or can follow the AS
keyword as shown in the following example below:
In this particular example what we are doing is asking the system to do the following: for each employee, retrieve the employee’s first and last name and the first and last name of his/her immediate supervisor.
This question involves only employee table. All of the information about the question we want to see and the constraint is based on columns in the employee table.
In the example, we use a string function CONCAT
to concatenate the two-column name FIRST_NAME
and LAST_NAME
in the SELECT
clause. The first concatenation takes EMPLOYEE_NAME
as an alias for the column header, the second concatenation takes SUPERVISOR_NAME
as an alias.
In the FROM
clause, we can think of E
and S
as two alias representing different copies of the employee table. E
represents employees in the role of subordinates and S
, represents employees in the role of supervisors.
In the WHERE
clause, we joined the two copies. Notice that there is only one employee relation, and the join condition is meant to join the relation with itself by matching the tuples (rows) that satisfy the join condition E.EMP_ID =S.SUPERIOR_EMP_ID.
Whenever one or more aliases are given to a relation, we can use these names to represent different references to that same relation.
Note: SQL use operators which are reserved keywords used in the
WHERE
clause of a SQL statement to perform arithmetic(- , + , / , % , *)
, logical operators(LIKE, AND, OR, NOT, IN, BETWEEN, ALL, EXISTS etc)
and comparison operations(=, >, <, ≥, <=, <>)
.
Let’s make more questions involving more combination of keywords:
- Get the names of the employee (s) who were hired in 2002 and their respective supervisor name.
- List employees whose names are Michael or Rick or Jane.
- Get employees whose first name starts with
's’
and works inDept_ID = 1
.
Notice the business questions is a little vague, so we can choose which columns we want to see in the output. To answer these questions we can select columns such as FIRST_NAME
, LAST_NAME
, START_DATE
and DEPT_ID.
1. Get the names of employee (s) who was hired in 2002 and they respective supervisor name
The table we want to retrieve information is the employee table. The criteria are defined in WHERE
keyword. The WHERE
clause will select all rows that meet the join condition E.EMP_ID =S.SUPERIOR_EMP_I
and the criteria YEAR(START_DATE = '2002')
.
If we don’t specify YEAR
and we just use START_DATE = '2002'
we get an error incorrect DATE value: '2002'
because actually, the format of the column is YYYY-mm-dd
. the YEAR
is used to get only the year without specifying the month or day.
The ORDER BY
clause sorts the output rows. In this case, we use 1 to indicate that we want to order the output by START_DATE
column since it is in the first position in the select query. The default sorts the output in ascending order ASC
, adding the keyword DESC
results in a descending sort.
2. List employees whos names are Michael or Rick or Jane
The IN
operator allows us to test multiple possible values options. The syntax is: WHERE expression IN(value1, value2 ...valueN)
.
In the example, we create a condition that checks if a value is included in a list of multiple options. It returns all rows from the employee table where the FIRST_NAME
is either 'Michael', ‘Jane', or ‘Rick'.
The IN
condition is used to help reduce the need to use multiple OR
conditions in SELECT
statement. It works with values of all data types for example, with character, with numeric values and also using NOT
operator.
3. list employees whose first name starts with ‘S’ and works in Dept_ID = 1.
Here we are using the operator LIKE
in the WHERE
clause to find all employee whose FIRST_NAME
starts with ‘S’. The operator allows us to perform pattern matching against a string or date format with the use of wildcard characters:
- Percent sign
%
matches any sequence of characters; - Underscore
_
represents exactly one single character.
SQL GROUP BY clause
The GROUP BY
clause is used in a SELECT
statement to divide rows in a table into smaller groups. Then use the group function to return summary information for each group.
SUM()- returns the total sum of a numeric column.
AVG()- returns the average value of a numeric column.
MIN()- returns the smallest value of the selected column.
MAX()- returns the largest value of the selected column.
COUNT()- returns the number of rows that matches a specified criteria.
The figure below illustrates an example of how GROUP BY
works with SUM
, COUNT
aggregate function.
Now let’s look at how we could use the GROUB BY
clause with these aggregate function.
We might ask what is the total amount of money in an account in proportion to each type of different services (PRODUCT_CD)
and also we want to know how many different accounts exist and the average of the available balance.
Recall that an aggregate function takes in multiple values and reduces them back down to one single value:
In this case, COUNT
function returns the total number of unique values in the account column which is equal to 24. The SUM
function takes in all values without specifying a category and then return back the total sum account of 170,754.46. and the AVG
gives us the average of the available balance. The ROUND
function is used just to returns the values rounded in 2 decimals.
But this doesn't return value per category. To do so, we need to use a GROUP BY
clause:
In the SELECT
statement, the rows are being grouped by PRODUCT_CD
and then the COUNT, SUM, AVG
being calculated for each PRODUCT_CD
separately. If we did not include GROUP BY
clause, i.e, if we omit the clause we would obtain only one row like the table below which is wrong:
We could also use more than one column. Using the following query, we also selectOPEN_EMP_ID
column:
To filter the result by only PRODUTC_ID
we must use WHERE
before apply GROUP BY
:
So what we’re saying is, select PROUCT_CD
, OPEN_EMP_ID
, COUNT, SUM, AVG
and filter PRODUCT_CD
where the product is in 'CHK'
and 'SAV'
account, before we actually perform the GROUP BY
on the PROUCT_CD
and OPEN_EMP_ID
(these columns are indicated by their position 1 and 2).
The ORDER BY
clause means that we want to sort the value based on the positions of the columns selected. Keep in mind we could also sort the results by selecting the aggregate function by their positions or by referencing the entire function.
SQL HAVING clause
The WHERE
keyword cannot be used with aggregate functions. It places conditions on the selected columns. An alternative way is to use HAVING
clause which places conditions on groups created by the GROUP BY
clause.
Looking at the above result, now suppose we want to group the types of services PRODUCT_CD
and get only the types that the sum of the account is above 700. The query is as follows:
Summary
We have concluded our second part of the article resumed into formulating queries for retrieving data from a single table using MySQL and DBeaver. We have covered an explanation about GROUP BY
and HAVING
clause. We learned about the difference between WHERE
and HAVING
clause with examples.
We covered the fact that the SELECT
and FROM
clauses are the fundamental clauses required to retrieve information from the database and that the remaining clauses — WHERE
, GROUP BY
and HAVING
— are used to conditionally process and filter the information returned by the SELECT
clause.
In the next blog, the topic will involve JOIN
clause which includes works with more than one table.
Thanks for reading and I hope that you find this blog useful.
Books References
- John L. Viescas, Michael J. Hernandez (2014). SQL Queries for Mere Mortals, A Hand on Guide to Data Manipulation in SQL.
Link Reference:
Technothenet: https://www.techonthenet.com/mysql/select.php