SQL Querying on a Bank Database — Part 2

Photo by Caspar Camille Rubin on Unsplash

This blog is related to part 2 of 4 (see part 1 — Understanding Relational Database ). We will focus on querying and reading data from an existing database, as part of my learning path in understanding SQL. We will be using the MySQL Workbench server and a sample bank database which will be available on my Github repository.

The main focus will be to formulate queries using SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY clauses to retrieve data from a single table using MySQL connected to DBeaver.

Introducing the SELECT statement

The command most used in SQL statement is the SELECT clause, which is used to retrieve data from tables stored in a database. A simple version of the command contains the three basic parts: SELECT, FROM and WHERE. These parts match the basic questions needed by every query:

SELECT columns       -> What output we want to see?
FROM table(s) -> What tables are involved?
WHERE condition -> What are the constraints?

Besides the SELECT and FROM clauses, the SELECT statement can contain many other clauses. Regardless of which RDBMS we are using, the list below is the big 6 elements of a SELECT statement:

SELECT [DISTINCT|ALL] (*, column [AS alias]...)   
FROM
tablename
[WHERE
condition]
[GROUP BY
group_by_expression]
[HAVING
group_condition]
[ORDER BY
columname]

The capitalized words SELECT, FROM, WHERE and so forth are keywords (words with a predefined meaning). Each keyword is referred to us as a clause:SELECT clause, FROM clause, WHERE clause and so on).

SELECT is the primary clause of the SELECT statement followed by FROM clause. We use SELECT to specify the columns we want to include in our result set. The columns are drawn from the table (or view) we specify in the FROM clause. (we can also draw them from several tables simultaneously, which is discussed in part 3 and part 4) we can also use aggregate functions, such as Sum(salary), or mathematical expressions, such as Quantity * Price.

In the simplest form, the SELECT statement contains the syntax for selecting columns, selecting rows, grouping data, joining tables, and performing calculations.

How to use simple SELECT statement to query data from a single table with SELECT clause:

SELECT * FROM customer; -> select all information from customer table (columns and rows)SELECT city FROM customer -> select only one column from customer tableSELECT city, cust_id FROM customer -> select specific columns from customerSELECT DISTINCT CustState FROM customer -> select unique values of custstate from customer, without duplicates

To have the SELECT statement return all data from a specific table, we type an asterisk (*) . It is a great way to quickly see what data a table contains. This statement retrieves all data stored in the customer table.

When specifying more than one column in the SELECT clause, commas should separate the columns listed. Keep in mind that if we forget a comma, the system interprets the subsequent field name as a column alias and we don’t get the results we intended.

Running SELECT * FROM table without using a WHERE clause will return the entire table (all columns and all rows). Be cautious, in a large database we should limit our data.

Now that we know how to select columns from a table, we can look at some other operations such as how to use a column alias, arithmetic operations and eliminate duplicate output.

We’ll begin with relatively easy queries by presenting queries from our bank database to show the basics of creating a query.

Consider a typical banking diagram in managing the institution’s data:

EER Diagram

In this case, our database includes 11 related tables, containing information about:

  • Customers (Cust_id, Address, etc.)
  • Business (employee, account, etc.)
  • Product and so on.

Each table contain information organized into tuples (rows or record) and attributes (columns or fields). For example, our account table contains 11 columns and 24 rows:

Partial result set
  • Each column contains an attribute related to the bank account:ACCOUNT_ID, AVAIL_BALANCE, CLOSE_DATE, CUSTOMER_ID etc.
  • Each row corresponds to one specific account (example: ACCOUNT_ID = 2, has an available balance = 500, owned by customer_id number 1 and this account is a saving account(SAV).

As we know, from the previous article, normalized tables contain Primary Key (PK) and Foreign Keys (FK). The ACCOUNT_ID column is known as a primary key, which serves as a unique identifier for each record in the account table. CUST_ID and OPEN_BRANCH_ID that appears in the table corresponds to the foreign key.

Consider the first query using product_type table and pty as alias>

SELECT * FROM product_type; -> select all columnsSELECT pty.* FROM product_type pty -> retrieve all columnsSELECT pty.prodyct_type_cd, pty.name FROM product_type pty -> select two specific columns using pty as alias

The above statements are equivalent. Pty is used as an alias to product_type table. The output is as follow:

Partial query result

Getting information on the employee table using Eas an alias for employee:

Partial query result

What is happening is that the alias Ecan directly follow the table name by writing employee E in the FROMclause or can follow the AS keyword as shown in the following example below:

Partial query result

In this particular example what we are doing is asking the system to do the following: for each employee, retrieve the employee’s first and last name and the first and last name of his/her immediate supervisor.

This question involves only employee table. All of the information about the question we want to see and the constraint is based on columns in the employee table.

In the example, we use a string function CONCAT to concatenate the two-column name FIRST_NAME and LAST_NAME in the SELECT clause. The first concatenation takes EMPLOYEE_NAME as an alias for the column header, the second concatenation takes SUPERVISOR_NAMEas an alias.

In the FROM clause, we can think of E and Sas two alias representing different copies of the employee table. E represents employees in the role of subordinates and S, represents employees in the role of supervisors.

In the WHEREclause, we joined the two copies. Notice that there is only one employee relation, and the join condition is meant to join the relation with itself by matching the tuples (rows) that satisfy the join condition E.EMP_ID =S.SUPERIOR_EMP_ID.

Whenever one or more aliases are given to a relation, we can use these names to represent different references to that same relation.

Note: SQL use operators which are reserved keywords used in the WHEREclause of a SQL statement to perform arithmetic (- , + , / , % , *) , logical operators (LIKE, AND, OR, NOT, IN, BETWEEN, ALL, EXISTS etc) and comparison operations (=, >, <, ≥, <=, <>).

Let’s make more questions involving more combination of keywords:

  1. Get the names of the employee (s) who were hired in 2002 and their respective supervisor name.
  2. List employees whose names are Michael or Rick or Jane.
  3. Get employees whose first name starts with 's’ and works in Dept_ID = 1.

Notice the business questions is a little vague, so we can choose which columns we want to see in the output. To answer these questions we can select columns such as FIRST_NAME , LAST_NAME , START_DATE and DEPT_ID.

1. Get the names of employee (s) who was hired in 2002 and they respective supervisor name

The table we want to retrieve information is the employee table. The criteria are defined in WHERE keyword. The WHERE clause will select all rows that meet the join condition E.EMP_ID =S.SUPERIOR_EMP_I and the criteria YEAR(START_DATE = '2002') .

If we don’t specify YEAR and we just use START_DATE = '2002' we get an error incorrect DATE value: '2002' because actually, the format of the column is YYYY-mm-dd. the YEAR is used to get only the year without specifying the month or day.

The ORDER BYclause sorts the output rows. In this case, we use 1 to indicate that we want to order the output by START_DATE column since it is in the first position in the select query. The default sorts the output in ascending order ASC, adding the keyword DESC results in a descending sort.

2. List employees whos names are  Michael or Rick or Jane

The IN operator allows us to test multiple possible values options. The syntax is: WHERE expression IN(value1, value2 ...valueN) .

In the example, we create a condition that checks if a value is included in a list of multiple options. It returns all rows from the employee table where the FIRST_NAME is either 'Michael', ‘Jane', or ‘Rick'.

The IN condition is used to help reduce the need to use multiple OR conditions in SELECT statement. It works with values of all data types for example, with character, with numeric values and also using NOT operator.

3.  list employees whose first name starts with ‘S’ and works in Dept_ID = 1.
Query result

Here we are using the operator LIKE in the WHERE clause to find all employee whose FIRST_NAMEstarts with ‘S’. The operator allows us to perform pattern matching against a string or date format with the use of wildcard characters:

  • Percent sign % matches any sequence of characters;
  • Underscore _ represents exactly one single character.

SQL GROUP BY clause

The GROUP BY clause is used in a SELECT statement to divide rows in a table into smaller groups. Then use the group function to return summary information for each group.

SUM()- returns the total sum of a numeric column.
AVG()- returns the average value of a numeric column.
MIN()- returns the smallest value of the selected column.
MAX()- returns the largest value of the selected column.
COUNT()-
returns the number of rows that matches a specified criteria.

The figure below illustrates an example of how GROUP BY works with SUM , COUNT aggregate function.

SUM aggregate function with GROUP BY
COUNT aggregate function with GROUP BY

Now let’s look at how we could use the GROUB BY clause with these aggregate function.

We might ask what is the total amount of money in an account in proportion to each type of different services (PRODUCT_CD)and also we want to know how many different accounts exist and the average of the available balance.

Recall that an aggregate function takes in multiple values and reduces them back down to one single value:

In this case, COUNT function returns the total number of unique values in the account column which is equal to 24. The SUM function takes in all values without specifying a category and then return back the total sum account of 170,754.46. and the AVG gives us the average of the available balance. The ROUND function is used just to returns the values rounded in 2 decimals.

But this doesn't return value per category. To do so, we need to use a GROUP BY clause:

In the SELECT statement, the rows are being grouped by PRODUCT_CD and then the COUNT, SUM, AVG being calculated for each PRODUCT_CD separately. If we did not include GROUP BY clause, i.e, if we omit the clause we would obtain only one row like the table below which is wrong:

We could also use more than one column. Using the following query, we also selectOPEN_EMP_ID column:

Partial result set

To filter the result by only PRODUTC_IDwe must use WHERE before apply GROUP BY :

Partial result set

So what we’re saying is, select PROUCT_CD, OPEN_EMP_ID, COUNT, SUM, AVG and filter PRODUCT_CD where the product is in 'CHK' and 'SAV' account, before we actually perform the GROUP BY on the PROUCT_CD and OPEN_EMP_ID (these columns are indicated by their position 1 and 2).

The ORDER BY clause means that we want to sort the value based on the positions of the columns selected. Keep in mind we could also sort the results by selecting the aggregate function by their positions or by referencing the entire function.

SQL HAVING clause

The WHERE keyword cannot be used with aggregate functions. It places conditions on the selected columns. An alternative way is to use HAVING clause which places conditions on groups created by the GROUP BYclause.

Looking at the above result, now suppose we want to group the types of services PRODUCT_CD and get only the types that the sum of the account is above 700. The query is as follows:

Summary

We have concluded our second part of the article resumed into formulating queries for retrieving data from a single table using MySQL and DBeaver. We have covered an explanation about GROUP BY and HAVING clause. We learned about the difference between WHERE and HAVING clause with examples.

We covered the fact that the SELECTand FROM clauses are the fundamental clauses required to retrieve information from the database and that the remaining clauses — WHERE , GROUP BY and HAVING— are used to conditionally process and filter the information returned by the SELECT clause.

In the next blog, the topic will involve JOIN clause which includes works with more than one table.

Thanks for reading and I hope that you find this blog useful.

Books References

  • John L. Viescas, Michael J. Hernandez (2014). SQL Queries for Mere Mortals, A Hand on Guide to Data Manipulation in SQL.

Link Reference:

Technothenet: https://www.techonthenet.com/mysql/select.php

Tracing my own path to become a Data Scientist in the business field