PostgreSQL – JOIN

What is Joins ?

Joins are basically a technique to link data together link two tables together on a common column. That will allow you to pick and choose different columns from different tables and and link them together properly.

Correlated sub queries :

Remember the inner query is using data from the outer query and linking them together on a common column. So two sources of data are being linked together on a common column.

Joints are basically the similar concept except that you can join two different tables or multiple as many tables as you want across common columns.

if I wanted to see which employee works in which country I could actually write a query that you know (select first_name, country) from (employees,regions)tables the only thing is that you have to form a relationship between these two tables.


SELECT first_name, country
FROM employees, regions
WHERE employees.region_id = regions.region_id

Example 1.a : I’d like you to create a report that gives me the first name of the employee their email address as well as the division in which that employee works.

So I’d like to see employee first name their e-mail address and the division that they work for. And also I do not want to see any no data in the e-mail column.


SELECT first_name, e.department, email, division
FROM employees e,departments d
WHERE e.department = d.department
AND e.email IS NOT NULL

Example 1.b : I also want to see the country that that employee works in :-


SELECT first_name, e.department, email, division, r.country
FROM employees e,departments d, regions r
WHERE e.department = d.department
AND e.email IS NOT NULL
AND e.region_id = r.region_id

Example 2 : I’d like you to write a query that gives me the country as well as the total number of employees employed in that country.


SELECT r.country, count(*)
FROM employees e, regions r
WHERE e.region_id = r.region_id
GROUP BY r.country

A subquery can also be a source of data.


SELECT r.country, count(*)
FROM employees e, (SELECT * FROM regions) r
WHERE e.region_id = r.region_id
GROUP BY r.country

So a subquery can be a source of data just like a table could be a source of data.
Is this a correlated subquery or a non correlated query ?

This is a non correlated sub query because in this query in the sub query portion we are not using information from the outer query. We’re not linking anything inside of this sub query with the data that’s available on the outer query.There’s not a correlated subquery.

Example 1 :


SELECT frist_name, country
FROM employees INNER JOIN regions
ON employees.region_id = regions.region_id

Example 2 :


SELECT first_name, email, division
FROM employees INNER JOIN departments
ON employees.department = departments.department
WHERE email IS NOT NULL

We’re forming a relationship between employees and departments.I can form a relationship with all of this with another table and I can do INNER JOIN and then the new table name which is regions.

So this inner join with the third table is happening with the results of these two tables(employess, departments).

And if I wanted to add another Join I’m doing that down here with regions and I’m specifying that this region ID is going to be joined with the employees table.

so this third join that is happening here of the regions table is joining with the results of this whole thing.

This source of data right in the From clause together right together. This forms a source of data and then I’m adding another source of data joining with this data set.

We should get the country in which that employee works.


SELECT first_name, email, division, country
FROM employees INNER JOIN departments
ON employees.department = departments.department
INNER JOIN regions ON employees.region_id = regions.region_id
WHERE email IS NOT NULL

This is referred to as an inner join and what that means is it’s going to join on the like data meaning that the data that exists in the employees in the department column is being matched with the data that exists in the department column in the department’s table.

So only when they are equal to each other that’s the data that is going to be returned.

But let’s say that the department’s table had departments that were not in the Employees table or vice versa where the employees were you know assigned to departments that did not exist in departments table.

For that we do not use an INNER JOIN. We use a different kind of joint and that is called an OUTER JOIN.


SELECT distinct department FROM employees
-- 27 department

SELECT distinct department FROM departments
-- 24 department

SELECT distinct employees.department, departments.department
FROM employees INNER JOIN departments ON employees.department = departments.department

INNER JOIN used to get the matching values. How many records going to be returned ? Count will not be 27. We need common values between employees and department , because only going to give us the matching data.

Three more departments in the Employees table which means that the Employees table contains departments that do not exist in the departments table. So how could we expose this information. I could do something called a LEFT JOIN.

LEFT JOIN


SELECT distinct employees.department employees_department, 
		departments.department departments_department
FROM employees LEFT JOIN departments ON employees.department = departments.department

--27 Records

If there is no match in the departments table it will be null in the departments_department column.

This basically says that give me all of the departments from the Employees table regardless of whether or not they exist in the department’s table.

We’re going to join on this column the department column in both of the tables but the LEFT JOIN says give me all of the departments that exist in the Employees table regardless of whether or not they exist in the department’s table.

And left basically means give preference to the table on the left.

Obviously employees is on the left and it departments is on the right so that’s what that’s left join means.

There’s another thing called right join.

UNION ALL

Union eliminates duplicates while stacking the data on top of each other whereas UNION ALL does not eliminate duplicates and just takes the data as is from the top 1 and stacks it on top of the second query.


SELECT distinct department
FROM employees
UNION ALL
SELECT department
FROM departments

If we did not use the distinct key word the top query will get 1000 records, so eliminate duplicate records we use distinct here.

Now by the way these queries could be anything they could be sub queries and inside of Sub-Warden sort of sub queries for the top one and the bottom query could be just as complex.

The key thing to remember is that the columns must match.


SELECT distinct department, region_id
FROM employees
UNION ALL
SELECT department
FROM departments

So if you’ve got two columns in the top query Let’s say you want to get the distinct Department and then the region ID. This will not work ok because you cannot stack on top of one another data that is inconsistent in terms of columns.

Here we’ve got two columns in the top corridors two columns and the bottom query there is only one column so we will not know how to stack it.

If I execute this it’s going to give an error OK.

It’s saying each union query must have the same number of columns.

So if I was to add the division column this division column belongs in departments. This will not work because the region IDs numeric but the division is storing data. So if we execute this together it’s now going to say union types integer and character varying data cannot be matched.


SELECT distinct department, region_id
FROM employees
UNION ALL
SELECT department, division
FROM departments

So the data types not only the column count should be the same data types between the columns should also be the same.

ORDER BY should be always at the end of the query.


SELECT distinct department
FROM employees
UNION ALL
SELECT department
FROM departments
ORDER BY derpatment

Again this order by clause is applying to this entire Querrey and we can have other unions we can you know union some other Querrey here select first name as long as the data types match.

We can UNION multiple tables and subqueries.


SELECT  department
FROM employees
UNION ALL
SELECT department
FROM departments
UNION 
SELECT country
FROM regions
order by department

RIGHT JOIN

And the right join basically means give preference to the table on the right. And in that case that’s this table right here. Give me all of the departments in the department table regardless of whether or not they match the Employees table.


SELECT distinct employees.department employees_department, 
		departments.department departments_department
FROM employees RIGHT JOIN departments ON employees.department = departments.department

--24 Records

Excersise 1 : I’d like to see only those departments that exist in the Employees table and that do not exist in the department’s table.


SELECT distinct employees.department employees_department
FROM employees LEFT JOIN departments ON employees.department = departments.department
WHERE departments.department IS NULL

These are known as outer joins left outer join right outer join.

There’s also something called a full outer join and that is and that’s called Outer.I need to use something called FULL OUTER JOIN. Here no WHERE Clause.


SELECT distinct employees.department employees_department, 
		departments.department departments_department
FROM employees FULL OUTER JOIN departments ON employees.department = departments.department

--28 Records

you can see the full outer join expose’s both sides of the scenario.

UNION

There’s other ways to work with multiple tables or multiple sources of data one of them is called Union and there’s a special command called Union and that’s used to stack one set of data on top another.

Let’s say that I wanted to get the departments in the Employees table and the departments in the departments table or all of them and I wanted to stack them up on top of each other.I can use something called the Union.


SELECT department from employees
-- 1000 departments

SELECT distinct department from employees
-- 27 departments

SELECT department from departments
-- 24 departments	  

SELECT department
FROM employees
UNION
SELECT department
FROM departments

--28 departments(27 emploee departments + 1 department from departments table which is not used in employee tables)

select department from employees where department = 'Camping & Fishing'

There are certain departments in the Employees table that are not in the departments table and there’s certain departments in the departments table that do not exist and the employees. So I could actually combine both of these queries by using something called the Union.

And now it’s going to stack up on the top the departments in the Employees table and then it’s going to underneath that it’s going to give me the departments in the department table but it will only show the UNIQUE departments OK.

So using this union basically allows us to stack data up on top of each other.union is going to remove any duplicates.

EXCEPT

There’s something called Except And what this does is the except operator takes the first resultset and removes from it all rows found in the second resultset.

Again the accept operator takes the first resultset and removes from it all of the rows that are found in the second resultset.


SELECT  distinct department
FROM employees
EXCEPT
SELECT department
FROM departments

Excersise 1 : I’d like to see the department and the total number of employees working for that department and towards the end. I’d also like to see the total.


SELECT department,count(*)
FROM employees
GROUP BY department
UNION ALL
SELECT 'TOTAL',count(*)
FROM employees

Union which is used to stack it on top of each other but it EXCLUDES the duplicates.
Union all is used to do the same thing but it does NOT EXCLUDE the duplicate.
EXCEPT which is used to subtract one result from another.

Cartesian product – CROSS JOIN

It’s not an inner join it’s not an outer join it’s a cross join which basically means multiply each record in the first datasource with the with each record of the second data source.


SELECT * 
FROM employees a CROSS JOIN departments b

Excersise 1 : Write a query that returns the first name the department the hire date and the country of the first employee that we hired and that in the company. As well as the last employee that we hired in the company so I want to see two employees and how do we figure out who was hired first and who were who was hired last.


(SELECT e.first_name, e.department, e.hire_date, r.country
FROM employees e INNER JOIN regions r
ON e.region_id = r.region_id
WHERE hire_date = (SELECT MIN(hire_date) FROM employees e2)
LIMIT 1)
UNION
SELECT e.first_name, e.department, e.hire_date, r.country
FROM employees e INNER JOIN regions r
ON e.region_id = r.region_id
WHERE hire_date = (SELECT MAX(hire_date) FROM employees e2)
ORDER BY hire_date
firste_name department hire_date country
Norbie First Aid 2003-01-01 Canada
Barby Clothing 2016-12-26 Canada

Excersise 2 :

Create a report that can show how the spending for salaries of salary budget that we paid to our employees how that has fluctuated for every 90 day period.

Basically we’re going to be computing the sum of salaries for every 90 days starting with the higher date of the first employee all the way down to the last employee that we hired.


SELECT hire_date, salary,
(SELECT SUM(salary) FROM employees e2
	WHERE e2.hire_date BETWEEN e.hire_date - 90 AND e.hire_date) as spending_pattern
FROM employees e
ORDER BY hire_date

Note : In the subquery we compare the e2 values with e1 values. The subquery going to be excuted against every outer query(in the each record of the table).

//Output : Execute this query , the spending_pattern column value will be changing based on the 90 day window. If two dates are same then the spending_pattern value also be same. because it is same date.

VIEW

A view is basically like a virtual table. It’s generated via a sequel query. views are created based on queries.

You can’t insert data into a view or you can’t delete data from a view because they are based on a sequel query and they’re designed to make your life easier.

Every time I need a query that involves pulling these different columns I have to join these tables in the query.sometimes you can forget what the joints are supposed to be.So to make your life easier you can create something called a view that has the query already formulated and you can directly query the view to get the particular columns that you want without having to worry about joins and these views can be based on group by queries all kinds of queries and sequel that you’ve learned up to this point can be used to generate a view.Think of this as an object in the database.


CREATE VIEW v_employee_information as 

SELECT first_name, email, e.department, salary, division, region, country
FROM employees e, departments d, regions r
WHERE e.department = d.department
AND e.region_id = r.region_id

SELECT * FROM v_employee_information

Leave a Reply

Your email address will not be published. Required fields are marked *