%load_ext sql
from sqlalchemy import create_engine
%sql postgresql://postgres:****@localhost/dvdrental
engine = create_engine('postgresql://postgres:****@localhost/dvdrental')
The WHERE
statement is one of the most basic ways to filter data. In the example below, a specific amount of money is queried from the payment table using the WHERE
statement. This filters out any value in the amount column that is less than 11 euros. In other words, what is returned is any row in which the amount is greater than 11 euros.
%%sql
SELECT *
FROM payment
WHERE amount > 11;
* postgresql://postgres:***@localhost/dvdrental 8 rows affected.
payment_id | customer_id | staff_id | rental_id | amount | payment_date |
---|---|---|---|---|---|
20403 | 362 | 1 | 14759 | 11.99 | 2007-03-21 21:57:24.996577 |
22650 | 204 | 2 | 15415 | 11.99 | 2007-03-22 22:17:22.996577 |
23757 | 116 | 2 | 14763 | 11.99 | 2007-03-21 22:02:26.996577 |
24553 | 195 | 2 | 16040 | 11.99 | 2007-03-23 20:47:59.996577 |
24866 | 237 | 2 | 11479 | 11.99 | 2007-03-02 20:46:39.996577 |
28799 | 591 | 2 | 4383 | 11.99 | 2007-04-07 19:14:17.996577 |
28814 | 592 | 1 | 3973 | 11.99 | 2007-04-06 21:26:57.996577 |
29136 | 13 | 2 | 8831 | 11.99 | 2007-04-29 21:06:07.996577 |
The HAVING
statement can only be used in an aggregate function. An aggregate function is when the values of multiple rows are grouped together to form a single summary value. A good example of an aggregate function is the GROUP BY
statement which groups the value of a set of rows based on a specific criterion. The difference between HAVING
and WHERE
is that HAVING
applies to the result of a GROUP BY
operation, WHERE
applies before data are grouped by GROUP BY
.
The example below returns the total amount spent by each customer id:
%%sql
SELECT customer_id, sum(amount)
FROM payment
GROUP BY customer_id
LIMIT 10;
* postgresql://postgres:***@localhost/dvdrental 10 rows affected.
customer_id | sum |
---|---|
1 | 114.70 |
2 | 123.74 |
3 | 130.76 |
4 | 81.78 |
5 | 134.65 |
6 | 84.75 |
7 | 130.72 |
8 | 85.77 |
9 | 78.80 |
10 | 94.76 |
However, a more realistic question might ask which customers spent more than 170 Euros? The HAVING
statement can be used here after the data has been grouped by customer_id, to filter out any rows where the total amount spent is less than 170 euros.
%%sql
SELECT customer_id, sum(amount)
FROM payment
GROUP BY customer_id
HAVING sum(amount) > 170;
* postgresql://postgres:***@localhost/dvdrental 6 rows affected.
customer_id | sum |
---|---|
144 | 189.60 |
526 | 208.58 |
178 | 194.61 |
459 | 183.63 |
137 | 191.62 |
148 | 211.55 |
The main difference is that the HAVING
statement can only be used after an aggregate function, whereas the WHERE
statement can be used on non-aggregated data.
In short, HAVING
is always placed after the WHERE
and GROUP BY
clauses.
An example of this can be seen below:
%%sql
SELECT customer_id, sum(amount)
FROM payment
WHERE customer_id > 400
GROUP BY customer_id
HAVING sum(amount) > 170;
* postgresql://postgres:***@localhost/dvdrental 2 rows affected.
customer_id | sum |
---|---|
526 | 208.58 |
459 | 183.63 |