postgresSQL_Execution

Execution¶

An SQL query is written in a declarative format, but when it comes time to run a query, the order of execution is important to understand. Some queries are more efficient than others, even when the end result is the same. Efficiency becomes very important when working with large datasets.

SQL is executed in a systematic manner. The EXPLAIN statement can be used in PostgreSQL to output how the query will be executed. Instead of returning the rows of data, a Query Plan is returned. Contained within the parentheses are some statistics related to the Query Plan.

Below is an example of the EXPLAIN statement. The first part cost=0.00..253.96, is stating that the computational effort started at 0 and ended at 253.96 computations. This is an internal arbitrary number used by PostgreSQL, and does not refer to real time.

The second part rows=14596 width=26 refers to the estimated amount of rows that will return, and the width, which is an expression of the estimated average width of rows that will output in bytes of data. It is important to note that these are estimations, the actual execution statistics may differ slightly.

By using the EXPLAIN and ANALYSE statements, the output states the time it took to build the execution plan (Planning Time), as well as the actual time it took to execute that plan (Execution Time), 1.777 and 1.311 milliseconds, respectively.

These statistics can be very useful when optimising the execution time when querying very large tables.

QUERY PLAN
Seq Scan on payment (cost=0.00..253.96 rows=14596 width=26) (actual time=0.013..0.917 rows=14596 loops=1)
Planning Time: 1.777 ms
Execution Time: 1.311 ms

The next examples compare the execution time of the WHERE statement versus not using it. In this example using the WHERE statement, the planning time and execution time actually increases:

Using the WHERE statement, the planning time is 0.126 milliseconds, versus 0.056 milliseconds when not using the WHERE statement
Using the WHERE statement, the excution time is 2.094 milliseconds, versus 1.553 milliseconds when not using the WHERE statement

This means that the total estimated time can actually be less when returning more data if there are fewer steps in the execution plan.

In [3]:

%%sql

EXPLAIN ANALYSE
SELECT *
FROM payment
WHERE payment.amount > 10;

 * postgresql://postgres:***@localhost/dvdrental
5 rows affected.

Out[3]:

QUERY PLAN
Seq Scan on payment (cost=0.00..290.45 rows=107 width=26) (actual time=0.091..2.083 rows=107 loops=1)
Filter: (amount > '10'::numeric)
Rows Removed by Filter: 14489
Planning Time: 0.126 ms
Execution Time: 2.094 ms

QUERY PLAN
Seq Scan on payment (cost=0.00..253.96 rows=14596 width=26) (actual time=0.008..1.029 rows=14596 loops=1)
Planning Time: 0.056 ms
Execution Time: 1.553 ms

Scanning¶

Scanning is a linear operation, moving from one row to the next row, while applying an operation. This operation could be a filter, for example the WHERE statement demonstrated above.

Scanning small tables with few rows can be efficient, however for large tables with many rows, scanning becomes inefficient. Scanning all the rows in a table is called a 'full table scan' or a 'sequential scan'.

In future posts I will introduce indexes that can combat the poor inefficiency of full table scans.

full table scan