Execution

An SQL query is written in a declarative format, but when it comes time to run a query, the order of execution is important to understand. Some queries are more efficient than others, even when the end result is the same. Efficiency becomes very important when working with large datasets.

SQL is executed in a systematic manner. The EXPLAIN statement can be used in PostgreSQL to output how the query will be executed. Instead of returning the rows of data, a Query Plan is returned. Contained within the parentheses are some statistics related to the Query Plan.

Below is an example of the EXPLAIN statement. The first part cost=0.00..253.96, is stating that the computational effort started at 0 and ended at 253.96 computations. This is an internal arbitrary number used by PostgreSQL, and does not refer to real time.

The second part rows=14596 width=26 refers to the estimated amount of rows that will return, and the width, which is an expression of the estimated average width of rows that will output in bytes of data. It is important to note that these are estimations, the actual execution statistics may differ slightly.

By using the EXPLAIN and ANALYSE statements, the output states the time it took to build the execution plan (Planning Time), as well as the actual time it took to execute that plan (Execution Time), 1.777 and 1.311 milliseconds, respectively.

These statistics can be very useful when optimising the execution time when querying very large tables.

The next examples compare the execution time of the WHERE statement versus not using it. In this example using the WHERE statement, the planning time and execution time actually increases:

This means that the total estimated time can actually be less when returning more data if there are fewer steps in the execution plan.

Scanning

Scanning is a linear operation, moving from one row to the next row, while applying an operation. This operation could be a filter, for example the WHERE statement demonstrated above.

Scanning small tables with few rows can be efficient, however for large tables with many rows, scanning becomes inefficient. Scanning all the rows in a table is called a 'full table scan' or a 'sequential scan'.

In future posts I will introduce indexes that can combat the poor inefficiency of full table scans.

full table scan