postgresSQL_Selects

What is a Table?¶

A table contains data. A table is used to organise data so that it can be manipulated and analysed in various ways applicable to the questions being asked of the data. A data table can be broken down further to its constituent parts:

rows
columns

Each row in a table represents one entry, in the sketch below, the row entry from left to right is, Jane, Lawyer, England and 27. Each column in a table represents a feature, the below example shows the "Country" feature. It is important to note that each entry in the column is of the same datatype.

datatable

Select¶

The most fundamental keyword in SQL is the SELECT command. This command allows retrieval of selected data from a database. Usually the SELECT command is used in conjunction with the FROM command.

SELECT specifies columns of interest
FROM specifies where these columns are stored

Often SQL scripts will work without terminating a statement with a semicolon, but it is best practice to end a statement with a semi colon (;). It is also good practice to write all commands in uppercase, such as SELECT.

The SELECT can specify what columns should be returned from what tables, or what records should be returned based on the value of certain columns. The following example uses SELECT to select data FROM the payment table, the query reads like plain english.

In [2]:

%%sql

SELECT * FROM payment
LIMIT 5;

 * postgresql://postgres:***@localhost/dvdrental
5 rows affected.

Out[2]:

payment_id	customer_id	staff_id	rental_id	amount	payment_date
17503	341	2	1520	7.99	2007-02-15 22:25:46.996577
17504	341	1	1778	1.99	2007-02-16 17:23:14.996577
17505	341	1	1849	7.99	2007-02-16 22:41:45.996577
17506	341	2	2829	2.99	2007-02-19 19:39:56.996577
17507	341	2	3130	7.99	2007-02-20 17:31:48.996577

The Wildcard¶

The asterisk (*) is a wildcard, and returns all columns from a given table.

In the above query, the SELECT statement uses the wildcard to return all of the columns FROM the table called payment. The LIMIT command at the end of the query restricts the rows affected to 5, in order not to return all of the rows, which could be many thousands.

Above is returned 6 columns of data, where the rows returned have been limited to 5. If the LIMIT command was not used, 14596 rows would be returned.

There are times when retrieving all of the column data in a table is necessary, but in most cases this leads to slow performance. Especially when dealing with very large datasets. For this reason, the wildcard should be used sparingly when writing queries.

Note also the convention to use all lower case for column titles as well as underscores to seperate words.

Explicit Notation¶

Instead of using the wildcard to SELECT all of the columns data, querys can be more specific. Explicit notation lists specific columns of interest, which can increase readability, and make code easier to troubleshoot.

The following query selects only the payment_id and customer_id columns from the payment table, so instead of all 6 columns, what is returned is the specified 2.

payment_id	customer_id
17503	341
17504	341
17505	341
17506	341
17507	341

Primary and Foreign Keys¶

A PRIMARY KEY is a column that contains a unique identifier for the entire row. Its only job is to make sure each row has one piece of information that makes sure the entire row is uniquely identifiable. It does not have any other significance. A PRIMARY KEY cannot contain NULL values.

A FOREIGN KEY is the link between two tables in a database that allows for a relational database. A FOREIGN KEY is a field (or collection of fields) in one table, that refers to the PRIMARY KEY in another table. FOREIGN KEY constraints are useful in that they protect the integrity of the database by not allowing data to exist that does not relate to the PRIMARY KEY constraint.

keys

To demonstrate primary and foreign keys using PostgreSQL, the same as in the sketch above, in the following table called "rental", customer_id is a FOREIGN KEY that refers to the PRIMARY KEY in the "payment" table. customer_id is the relation between the two tables. This relational concept is very important to understand when working with relational databases.

customer_id
87
184
477
273
550