Normalising the data

Normalisation is the division of attributes or table columns into several relations or tables so that no redundancies are included.


In the following example, we normalise the language in which the books were published.

  1. To do this, we first create a new table languages with the columns id and language_code:

    7    """CREATE TABLE languages
    8                  (id INTEGER PRIMARY KEY AUTOINCREMENT,
    9                  language_code VARCHAR(2))"""
  2. Then we create the values de and en in this table:

    13    """INSERT INTO languages (language_code)
    14                  VALUES ('de')"""
    18    """INSERT INTO languages (language_code)
  3. Since SQLite does not support MODIFY COLUMN, we now create a temporary table temp with all columns from books and a column language_code that uses the column id from the languages table as a foreign key:

    23    """CREATE TABLE "temp" (
    24                  "id" INTEGER,
    25                  "title" TEXT,
    26                  "language_code" INTEGER REFERENCES languages(id),
    27                  "language" TEXT,
    28                  "author" TEXT,
    29                  "license" TEXT,
    30                  "release_date" DATE,
    31                  PRIMARY KEY("id" AUTOINCREMENT)
    32                  )"""
  4. Now we transfer the values from the books table to the temp table:

    36    """INSERT INTO temp (title,language,author,license,release_date)
    37                  SELECT title,language,author,license,release_date FROM books"""
  5. Transfer the specification of the language in books as the id of the data records from the languages table to temp.

    41    """UPDATE temp
    42                  SET language_code = 1
    43                  WHERE language = 'de'"""
  6. Now we can delete the languages column in the temp table:

    55cursor.execute("""ALTER TABLE temp DROP COLUMN language""")


    DROP COLUMN can only be used from Python versions from 3.8 that were released after 27 April 2021.

    With older Python versions, another table would have to be created that no longer contains the languages column and then the data records from temp would have to be inserted into this table.

  7. The books table can now also be deleted:

    57cursor.execute("""DROP TABLE books""")
  8. And finally, the temp table can be renamed books:

    59cursor.execute("""ALTER TABLE temp RENAME TO books""")