Insert and Delete Queries

In this document I will describe the interface for performing insert and delete queries with doctable.


import pandas as pd
import numpy as np
import typing

import sys
sys.path.append('..')
import doctable

Define a demonstration schema

The very first step is to define a table schema that will be appropriate for our examples. This table includes the typical id column (the first column, specified by order=0), as well as string, integer, and boolean attributes. The object used to specify the schema is called a container, and I will use that terminology as we go.

@doctable.table_schema
class Record:
    name: str = doctable.Column(column_args=doctable.ColumnArgs(nullable=False, unique=True))
    age: int = doctable.Column()
    is_old: bool = doctable.Column()

    id: int = doctable.Column(
        column_args=doctable.ColumnArgs(
            order = 0, 
            primary_key=True, 
            autoincrement=True
        ),
    )

core = doctable.ConnectCore.open(target=':memory:', dialect='sqlite', echo=True)

with core.begin_ddl() as ddl:
    rtab = ddl.create_table_if_not_exists(container_type=Record)

2023-11-13 14:46:03,418 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,418 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("Record")
2023-11-13 14:46:03,419 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-11-13 14:46:03,420 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("Record")
2023-11-13 14:46:03,421 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-11-13 14:46:03,422 INFO sqlalchemy.engine.Engine 
CREATE TABLE "Record" (
    id INTEGER, 
    age INTEGER, 
    is_old INTEGER, 
    name VARCHAR NOT NULL, 
    PRIMARY KEY (id), 
    UNIQUE (name)
)


2023-11-13 14:46:03,423 INFO sqlalchemy.engine.Engine [no key 0.00050s] ()
2023-11-13 14:46:03,424 INFO sqlalchemy.engine.Engine COMMIT

Two Interfaces: `ConnectQuery` and `TableQuery`

First, a little about the doctable query interface. There are two interfaces for performing queries: ConnectQuery and TableQuery.

ConnectQuery table-agnostic interface for querying any table in any result format. Create this object using the ConnectCore.query() method.
TableQuery table-specific interface for querying a specific table. Insert and select from container objects used to define the schema. Create this object using the DBTable.query() method.

Inserts via `ConnectQuery`

First I will discuss the ConnectQuery interface, which is created via the ConnectCore.query() method. This object maintains a database connection, and, when used as a context manager, will commit all changes upon exit. It is fine to use the ConnectQuery object without a context manager for queries that do not require commits.

There are two primary methods for insertions via the ConnectQuery interface, which you can see in this table. Both accept a single DBTable object, followed by one or multiple dictionaries of data to insert, depending on the method.

Method	Description
`insert_single()`	Insert a single row into a table.
`insert_multi()`	Insert multiple rows into a table.

with core.query() as q:
    q.insert_single(rtab, {
        'name': 'test_A',
        'age': 10,
        'is_old': False,
    })

    q.insert_multi(rtab, data = [
        {
            'name': 'test_B',
            'age': 10,
            'is_old': False,
        },
        {
            'name': 'test_C',
            'age': 10,
            'is_old': False,
        }
    ])

q.select(rtab.all_cols()).df()

2023-11-13 14:46:03,478 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,480 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (age, is_old, name) VALUES (?, ?, ?)
2023-11-13 14:46:03,482 INFO sqlalchemy.engine.Engine [generated in 0.00420s] (10, False, 'test_A')
2023-11-13 14:46:03,487 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (age, is_old, name) VALUES (?, ?, ?)
2023-11-13 14:46:03,489 INFO sqlalchemy.engine.Engine [generated in 0.00216s] [(10, False, 'test_B'), (10, False, 'test_C')]
2023-11-13 14:46:03,491 INFO sqlalchemy.engine.Engine COMMIT
2023-11-13 14:46:03,494 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,495 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 14:46:03,497 INFO sqlalchemy.engine.Engine [generated in 0.00319s] ()

	id	age	name
0	1	10	test_A
1	2	10	test_B
2	3	10	test_C

Omit Attributes

If some values are not provided, the database will decide which values they take. In this case, the database populates the ID column according to the schema (it acts as the primary key in this case).

core.query().insert_single(rtab, {
    'name': 'test_D',
})
core.query().select(rtab.all_cols()).df()

2023-11-13 14:46:03,544 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,546 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (name) VALUES (?)
2023-11-13 14:46:03,548 INFO sqlalchemy.engine.Engine [generated in 0.00389s] ('test_D',)
2023-11-13 14:46:03,551 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,552 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 14:46:03,554 INFO sqlalchemy.engine.Engine [cached since 0.06015s ago] ()

	id	age	is_old	name
0	1	10.0	0.0	test_A
1	2	10.0	0.0	test_B
2	3	10.0	0.0	test_C
3	4	NaN	NaN	test_D

Note that in our schema we set nullable=False for the name column, so this must be provided in an insert otherwise there will be an error. This typically results in an sqlalchemy.exc.IntegrityError, which you may catch if needed.

import sqlalchemy.exc

try:
    core.query().insert_single(rtab, {
        'is_old': True,
    })
except sqlalchemy.exc.IntegrityError as e:
    print(type(e), e)

2023-11-13 14:46:03,605 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,607 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (is_old) VALUES (?)
2023-11-13 14:46:03,609 INFO sqlalchemy.engine.Engine [generated in 0.00380s] (True,)
<class 'sqlalchemy.exc.IntegrityError'> (sqlite3.IntegrityError) NOT NULL constraint failed: Record.name
[SQL: INSERT OR FAIL INTO "Record" (is_old) VALUES (?)]
[parameters: (True,)]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

`ifnotunique` Parameter

The ifnotunique paramter controls the behavior when a unique constraint is violated.

The default value is FAIL, which will raise an error when a unique constraint is violated - it will raise an sqlalchemy.exc.IntegrityError exception in this case. The other options are IGNORE, meaning inserted rows that violate the constraints should be ignored, and REPLACE, which will replace the existing row with the new row.

In the Record table we have created, there is a unique constraint on name. We will receive an integrity error if we try to insert a duplicate there when using the default ifnotunique='ERROR'.

try:
    core.query().insert_single(rtab, {
        'name': 'test_A',
    })
except sqlalchemy.exc.IntegrityError as e:
    print(type(e), e)

2023-11-13 14:46:03,665 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,668 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (name) VALUES (?)
2023-11-13 14:46:03,669 INFO sqlalchemy.engine.Engine [cached since 0.1247s ago] ('test_A',)
<class 'sqlalchemy.exc.IntegrityError'> (sqlite3.IntegrityError) UNIQUE constraint failed: Record.name
[SQL: INSERT OR FAIL INTO "Record" (name) VALUES (?)]
[parameters: ('test_A',)]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

When using ifnotunique='REPLACE', the insert will replace the existing row with the new row. This is useful when you want to update a row if it already exists, but insert it if it does not.

core.query().insert_single(rtab, {
    'name': 'test_A',
}, ifnotunique='REPLACE')
core.query().select(rtab.all_cols()).df()

2023-11-13 14:46:03,728 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,731 INFO sqlalchemy.engine.Engine INSERT OR REPLACE INTO "Record" (name) VALUES (?)
2023-11-13 14:46:03,733 INFO sqlalchemy.engine.Engine [generated in 0.00453s] ('test_A',)
2023-11-13 14:46:03,738 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,739 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 14:46:03,741 INFO sqlalchemy.engine.Engine [cached since 0.2469s ago] ()

	id	age	is_old	name
0	2	10.0	0.0	test_B
1	3	10.0	0.0	test_C
2	4	NaN	NaN	test_D
3	5	NaN	NaN	test_A

Inserts via `TableQuery`

The TableQuery interface is created via the DBTable.query() method. This object is table-specific, and is used to insert and select from a single table. As such, inserts ONLY accept the Record container objects used to define the schema. Following the ConnectQuery interface, there are two methods for inserting data into a table: .insert_single() and .insert_multi().

rtab.query().insert_single(Record(name='test_E', is_old=False, age=10))
rtab.query().select(rtab.all_cols())

2023-11-13 14:46:03,806 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,809 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (age, is_old, name) VALUES (?, ?, ?)
2023-11-13 14:46:03,811 INFO sqlalchemy.engine.Engine [cached since 0.3334s ago] (10, False, 'test_E')
2023-11-13 14:46:03,814 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,815 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 14:46:03,816 INFO sqlalchemy.engine.Engine [cached since 0.3218s ago] ()





[Record(name='test_B', age=10, is_old=0, id=2),
 Record(name='test_C', age=10, is_old=0, id=3),
 Record(name='test_D', age=None, is_old=None, id=4),
 Record(name='test_A', age=None, is_old=None, id=5),
 Record(name='test_E', age=10, is_old=0, id=6)]

rtab.query().insert_multi([
    Record(name='test_F', is_old=False, age=10),
    Record(name='test_G', is_old=True, age=80),
])
rtab.query().select(rtab.all_cols())

2023-11-13 14:46:03,866 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,869 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (age, is_old, name) VALUES (?, ?, ?)
2023-11-13 14:46:03,870 INFO sqlalchemy.engine.Engine [cached since 0.3834s ago] [(10, False, 'test_F'), (80, True, 'test_G')]
2023-11-13 14:46:03,874 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,875 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 14:46:03,876 INFO sqlalchemy.engine.Engine [cached since 0.3818s ago] ()





[Record(name='test_B', age=10, is_old=0, id=2),
 Record(name='test_C', age=10, is_old=0, id=3),
 Record(name='test_D', age=None, is_old=None, id=4),
 Record(name='test_A', age=None, is_old=None, id=5),
 Record(name='test_E', age=10, is_old=0, id=6),
 Record(name='test_F', age=10, is_old=0, id=7),
 Record(name='test_G', age=80, is_old=1, id=8)]

The `doctable.MISSING` Sentinel

Lets now take a closer look at the container object behavior. Notice that in the schema definition we gave default values of doctable.Column, which we used to specify additional attributes. This automatically sets the default value for the dataclass to be doctable.MISSING, which is special because it will be ignored when inserting - instead, it will let the database decide how to handle it. This is especially useful for columns like id, which are intended to be automatically generated by the database. We can see this when we omit attributes from the object.

test_record = Record(name='test_H')
test_record

Record(name='test_H', age=MISSING, is_old=MISSING, id=MISSING)

Those values will be omitted in the insert, filled in by the db, and be returned upon selection.

rtab.query().insert_single(test_record)
rtab.query().select(rtab.all_cols())

2023-11-13 14:46:03,982 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,985 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (name) VALUES (?)
2023-11-13 14:46:03,987 INFO sqlalchemy.engine.Engine [cached since 0.4426s ago] ('test_H',)
2023-11-13 14:46:03,990 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:03,991 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 14:46:03,992 INFO sqlalchemy.engine.Engine [cached since 0.4979s ago] ()





[Record(name='test_B', age=10, is_old=0, id=2),
 Record(name='test_C', age=10, is_old=0, id=3),
 Record(name='test_D', age=None, is_old=None, id=4),
 Record(name='test_A', age=None, is_old=None, id=5),
 Record(name='test_E', age=10, is_old=0, id=6),
 Record(name='test_F', age=10, is_old=0, id=7),
 Record(name='test_G', age=80, is_old=1, id=8),
 Record(name='test_H', age=None, is_old=None, id=9)]

If we select a subset of columns, the missing values will refer to doctable.MISSING, even though the attributes will continue to exist.

rtab.query().select(rtab.cols('id', 'name'))

2023-11-13 14:46:04,041 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:04,044 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".name 
FROM "Record"
2023-11-13 14:46:04,045 INFO sqlalchemy.engine.Engine [generated in 0.00350s] ()





[Record(name='test_A', age=MISSING, is_old=MISSING, id=5),
 Record(name='test_B', age=MISSING, is_old=MISSING, id=2),
 Record(name='test_C', age=MISSING, is_old=MISSING, id=3),
 Record(name='test_D', age=MISSING, is_old=MISSING, id=4),
 Record(name='test_E', age=MISSING, is_old=MISSING, id=6),
 Record(name='test_F', age=MISSING, is_old=MISSING, id=7),
 Record(name='test_G', age=MISSING, is_old=MISSING, id=8),
 Record(name='test_H', age=MISSING, is_old=MISSING, id=9)]

Note that the doctable.MISSING will never be inserted into the databse because it will be ignored.

Deletion Interface

Deleting rows is pretty straightforward when using either the ConnectQuery or TableQuery interfaces. In fact, it is the exact same for both. The only parameters are where and wherestr (where you can add additional conditionals as strings).

print(core.query().select([rtab['id'].count()]).scalar_one())
with rtab.query() as q:
    q.delete(where=rtab['name']=='test_A')
core.query().select([rtab['id'].count()]).scalar_one()

2023-11-13 14:46:31,486 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:31,489 INFO sqlalchemy.engine.Engine SELECT count("Record".id) AS count_1 
FROM "Record"
2023-11-13 14:46:31,490 INFO sqlalchemy.engine.Engine [cached since 27.39s ago] ()
7
2023-11-13 14:46:31,493 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:31,495 INFO sqlalchemy.engine.Engine DELETE FROM "Record" WHERE "Record".name = ?
2023-11-13 14:46:31,497 INFO sqlalchemy.engine.Engine [cached since 27.39s ago] ('test_A',)
2023-11-13 14:46:31,498 INFO sqlalchemy.engine.Engine COMMIT
2023-11-13 14:46:31,500 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:46:31,500 INFO sqlalchemy.engine.Engine SELECT count("Record".id) AS count_1 
FROM "Record"
2023-11-13 14:46:31,501 INFO sqlalchemy.engine.Engine [cached since 27.4s ago] ()





7

To delete all columns, pass the all=True flag. This prevents the user from accidentally deleting all rows.

with rtab.query() as q:
    q.delete(all=True)
core.query().select([rtab['id'].count()]).scalar_one()

2023-11-13 14:48:08,718 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:48:08,720 INFO sqlalchemy.engine.Engine DELETE FROM "Record"
2023-11-13 14:48:08,722 INFO sqlalchemy.engine.Engine [cached since 13.97s ago] ()
2023-11-13 14:48:08,724 INFO sqlalchemy.engine.Engine COMMIT
2023-11-13 14:48:08,726 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 14:48:08,727 INFO sqlalchemy.engine.Engine SELECT count("Record".id) AS count_1 
FROM "Record"
2023-11-13 14:48:08,729 INFO sqlalchemy.engine.Engine [cached since 124.6s ago] ()





0

Insert and Delete Queries

Define a demonstration schema

Two Interfaces: ConnectQuery and TableQuery

Inserts via ConnectQuery