SQLAlchemy ORM(Object-Relational Mapping)利用 relationship() 將多個 Model 關聯在一起,讓我們直接用存取屬性方式,直接就能夠使用被關聯的 model ,使用起來十分方便。

但是 SQLAlchemy ORM 的 relationship() 預設使用一種稱為 lazy loding 的技術,如果不了解其原理的話,很容易會有 n+1 query 的問題。

本文將簡單地談談 SQLAlchemy relationship()

本文環境

  • Python 3.6.5
  • SQLAlchemy 1.3.3

relationship

首先看看以下 2 個 models, 分別是 User & Address, 每個 User 可以擁有多個地址,所以用 addresses = relationship('Address', uselist=True) 將 2 個 models 關聯起來。

# -*- coding: utf-8 -*-
import sqlalchemy.orm

from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.orm import sessionmaker


Base = declarative_base()


class User(Base):
    __tablename__ = 'user'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    username = Column(String)
    addresses = relationship('Address', uselist=True)

    def __init__(self, name, username):
        self.name = name
        self.username = username


class Address(Base):
    __tablename__ = 'user_address'
    id = Column(Integer, primary_key=True)
    address = Column(String, nullable=False)
    user_id = Column(Integer, ForeignKey('user.id'))

    def __init__(self, address):
        self.address = address

接著用以下程式試著看看存取 User.addresses 時 SQLAlchemy 執行什麼 SQL 語句:

engine = create_engine('sqlite:///:memory:', echo=True)
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()

user = User('Jeff', 'jeff1111')
session.add(user)
session.commit()

user = session.query(User).first()
print(user.addresses)
print(user.addresses)

其結果是:

2019-06-01 16:48:08,675 INFO sqlalchemy.engine.base.Engine SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
 LIMIT ? OFFSET ?
2019-06-01 16:48:08,676 INFO sqlalchemy.engine.base.Engine (1, 0)
2019-06-01 16:48:08,681 INFO sqlalchemy.engine.base.Engine SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id
FROM user_address
WHERE ? = user_address.user_id
2019-06-01 16:48:08,682 INFO sqlalchemy.engine.base.Engine (1,)

可以看到存取 User.addresses 時 SQLAlchemy 執行 1 次 SQL SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id FROM user_address WHERE ? = user_address.user_id 從資料庫中取得 User 的 Addresses ,值得注意的是儘管我們存取 2 次 User.addresses, SQLAlchemy 仍只執行 1 次 Address 查詢。

這就是 relationship 預設的 lazy loading 設定 lazy='select' , 只有存取到相關聯的屬性時,才會執行 1 個 SELECT 載入相關連的 models 。

items should be loaded lazily when the property is first accessed, using a separate SELECT statement, or identity map fetch for simple many-to-one references.

目前 relationship() 有多種設定可供選擇:

  • select
  • immediate
  • joined
  • subquery
  • selectin
  • noload
  • raise
  • raise_on_sql
  • dynamic

避免本文篇幅過長,在此不一一解釋僅解釋 immediate , joinedsubquery 3 種。

immediate

immediate 會直接在 User model 載入時,直接用額外的 SELECT 語句載入相關聯的 model 資料,可以將 User model 的 relationship 改為以下設定:

addresses = relationship('Address', uselist=True, lazy='immediate')

試試查詢 User model 看有沒有任何行為不一樣:

>>> user = session.query(User).first()

從以下 log 可以發現 lazy='immediate' User model 一載入時就執行 SELECT 載入相關的 Address, 即使我們並沒有存取 addresses 屬性,也付出額外 1 個 SELECT 的成本:

2019-06-01 17:20:51,378 INFO sqlalchemy.engine.base.Engine SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
 LIMIT ? OFFSET ?
2019-06-01 17:20:51,378 INFO sqlalchemy.engine.base.Engine (1, 0)
2019-06-01 17:20:51,382 INFO sqlalchemy.engine.base.Engine SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id
FROM user_address
WHERE ? = user_address.user_id
2019-06-01 17:20:51,383 INFO sqlalchemy.engine.base.Engine (1,)

p.s. 這種立即載入的設定稱為 eagerly loading

joined

joined 同樣會立即載入相關聯的 model, 但是用 JOIN 語句,而不是額外的 SELECT , 同樣將 User.addresses 的relationship() 改為以下設定:

addresses = relationship('Address', uselist=True, lazy='joined')

試看看 joined 的行為:

>>> user = session.query(User).first()

從以下 log 可以發現,額外的 SELECTJOIN 取而代之,從而節省 1 個 SELECT 的成本,此種設定十分適合每次查詢都ㄧ定會用到相關聯 model 的情況:

2019-06-01 17:34:04,257 INFO sqlalchemy.engine.base.Engine SELECT anon_1.user_id AS anon_1_user_id, anon_1.user_name AS anon_1_user_name, anon_1.user_username AS anon_1_user_username, user_address_1.id AS user_address_1_id, user_address_1.address AS user_address_1_address, user_address_1.user_id AS user_address_1_user_id
FROM (SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
 LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN user_address AS user_address_1 ON anon_1.user_id = user_address_1.user_id
2019-06-01 17:34:04,258 INFO sqlalchemy.engine.base.Engine (1, 0)

subquery

subqueryjoined 不同,而是使用子查詢(subquery)再將結果用 JOIN 的方式結合:

addresses = relationship('Address', uselist=True, lazy='subquery')

試看看 subquery 的行為:

>>> user = session.query(User).first()

以下 log 同樣可以發現子查詢加上 JOIN 的行為:

2019-06-01 17:46:37,352 INFO sqlalchemy.engine.base.Engine SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
 LIMIT ? OFFSET ?
2019-06-01 17:46:37,353 INFO sqlalchemy.engine.base.Engine (1, 0)
2019-06-01 17:46:37,355 INFO sqlalchemy.engine.base.Engine SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id, anon_1.user_id AS anon_1_user_id
FROM (SELECT user.id AS user_id
FROM user
 LIMIT ? OFFSET ?) AS anon_1 JOIN user_address ON anon_1.user_id = user_address.user_id ORDER BY anon_1.user_id
2019-06-01 17:46:37,355 INFO sqlalchemy.engine.base.Engine (1, 0)

以上就是幾個關於 SQLAlchemy lazy loading 設定的介紹。

Happy Coding!

References

https://docs.sqlalchemy.org/en/13/orm/relationship_api.html#sqlalchemy.orm.relationship

https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html