淺談 SQLAlchemy Relationship Lazy Loading
Posted on Jun 1, 2019 in Python 模組/套件推薦 by Amo Chen ‐ 3 min read
SQLAlchemy ORM(Object-Relational Mapping)利用 relationship() 將多個 Model 關聯在一起,讓我們直接用存取屬性方式,直接就能夠使用被關聯的 model ,使用起來十分方便。
但是 SQLAlchemy ORM 的 relationship()
預設使用一種稱為 lazy loding
的技術,如果不了解其原理的話,很容易會有 n+1 query
的問題。
本文將簡單地談談 SQLAlchemy relationship()
。
本文環境
- Python 3.6.5
- SQLAlchemy 1.3.3
relationship
首先看看以下 2 個 models, 分別是 User & Address, 每個 User 可以擁有多個地址,所以用 addresses = relationship('Address', uselist=True)
將 2 個 models 關聯起來。
# -*- coding: utf-8 -*-
import sqlalchemy.orm
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
username = Column(String)
addresses = relationship('Address', uselist=True)
def __init__(self, name, username):
self.name = name
self.username = username
class Address(Base):
__tablename__ = 'user_address'
id = Column(Integer, primary_key=True)
address = Column(String, nullable=False)
user_id = Column(Integer, ForeignKey('user.id'))
def __init__(self, address):
self.address = address
接著用以下程式試著看看存取 User.addresses 時 SQLAlchemy 執行什麼 SQL 語句:
engine = create_engine('sqlite:///:memory:', echo=True)
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()
user = User('Jeff', 'jeff1111')
session.add(user)
session.commit()
user = session.query(User).first()
print(user.addresses)
print(user.addresses)
其結果是:
2019-06-01 16:48:08,675 INFO sqlalchemy.engine.base.Engine SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
LIMIT ? OFFSET ?
2019-06-01 16:48:08,676 INFO sqlalchemy.engine.base.Engine (1, 0)
2019-06-01 16:48:08,681 INFO sqlalchemy.engine.base.Engine SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id
FROM user_address
WHERE ? = user_address.user_id
2019-06-01 16:48:08,682 INFO sqlalchemy.engine.base.Engine (1,)
可以看到存取 User.addresses 時 SQLAlchemy 執行 1 次 SQL SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id FROM user_address WHERE ? = user_address.user_id
從資料庫中取得 User 的 Addresses ,值得注意的是儘管我們存取 2 次 User.addresses, SQLAlchemy 仍只執行 1 次 Address 查詢。
這就是 relationship 預設的 lazy loading 設定 lazy='select'
, 只有存取到相關聯的屬性時,才會執行 1 個 SELECT 載入相關連的 models 。
items should be loaded lazily when the property is first accessed, using a separate SELECT statement, or identity map fetch for simple many-to-one references.
目前 relationship()
有多種設定可供選擇:
select
immediate
joined
subquery
selectin
noload
raise
raise_on_sql
dynamic
避免本文篇幅過長,在此不一一解釋僅解釋 immediate
, joined
與 subquery
3 種。
immediate
immediate
會直接在 User model 載入時,直接用額外的 SELECT
語句載入相關聯的 model 資料,可以將 User model 的 relationship 改為以下設定:
addresses = relationship('Address', uselist=True, lazy='immediate')
試試查詢 User model 看有沒有任何行為不一樣:
>>> user = session.query(User).first()
從以下 log 可以發現 lazy='immediate'
User model 一載入時就執行 SELECT
載入相關的 Address, 即使我們並沒有存取 addresses 屬性,也付出額外 1 個 SELECT
的成本:
2019-06-01 17:20:51,378 INFO sqlalchemy.engine.base.Engine SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
LIMIT ? OFFSET ?
2019-06-01 17:20:51,378 INFO sqlalchemy.engine.base.Engine (1, 0)
2019-06-01 17:20:51,382 INFO sqlalchemy.engine.base.Engine SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id
FROM user_address
WHERE ? = user_address.user_id
2019-06-01 17:20:51,383 INFO sqlalchemy.engine.base.Engine (1,)
p.s. 這種立即載入的設定稱為 eagerly loading
joined
joined
同樣會立即載入相關聯的 model, 但是用 JOIN
語句,而不是額外的 SELECT
, 同樣將 User.addresses 的relationship() 改為以下設定:
addresses = relationship('Address', uselist=True, lazy='joined')
試看看 joined
的行為:
>>> user = session.query(User).first()
從以下 log 可以發現,額外的 SELECT
被 JOIN
取而代之,從而節省 1 個 SELECT
的成本,此種設定十分適合每次查詢都ㄧ定會用到相關聯 model 的情況:
2019-06-01 17:34:04,257 INFO sqlalchemy.engine.base.Engine SELECT anon_1.user_id AS anon_1_user_id, anon_1.user_name AS anon_1_user_name, anon_1.user_username AS anon_1_user_username, user_address_1.id AS user_address_1_id, user_address_1.address AS user_address_1_address, user_address_1.user_id AS user_address_1_user_id
FROM (SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN user_address AS user_address_1 ON anon_1.user_id = user_address_1.user_id
2019-06-01 17:34:04,258 INFO sqlalchemy.engine.base.Engine (1, 0)
subquery
subquery
與 joined
不同,而是使用子查詢(subquery)再將結果用 JOIN
的方式結合:
addresses = relationship('Address', uselist=True, lazy='subquery')
試看看 subquery
的行為:
>>> user = session.query(User).first()
以下 log 同樣可以發現子查詢加上 JOIN
的行為:
2019-06-01 17:46:37,352 INFO sqlalchemy.engine.base.Engine SELECT user.id AS user_id, user.name AS user_name, user.username AS user_username
FROM user
LIMIT ? OFFSET ?
2019-06-01 17:46:37,353 INFO sqlalchemy.engine.base.Engine (1, 0)
2019-06-01 17:46:37,355 INFO sqlalchemy.engine.base.Engine SELECT user_address.id AS user_address_id, user_address.address AS user_address_address, user_address.user_id AS user_address_user_id, anon_1.user_id AS anon_1_user_id
FROM (SELECT user.id AS user_id
FROM user
LIMIT ? OFFSET ?) AS anon_1 JOIN user_address ON anon_1.user_id = user_address.user_id ORDER BY anon_1.user_id
2019-06-01 17:46:37,355 INFO sqlalchemy.engine.base.Engine (1, 0)
以上就是幾個關於 SQLAlchemy lazy loading 設定的介紹。
Happy Coding!
References
https://docs.sqlalchemy.org/en/13/orm/relationship_api.html#sqlalchemy.orm.relationship
https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html