Python 模組/套件推薦 - itertools & more-itertools

Posted on Aug 9, 2020 in Python 程式設計 - 初階 , Python 模組/套件推薦 by Amo Chen ‐ 4 min read

覺得我們的內容實用嗎？ MyApollo 電子報讀者募集中！歡迎訂閱電子報!

Python 其實內建許多好用的模組，如果運用得當，不僅節省開發時間，還能讓程式碼看起來更加簡潔。

itertools 就屬於內建好用的模組之一，該模組提供不少讓我們能夠方便地處理 iterable （例如 dict, list, tuple, str 等類型的資料），譬如環型走訪、分類群組(group by)、乘積(product）等等。

如果 itertools 內建的函式還不夠的話，還可以安裝 more-itertools 。 more-itertools 提供更多額外的函式可供利用。

本篇將介紹幾種 itertools/more-itertools 中簡單易用的函式，提供除了自己動手實作之外的方法，不僅能夠節省重複開發輪子的時間，也能夠讓程式看起來更加優雅、簡潔。

本文環境

Python 3.7
more-itertools 8.4.0

$ pip install more-itertools==8.4.0

cycle

有時會遇到需要環狀走訪一個 list 的情況，譬如以下情況

A -> B -> C -> D -> A -> B -> C …

這時候能夠利用 itertools 中的 cycle ，例如以下範例走訪 A, B, C, D ，到 D 之後又從 A 開始走訪起，走訪 2 次後結束執行：

from itertools import cycle


count = 1
for x in cycle(['A', 'B', 'C', 'D']):
    print(count, x)
    if count == 8:
        break
    count += 1

上述範例執行結果如下，值得注意的是 cycle 並沒有設置結束條件的選項，所以得自己控制何時結束才行，否則會進入無窮迴圈，這也就是為何上述範例第 8 行需要 break 迴圈的原因：

1 A
2 B
3 C
4 D
5 A
6 B
7 C
8 D

如果要設置環狀走訪幾次，除了像前述範例自行控制之外，可以利用 more-itertools 的 ncycles 。

例如以下範例，只會走訪 A, B, C, D 2 次，不像 cycle 需要自行控制何時結束，相較於 cycle 更加簡潔：

from more_itertools import ncycles


for x in ncycles(['A', 'B', 'C', 'D'], 2):
    print(x)

filterfalse

filterfalse 很適合用來找出 iterable 中不符合條件的元素，譬如假設所有的使用者應該都至少有 1 種習慣，當我們想找到沒有習慣的使用者時，就可以用 filterfalse ：

from itertools import filterfalse


input = [
   {'user_id': 1, 'habits': ['fishing', 'hiking']},
   {'user_id': 2, 'habits': []},
   {'user_id': 3, 'habits': ['drawing']},
   {'user_id': 4, 'habits': ['swimming']},
]
for x in filterfalse(lambda d: d['habits'], input):
    print(x)

上述範例第 10 行的 lambda d: d['habits'] 被稱為 predicate ，該函式只會回傳 True/False 2 種情況，回傳 false 時，就會被 filterfalse 捕捉，進而將結果 yield ，因此該 predicate 判斷情況為 False 時，就是沒有 habits 的使用者。

上述範例用以找出沒有 habits 的使用者，其執行結果如下。

{'user_id': 2, 'habits': []},

groupby

Itertools 還提供 groupby 讓我們能夠為 iterable 進行分組，雖然其函式名稱 groupby 會讓人直覺認為與 SQL 的 groupby 一樣方便，但其實並非如此，要能夠順利使用 itertools 的 groupby ，得先將 iterable 按照分組依據排序過才行。例如以下資料 input 已經先按照 group 排序過一次，才能夠順利執行：

from itertools import groupby


input = [
    {'id': 1, 'group': 'A'},
    {'id': 4, 'group': 'A'},
    {'id': 5, 'group': 'B'},
    {'id': 2, 'group': 'B'},
    {'id': 3, 'group': 'C'},
    {'id': 6, 'group': 'C'},
]
for group, members in groupby(input, lambda x: x['group']):
    print(group, list(members))

上述範例結果如下，可以看到 groupby 順利地將 input 分為 A, B, C 3 組：

A [{'id': 1, 'group': 'A'}, {'id': 4, 'group': 'A'}]
B [{'id': 5, 'group': 'B'}, {'id': 2, 'group': 'B'}]
C [{'id': 3, 'group': 'C'}, {'id': 6, 'group': 'C'}]

product

product 的完美應用場景是處理像 9 * 9 乘法表，需要多層迴圈的情況，譬如以下是 9 * 9 乘法表最直覺的實作方式－使用雙層迴圈：

for x in range(1, 10):
    for y in range(1, 10):
        print(x, y, x*y)

然而用 product 就只需要 1 個迴圈即可：

from itertools import product


for x, y in product(range(1, 10), range(1, 10)):
    print(x, y, x*y)

當然， product 也可以處理多層迴圈，因為 product 可以接受不定長度的 iterable ，例如 product(itertable1, iterable2, iterable3, ...) ，所以我們同樣可以將 2 * 3 * 4 乘法表濃縮為 1 個迴圈：

from itertools import product


for x, y, z in product(range(1, 3), range(1, 4), range(1, 5)):
    print(x, y, z, x*y*z)

flatten

處理 2 維陣列時，有一種情況也經常會遇到－走訪 2 維陣列中的所有元素。

例如以下 2 維陣列，如果我們想要走訪所有元素，直覺上應該也是使用 2 層迴圈：

input = [
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9],
]

不過實際上還可以透過 flatten 將 2 維陣列中的值全部扁平化到 1 維陣列後進行走訪，程式看起來也相對簡潔些：

from more_itertools import flatten

input = [
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9],
]

for x in flatten(input):
    print(x)

以下是運用 flatten 的範例執行結果：

islice

使用 Python 的人都知道 slice 的用法，譬如 x = [1, 2, 3, 4] ，我們可以用 x[1:3] 取得 [2, 3] 2 個元素，這種用法就被稱為切片(slice) ，而 islice 與 slice 用法相似，差別在於 islice 會回傳 1 個 generator ，而不是直接回傳切片後的結果。

以下是 islice 的範例，該範例利用 islice 試圖取得 input[2:4] 的結果：

from itertools import islice


input = [1, 2, 3, 4, 5, 6]

generator = islice(input, 2, 4)
for i in generator:
    print(i)

上述範例執行結果如下，可以看到我們順利取得 3, 4 這 2 個元素，其結果與我們直接執行 input[2:4] 一樣。

3
4

grouper

接著談談很高機率會遇到的批次(batch)處理，譬如將 list 中的資料每 500 個為一組進行處理，這種需求可以透過 grouper 達成，例如以下範例以 2 個為一組取得資料：

from more_itertools import grouper


for group in grouper([1, 2, 3, 4, 5, 6, 7], 2):
    print(group)

上述範例執行結果如下，值得注意的是當不足 2 個時， gropuer 預設會用 None 補滿長度：

(1, 2)
(3, 4)
(5, 6)
(7, None)

如果想改變預設值 None ，可以多加個參數，例如以下：

from more_itertools import grouper


for group in grouper([1, 2, 3, 4, 5, 6, 7], 2, fillvalue=-1):
    print(group)

上述範例將預設值 None 改為 -1 ，其執行結果如下：

(1, 2)
(3, 4)
(5, 6)
(7, -1)

ichunked

ichunked 與 grouper 作用類似，但是 ichunked 並不會將長度補滿，而且其回傳的值是 islice 的 generator ，如果不需要將長度補滿的情況，可以選擇用 ichunked ：

from more_itertools import ichunked


for chunk in ichunked([1, 2, 3, 4, 5, 6, 7], 2):
    print(chunk, list(chunk))

上述範例執行結果如下，可以看到 [7] 並不像 grouper 被補滿至長度 2 ：

<itertools.islice object at 0x7f818a4ac530> [1, 2]
<itertools.islice object at 0x7f818a4bc110> [3, 4]
<itertools.islice object at 0x7f818a4bc590> [5, 6]
<itertools.islice object at 0x7f818a4bc4d0> [7]

以上就是 itertools/more-itertools 的介紹。事實上官方文件中還有更多關於其他本文未提及的函式說明，有興趣的話可以利用時間閱讀一番，想必可以有不少收穫！

Happy Coding!

References

https://docs.python.org/3/library/itertools.html

https://more-itertools.readthedocs.io/en/stable/index.html

覺得我們的內容實用嗎？ MyApollo 電子報讀者募集中！歡迎訂閱電子報!

python itertools more-itertools

Python 模組/套件推薦 - itertools & more-itertools

本文環境

cycle

filterfalse

groupby

product

islice

grouper

ichunked

References

對抗久坐職業傷害

贊助我們的創作

Python 模組/套件推薦 - itertools & more-itertools

本文環境 #

cycle #

filterfalse #

groupby #

product #

islice #

grouper #

ichunked #

References #

對抗久坐職業傷害

贊助我們的創作

你可能也會感興趣的文章

Python 版本管理的好工具 - pyenv

Python 好用套件介紹 - better-exceptions

Python 好用套件介紹 - structlog

本文環境

cycle

filterfalse

groupby

product

islice

grouper

ichunked

References