用 Python 學 Google Protocol Buffers - Part 1

Posted on Oct 27, 2018 in Python 程式設計 - 高階 by Amo Chen ‐ 4 min read

覺得我們的內容實用嗎？ MyApollo 電子報讀者募集中！歡迎訂閱電子報!

本文為系列教學：

What I Learned from Quip on How to Build a Product on 8 Different Platforms with Only 13 Engineers 一文說明 Quip 如何用僅 13 人的人力同時建置 8 種不同平台的產品，十分值得借鏡。

該文有個很重要的概念 - Build once, use multiple times ，就是提倡減少重複打造相同元件的過程，提高元件的再利用率。而該文也揭露 Quip 大量使用 Google Protocol Buffers ，透過 Google Protocol Buffers 定義資料結構之後，就能夠在各個語言或平台上自動化產生能夠讀寫相同資料結構的程式碼，甚至能夠作為資料交換格式在各種不同平台間傳遞，降低重複開發的成本進而增加開發效率。

如此方便的工具怎能夠放過，本篇就用 Python 學習 Google Protocol Buffers 吧！

本文環境

Python 3.6.5
Google Protocol Buffers 3.6.1
macOS 10.13.6

macOS 安裝 protobuf 指令：

$ brew install protobuf

Google Protocol Buffers 3 步驟

事實上，使用 Google Protocol Buffers 很簡單，只要 3 步驟：

撰寫 .proto 檔，定義你所需要的資料結構(也就是所謂的 message type )
用 protoc 編譯 .proto 檔，自動產生程式碼
開始使用 protoc 編譯產生的程式碼

撰寫 `.proto` 檔

目前撰寫 .proto 的語法分為 proto2 與 proto3 2 種版本。 proto3 支援更多種程式語言，例如 Go , Ruby , Objective-C , PHP 及 C# ，而且 proto3 較 proto2 多了 JSON Mapping ，讓我們可以簡單地撰寫 JSON 格式的 Protocol Buffers 。

也因此，撰寫 .proto 檔時，需要指明 Protocol Buffers 版本(本篇選用 proto3 作為示範)：

syntax = "proto3";

除了指明 syntax 版本外，還可以額外指定 package 避免 message type 因為名字一樣產生衝突：

package foo.bar;

不過 package 語法，在將 .proto 檔編譯成 Python 時會被忽略，因為 Python 的模組會對應到該模組在檔案系統中的路徑，只要換個檔名或路徑就可避免名字一樣產生衝突的問題。因此，所有應用(application)都只用 Python 建構的話，就可以忽略 package ，但如果是以多種程式語言建構各種應用的話，建議 package 仍要設定。

指定好 syntax 與 package 之後，就可以正式定義我們所需的資料結構，而 Google Protocol Buffers 文件中，將我們定義的資料結構稱為 message type 。

每個 message type 都是以 message 關鍵字開頭，加上 message type 的名字之後並在大括號內定義其欄位(field)名稱與資料型態。

例如我們定義名稱 User 的 message type ：

message User {
	int32 id = 1;      // user's id
	string name = 2;   /* nickname */
	string email = 3;
}

上述結構中，共有 id, name, email 3 個欄位，資料型態分別為 int32, string, string 。每個欄位最後面的數字不是預設值，而是欄位編號， message type 中的每個欄位，都必須指定欄位編號，最小的編號為 1 開始，最大為 536,870,911（ 2 的 29 次方減 1 ，應該沒有人會定義到如此多欄位吧…），其中 19,000 - 19,999 為 Google Protocol Buffers 保留的編號，無法使用。

值得一提的是官方建議如果有效能上的考量的話，盡量把 1 - 15 號保留給最常用到的欄位，因為 1 - 15 號只需要 1 byte 作為編號的儲存容量。

上述範例也同時示範 Google Protocol Buffers 2 種註解的方式：

// comment
/* comment */

進行至此，我們應已完成一個 message type 的 .proto 檔案(本文命名為 user.proto )，其完整內容為：

syntax = "proto3";

message User {
    int32 id = 1;      // user's id
    string name = 2;   /* nickname */
    string email = 3;
}

用 `protoc` 編譯 `.proto` 檔

完成 .proto 檔之後，就能夠用 protoc 指令編譯 .proto 檔。本文希望將 .proto 檔輸出成 Python 能用的程式碼，所以必須指定輸出的參數為 --python_out <destionation directory> 輸出到某個資料夾內：

$ mkdir protobufs # 建立資料夾存放編譯後的 python 程式碼
$ protoc --python_out protobufs user.proto

開始使用 `protoc` 編譯產生的程式碼

編譯好的 Python 程式產生之後，就可以 import 使用：

$ python
>>> from protobufs.message_pb2 import User
>>> u = User()
>>> u.id = 1
>>> u.name = 'John'
>>> u.email = '[email protected]'

如果試圖為屬性設定一個不被接受的資料型態，就會出現錯誤，例如將 id 給定 1 個字串型態的值：

>>> u.id = 'string'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'string' has type str, but expected one of: int, long

或者試圖設定 message type 內未定義的屬性，也會出現錯誤：

>>> u.unknown_field = 'test'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: Assignment not allowed (no field "unknown_field" in protocol message object).

將 message 內的值設定好之後，就可以輸出 binary 形式的字串：

>>> output = u.SerializeToString()
>>> output
b'\x08\x01\x12\x04John\x1a\[email protected]'

Google Protocol Buffers 之所以高效能的原因之一，是因為其將資料轉為 binary 的型態，也因此實務上經常會與 Kafka 一起搭配使用。

會輸出，也要會讀取才有用，讀取的方法為 ParseFromString ：

>>> user = User()
>>> user.ParseFromString(output)
24
>>> user.id
1
>>> user.name
'John'
>>> user.email
'[email protected]'

上述範例可以看到先將 User() 實例化之後，就能夠用 ParseFromString 方法將 message 讀進來，接著 user 內的值就被設定好了。

小結

以上就是最簡單的 Google Protocol Buffers 教學。

雖然看似簡單，但事實上， A 定義好之後， B 也可以根據相同的 .proto 產生程式，所以 A 與 B 都可以順利解讀彼此利用 Google Protocol Buffers 發布的資料，省下不同應用/平台間重複開發相同模組的成本，達到高效率整合的效果。

下一篇，將針對解說更多 proto3 中重要的語法與特性。

References

https://developers.google.com/protocol-buffers/docs/proto3

覺得我們的內容實用嗎？ MyApollo 電子報讀者募集中！歡迎訂閱電子報!

python protocol buffers protobuf

用 Python 學 Google Protocol Buffers - Part 1

本文環境

Google Protocol Buffers 3 步驟

撰寫 `.proto` 檔

用 `protoc` 編譯 `.proto` 檔

開始使用 `protoc` 編譯產生的程式碼

小結

References

對抗久坐職業傷害

贊助我們的創作

用 Python 學 Google Protocol Buffers - Part 1

本文環境 #

Google Protocol Buffers 3 步驟 #

撰寫 .proto 檔 #

用 protoc 編譯 .proto 檔 #

開始使用 protoc 編譯產生的程式碼 #

小結 #

References #

對抗久坐職業傷害

贊助我們的創作

你可能也會感興趣的文章

用 Python 學 Google Protocol Buffers - Part 3

用 Python 學 Google Protocol Buffers - Part 2

Python mock 模組 - 淺談 spec, return_value, side_effect, wraps - Part 2

本文環境

Google Protocol Buffers 3 步驟

撰寫 `.proto` 檔

用 `protoc` 編譯 `.proto` 檔

開始使用 `protoc` 編譯產生的程式碼

小結

References