Learning Google Protocol Buffers with Python - Part 3

Posted on  Nov 9, 2018  in  Python Programming - Advanced Level  by  Amo Chen  ‐ 5 min read

This article is part of a tutorial series:

In the last article, we introduced some key syntax and features of proto3 with Google Protocol Buffers. This time, we’ll delve into how to write more complex messages, and introduce some convenient data types along with how to use them.

Composite Messages

Typically, the definition of a message might resemble a JSON string where one object nests inside another. Similarly, a message could contain other messages. For example, a shopping cart might include various products:

syntax = "proto3";

message User {
	string id = 1;
	string name = 2;
}

message Product {
	string id = 1;
	string name = 2;
	int32 price = 3;
	int32 quantity = 4;
}

message Cart {
	User user = 1;
	repeated Product products = 2;
}

A composite message is built by first defining the smallest message, then combining these small messages. Once compiled into Python, you can use methods like MergeFrom or add (limited to repeated fields) to assemble them:

>>> from cart_pb2 import User, Product, Cart
>>> user = User()
>>> user.id = "userid"
>>> user.name = "username"
>>>
>>> cart = Cart()
>>> cart.user.MergeFrom(user)
>>> product = cart.products.add()
>>> product.id = "productid"
>>> product.name = "productname"
>>> product.price = 100
>>> product.quantity = 1
>>>
>>> cart.SerializeToString()
b'\n\x12\n\x06userid\x12\x08username\x12\x1c\n\tproductid\x12\x0bproductname\x18d \x01'

If you want to assemble directly from binary data, you can use MergeFromString:

>>> product_string = product.SerializeToString()
>>> product_2 = cart.products.add()
>>> product_2.MergeFromString(product_string)
>>> print(len(cart.products))
2
>>> cart.SerializeToString()
b'\n\x12\n\x06userid\x12\x08username\x12\x1c\n\tproductid\x12\x0bproductname\x18d \x01\x12\x1c\n\tproductid\x12\x0bproductname\x18d \x01'

That’s how you use composite messages—pretty straightforward, isn’t it?

Importing .proto Files

Google Protocol Buffers allows splitting messages into different .proto files, which you can import:

syntax "proto3";

import "user.proto";

message Product {
	string id = 1;
	string name = 2;
	int32 price = 3;
	int32 quantity = 4;
}

message Cart {
	User user = 1;
	repeated Product products = 2;
}

In the example above, you can see User is brought in via import "user.proto";.

Any Type

You might encounter situations where a message field stores various types of data, like log-type data which doesn’t have a fixed format. In such cases, you can try using the expanded type Any provided by proto3.

Using the Any type requires importing via import "google/protobuf/any.proto";.

Below is an attempt to define a message type called Event, in which the field payload is Any and can hold various message types:

syntax = "proto3";

import "google/protobuf/any.proto";

message Event {
	string event = 1;
	google.protobuf.Any payload = 2;
}

Here’s a Python example of how to work with Event:

>>> from event_pb2 import Event
>>> from cart_pb2 import User
>>>
>>> event = Event()
>>> event.type = "viewProductPage"
>>> user = User()
>>> user.id = "userid"
>>> user.name = "username"
>>> event.payload.Pack(user)
>>> event.payload
type_url: "type.googleapis.com/User"
value: "\n\006userid\022\010username"

In the example above, you can see the Any type uses Pack() to put a message into the field. If you try using MergeFrom(), it only works with messages of the Any type; otherwise, you’ll see an error like:

TypeError: Parameter to MergeFrom() must be instance of same class: expected google.protobuf.Any got User.

After loading with pack(), to read the Any type field, you should use the Unpack() method:

>>> if event.payload.Is(User.DESCRIPTOR):
...    user2 = User()
...    event.payload.Unpack(user2)
...    print(user2)

id: "userid"
name: "username"

This example checks if the payload field holds data of the User type using Is() method, then creates a new user2 and deserializes data into user2 using Unpack(user2).

That’s an overview of the Any type.

Other Extended Types

proto3 added many extended types like Maps, Timestamp, and Duration, increasing usability and convenience.

Here is the declaration format for Maps, where you specify the key and value types individually:

map<key_type, value_type> map_field = N;

For example, declare a Map with a string key and User value:

syntax = "proto3";

message Mapping {
	map<string, User> id_user_map = 1;
}

Here’s an example of using it after compilation:

>>> from cart_pb2 import User
>>> from mapping_pb2 import Mapping
>>>
>>> mapping = Mapping()
>>>
>>> user1 = mapping.id_user_map.get_or_create("userid_1")
>>> user1.id = 'userid_1'
>>>
>>> user2 = mapping.id_user_map.get_or_create("userid_2")
>>> user2.id = 'userid_2'
>>>
>>> for k, v in mapping.id_user_map.keys():
...    print(k)
...
userid_1
userid_2

In the example above, map adds entries via get_or_create(key). If the key exists in the map, it retrieves the value; otherwise, it creates a new entry. You can also traverse existing keys with keys().

While the Google Protocol Buffers website doesn’t have many examples for other data types, you can refer to .proto files in protocolbuffers/protobuf on Github, where file comments include usage examples.

MessageToJson: Convert Message to JSON

proto3 also enhances data exchange capabilities (given that many applications still use JSON as a data exchange format). It supports converting messages to JSON strings. For a comparison of various data types and JSON data types, refer to the JSON Mapping table.

Here’s how to convert a message to JSON in Python:

>>> import google.protobuf.json_format
>>> print(google.protobuf.json_format.MessageToJson(mapping))
{
  "idUserMap": {
    "userid_2": {
      "id": "userid_2"
    },
    "userid_1": {
      "id": "userid_1"
    }
  }
}

Conclusion

This concludes our introduction to Google Protocol Buffers. We hope that after reading this series, you now have a basic understanding of Google Protocol Buffers.

References

https://developers.google.com/protocol-buffers/docs/reference/python-generated