Learning Google Protocol Buffers with Python - Part 2

Posted on  Nov 3, 2018  in  Python Programming - Advanced Level  by  Amo Chen  ‐ 4 min read

This post is part of a tutorial series:

In the previous post, we introduced the basics of using Google Protocol Buffers (proto3) with Python. In this post, we’ll delve further into some key syntax and features of proto3.

No required Keyword in proto3

One crucial thing to know about proto3 is that it doesn’t have the required keyword.

In proto2, the required keyword was used to indicate that a field must be present:

syntax = 'proto2';

message MyMessage {
	required string id = 1;
}

However, proto3 has removed the required keyword, meaning all fields in proto3 are optional. Therefore, even if some fields in a message are missing, they can still be converted into binary.

The repeated Keyword

If a field can correspond to a Python list or tuple, then the field requires the repeated keyword. For example, one user might have multiple phone numbers, which can be expressed using repeated like this:

syntax = 'proto3';

message User {
	repeated string phonenumbers = 1;
}

After compiling into Python code, you can use the extend method to specify multiple phone numbers at once:

>>> from user_pb2 import User
>>> user = User()
>>> user.phonenumbers.extend(['phone1', 'phone2'])
>>> user.SerializeToString()
b'\n\x06phone1\n\x06phone2'

Or use append to add a single phone number:

>>> user.phonenumbers.append('phone3')
>>> user.phonenumbers
['phone1', 'phone2', 'phone3']

It’s important to note that repeated doesn’t require at least one value to be present in the field; having zero values is also valid. If there’s a need for such a constraint, you must check it before assigning to the Google Protocol Buffers’ Message. Additionally, the order of elements in a list or tuple will be preserved in the Message.

repeated: This field can appear any number of times (including zero) in a well-formed message. The order of repeated values will be preserved.

Reserved Fields

As an application’s use of messages evolves, fields may be added or removed. If the .proto files are kept updated and the compiled code is always current, there’s usually no issue. However, if some systems fail to update or are mid-update, problems can arise when new fields use the same field numbers as previously deleted ones.

For example, UserV2’s field 1 is devices (one user might have multiple devices), while UserV1’s field 1 is phonenumbers. UserV1 can still read UserV2’s binary data, but the field data won’t be as expected:

syntax = 'proto3';

message UserV1 {
	repeated string phonenumbers = 1;
}

message UserV2 {
	repeated string devices = 1;
}
>>> import user_pb2
>>> user_v2 = user_pb2.UserV2()
>>> user_v2.devices.extend(['iPhone XS', 'Macbook Pro'])
>>> user_v2_bytes = user_v2.SerializeToString()
>>>
>>> user_v1 = user_pb2.UserV1.FromString(user_v2_bytes)
>>> user_v1.phonenumbers
['iPhone XS', 'Macbook Pro']

This situation is due to different versions sharing the same field number, leading to misuse. To avoid this, Google Protocol Buffers provides the reserved field number feature. If you delete certain fields, you can reserve those field numbers using the reserved keyword to prevent future misuse.

Example:

syntax = 'proto3';

message UserV2 {
	reserved 1;
	repeated devices = 2;
}

reserved not only preserves field numbers but also supports field names:

message UserV2 {
	reserved 1;
	reserved "phonenumbers";
	repeated devices = 2;
}

For more details, refer to the Google Protocol Buffers documentation.

Enumerations

Google Protocol Buffers also supports enumerations with the enum keyword, for example:

message User {
	enum Sex {
		UNKNOWN = 0;
		MALE = 1;
		FEMALE = 2;
	}
	Sex sex = 1;
	string name = 2;
}
>>> import user_pb2
>>> user = user_pb2.User()
>>> user.sex
0

Using enum, you’ll notice the sex field defaults to 0. This is because, in Google Protocol Buffers, enum’s 0 is automatically set as the default value. Thus, all enums must start from 0, and 0 should be the first value listed. If not, you’ll encounter an error like the following:

user.proto: The first enum value must be zero in proto3.

Summary

We have now covered several important proto3 keywords and features. In the next post, we’ll explore how to define more complex .proto files and how to manipulate them with Python.

References

https://developers.google.com/protocol-buffers/docs/proto3