Learning Google Protocol Buffers with Python - Part 2
Posted on Nov 3, 2018 in Python Programming - Advanced Level by Amo Chen ‐ 4 min read
This post is part of a tutorial series:
- Learning Google Protocol Buffers with Python - Part 1
- Learning Google Protocol Buffers with Python - Part 2
- Learning Google Protocol Buffers with Python - Part 3
In the previous post, we introduced the basics of using Google Protocol Buffers (proto3) with Python. In this post, we’ll delve further into some key syntax and features of proto3
.
No required
Keyword in proto3
One crucial thing to know about proto3
is that it doesn’t have the required
keyword.
In proto2
, the required
keyword was used to indicate that a field must be present:
syntax = 'proto2';
message MyMessage {
required string id = 1;
}
However, proto3
has removed the required
keyword, meaning all fields in proto3
are optional. Therefore, even if some fields in a message are missing, they can still be converted into binary.
The repeated
Keyword
If a field can correspond to a Python list or tuple, then the field requires the repeated
keyword. For example, one user might have multiple phone numbers, which can be expressed using repeated
like this:
syntax = 'proto3';
message User {
repeated string phonenumbers = 1;
}
After compiling into Python code, you can use the extend
method to specify multiple phone numbers at once:
>>> from user_pb2 import User
>>> user = User()
>>> user.phonenumbers.extend(['phone1', 'phone2'])
>>> user.SerializeToString()
b'\n\x06phone1\n\x06phone2'
Or use append
to add a single phone number:
>>> user.phonenumbers.append('phone3')
>>> user.phonenumbers
['phone1', 'phone2', 'phone3']
It’s important to note that repeated
doesn’t require at least one value to be present in the field; having zero values is also valid. If there’s a need for such a constraint, you must check it before assigning to the Google Protocol Buffers’ Message. Additionally, the order of elements in a list or tuple will be preserved in the Message.
repeated
: This field can appear any number of times (including zero) in a well-formed message. The order of repeated values will be preserved.
Reserved Fields
As an application’s use of messages evolves, fields may be added or removed. If the .proto
files are kept updated and the compiled code is always current, there’s usually no issue. However, if some systems fail to update or are mid-update, problems can arise when new fields use the same field numbers as previously deleted ones.
For example, UserV2’s field 1 is devices (one user might have multiple devices), while UserV1’s field 1 is phonenumbers. UserV1 can still read UserV2’s binary data, but the field data won’t be as expected:
syntax = 'proto3';
message UserV1 {
repeated string phonenumbers = 1;
}
message UserV2 {
repeated string devices = 1;
}
>>> import user_pb2
>>> user_v2 = user_pb2.UserV2()
>>> user_v2.devices.extend(['iPhone XS', 'Macbook Pro'])
>>> user_v2_bytes = user_v2.SerializeToString()
>>>
>>> user_v1 = user_pb2.UserV1.FromString(user_v2_bytes)
>>> user_v1.phonenumbers
['iPhone XS', 'Macbook Pro']
This situation is due to different versions sharing the same field number, leading to misuse. To avoid this, Google Protocol Buffers provides the reserved field number feature. If you delete certain fields, you can reserve those field numbers using the reserved
keyword to prevent future misuse.
Example:
syntax = 'proto3';
message UserV2 {
reserved 1;
repeated devices = 2;
}
reserved
not only preserves field numbers but also supports field names:
message UserV2 {
reserved 1;
reserved "phonenumbers";
repeated devices = 2;
}
For more details, refer to the Google Protocol Buffers documentation.
Enumerations
Google Protocol Buffers also supports enumerations with the enum
keyword, for example:
message User {
enum Sex {
UNKNOWN = 0;
MALE = 1;
FEMALE = 2;
}
Sex sex = 1;
string name = 2;
}
>>> import user_pb2
>>> user = user_pb2.User()
>>> user.sex
0
Using enum
, you’ll notice the sex
field defaults to 0. This is because, in Google Protocol Buffers, enum’s 0 is automatically set as the default value. Thus, all enums must start from 0, and 0 should be the first value listed. If not, you’ll encounter an error like the following:
user.proto: The first enum value must be zero in proto3.
Summary
We have now covered several important proto3
keywords and features. In the next post, we’ll explore how to define more complex .proto
files and how to manipulate them with Python.
References
https://developers.google.com/protocol-buffers/docs/proto3