Python Module - jsonschema Part 2
Posted on Mar 23, 2018 in Python Module/Package Recommendations by Amo Chen ‐ 6 min read
This article is a part of a series on the Python module - jsonschema:
In the previous article, Python Module - jsonschema Part 1, we introduced six data types defined by JSON Schema and covered some basic validation techniques.
In this post, we will dive deeper into more complex usages of several types, namely number
, string
, array
, and object
.
Integer Type
The number
type includes any positive or negative numbers, as well as decimals. If you want JSON Schema to accept only integers, you can set the type
to integer
. For example, the following code uses the integer
type to validate a decimal number, which will result in validation failure:
from jsonschema import validate
schema = {'type': 'integer'}
# would raise exception
validate(123.123, schema)
Execution result:
ValidationError: 123.123 is not of type 'integer'
Failed validating 'type' in schema:
{'type': 'integer'}
On instance:
123.123
Restricting Numeric Range
JSON Schema allows you to restrict the range of numbers using the keywords minimum
, maximum
, exclusiveMinimum
, and exclusiveMaximum
.
For example, to restrict a number to be within 1 and 10 (including 1 and 10):
schema = {
'type': 'number',
'minimum': 1,
'maximum': 10
}
If you want to exclude 1 and 10, you can add the keywords exclusiveMinimum
and exclusiveMaximum
:
schema = {
'type': 'number',
'minimum': 1,
'exclusiveMinimum': True,
'maximum': 10,
'exclusiveMaximum': True,
}
Limiting String Length
By default, the string
type does not have a length limit. However, in real-world scenarios, strings often have length restrictions. For instance, a message board might limit messages to between 1 and 1024 characters. We can impose these restrictions in JSON Schema:
schema = {
'type': 'string',
'minLength': 1,
'maxLength': 1024,
}
The above example uses minLength
and maxLength
to restrict the minimum and maximum length of the string.
Regular Expressions for String Validation
It’s common to use regular expressions for string validation, and JSON Schema supports regular expressions too. For example, to restrict a username to only contain letters and numbers:
schema = {
'type': 'string',
'pattern': '^[a-zA-Z0-9]+$',
}
The keyword for regular expressions is pattern
. More detailed syntax for regular expressions can be found in Regular Expressions.
Built-in String Formats
Python’s jsonschema library implements several formats specified in JSON Schema, such as date-time
and email
. Here’s an example of using the format
keyword to validate an email:
import jsonschema
schema = {
'type': 'string',
'format': 'email',
}
jsonschema.validate(
'[email protected]',
schema,
format_checker=jsonschema.FormatChecker()
)
In the above example, format_checker=jsonschema.FormatChecker()
is necessary. Without this parameter, the jsonschema library would only check if the data is of the string
type. The jsonschema library also provides methods to customize FormatChecker()
. Details can be found in Validating Formats, along with supported formats
in the documentation.
Limiting Array Length
The array
type does not have a default length limit, but in practice, arrays often have length restrictions. For example, there are only 12 months in a year, so we can use the keywords minItems
or maxItems
to impose minimum or maximum length limits on an array
:
schema = {
'type': 'array',
'minItems': 1,
'maxItems': 12,
}
Ensuring Array Elements are Unique
To ensure that elements within an array
are unique, simply add 'uniqueItems': True
.
schema = {
'type': 'array',
'uniqueItems': True
}
Note: An empty array
[]
will also pass validation.
Specifying Element Types in an Array
The array
type provides an items
keyword, allowing you to specify the type of each element. For example, to specify that all elements in an array
must be string
:
schema = {
'type': 'array',
'items': {
'type': 'string',
}
}
# would pass
validate(['a', 'b', 'c'], schema)
# would raise exception
validate(['a', 2, 'c'], schema)
Execution result:
ValidationError: 2 is not of type 'string'
Failed validating 'type' in schema['items']:
{'type': 'string'}
On instance[1]:
2
In the example, validate(['a', 'b', 'c'], schema)
will pass validation, whereas validate(['a', 2, 'c'], schema)
will fail because one element, 2, is not of the string
type.
If each element in the array has a different type, you can use Tuple validation to specify the type of each element sequentially:
schema = {
'type': 'array',
'minItems': 3,
'maxItems': 3,
'items': [
{'type': 'string'},
{'type': 'number'},
{'type': 'integer'}
]
}
In this example, the array must have a length of 3, with elements being types string
, number
, and integer
in order (e.g., ['a', 1.1, 1]
) to pass validation.
Restricting Object Properties
By default, object
type has no restrictions on properties. This means any object
will pass validation, but typically when using object
, there are specific properties expected. For instance, an object
storing user data might be required to have the properties name, gender, and email. Here’s what the JSON Schema would look like:
schema = {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'gender': {'type': 'string'},
'email': {'type': 'string'},
},
'required': [
'name',
'gender',
'email'
]
}
In this example, the properties
keyword specifies that the object
should contain name, gender, and email properties. Since these properties are optional by default, the required
keyword is added to indicate that name, gender, and email are mandatory properties. An object
missing any of these properties will fail validation, for instance, if the object is missing an email:
# would pass
validate(
{
'name': 'foo',
'gender': 'male',
'email': '[email protected]'
},
schema
)
# would raise exception
validate(
{
'name': 'foo',
'gender': 'male'
},
schema
)
Moreover, as long as the object’s properties satisfy the JSON Schema settings, additional properties will also pass validation. For example, the previous object with additional properties beyond name, gender, and email will still pass validation:
# would pass
validate(
{
'name': 'foo',
'gender': 'male',
'email': '[email protected]',
'account': 'bar',
},
schema
)
If you want additional properties in the object
to be considered as format errors, you can add 'additionalProperties': False
to disallow extra properties.
The
additionalProperties
keyword is used to control the handling of extra properties not listed in the properties keyword. By default, any additional properties are allowed.
The schema would thus become:
schema = {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'gender': {'type': 'string'},
'email': {'type': 'string'},
},
'required': [
'name',
'gender',
'email'
],
'additionalProperties': False,
}
Summary
In this post, we’ve explored more complex usages of the number
, string
, array
, and object
types. In the next installment, we’ll discuss how to mix multiple types to validate more complicated JSON data structures.
References
https://spacetelescope.github.io/understanding-json-schema/