Python Module - jsonschema Part 2

Posted on  Mar 23, 2018  in  Python Module/Package Recommendations  by  Amo Chen  ‐ 6 min read

This article is a part of a series on the Python module - jsonschema:

In the previous article, Python Module - jsonschema Part 1, we introduced six data types defined by JSON Schema and covered some basic validation techniques.

In this post, we will dive deeper into more complex usages of several types, namely number, string, array, and object.

Integer Type

The number type includes any positive or negative numbers, as well as decimals. If you want JSON Schema to accept only integers, you can set the type to integer. For example, the following code uses the integer type to validate a decimal number, which will result in validation failure:

from jsonschema import validate

schema = {'type': 'integer'}

# would raise exception
validate(123.123, schema)

Execution result:

ValidationError: 123.123 is not of type 'integer'

Failed validating 'type' in schema:
    {'type': 'integer'}

On instance:
    123.123

Restricting Numeric Range

JSON Schema allows you to restrict the range of numbers using the keywords minimum, maximum, exclusiveMinimum, and exclusiveMaximum.

For example, to restrict a number to be within 1 and 10 (including 1 and 10):

schema = {
    'type': 'number',
    'minimum': 1,
    'maximum': 10
}

If you want to exclude 1 and 10, you can add the keywords exclusiveMinimum and exclusiveMaximum:

schema = {
    'type': 'number',
    'minimum': 1,
    'exclusiveMinimum': True,
    'maximum': 10,
    'exclusiveMaximum': True,
}

Limiting String Length

By default, the string type does not have a length limit. However, in real-world scenarios, strings often have length restrictions. For instance, a message board might limit messages to between 1 and 1024 characters. We can impose these restrictions in JSON Schema:

schema = {
    'type': 'string',
    'minLength': 1,
    'maxLength': 1024,
}

The above example uses minLength and maxLength to restrict the minimum and maximum length of the string.

Regular Expressions for String Validation

It’s common to use regular expressions for string validation, and JSON Schema supports regular expressions too. For example, to restrict a username to only contain letters and numbers:

schema = {
    'type': 'string',
    'pattern': '^[a-zA-Z0-9]+$',
}

The keyword for regular expressions is pattern. More detailed syntax for regular expressions can be found in Regular Expressions.

Built-in String Formats

Python’s jsonschema library implements several formats specified in JSON Schema, such as date-time and email. Here’s an example of using the format keyword to validate an email:

import jsonschema

schema = {
    'type': 'string',
    'format': 'email',
}

jsonschema.validate(
    '[email protected]',
    schema,
    format_checker=jsonschema.FormatChecker()
)

In the above example, format_checker=jsonschema.FormatChecker() is necessary. Without this parameter, the jsonschema library would only check if the data is of the string type. The jsonschema library also provides methods to customize FormatChecker(). Details can be found in Validating Formats, along with supported formats in the documentation.

Limiting Array Length

The array type does not have a default length limit, but in practice, arrays often have length restrictions. For example, there are only 12 months in a year, so we can use the keywords minItems or maxItems to impose minimum or maximum length limits on an array:

schema = {
    'type': 'array',
    'minItems': 1,
    'maxItems': 12,
}

Ensuring Array Elements are Unique

To ensure that elements within an array are unique, simply add 'uniqueItems': True.

schema = {
    'type': 'array',
    'uniqueItems': True
}

Note: An empty array [] will also pass validation.

Specifying Element Types in an Array

The array type provides an items keyword, allowing you to specify the type of each element. For example, to specify that all elements in an array must be string:

schema = {
    'type': 'array',
    'items': {
        'type': 'string',
    }
}

# would pass
validate(['a', 'b', 'c'], schema)

# would raise exception
validate(['a', 2, 'c'], schema)

Execution result:

ValidationError: 2 is not of type 'string'

Failed validating 'type' in schema['items']:
    {'type': 'string'}

On instance[1]:
    2

In the example, validate(['a', 'b', 'c'], schema) will pass validation, whereas validate(['a', 2, 'c'], schema) will fail because one element, 2, is not of the string type.

If each element in the array has a different type, you can use Tuple validation to specify the type of each element sequentially:

schema = {
    'type': 'array',
    'minItems': 3,
    'maxItems': 3,
    'items': [
        {'type': 'string'},
        {'type': 'number'},
        {'type': 'integer'}
    ]
}

In this example, the array must have a length of 3, with elements being types string, number, and integer in order (e.g., ['a', 1.1, 1]) to pass validation.

Restricting Object Properties

By default, object type has no restrictions on properties. This means any object will pass validation, but typically when using object, there are specific properties expected. For instance, an object storing user data might be required to have the properties name, gender, and email. Here’s what the JSON Schema would look like:

schema = {
    'type': 'object',
    'properties': {
        'name': {'type': 'string'},
        'gender': {'type': 'string'},
        'email': {'type': 'string'},
    },
    'required': [
        'name',
        'gender',
        'email'
    ]
}

In this example, the properties keyword specifies that the object should contain name, gender, and email properties. Since these properties are optional by default, the required keyword is added to indicate that name, gender, and email are mandatory properties. An object missing any of these properties will fail validation, for instance, if the object is missing an email:

# would pass
validate(
    {
        'name': 'foo',
        'gender': 'male',
        'email': '[email protected]'
    },
    schema
)

# would raise exception
validate(
    {
        'name': 'foo',
        'gender': 'male'
    },
    schema
)

Moreover, as long as the object’s properties satisfy the JSON Schema settings, additional properties will also pass validation. For example, the previous object with additional properties beyond name, gender, and email will still pass validation:

# would pass
validate(
    {
        'name': 'foo',
        'gender': 'male',
        'email': '[email protected]',
        'account': 'bar',
    },
    schema
)

If you want additional properties in the object to be considered as format errors, you can add 'additionalProperties': False to disallow extra properties.

The additionalProperties keyword is used to control the handling of extra properties not listed in the properties keyword. By default, any additional properties are allowed.

The schema would thus become:

schema = {
    'type': 'object',
    'properties': {
        'name': {'type': 'string'},
        'gender': {'type': 'string'},
        'email': {'type': 'string'},
    },
    'required': [
        'name',
        'gender',
        'email'
    ],
    'additionalProperties': False,
}

Summary

In this post, we’ve explored more complex usages of the number, string, array, and object types. In the next installment, we’ll discuss how to mix multiple types to validate more complicated JSON data structures.

References

https://spacetelescope.github.io/understanding-json-schema/