Leveraging pydantic to avoid logging PII

PII, short for Personally Identifiable Information, is often tricky to manage. Many countries have legislation around it. How much can you store, how can you store it, what is your procedure to wipe out all data about a given person? In some regards, the best way to deal with it is to view it as toxic waste. You want as few as possible, and you want guard rails around it not to get intoxicated.

In the case of logging, it’s best to just not log it. This way, you will never have to go back and figure out a way to prune your logs of PII. In this post, we will explore how to leverage pydantic models to avoid logging PII by accident. This post will use FastAPI for its examples, but it’s not required at all. It will take some input (including PII), log it, and return it.

All examples are available here.

httpie will be used as the HTTP client.

Base Case – Logging PII

Here we have a FastAPI server with a single POST endpoint: /pii.

# Base case, all PII is logged
from fastapi import FastAPI
from pydantic import BaseModel, EmailStr

app = FastAPI()


class PersonalInfo(BaseModel):
    name: str
    email: EmailStr


class ResponseModel(BaseModel):
    status: str
    data: PersonalInfo


@app.post("/pii")
def post_pii(personal_info: PersonalInfo) -> dict:
    print("Received personal information:", personal_info)
    return {"status": "success", "data": personal_info}

If we call the endpoint

http http://127.0.0.1:8000/pii name=Frank-Mich email=fm@mail.com

Response

{
    "data": {
        "email": "fm@mail.com",
        "name": "Frank-Mich"
    },
    "status": "success"
}

Server Logging

Received personal information: name='Frank-Mich' email='fm@mail.com'

So, the client response is just fine, but the server logged PII: Name and emails are PII. Let’s iterate to fix this.

Leveraging pydantic SecretStr

pydantic has a SecrtStr type. It acts mostly as a wrapper around str. It renders as SecretStr('**********'), and to get the actual value you need to invoke the get_secret_value method. Let’s use it in our example and see how it goes.

# improvement: by using SecretStr, PII is not logged, but not returned either
from fastapi import FastAPI
from pydantic import (
    BaseModel,
    EmailStr,
    SecretStr,  # new import
)

app = FastAPI()


class PersonalInfo(BaseModel):
    name: SecretStr  # changed to SecretStr
    email: EmailStr


class ResponseModel(BaseModel):
    status: str
    data: PersonalInfo


@app.post("/pii")
def post_pii(personal_info: PersonalInfo) -> dict:
    print("Received personal information:", personal_info)
    print("name:", personal_info.name)  # still renders '**********'
    print(
        "name.get_secret_value():", personal_info.name.get_secret_value()
    )  # renders the actual value
    return {"status": "success", "data": personal_info}

If we call the endpoint

http http://127.0.0.1:8000/pii name=Frank-Mich email=fm@mail.com

Response

{
    "data": {
        "email": "fm@mail.com",
        "name": "**********"
    },
    "status": "success"
}

Server Logging

Received personal information: name=SecretStr('**********') email='fm@mail.com'
name: **********
name.get_secret_value(): Frank-Mich

Great, PII is now hidden from the logs. Not so great, it’s also hidden from the server response. Let’s look into how we can use annotations to make it (semi) reusable.

Creating a Custom Type With the Annotated Pattern

One way to create custom types in pydantic is to use the annotated type pattern.

# improvement, with the Annotated type PIIStr, we can hide it when logging,
# but still return the value when serializing to json
from typing import Annotated  # new import

from fastapi import FastAPI
from pydantic import (
    BaseModel,
    EmailStr,
    PlainSerializer,  # new import
    SecretStr,
)

app = FastAPI()

# introducing PIIStr
PIIStr = Annotated[
    SecretStr,  # under the hood, that's the real type
    PlainSerializer(  # when serializing ...
        lambda x: x.get_secret_value(),  # ... render the value ...
        return_type=str,
        when_used="json",  # ... but only when serializing to json
    ),
]


class PersonalInfo(BaseModel):
    name: PIIStr  # switch to PIIStr
    email: EmailStr


class ResponseModel(BaseModel):
    status: str
    data: PersonalInfo


@app.post("/pii")
def post_pii(personal_info: PersonalInfo) -> dict:
    print(
        "raw personal information:",
        personal_info,
    )
    print(
        "default (python) model dump of personal information:",
        personal_info.model_dump(),
    )
    print(
        "json model dump of personal information:",
        personal_info.model_dump(mode="json"),
    )
    return {"status": "success", "data": personal_info}

Let’s focus here on the introduction of PIIStr. Under the hood, the type is still SecretStr. All its input validator configurations, like Field(min_length=10), are supported. When manipulating the value, it’s used exactly as SecretStr. It’s only the serialization that has a custom behaviour, as defined by the PlainSerializer that is part of the annotation. It is limited to when the serialization target is JSON, as defined by the when_used="json" parameter, meaning that default serialization will still render SecretStr(‘**********’). So, to really get the value when serializing, one has to intentionally set the mode to JSON. For the rest of the code, the only important change is the name field, of the PersonalInfo model, which has been set to PIIStr. There is some extra logging, but that’s only to demonstrate the behaviour of PIIStr typed fields.

Let’s call our latest version of the endpoint.

http http://127.0.0.1:8000/pii name=Frank-Mich email=fm@mail.com

Response

{
    "data": {
        "email": "fm@mail.com",
        "name": "Frank-Mich"
    },
    "status": "success"
}

Server Logging

raw personal information: name=SecretStr('**********') email='fm@mail.com'
default (python) model dump of personal information: {'name': SecretStr('**********'), 'email': 'fm@mail.com'}
json model dump of personal information: {'name': 'Frank-Mich', 'email': 'fm@mail.com'}

Success! On the server side, the first logging line, just printing the object, hides the PII. The second line, using a default model_dump, also hides the PII. The last one, using a JSON model_dump, renders the value, as expected. (Remember, the PlainSerializer parameter when_used="json").

The response is also what we want it to be. You may have guessed why: under the hood, FastAPI serialized the response with the JSON mode.

What About Non str Types

We’re not done yet. We also need to mask the email, because that is also PII. We will use a very similar technique, with the added complexity of validating the input against the advanced type.

# Improvement: Deal with the EmailStr field
from typing import Annotated

from fastapi import FastAPI
from pydantic import (
    BaseModel,
    BeforeValidator,  # new import
    EmailStr,
    PlainSerializer,
    SecretStr,
)

app = FastAPI()


PIIStr = Annotated[
    SecretStr,
    PlainSerializer(
        lambda x: x.get_secret_value(),
        return_type=str,
        when_used="json",
    ),
]

# Introducing PIIEmailStr
PIIEmailStr = Annotated[
    SecretStr,
    BeforeValidator(lambda x: EmailStr._validate(x)),  # introducing the BeforeValidator
    PlainSerializer(
        lambda x: x.get_secret_value(),
        return_type=str,
        when_used="json",
    ),
]


class PersonalInfo(BaseModel):
    name: PIIStr
    email: PIIEmailStr


class ResponseModel(BaseModel):
    status: str
    data: PersonalInfo


@app.post("/pii")
def post_pii(personal_info: PersonalInfo) -> dict:
    print("Received personal information", personal_info)
    return {"status": "success", "data": personal_info}

This new version of the code introduces the PIIEmailStr type. It’s mostly identical to PIIStr, with the distinction of having an extra annotation parameter: BeforeValidator. Its purpose is to validate the input through EmailStr via it’s (oddly) private method _validate. Note that this technique only works because nothing, in this use case, is making use of a method specific to EmailStr. Otherwise another approach would be required.

Let’s call the endpoint to confirm everything works as expected. First, let’s do it with an invalid email to confirm the validator is indeed being applied.

http http://127.0.0.1:8000/pii name=Frank-Mich email=fm

Client Response

{
    "detail": [
        {
            "ctx": {
                "reason": "An email address must have an @-sign."
            },
            "input": "fm",
            "loc": [
                "body",
                "email"
            ],
            "msg": "value is not a valid email address: An email address must have an @-sign.",
            "type": "value_error"
        }
    ]
}

Server Logging

None (aside from FastAPI mentioning a 422 status code)

Great. Let’s now do a proper call.

http http://127.0.0.1:8000/pii name=Frank-Mich email=fm@mail.com

Response

{
    "data": {
        "email": "fm@mail.com",
        "name": "Frank-Mich"
    },
    "status": "success"
}

Server Logging

Received personal information name=SecretStr('**********') email=SecretStr('**********')

Alternatives

The technique presented here is just one of many techniques. As an example, here is the one that inspired me. What I like about it, is that there is a single annotation, PiiType, so no need to have a custom PII annotation per type, like in this post for PIIStr and PIIEmailStr. There are two things I don’t like about its implementation. First: it does not default to safety. If the parameter "hide_pii" is not set, it’s rendered. That is easy enough to fix: we can fiddle with the serializer_pii function to hide the value by default. The second problem seems unfixable: If the model is not serialized, like when using a plain print or log call, the serializer_pii function is never invoked and the text is rendered in plain text. For the solution in the current post, that is not a problem because the inherent type is SecretStr, so whatever the context, it hides its content by default.

Conclusion

I don’t have anything to add. I am curious to hear your solutions if you have better ones. You can share links to your code samples in the comments or on the related pydantic discussion.

Thank you!

Cover Picture: Wise Monkey See No Evil Statuette Resin Black Gold, from Evideco

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.