Frank-Mich L'Heureux

Postgres – Array Index Equivalence Miss

Posted on 2025/05/232025/05/23 by Frank-Mich L'Heureux

Let’s explain the situation with a concrete example. Given the following table structure:

CREATE TABLE data (
    name TEXT PRIMARY KEY,
    categories TEXT[]
);

The two following queries yield the exact same result.

SELECT COUNT(*) FROM data WHERE 'cafes' = ANY(categories);
SELECT COUNT(*) FROM data WHERE categories && ARRAY['cafes']::TEXT[];

To get there, they use different operators though, and, under the hood, it’s not clear how different it gets. The first one, using the ANY operator, will loop through the categories array, and return true as soon as a match is found. The second one is using “&&”, the overlap operator, and returns true if the arrays have any elements in common. In our case, the right hand side array has a single element, so that query really is equivalent to the first version, using ANY.

The thing is, if you are to index specific categories, the index needs to match exactly what the query will be doing. Following the same order, you would need

CREATE INDEX idx_categories_2_any ON data (name) WHERE 'cafes' = ANY(categories); for the first query.
CREATE INDEX idx_categories_2_intersect ON data (name) WHERE categories && ARRAY['cafes']::TEXT[]; for the second query.

Even though the queries yield identical results, is seems like the engine doesn’t know that they are equivalent. If you only have the “ANY” index, when using the “&&” query, the engine won’t use the index. The opposite is also true: If you only have the “&&” index and use the “ANY” query, the engine also won’t use the index.

I confirmed it’s the case from postgres 15 up to 17. I did not try version 14 or any lower one.

If you’re wondering if one form has better performance than the other, it yet again seems like they are identical. I tested this over production data at Local Logic, on our table for Point of Interests (POIs), which has over 6M rows, with 29 categories. Running EXPLAIN (analyse, buffers) SELECT... on both form of the query yielded the exact same query cost.

What should you do to avoid the trap? Depending on your context, I would first recommend creating both indexes. That way, you won’t create a trap for your team members where it’s very easy to use the incorrect form if you are unaware of that peculiarity. If you are in a setting where the index is super large and there is a good reason not to duplicate it, then you will need to make sure every invocations use the proper form, which may be more involving.

True sqlalchemy unit testing

Posted on 2025/03/272025/03/27 by Frank-Mich L'Heureux

tl;dr; Let’s mock the sqlalchemy calls and look at the generated SQL statements.

I’ve read a couple of articles about testing when sqlalchemy is involved. None of them really did unit testing: they all ended up doing local integration tests by spinning up a DB of some sort. While that can work pretty well in many many use cases (in-memory sqlite!), it can also be a PITA when you need to start a docker container, wait for it to boot, install plugins (moddatetime, pgtrimt, unaccent, …), instantiate the models, insert the data, and finally run the tests. While these integration tests have their value, in such slower setup there are benefits to constrain them to the bare minimum instead of being the default way of testing.

Measuring Sentry Impact on AWS Lambda Cold Start and Time to Handler

Posted on 2024/05/152024/05/15 by Frank-Mich L'Heureux

At Local Logic, we rely heavily on lambda functions. We currently monitor our environment with Sentry via their sdk and know that we could use, as an alternative, the sentry layer. As a good developer, I wondered what were the pros and cons. Obviously, one of them is: which option is the fastest? So, I compared them and will share the results in this post. Without further introduction, let’s dig into the experiment.

It’s a Trap! Postgis Geometry with SRID 4326 is not a Geography

Posted on 2024/04/302024/04/30 by Frank-Mich L'Heureux

Long story short, you create a table with a geometry column that stores points with srid 4326 (read geographic lat/lng, or should I write lng/lat, coordinates). That should be interpreted as geography right? Not at all!

Hybrid Pydantic/FastAPI Model

Posted on 2024/03/252024/03/26 by Frank-Mich L'Heureux

Short story, I have an API that doesn’t run on FastAPI, but I still use FastAPI to generate documentation in OpenAPI format.

So I have models for endpoints and I want them to work with and without FastAPI, because in their pure version FastAPI is not deployed. (To make a long story short, installing FastAPI makes the package a bit heavier to deploy, but especially the cold starts of the AWS lambdas because of the Sentry initialization delay. Sentry detects that FastAPI is present in the environment and automatically adds monitoring, which takes a good 500ms longer than when it’s missing. Measured in December 2023).

Anyway, drawing heavily on a StackOverflow answer, I produced this.

# models.py
from typing import Annotated, Any, cast

from pydantic import BaseModel, Field
from pydantic.fields import FieldInfo

try:
    from fastapi import Query  # type:ignore[import-not-found]

    def FieldQuery(*args: Any, **kwargs: Any) -> FieldInfo:  # noqa:N802
        return cast(FieldInfo, Field(Query(*args, **kwargs)))

except ImportError:

    def FieldQuery(*args: Any, **kwargs: Any) -> FieldInfo:  # noqa:N802
        return cast(FieldInfo, Field(*args, **kwargs))  # type: ignore[pydantic-field]


class QueryParams(BaseModel):
    limit: Annotated[
        int,
        FieldQuery(
            -1,
            description="The limit of values to fetch. -1 means no limit.",
        ),
    ]

And for the routes:

# routes.py
from fastapi import FastAPI

from models import QueryParams

app = FastAPI()

@app.get("/things")
def get_things(
    params: Annotated[QueryParams, Depends()],
) -> dict:
    ...

Is it perfect? With the “type: ignore”, “noqa”, “import try/except”, not at all. It looks like a big hack.

Does it work? Oh yeah!

Note: The “noqa” and “type: ignore” tags are respectively linked to ruff and mypy linters.

My Python Toolbox, 2024 Edition

Posted on 2024/01/172024/01/18 by Frank-Mich L'Heureux

I’ve been wanting to publish my favorite Python tools annually for a long time now. It’s practical for a lot of people, and it allows to see how they evolve over time and the projects I contribute to.

So here’s the first edition of this list.

Recipe: Migrating a WordPress Blog Over AWS Lightsail

Posted on 2024/01/152024/01/15 by Frank-Mich L'Heureux

The blog you are currently on used to be hosted on some Virtual Private Server. My hosting provider decided to quit the business and I had to leave. He kindly suggested that I look at AWS Lightsail as a migration option. I did, and this is where this blog is now hosted.

This blog is not new. It already had content. It also has customisations like plugins and themes. This post will focus on how to export your content and customisations from a former blog and import them on Lightsail. Its accuracy may vary depending on the level of customisation your blog has. Command line (CLI) experience is required.

Printing an Aware Date in Python

Posted on 2023/09/072023/11/08 by Frank-Mich L'Heureux

A handy code sample that I keep forgetting about.

from datetime import datetime, timezone

now = datetime.now(tz=timezone.utc)
print(now.astimezone().isoformat())

FastAPI Stripe Webhook Template

Posted on 2023/08/302024/05/02 by Frank-Mich L'Heureux

A FastAPI equivalent for verify-events-came-from-stripe.

import os

from http import HTTPStatus
from typing import Annotated

import stripe
from fastapi import Depends, FastAPI, Header, HTTPException, Request


app = FastAPI()


async def get_body(request: Request) -> bytes:
    return await request.body()


@app.post("/webhook", status_code=HTTPStatus.NO_CONTENT)
def post_report(
    stripe_signature: Annotated[str, Header(alias="stripe-signature")],
    body: bytes = Depends(get_body),
) -> None:
    endpoint_secret = os.environ["ENDPOINT_SECRET"]

    try:
        # signature validation
        event = stripe.Webhook.construct_event(body, stripe_signature, endpoint_secret)
    except ValueError as e:
        # Invalid payload
        raise HTTPException(status_code=HTTPStatus.BAD_REQUEST) from e
    except stripe.error.SignatureVerificationError as e:
        # Invalid signature
        raise HTTPException(status_code=HTTPStatus.UNPROCESSABLE_ENTITY) from e

    print(event)

    return

A possible catch, one that I ran into, is that I had FastAPI convert the request.body to a dictionary in the function parameters (so body: dict). Then I serialized it to a string for the validation step… and it failed because it was no longer identical to what came in.

Serverless: How to share the ids of an existing API Gateway

Posted on 2023/01/312023/01/31 by Frank-Mich L'Heureux

The serverless documentation indicates how to share the ids of an API Gateway, but the technique described there seems to work only for a new deployment. (In my case, following this technique created a 2nd empty API Gateway whose ids were shared, which was, unsurprisingly, not fulfilling my needs).

Here is how to do it for an existing deployment.