Let’s explain the situation with a concrete example. Given the following table structure:
CREATE TABLE data (
name TEXT PRIMARY KEY,
categories TEXT[]
);
The two following queries yield the exact same result.
SELECT COUNT(*) FROM data WHERE 'cafes' = ANY(categories);
SELECT COUNT(*) FROM data WHERE categories && ARRAY['
cafes
']::TEXT[];
To get there, they use different operators though, and, under the hood, it’s not clear how different it gets. The first one, using the ANY operator, will loop through the categories array, and return true as soon as a match is found. The second one is using “&&”, the overlap operator, and returns true if the arrays have any elements in common. In our case, the right hand side array has a single element, so that query really is equivalent to the first version, using ANY.
The thing is, if you are to index specific categories, the index needs to match exactly what the query will be doing. Following the same order, you would need
CREATE INDEX idx_categories_2_any ON data (name) WHERE 'cafes' = ANY(categories);
for the first query.CREATE INDEX idx_categories_2_intersect ON data (name) WHERE categories && ARRAY['cafes']::TEXT[];
for the second query.
Even though the queries yield identical results, is seems like the engine doesn’t know that they are equivalent. If you only have the “ANY” index, when using the “&&” query, the engine won’t use the index. The opposite is also true: If you only have the “&&” index and use the “ANY” query, the engine also won’t use the index.
I confirmed it’s the case from postgres 15 up to 17. I did not try version 14 or any lower one.
If you’re wondering if one form has better performance than the other, it yet again seems like they are identical. I tested this over production data at Local Logic, on our table for Point of Interests (POIs), which has over 6M rows, with 29 categories. Running EXPLAIN (analyse, buffers) SELECT...
on both form of the query yielded the exact same query cost.
What should you do to avoid the trap? Depending on your context, I would first recommend creating both indexes. That way, you won’t create a trap for your team members where it’s very easy to use the incorrect form if you are unaware of that peculiarity. If you are in a setting where the index is super large and there is a good reason not to duplicate it, then you will need to make sure every invocations use the proper form, which may be more involving.