Measuring Sentry Impact on AWS Lambda Cold Start and Time to Handler

At Local Logic, we rely heavily on lambda functions. We currently monitor our environment with Sentry via their sdk and know that we could use, as an alternative, the sentry layer. As a good developer, I wondered what were the pros and cons. Obviously, one of them is: which option is the fastest? So, I compared them and will share the results in this post. Without further introduction, let’s dig into the experiment.

Setup

As all good scientific experiments, I will lay out the setup environment for posterity.

  • Runtime: python 3.11
  • memory: 1024MB
  • architecture: x86_64

To benchmark, I used some real life lambda we have. I reconfigured it for each of the following test:

  1. No sentry
  2. sentry-sdk 1.45.0 (The one we’re using)
  3. sentry-sdk 2.1.1 (The one I just realized is out)
  4. Sentry layer arn:aws:lambda:us-east-2:943013980633:layer:SentryPythonServerlessSDK:112

To reproduce the sentry layer as much possible, the python code to initialize the sdk was

import os
import sentry_sdk
from sentry_sdk.integrations.aws_lambda import AwsLambdaIntegration

sentry_sdk.init(
    dsn=settings.os.environ["SENTRY_DSN"],
    integrations=[AwsLambdaIntegration(timeout_warning=true)],
    traces_sample_rate=os.environ["SENTRY_TRACES_SAMPLE_RATE"],
    environment=os.environ["STAGE"],
)

To be able to measure, I fiddled with the lambda in the following ways:

  1. Set the timeout to 5 seconds.
  2. On the very first line of the handler function, log, with structured logging, “ready” with the key/value of context.remaining_time_in_milis(). This was needed to measure how much time it takes for the sentry layer to actually invoke our code.
  3. On the next line, run sleep(6). This makes the lambda hang until it times out, which eases testing as all the calls made within this timeout period will trigger a new lambda and hence a cold start.

To generate results, I ran, in bash, the following tight loop.

for i in {0..20}; do http -A bearer -a $BEARER_TOKEN https://<endpoint that targets the lambda> & done;

The fact that I’m using an http interface to trigger the test here is irrelevant. I could have called it with direct invocation. It does not matter, what matters is that the loop sends each http call to the background, effectively generating 21 almost simultaneous http calls.

To gather the metrics, I used log insights with the following query.

stats count(@initDuration), min(@initDuration), avg(@initDuration), max(@initDuration), stddev(@initDuration), count(remaining_time_in_milis), min(remaining_time_in_milis), avg(remaining_time_in_milis), max(remaining_time_in_milis), stddev(remaining_time_in_milis) by bin(5m)

As you may have gathered from that query, I had to run the test 5 minutes apart as not to mix the results. That was easy to do as setting up the code, the layer, the dependencies and deploying between each test took about that. As for the counts, since the logs are eventually consistent, I had to wait until they reached 21 to make sure I had appropriate results.

Results

Sentry setupAverage @initDuration ms (aka cold starts)Std Dev @initDuration msAverage time to handler ms (5 – remaining_time_in_milis)Total avg init time ms
No sentry3101.317223.7690.8093102.127
Sentry-sdk 1.45.03485.301251.7201.6193486.92
Sentry-sdk 2.1.13377.584226.0861.76193379.346
Sentry Layer Serverless 112963.46079.4563858.6674822.127
Results Grid

I omitted the counts, min, max and std deviation from this table as it would have been too large for this post.

Interpretation

Let’s get the most surprising result out of the way: The layer is impressively… disappointing. With a major ~1443ms longer total average init time. The winner is … there is no winner. Purely looking at the average, you could think sentry 2.1.1 is the winner, but taking into account the std dev and the sample size, it’s statistically inconclusive.

From this table, the need to measure “time to handler” is obvious. I can only suppose that the sentry layer, instead of loading our app as part of the cold start, does it inside its handler, and hence it counts towards the regular execution time. It can be a problem, and it’s one I faced while testing, as this “inside the regular execution time” also count towards the timeout, which is undesirable. When I first started to test, my timeout was set to 3 seconds and the sentry layer tests failed because they timed out before reaching my own handler.

Using the sentry-sdk and initializing it at the module level / global namespace, increases the cold start time, which is expected.

If you are screaming at our cold start time of ~3 seconds, take them into context. They happen for about 0.2% of the queries. Then you may ask: Was it worth it to spend so much time on these benchmarks? Yes! For science!

Mentions

Simon Lemieux provided mathematical guidance about the statistical reliability of the 100ms seconds “advantage” of the sdk 2.1.1 over 1.45.0.

The cover image is Abstract Clock Vortex HD Wallpaper by Laxmonaut

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.