Articles

Using CloudWatch to Monitor your AWS Lambda and send Alerts on Errors


CloudWatch is an amazing tool that Amazon has to gather logs and metrics, and also create alarms based on them. In this article we're going to see some useful tips to create alarms and send notifications when your Lambda functions end with error or log messages with a specific pattern.

Note that in all cases, CloudWatch with send the notifications through an SNS Topic, and you can create as many SNS topic subscriptions as needed to deliver the notifications where you need to (SMS, HTTP, Email, etc).

Use CloudWatch to send notifications when a Lambda function ends with error

I always setup this kind of notifications, because one would expect that the Lambda execution finishes successfully, so it's always a good thing to get notified when things go wrong.

To setup an alarm when your Lambda fails (this could be because of an error in the code itself or because your website is down), go to your CloudWatch Console:

  • Click "Alarms" at the left, and then Create Alarm.
  • Click "Lambda Metrics".
  • Look for your Lambda name in the listing, and click on the checkbox for the row where the metric name is "Errors". Click "Next".
  • Enter a name and description for this alarm.
  • Setup the alarm to be triggered whenever "Errors" is above 0, for 1 consecutive period(s).
  • Select "Sum" as the Statistic and 5 minutes (or the amount of minutes that's reasonable for your use case) in the "Period" dropdown.
  • In the "Notification" box, click the Select a notification list dropdown and select your new SNS endpoint.
  • Click "Create Alarm".

That's it. And it doesn't matter if your Lambda is being used by an API Gateway Method, or it's being invoked by a CloudWatch event, or a Kinesis stream. You will always know if your code reported an error when finished.

A CloudWatch alarm to monitor that a Lambda is executing periodically

This one is kind of a "sanity check". If you've setup your lambda to be executed periodically by a CloudWatch Event with Schedule, it might be good to know if (perhaps by accident), the event was disabled or removed from the list of triggers for the Lambda.

If you've setup the alarm for the errors in the above section, it will suffice to setup a notification in the same alarm, but to be triggered when the alarm is in the "INSUFFICIENT DATA" state. This means that CloudWatch does not have enough information for the metric in the expected period of time (and when we're talking about a Lambda that is expected to be executed periodically, this also means that the Lambda is not being executed at all).

If you haven't setup the alarm for the Lambda errors (or you want to create a separate alarm for this use case anyway), go to your CloudWatch Console:

  • Look for your Lambda name in the listing of metrics, and click on the checkbox for the row where the metric name is "Invocations". Click "Next".
  • Enter a name and description for this alarm.
  • Setup the alarm to be triggered whenever "Invocations" is less than 3, for 1 consecutive period(s).
  • Select "Sum" as the Statistic and 5 minutes (or the amount of minutes that's reasonable for your use case) in the "Period" dropdown.
  • In the "Notification" box, click the Select a notification list dropdown and select your new SNS endpoint.
  • Click "Create Alarm".

A custom CloudWatch metric to monitor your Lambda logs

This one is really useful, because it will allow you to monitor for specific strings in your Lambda logs and send an alert when found. Let's say your Lambda function logs messages like:

You can then send alerts when a log like "[ERROR]" is found by filtering using patterns in your logs in CloudWatch like this:

  • Go to your CloudWatch console.
  • Select the checkbox next to your lambda log group (something like /aws/lambda/YOUR_LAMBDA_NAME).
  • Click "Create Metric Filter".
  • In filter pattern enter something like "[ERROR]".
  • Click "Assign Metric".
  • Enter a name for this metric (this name can be later on be used to setup an alarm).
  • Click "Create filter".
  • Go to the "Alarms" section.
  • Click "Create Alarm".
  • At the bottom of the list of metrics categories, find the "Custom Metrics" dropdown, and select "Log metrics".
  • Find and select your metric name, click "Next".
  • Select a reasonable period of time, the default of 5 minutes is usually ok, then select "Sum" as the Statistic.
  • Setup the alarm to be triggered when the metric is "> 0" in 1 period.

Monitor your JSON logs with CloudWatch

Now if your lambdas are logging in JSON format, like:

You can use a pattern like { $.level = "ERROR" } to filter what you need. There are actually quite a lot of complex filters you can setup, and you can find the syntax for CloudWatch log filters here.

Conclusion

To sum up: these are the alerts that I like to setup almost always and every time I want to trust that a Lambda is doing it's job as it's supposed to.

You can rest assure that you will know when something goes wrong (either lack of enough invocations in a given period of time), specific log message are being produces, or if your code ended up in error.

Enjoy!