Observability Made Easy with Synthetic Monitoring

When Christina Yakomin (@SREChristina) started her journey toward synthetic monitoring, she owned a platform for containerized applications and all of the underlying infrastructure. But she didn’t own the applications themselves that were deployed to that infrastructure. This consisted of some application servers, cache servers, and web servers.

When she came onto the team, they had robust monitoring in place.

The Problem: Defining Healthy

But, in spite of that, it was actually pretty difficult to define what “healthy” was going to mean. For the platform, they decided to broadly consider that anything below a 500 is healthy, and that anything faster than three seconds would be a healthy response time.

They defined alerts for all of this, and everything was great… at least until she was on call for the first time. She was awakened by many calls, including false alarms.

So what gave?

Monitoring Percentages

In her case, it was that a small number of apps represented a large portion of the total traffic. So, anything happening to those apps disproportionately skewed the aggregate metrics and sent her a false alarm.

What to do? Well, next up was to monitor the percentage of healthy services.

But the false alarms continued. Why?

Well, because a small number of bad services were throwing off the service level.

Synthetic Monitoring

Next up, they decided to look into synthetic monitoring: artificially generated traffic to an application, mimicking the patterns of a typical user. This has a few advantages:

  1. Traffic is controlled.
  2. It complements real-traffic monitoring.
  3. It mimics real user patterns.

Here was their first iteration of synthetic monitoring:


Providing this REST API for the synthetic monitoring allowed her to generate this interesting visualization of traffic through the system.

The thickness of the line represents the traffic volume flowing from point A (Read more...)

*** This is a Security Bloggers Network syndicated blog from Sonatype Blog authored by Erik Dietrich. Read the original post at: