What is the difference between metrics, logs, and traces in observability?

Medium Topic: Observability June 17, 2026

Observability is the ability to understand the internal state of a system by examining its external outputs. The three pillars of observability are metrics, logs, and traces.

Metrics

Metrics are numerical measurements collected over time. They represent the current state or behavior of a system in an aggregated form. Examples: CPU usage percentage, request count per second, error rate, memory usage, p99 latency.

Metrics are best for: Dashboards and alerting on system health. Detecting anomalies and trends over time. Capacity planning. Tools: Prometheus, Datadog, CloudWatch, Grafana (visualization).

Logs

Logs are timestamped records of discrete events that occurred in a system. They provide detailed context about what happened and when. Examples: Application error messages, HTTP access logs, audit trails, debug output.

Logs are best for: Debugging specific errors or incidents. Audit trails for compliance. Understanding event sequences. Tools: ELK Stack (Elasticsearch, Logstash, Kibana), Loki, Splunk, CloudWatch Logs.

Traces

Traces follow a single request as it flows through distributed services, capturing the path and timing of each operation. A trace consists of spans – individual units of work with start time and duration. Trace IDs link all spans of a single request across services.

Traces are best for: Identifying bottlenecks in distributed systems. Understanding service dependencies. Debugging latency issues across microservices. Tools: Jaeger, Zipkin, AWS X-Ray, OpenTelemetry.

Using All Three Together

When an alert fires on a metric (e.g., high error rate), you look at logs to find the specific error messages, then use traces to see which service call failed and where the latency spike originated. OpenTelemetry is the open standard for collecting all three signal types across different languages and platforms.

← Previous How do you write effective Prometheus alerting rules?

Practice Similar Questions

Back to Observability Topics