“What was the moment this product clicked?” —
A software engineer or site reliability engineer who is on a rotating on-call schedule and whose relationship with PagerDuty is defined by the moments it wakes them up. They've been paged at 3am. They've resolved incidents from their phone in bed. They've also been paged for something that wasn't an incident — a flaky alert, a threshold set too low, a monitoring rule that was never updated after the system changed. Every false positive erodes their trust in the alert and their willingness to respond with full urgency next time. They manage this tension carefully.
What are they trying to do? —
What do they produce? —
It's 2:47am. PagerDuty fires. Payment processing latency is above threshold. They're awake, phone in hand. They open the incident. Linked to a Datadog alert. They open Datadog. The latency spike started 12 minutes ago and is ongoing. They check the deployment log — a deploy happened 40 minutes ago. They roll back. Latency normalizes in 3 minutes. Total time: 19 minutes. They write the incident summary, flag the deploy for post-mortem, and go back to sleep. This is the best version of this scenario. They know this.
Is on an on-call rotation that cycles every 1–2 weeks. Has PagerDuty mobile app with escalating alert tones. Has been on-call for 1–5 years. Manages their own alert rules — or inherits ones they didn't write. Reviews alert noise monthly — or plans to. Has written at least one runbook. Knows which runbooks are out of date. Has escalated an incident to a senior engineer at least twice. Has been that senior engineer at least once. Has strong opinions about alert thresholds that they will share at any retrospective.
Pairs with `sentry-primary-user` for the error-detection-to-incident-response chain. Contrast with `datadog-primary-user` for the monitoring-as-prevention vs. incident-response-when-it-fails distinction. Use with `gitlab-primary-user` for DevOps teams where the deployment pipeline is the most common incident source.