Application performance monitoring (APM); Smarter observability
Final Design
Before
My role
Designing the smart alerts for application and other services.
Time: 4 Sprints
Deliverables
Workshop artifacts
Wireframes
Hifi Designs
User frustration
“Smart Alerts” may not offer the same capabilities as compared to “Custom Events”
When the business shifted from a monolith to a microservices architecture, they decided to deprecate “Custom Events” as they were not scalable or efficient in dynamic, containerized environments, due to their static configurations and lacked automation.
As a result, there was a transition from “Custom Events” to “Smart Alerts” However, the customer was not happy and they expressed concerns that Smart Alerts have limitations and do not offer the same functionality as Custom Events.
Research
Personas & need statements

Need: Automate anomaly detection
“I need an intelligent alerting system that automatically detects anomalies, adjusts thresholds, and minimizes alert fatigue”
Johnathan
Site Reliability Engineer

Need: Automate alerting
“I need an intelligent, automated alerting system that dynamically adapts to my environment and diagnose even unknown issues.”
Matt
DevOps
Aligning on the user requirements
We had regular sync-ups to create a UX strategy blueprint, aligning everyone on the project objectives by identifying goals and the pain points we wanted to solve
Pain points
Include restrictive evaluation windows, unclear time window types, unmet 1-second MTTD, misaligned DFQ filters, limited metric threshold options, and a cumbersome alert preview experience.
Evaluation windows: The smallest and largest time window that can be selected in Smart Alerts is still too small or large for me.
Time windows: I don't know whether time windows in Custom Events and / or Smart Alerts are sliding windows or tumbling windows
Scoping: I'm not sure if the filter I applied using DFQs for deprecated Custom Events match 100% to the filters in Smart Alerts
Setting thresholds: I don't understand why some metric threshold types are not available in some "group by" options (e.g., Adaptive Thresholds not available for per-endpoint grouping)

High-level user journey stages (proposed) - creating an AP smart alerts
This user journey outlines creating an AP Smart Alert, from selecting a blueprint and defining scope to configuring conditions, thresholds, persistence, and finalizing alert channels, properties, and payloads for effective monitoring.

Redesigning page layout
Interface enhances usability with an always-visible preview chart, optimised viewport usage, and reduced cognitive overload. A structured left-to-right control layout, "Group by" selector with guidance, and a dedicated "previous step" button improve navigation, clarity, and decision-making.

Redefining smart alerts layout for better usability
Applying the Zeigarnik Effect, our initial transition from custom events to smart alerts followed a specific layout. However, to enhance user experience, we've redefined the page structure for better clarity and usability.

Learnings
Strategic Alignment: Regular sync-ups and a unified UX strategy blueprint were essential, aligning on user requirements helped us clearly identify pain points and set actionable goals for transforming Smart Alerts.
User-Centric Problem Solving: By deeply understanding frustrations, such as restrictive evaluation windows and misaligned filters, we drove design enhancements that addressed specific user needs and improved overall usability.
Iterative Redesign: Applying principles like the Zeigarnik Effect, we continuously refined our page layout and interaction flows. This iterative approach not only improved navigation and clarity but also reinforced our commitment to a seamless user experience.