Adaptive thresholds enable users to optimize efficiency of alerts being received, by modifying threshold values for critical monitors dynamically using OpManager's Machine Learning-based predictive algorithms. It eliminates the need for manual intervention with deciding thresholds and fully automates the process of studying complex datasets and arriving feasible threshold values for each monitor.
Here's how OpManager's Adaptive Thresholds help simplify the process of determining threshold values:
On enabling Adaptive Thresholds, we collect what are called as "deviation values" from the user(s) in order to determine how much the polled value can vary before an alert should be raised. The three deviation values for the severity levels (Attention, Trouble and Critical), are collected in either percentages or values, and in either an increasing or decreasing order.
OpManager requires at least 14 days of performance data to start generating alerts. This might cause a minor delay with raising alerts when the Adaptive Threshold feature is enabled for the first time.
For each hour, OpManager's predictive algorithms provide Forecast value based on previously observed data patterns and behavior, and the deviation values configured by the user are applied based on that value. For example, consider the following deviation values.
Kindly note that the deviation can either be described in terms of values or in terms of percentage. Let us consider this with an example.
Attention | Trouble | Critical |
---|---|---|
5 | 8 | 15 |
We can configure the deviation value either by values or percentages, as described below.
1. Deviation in terms of value: If the forecast value for the CPU utilization of a device is 34 for the first hour of the day (0:00 - 1.00), then the corresponding value for raising an alert with severity "Attention" would be 34+5=39 (Forecast + Attention deviation). Similarly, Trouble and Critical values are also calculated every hour. The calculated values for 5 consecutive hours for different forecast values would be as follows:
Hour of time | Forecast value | Attention value | Trouble value | Critical value |
---|---|---|---|---|
0:00 - 1:00 | 34 | 39 | 42 | 49 |
1:00 - 2:00 | 36 | 41 | 44 | 51 |
2:00 - 3:00 | 44 | 49 | 52 | 59 |
3:00 - 4:00 | 58 | 63 | 66 | 73 |
4:00 - 5:00 | 54 | 59 | 62 | 69 |
2. Deviation in terms of percentage: If the forecast value for the CPU utilization of a device is 34 for the first hour of the day (0:00 - 1.00), then the corresponding value for raising an alert with severity "Attention" would be 34+(5% of 34)=36 (Forecast value + Attention deviation percentage of forecast value). Similarly, Trouble and Critical values are also calculated every hour. The calculated values for 5 consecutive hours for different forecast values would be as follows:
Hour of time | Forecast value | Attention value | Trouble value | Critical value |
---|---|---|---|---|
0:00 - 1:00 | 34 | 36 | 37 | 39 |
1:00 - 2:00 | 36 | 38 | 39 | 41 |
2:00 - 3:00 | 44 | 46 | 48 | 51 |
3:00 - 4:00 | 58 | 61 | 63 | 67 |
4:00 - 5:00 | 54 | 57 | 58 | 62 |
Before enabling the Adaptive Thresholds option, note that:
Adaptive thresholds can be enabled globally across OpManager from Settings -> Monitoring -> Adaptive Threshold. Navigate to this page and enable the "Enable Adaptive Threshold" option. You can also enable adaptive thresholds on an individual level from the respective performance monitor, perf group, or device template, and define the deviation levels in either value or percentage.
Once it has been enabled, it can be controlled on various levels based on your requirements:
Thank you for your feedback!