Health Insights Overview
Health Insights is a network health application that performs real-time key performance indicator (KPI) monitoring, analytics, alerting, and troubleshooting.
It builds dynamic detection and analytics modules that allow operators to monitor and alert network events with user-defined logic.
It also provides prebuilt KPIs that are based on model-driven and SNMP-based telemetry. The Health Insights Recommendation Engine uses data mining to analyze your network and then recommends which telemetry paths you should enable and monitor.
Important |
There must be an extended Cisco Crosswork Data Gateway that is configured if you plan to gather telemetry using KPIs and Health Insights. |
Note |
For the recommendation engine to work in Health Insights, you must ensure that connectivity is established between Cisco Crosswork Health Insights and the device. Enable the NETCONF protocol on the device itself, in the device configuration in Crosswork and in the credential profile for the device in Crosswork. |
Health Insights is configured to gather the link bandwidth usage data for device links. After a time period, it establishes a performance baseline for each link. If a link deviates from its baseline causing an alert to be generated, Health Insights detects it and you can then go and run the Playbook to reconfigure the network to resolve the issue.
The following high-level example gives a basic view of how Health Insights interacts with the other Cisco Crosswork Network Controller components:
-
Health Insights detects an anomaly: The optical bit error rate that you are monitoring on each of the links in your network suddenly increases.
-
Change Automation Playbooks automate remediation: Switch to the backup link immediately. Restore service. Open a ticket (manually initiated by the user). Alert the network engineer.
The complexity of the interaction will depend on the type of anomaly, how it is detected, and the Playbooks you choose to use to remediate it. You can orchestrate any form of network remediation using Change Automation Playbooks, helping you to close the loop on problem resolution and minimize network downtime.
Health Insights Alert Dashboard
The Health Insights alert dashboard provides device health summary information that is based on real-time network state events. The dashboard displays a network view of KPI sensors that are paired to specific device groups. Health Insights raises customizable events and alerts that are based on user-defined logic.
Note |
Alert dashboard displays individual KPI alerts, although the mechanism of enabling KPI on a device is done through a KPI profile. |
To display the Health Insights dashboard, choose
from the main menu.Item | Description |
---|---|
1 |
Device/KPI Alert Selector: Click here to toggle between device alert and KPI alert information. |
2 |
Filters: This field lets you filter the alert dashboard information by associated tag names. To select a tag, do one of the following:
Tag filters you create are not saved. If you open another window and then return to the alert dashboard, you need to re-create tag filters. |
3 |
Alerts History: This dashlet shows the total number of device alerts or KPI alerts that have been raised during the chosen time period, with detailed time lines showing both individual sets of alerts and the overall alert trend. |
4 |
Alerts Trend Line: This line shows the overall trend in alerts for the chosen time period. You can use the Alerts Trend Line to select and zoom in on a specific time period within the Alerts History Line, as follows:
To restore the full view of the Alerts History Line, click on any point outside of the light gray shading on the Alerts Trend Line. |
5 |
All Impacted Devices/All Impacted KPIs: When selected, this dashlet provides a complete list of all devices or KPIs affected by alerts. The information for each affected device or KPI includes:
|
6 |
Top 20 Impacted Devices/ Top 20 Impacted KPIs: When selected, this dashlet displays a map of tiles, each tile representing one of the 20 devices or KPIs with the most alerts during the selected time period. The amount of space that each tile occupies in the map corresponds to the number of alerts raised: the more alerts, the bigger the tile. Also, the tiles are color coded. The colors correspond to the Alert Severity. To view more detailed information for a particular device or KPI, click the device or KPI name link in the center of the tile. |
7 |
Device/KPI Watchlist: When selected, this dashlet provides a list of all devices or KPIs, that you had selected from + Manage Deveice/KPI Watchlist, which are affected by alerts. The information for each affected device or KPI includes:
|
8 |
Alerts History: The Alerts History line shows alerts as discrete bar indicators whose height represents the total number of alerts gathered at each point in time. To see the total for each type of alert, hover your mouse cursor over the bar indicator. You can also use the Alerts Trend line to zoom in on particular portions of the alert history. |
9 |
Timeframe: Specifies the time period for which the dashboard provides alert information: The past one hour, past day, past week, and so on. Please note that the dashboard provides alert information only, not telemetry information. |
10 |
Alert Severity: Maps the bar indicator colors that are used in the Alert History dashlet to the corresponding alert severity. To display or hide the alerts for a particular severity, click the check box for that severity. An enabled check box indicates that alerts of that severity have been raised and are being displayed. A clear check box indicates that the alerts of that severity are either not being displayed or have not been raised during the displayed time period. |
11 |
Auto Refresh: Specifies how often the dashboard is automatically refreshed. |
12 |
Refresh Icon: Refreshes the dashboard. |
13 |
Page Settings: Provides the default page settings for that particular session. You can customize the page display based on Alert Type, Timeframe, Auto Refresh, Detail Display, and Alert Severity. You can also change the weightage here for the impact score calculation. |
Note |
The individual alerts for any specific KPI are shown in the dashboard. Alerts resulting from the alert group logic are not shown in the dashboard. Only the API shows the impacted results. |
View Alerts for Network Devices
After enabling KPIs on a device, you can view alerts for that device and get data for each performance indicator being monitored.
Note |
The KPIs shown in the following steps are examples. There are many more KPIs available in Health Insights. For the complete list, see List of Health Insights KPIs. |
Procedure
Step 1 |
From the main menu, choose Health Insights Alert dashboard is displayed. . The |
||
Step 2 |
Make sure that the Device Alerts view is displayed (select the Device Alerts toggle, if needed). Then scroll down below the Alert History panel and click the All Impacted Devices tab. The dashboard displays a list of devices with alerts. |
||
Step 3 |
Click the Device Name for the device whose details that you want to view. Health Insights displays the device's basic Overview information, Alert History, a Topology map, and the list of the device's currently Enabled KPIs. The Topology map is a version of the map that you see when you select Topology from the main menu. |
||
Step 4 |
Under Enabled KPIs, click on the desired KPI to view the detailed KPI information. A graphical representation of that KPIs data, along with a list of alert messages and other information, is displayed on the right. A graphical time-series representation of the selected KPI is displayed for a 72-hour window with hourly slots. |
||
Step 5 |
Click and move the brush on the timeline to select the desired time slot (selects up to 6 hours) to view the corresponding Raw or Summary graphical data. Move the mouse cursor over any data point in the graph to view additional pop-up information for that data point. A red line or tag represents a point at which the KPI was triggered. This can occur on any subscribed statistic the KPI is monitoring. Health Insights collects and identifies the time points and frequency, which help determine when these events become an operational concern.
|
Telemetry Data Retention
Telemetry data is collected from devices and stored in the time-series database. This data is retained for the last 72 hours, and is used in the Health Insights Alert dashboard to identify alerts using a process that is known as stream based alerting. The resulting 'alerts'; if any, are stored in the same time-series database. The alerts are retained for 30 days, and the messages showing the duration of alerts are displayed in the top-right corner of the Device/KPI view in the Alert dashboard. For more information, see View Alerts for Network Devices. The alerts can also be queried using REST APIs.