Introduction
This document describes several Routing and Platform related issues along with steps to collect relevant data, debugs or show commands from IOS & IOS-XE Routers. To provide this information up front on a Technical Assistance Centre (TAC) Service Request(SR) helps you hit the ground running with respect to issue resolution.
Prerequisite
Requirements
Cisco recommends that you have knowledge of these topics:
- Basic understanding of classic Routing Features on IOS & IOS-XE routers
- It is required that the user has Command Line Interface (CLI) access or hands-on to the IOS & IOS-XE Routers
Components Used
The information in this document is based on these Platforms:
- ASR1000
- ISR4000
- ISR1000
- CSR1000v
- Classic IOS Platforms (ISRG1/G2)
Base Information Requested
- When did the problem start?
- Note the day and time the problem was first observed/noticed.
- What triggered the problem?
- Document any recent changes made prior to when the problem started.
- Note any specific actions or events that occurred that may have triggered the problem to start.
- What is the frequency of the problem?
- Was this a one-time occurrence?
- If not, how often does the problem happen?
- Does this correspond to any other network events or actions?
- How many users are affected/Business Impact?
- Any troubleshooting has been performed by yourself (explain steps)?
- Which/how is the topology of the involved devices?
Router and IOS-XE Architecture
Below are some of the common issues reported on IOS & IOS-XE routers along with useful outputs for each, that should be collected in addition to a "Show Tech".
These outputs help us ensure that the relevant data is collected during the time of the issue. This is specifically true if the problem isn't persistent, meaning that the problem may disappear by the time TAC is engaged.
IOS-XE Unexpected Reload
Problem Report : There is an unexpected reload or crash seen on the device. Along with the feedback on the Base Information Requested section we would need the below:
- Collect the "show tech”
- Check the Bootflash/Harddisk & collect any crash or core files if generated:
Router#show bootflash: | in crash
Router#show bootflash: | in core
- Archive the tracelogs to bootflash and collect it. Trace files are used to store tracing data. This might take a couple of minutes to get collected.
Router#request platform software trace rotate all
Router#request platform software trace slot rp active archive target bootflash:
- External Syslog data during the time of the issue.
We can also choose to collect the System-report Bundle that is automatically created in the bootflash after a Crash has been seen(16.11.x and Later releases). A System report bundle will have way more info collected as below in a tar file :
- Tracelogs
- Maroon Stats
- Core/Crash files
- RP/cyan logs
Note: With further serviceability enhancements on 16.11 and later releases , once there is a crash experienced on the device , it automatically collects "system-report" bundle.
Router#sh bootflash: | in sys
12 45 Oct 20 2020 05:08:05.0000000000 +00:00 /bootflash/core/system-report_20201020-050805-UTC.tar.gz <<<<
IOS-XE Boot Failure
Problem Report: There is boot failure observed for any of the components in the device. At times, it might be the RP(Route-Processor) which might get stuck in a boot loop, which won't allow to login into the device.
- Collect the "show tech" if you are able to login to the device
- Connect to Console and collect the "Console logs".
- Provide the LED status info.
Note: This is of utmost importance if we are not able to login to the device
- In case we have site access, try and reseat the module (ex. RP, ESP or SIP/SPA) & collect the Console logs.
- In case the Boot failure is seen for any of the other modules except the RP, Login to the device and try a soft-oir with below commands and collect the "show logging"
Router#hw-module slot <slot-num> reload
Router#hw-module subslot <slot-num/subslot num> reload
IOS-XE Software Version Selection Assistance
Problem Report: There is a need to upgrade the IOS on the device and you need assistance with selecting one.
- In situations like these we need to login to the CCO page and check the Suggested (Star Marked) release for the certain platform.
- Alternatively, you can use the below link if we are looking for a suggestion for IOS-XE Routers:
- In situations where we need assistance with IOS upgrade from 3.x to 16.x release you could refer to the below link:
IOS-XE Memory Leaks
Problem Report: There are memory related issues seen on the device. At times, we might get some errors related to High Memory Utilization on specific components. This section combines the most useful commands to be used when troubleshooting memory leaks or monitoring memory on IOS-XE routers.
The general aspects of IOS-XE memory usage have been described in:
With some recent serviceability efforts, we can collect the "Show Tech Memory" with 16.9.x release and later.
- Collect the "Show tech" if you are able to login to the device.
- show platform software status control-processor brief
- show platform software process list rp active sort memory
- show platform software process memory rp active all sort
- show platform software process slot rp active monitor cycles 2
- show platform software process list fp active summary
- show platform software process slot fp active monitor cycles 2
- show platform hardware qfp active infrastructure exmem statistics
- show platform hardware qfp active infrastructure exmem statistics user
- show platform hardware qfp active tcam resource-manager usage
- show platform hardware qfp active classification feature tcam-usage
- show platform hardware qfp active classification class-group-manager class-group all
With recent serviceability introduced in 16.2 and later releases below are the new cli that can be collected:
- show platform resources
- show memory platform
- show process memory platform sorted
- show process cpu platform monitor cycles 2
- show process memory platform sorted location fp active
Note: We say 'cycles 2' since the first set of values will not be accurate
IOS-XE ISSU Upgrade
Problem Report: ISSU represent a full or partial software upgrade of a system from one version to another with minimal outage on the forwarding plane (minimal packet loss) and no outage on the control plane. This section focuses on helping you further with ISSU upgrade:
Licensing on IOS-XE devices
Problem report: There is an issue with Licensing on the device. Most common issues include "License not getting installed”, License file not seen as Permanent etc. This section will focus on getting the bare minimum outputs that we might require troubleshooting license issues:
- show tech-support Licenses
Note: This command was introduced via serviceability on the Later codes(for ASR1K : 16.9.x & for ISR4K : 16.12.x).
- show license all
- show license version
- show license summary
- show license status
- show license usage
- show license udi
Routing Protocol issues
Below are some of the common issues reported on IOS & IOS-XE routers along with useful outputs for each, that should be collected in addition to a "Show Tech".
These outputs help us ensure that the relevant data is collected during the time of the issue. This is specifically true if the problem isn't persistent, meaning that the problem may disappear by the time TAC is engaged.
BGP/EIGRP/OSPF/Static-Routing
Problem Report: For the Routing protocol troubleshooting mostly it depends on what kind of issue are we are looking into and the focus area should be with providing as much data as possible following the "Base Information Requested" section. Along with it and "Show tech" we can collect protocol specific outputs as below:
BGP
|
Show tech-support bgp
|
EIGRP
|
Show ip eigrp events
Show ip eigrp interfaces
Show ip eigrp neighbors
Show ip eigrp topology
Show ip eigrp traffic
|
OSPF
|
Show tech-support ospf
|
Static Routing
|
Show tech-support
|
Note:"show tech bgp" has been introduced in some of the later releases so in case this command doesn't run on the IOS you are running, please provide as much info with "Base Information Requested" section
We could also choose to do a flow based BGP troubleshooting for some of the common scenario's by using the below:
EIGRP Neighbor Flap issues:
Problem Report: This is one of the most common issues seen with EIGRP wherein we have neighbor flap issues. We can leverage EEM script to collect the outputs and debugs exactly when the issue is seen:
NAT/PAT on a Router (Network/Port Address Translation)
On the IOS-XE platform, NAT configurations are received and processed by the IOS NAT subsystem, and are downloaded to the QFP via the Forwarding Manager (FMAN) and Client components. NAT session creation and management are done exclusively on the QFP, as well as any header and payload translations. There will be no punting of packets for the NAT translations on the IOS-XE routers. The QFP also generated states which are sent back to IOS, such as ipalias, static-route, and wlan sessions information.
Problem Report: There is a NAT/PAT related issue seen on the device. For example, NAT is not trigerring or we are not able to see the translations and trafficc not passing from inside to outside or vice versa. NAT/PAT issues on IOS-XE might be a little tricky at times due to the Hardware forwarding of the data packets. This section combines the most useful commands to be used when troubleshooting NAT issues on IOS-XE routers.
- show tech-support nat
Note: This output has been introduced recently via serviceability in 16.9.x release and later.
Platform Independent Show Commands
- show ip nat statistics
- show ip nat translation
We can also choose to run some of the filters with "show ip nat translation" as below:
- show ip nat translation udp total
- show ip nat translation inside
- show ip aliases
Platform dependent Show Commands
- show platform hardware qfp active statistics drop | exc _0 <<< Check for any NAT related drops
- show platform hardware qfp active feature nat datapath map
- show platform hardware qfp active feature nat datapath port
- show platform hardware qfp active feature nat datapath pool
- show platform hardware qfp active feature nat datapath stat
- show platform hardware qfp active feature nat datapath base
- show platform hardware qfp active infrastructure exmem statistics user
- show platform hardware qfp active infrastructure exmem stat
- show platform hardware qfp active feature nat datapath gatein
- show platform hardware qfp active feature nat datapath gateout
iWAN
iWAN is one complex solution and troubleshooting it can become even more complex. There are number of components involved in iWAN like DMVPN, IPSEC, Transport (MPLS/INET), PFR, EIGRP SAF. Different issues with regards to iWAN could be related to any one or multiple components. In iWAN network all the devices play the role of either "Master Controller" (MC) or "Border Router" (BR), and to troubleshoot the iWAN issues we need to capture details from both of these routers.
Problem Report: If facing any of the below issues with iWAN, then please capture given list of commands from respective devices.
- Site-Prefix/service-routes are not being distributed correctly
- EIGRP SAF peering is not established
- Traffic channels are not being created
- Frequent TCA logs are seen
- Traffic is not flowing from a particular transport
Here is the list of commands which needs to be captured.
MC - Master Controller
- show tech-support
- show domain IWAN master discovered-sites
- show domain IWAN master site-capability
- show domain IWAN master status
- show domain IWAN master site-prefix
- show domain IWAN master policy
- show domain IWAN master peering
- show domain IWAN master traffic-classes summary
- show platform hardware qfp active feature pfrv3 datapath global
- show eigrp service-family ipv4 neighbors
- show eigrp service-family ipv4 subscriptions detail
- show eigrp service-family ipv4 topology
- show eigrp service-family ipv4 traffic
- show derived-config | sec router eigrp
BR - Border Router
- show tech-support
- show domain IWAN border site-capability
- show domain IWAN master site-capability
- show domain IWAN border parent-route
- show domain IWAN border channels dscp default
- show domain IWAN border channels
- show domain IWAN border traffic-classes summary
- show domain IWAN master status
- show domain IWAN master policy
- show domain IWAN border peering
- show domain IWAN border status
- show domain IWAN border pmi
- show performance monitor cache monitor
- show platform hardware qfp active feature pfrv3 datapath global
- show eigrp service-family ipv4 neighbors
- show eigrp service-family ipv4 subscriptions detail
- show eigrp service-family ipv4 topology
- show eigrp service-family ipv4 traffic
- show derived-config | sec router eigrp
Miscellaneous Error Logs
Below are some of the common logs reported on IOS & IOS-XE routers along with useful outputs for each, that should be collected in addition to a "Show Tech".
These outputs help us ensure that the relevant data is collected during the time of the issue. This is specifically true if the problem isn't persistent, meaning that the problem may disappear by the time TAC is engaged.
Error related to %FMFP-3-OBJ_DWNLD_TO_DP_STUCK
- Take the object ID from the log message:
Example :
%FMFP-3-OBJ_DWNLD_TO_DP_STUCK: R0/0: fman_fp_image: AOM download of obj[20] type[215] pending-issue Req-create Issued-noneSSLMGR: Secondary Init to Data Plane is stuck for more than 1800 seconds
- The object ID in this example is obj[20], which will be used as 20
- Capture the commands replacing the '<object_id>' field with the number obtained in step 2:
- show platform software object-manager f0 object <object_id>
- show platform software object-manager f0 object <object_id> parents
- show platform software object-manager f0 object <object_id> children
- show platform software object-manager f0 object <object_id> downlinks
- Capture these commands which don't require an object ID:
- show platform software object-manager f0 statistics
- show platform software object-manager f0 pending-issue-update
- show platform software object-manager f0 pending-ack-update
- show platform software object-manager f0 object-type-count
- show platform software object-manager f0 error-object
- show platform software object-manager f0 resolve-object
- show platform software object-manager f0 stale-object
- show platform software object-manager f0 paused-object-type