Troubleshooting Mesh Access Points

Installation and Connections

Procedure


Step 1

Connect the mesh access point that you want to be the RAP to the controller.

Step 2

Deploy the radios (MAP) at the desired locations.

Step 3

On the controller CLI, enter the show mesh ap summary command to see all MAPs and RAPs on the controller.

Figure 1. Show Mesh AP Summary Page
Step 4

On the controller GUI, click Wireless to see the mesh access point (RAP and MAP) summary.

Figure 2. All APs Summary Page
Step 5

Click AP Name to see the details page and then select the Interfaces tab to see the active radio interfaces.

The radio slot in use, radio type, subband in use, and operational status (UP or DOWN) are summarized.

  • All APs supports 2 radio slots: slot 0—2.4 GHz and slot 1—5 GHz.

    If you have more than one controller connected to the same mesh network, then you must specify the name of the primary controller using global configuration for every mesh access point or specify the primary controller on every node, otherwise the least loaded controller is the preferred controller. If the mesh access points were previously connected to a controller, they already have learned a controller’s name.

    After configuring the controller name, the mesh access point reboots.

Step 6

Click Wireless > AP Name to check the mesh access point’s primary controller on the AP details page.


Debug Commands

The following two commands are very helpful to see the messages being exchanged between mesh access points and the controller.


(Cisco Controller) > debug capwap events enable 
(Cisco Controller) > debug disable-all 
  

You can use the debug command to see the flow of packet exchanges that occur between the mesh access point and the controller. The mesh access point initiates the discovery process. An exchange of credentials takes place during the join phase to authenticate that the mesh access point is allowed to join the mesh network.

Upon a successful join completion, the mesh access point sends a CAPWAP configuration request. The controller responds with a configuration response. When a Configure Response is received from the controller, the mesh access point evaluates each configuration element and then implements them.

Remote Debug Commands

You can log on to the mesh access point console for debugging either through a direct connection to the AP console port or through the remote debug feature on the controller.

To invoke remote debug on the controller, enter the following commands:


(Cisco Controller) > debug ap enable ap-name 
(Cisco Controller) > debug ap command command ap-name 

AP Console Access

AP1500s have a console port. A console cable is not shipped with the mesh access point. For the 1550 series access points, console ports are easily accessible and you need not open the access point box.

The AP1500s have console access security embedded in the code to prevent unauthorized access on the console port and provide enhanced security.

The login ID and password for console access are configured from the controller. You can use the following commands to push the username/password combination to the specified mesh access point or all access points:

<Cisco Controller> config ap username cisco password cisco ? 

all											Configures the Username/Password for all connected APs.
<Cisco AP>				Enter the name of the Cisco AP.

<Cisco Controller> config ap username cisco password cisco all 

You must verify whether the username/password pushed from the controller is used as user-id and password on the mesh access point. It is a nonvolatile setting. Once set, a login ID and password are saved in the private configuration of the mesh access point.

Once you have a successful login, the trap is sent to the Cisco Prime Infrastructure. If a user fails to log on three times consecutively, login failure traps are sent to the controller and Cisco Prime Infrastructure.


Caution

A mesh access point must be reset to the factory default settings before moving from one location to another.


Cable Modem Serial Port Access from an AP

Commands can be sent to the cable modem from the privileged mode of the CLI. Use the command to take a text string and send it to the cable modem UART interface. The cable modem interprets the text string as one of its own commands. The cable modem response is captured and displayed on the Cisco IOS console. Up to 9600 characters are displayed from the cable modem. Any text that is greater than 4800 characters is truncated.

The modem commands are only operational on mesh APs that have devices connected to the UART port originally intended for the cable modem. If the commands are used on a mesh AP that does not have a cable modem (or any other device connected to the UART), the commands are accepted, however, but they do not produce any returned output. No errors are explicitly flagged.

Configuration

Enter the following command from the privileged mode of the MAP:


AP#send cmodem timeout-value modem-command 
  

The modem command is any command or text to send to the cable modem. The range of timeout value is 1 to 300 seconds. However, if the captured data equals 9600 characters, any text beyond that is truncated and the response, irrespective of the timeout value and is immediately displayed on the AP console.

Figure 3. Cable Modem Console Access Command
Figure 4. Cable Modem Console Access Command

Caution

The question mark (?) and the exclamation point (!) should not be used in the send cmodem command. These characters have immediate interpreted use in the Cisco IOS CLI. Therefore, they cannot be sent to the modem.


Enabling the Cable Modem Console Port
By default, the Cable Modem console port is disabled. This is to prevent users from accessing the console through their residential cable modem. In the AP1572IC, AP1572EC, and AP1552C model, the cable modem console is connected directly to the access point. The console port is required for signaling between the AP and the cable modem. There are two methods to enable the cable modem console port, either through SNMP or by adding the command to the configuration .cm file on the CMTS.

Note

For the AP1572EC, AP1572IC, AP1552C, and AP1552CU, the cable modem must be enabled.


  • Enable the cable modem console port through SNMP by entering this command to the IP address of the cable modem:
    snmpset –c private IP_ADDRESS cmConsoleMode.0 i N
    
    
    Using the OID, enter this command:
    snmpset –c private IP_ADDRESS
    1.3.6.1.4.1.1429.77.1.4.7.0 i N
    
    
    Where IP_ADDRESS is any IPv4 address and N is an integer, 2 to enable read-write, 1 for read-only, or 0 to disable.
    Example:
    snmpset -c private 209.165.200.224 cmConsoleMode.0 i 2
    
    
  • Enable the cable modem console port through the configuration file. The configuration file (with a .cm extension) is loaded into the cable modem head end. It is pushed to the cable modem as part of the join process. Enter the following line to the cable modem configuration file:
    SA-CM-MIB::cmConsoleMode.0 = INTEGER: readWrite(2)
    
    
    Using the OID, enter this line:
    SA-CM-MIB::cmConsoleMode.0 = INTEGER: readWrite(2)
    
    
Resetting the AP1572xC/AP1552C Through the Cable Modem

An AP can be reset by entering an SNMP command to the Cable Modem, which resides inside the access point. For this feature to work, you must enable the cable modem console port.

Reset the AP by entering this snmpset command:
Snmpset -v2c -c public IP ADDRESS 1.3.6.1.4.1.1429.77.1.3.17.0 i 1

Where the IP ADDRESS is the IPv4 address of the cable modem.

Mesh Access Point CLI Commands

You can enter these commands directly on the mesh access point using the AP console port or you can use the remote debug feature from the controller:









Mesh Access Point Debug Commands

You can enter these commands directly on the mesh access point using the AP console port or you can use the remote debug feature from the controller.

  • debug mesh ethernet bridging —Debugs Ethernet bridging.

  • debug mesh ethernet config —Debugs access and trunk port configuration associated with VLAN tagging.

  • debug mesh ethernet registration —Debugs the VLAN registration protocol. This command is associated with VLAN tagging.

  • debug mesh forwarding table —Debugs the forwarding table containing bridge groups.

  • debugs mesh forwarding packet bridge-group —Debugs the bridge group configuration.

Defining Mesh Access Point Roles

By default, the AP1500s are shipped with a radio role set to MAP. Therefore, you must change the radio role on a mesh access point for it to function as RAP.

You can change this configuration on the mesh access point by statically setting them as rooftop access points or mesh access points with the config ap role {rootAP | mesh AP | default} command:

To change the radio role can also be changed using the GUI, follow these steps:

Procedure


Step 1

Choose Wireless > Access Points > All APs to open the All APs page.

Step 2

Click the name of the mesh access point that you want to change. Click the Mesh tab.

Step 3

From the AP Role drop-down list, choose MeshAP or RootAP to specify this mesh access point as a MAP or RAP, respectively.

Step 4

Click Apply to commit your changes. The mesh access point reboots.

Step 5

Click Save Configuration to save your changes.

Note 

We recommend a Fast Ethernet connection between the MAP and controller when changing from a MAP to RAP. After a RAP-to-MAP conversion, the MAP’s connection to the controller is a wireless backhaul rather than a Fast Ethernet connection. It is the responsibility of the user to ensure that the Fast Ethernet connection of the RAP being converted is disconnected before the MAP starts up so that the MAP can join over the air.


Backhaul Algorithm

A backhaul is used to create only the wireless connection between mesh access points.

The backhaul interface by default is 802.11a. You cannot change the backhaul interface to 802.11b/g.

The "auto" data rate is selected by default for AP1500s.

The backhaul algorithm has been designed to fight against stranded mesh access point conditions. This algorithm also adds a high-level of resiliency for each mesh node.

The algorithm can be summarized as follows:

  • A MAP always sets the Ethernet port as the primary backhaul if it is UP; otherwise, it is the 802.11a radio (this feature gives the network administrator the ability to configure it as a RAP the first time and recover it in-house). For fast convergence of the network, we recommend that you do not connect any Ethernet device to the MAP for its initial joining to the mesh network.

  • A MAP failing to connect to a WLAN controller on an Ethernet port that is UP, sets the 802.11a radio as the primary backhaul. Failing to find a neighbor or failing to connect to a WLAN controller via any neighbor on the 802.11a radio causes the primary backhaul to be UP on the Ethernet port again. A MAP gives preference to the parent which has the same BGN.

  • A MAP connected to a controller over an Ethernet port does not build a mesh topology (unlike a RAP).

  • A RAP always sets the Ethernet port as the primary backhaul.


    Note

    Cisco Wave 2 APs operating as RAPs can fall back on Ethernet sooner than 15 minutes if the RAPs cannot find any valid uplink on the radio in 5 minutes' time. In such a case, the RAPs clear the blocked-listing on the wired port and try to fall back on the wired port.


  • If the Ethernet port on a RAP is DOWN, or a RAP fails to connect to a controller on an Ethernet port that is UP, the 802.11a radio is set as the primary backhaul. Failing to find a neighbor or failing to connect to a controller via any neighbor on the 802.11a radio makes the RAP go to the SCAN state after 15 minutes and starts with the Ethernet port first.

Keeping the roles of mesh nodes distinct using the above algorithm greatly helps to avoid a mesh access point from being in an unknown state and becoming stranded in a live network.

Passive Beaconing (Anti-Stranding)

When enabled, passive beaconing allows a stranded mesh access point to broadcast its debug messages over-the-air using a 802.11b/g radio. A neighboring mesh access point that is listening to the stranded mesh access point and has a connection to a controller, can pass those messages to the controller over CAPWAP. Passive beaconing prevents a mesh access point that has no wired connection from being stranded.

Debug logs can also be sent as distress beacons on a nonbackhaul radio so that a neighboring mesh access point can be dedicated to listen for the beacons.

The following steps are automatically initiated at the controller when a mesh access point loses its connection to the controller:

  • Identifies the MAC address of a stranded mesh access point

  • Finds a nearby neighbor that is CAPWAP connected

  • Sends commands through remote debug

  • Cycles channels to follow the mesh access point

You only have to know the MAC address of the stranded AP to make use of this feature.

A mesh access point is considered stranded if it goes through a lonely timer reboot. When the lonely timer reboot is triggered, the mesh access point, which is now stranded, enables passive beaconing, the anti-stranding feature.

This feature can be divided into three parts:

  • Strand detection by stranded mesh access point

  • Beacons sent out by stranded mesh access point

    • Latch the 802.11b radio to a channel (1,6,11)

    • Enable debugs

    • Broadcast the standard debug messages as distress beacons

    • Send Latest Crash info file

  • Receive beacons (neighboring mesh access point with remote debugging enabled)

Deployed mesh access points constantly look for stranded mesh access points. Periodically, mesh access points send a list of stranded mesh access points and SNR information to the controller. The controller maintains a list of the stranded mesh access points within its network.

When the debug mesh astools troubleshoot mac-addr start command is entered, the controller runs through the list to find the MAC address of the stranded mesh access point.

A message is sent to the best neighbor to start listening to the stranded access point. The listening mesh access point gets the distress beacons from the stranded mesh access point and sends it to the controller.

Once a mesh access point takes the role of a listener, it does not purge the stranded mesh access point from its internal list until it stops listening to the stranded mesh access point. While a stranded mesh access point is being debugged, if a neighbor of that mesh access point reports a better SNR to the controller than the current listener by some percentage, then the listener of the stranded mesh access point is changed to the new listener (with better SNR) immediately.

End-user commands are as follows:

  • config mesh astools [enable | disable] —Enables or disables the astools on the mesh access points. If disabled, APs no longer sends a stranded AP list to the controller.

  • show mesh astools stats —Shows the list of stranded APs and their listeners if they have any.

  • debug mesh astools troubleshoot mac-addr start —Sends a message to the best neighbor of the mac-addr to start listening.

  • debug mesh astools troubleshoot mac-addr stop —Sends a message to the best neighbor of the mac-addr to stop listening.

  • clear mesh stranded [all | mac of b/g radio] —Clears stranded AP entries.

The controller console is swamped with debug messages from stranded APs for 30 minutes.

Dynamic Frequency Selection

This section describes the Dynamic Frequency Selection (DFS) functionality in RAP and MAP.

DFS in RAP

The RAP performs the following steps as a response to radar detection:

  1. The RAP sends a message to the controller that the channel is infected with radar. The channel is marked as infected on the RAP and on the controller.

  2. The RAP blocks the channel for 30 minutes. This 30-minute period is called the nonoccupancy period.

  3. The controller sends a TRAP, which indicates that the radar has been detected on the channel. A TRAP remains until the nonoccupancy period expires.

  4. The RAP has 10 seconds to move away from the channel. This period is called the channel move time, which is defined as the time for the system to clear the channel and is measured from the end of the radar burst to the end of the final transmission on the channel.

  5. The RAP enters the quiet mode. In the quiet mode, the RAP stops data transmissions. Beacons are still generated and probe responses are still delivered. The quiet mode exists until the channel move time is over (10 seconds).

  6. The controller picks up a new random channel and sends the channel information to the RAP.

  7. The RAP receives the new channel information and sends channel change frames (unicast, encrypted) to the MAP, and each MAP sends the same information to its lower children down the sector. Each mesh access point sends the channel change frames once every 100 msecs for a total of five times.

  8. The RAP tunes to the new channel and enters into the silent mode. During the silent mode, only the receiver is ON. The RAP keeps scanning the new channel for any radar presence for 60 seconds. This process is called channel availability check (CAC).

  9. The MAP tunes to the new channel and enters into the silent mode. During the silent mode, only the receiver is ON. The MAP keeps scanning the new channel for any radar presence for 60 seconds.

  10. If radar is not detected, the RAP resumes full functionality on this new channel and the whole sector tunes to this new channel.

DFS in MAP

The MAP performs the following steps as a response to radar detection:

  1. The MAP sends a radar seen indication to the parent and ultimately to the RAP indicating that the channel is infected. The RAP sends this message to the controller. The message appears to be coming from the RAP. The MAP, RAP, and controller mark the channel as infected for 30 minutes.

  2. The MAP blocks the channel for 30 minutes. This 30-minute period is called the nonoccupancy period.

  3. The controller sends a TRAP, which indicates that the radar has been detected on the channel. The TRAP remains until the nonoccupancy period expires.

  4. The MAP has 10 seconds to move away from the channel. This is called the channel move time, which is defined as the time for the system to clear the channel and is measured from the end of the radar burst to the end of the final transmission on the channel.

  5. The MAP enters the quiet mode. In the quiet mode, the MAP stops data transmissions. Beacons are still generated and probe responses are still delivered. The quiet mode exists until the channel move time is over (10 seconds).

  6. The controller picks up a new random channel and sends the channel to the RAP.

  7. The RAP receives the new channel information and sends channel change frames (unicast, encrypted) to a MAP, and each MAP sends the same information to its lower children down the sector. Each mesh access point sends the channel change frames once every 100 msecs for a total of five times.

  8. Each mesh access point tunes to the new channel and enters into the silent mode. During the silent mode, only the receiver is ON. There is no packet transmission. An AP keeps scanning the new channel for any radar presence for 60 seconds. This process is called the channel availability check (CAC). The MAP should not disconnect from the controller. The network should remain stable during this one-minute period.

DFS functionality allows a MAP that detects a radar signal to transmit that up to the RAP, which then acts as if it has experienced radar and moves the sector. This process is called the coordinated channel change. This functionally can be turned on or off on the controller. The coordinated channel change is enabled by default.

To enable DFS, enter the following command:


(Cisco Controller) > config mesh full-sector-dfs enable 
  

To verify that DFS is enabled on the network, enter the following command:


(Cisco Controller) > show network summary 

Note

A MAP that detects radar should send a message to the RAP, unless the parent has a different BGN, in which case it does not send messages for a coordinated sector change. Instead, the MAP reenters the SCAN state and searches on nonradar seen channels for a new parent.



Note

Ensure that none of your mesh access points are using a default BGN.



Note

A repeated radar event on the MAP (radar triggers once, and then almost immediately again), causes the MAP to disconnect.


Preparation in a DFS Environment

This section describes how to prepare in a DFS environment:

  • To verify that your controller is set to the correct country domain, enter the following command:

    
    (Cisco Controller) > show country 
      
    
  • To check the mesh access point country and the channel setting on the controller, enter the following command:

    
    (Cisco Controller)> show ap config 802.11a ap-name 
      
    
  • To identify channels available for mesh, enter the following command:

    
    (Cisco Controller)> show ap config 802.11a ap-name 
      
    

    Look for the allowed channel list.

    
    Allowed Channel List....................... 100,104,108,112,116,120,124, ......................................... 128,132,136,140
      
    
  • To identify channels available for mesh on the AP console (or use remote debug from the controller, enter the following command:

    
    ap1520-rap # show mesh channels 
      
    HW: Dot11Radio1, Channels:
    100, 104, 108, 112, 116, 120, 124, 128, 132, 136, 140
      
    

    An asterisk next to a channel indicates that radar has been seen on the channel.

  • To invoke remote debug, enter the following commands:

    
    (Cisco Controller) > debug ap enable ap-name 
    (Cisco Controller) > debug ap command command ap-name 
      
    
  • Debug commands to see radar detection and past radar detections on the DFS channel are as follows:

    
    show mesh dfs channel channel-number 
    show mesh dfs history 
      
    

    Information similar to the following appears.

    ap1520-rap # show mesh dfs channel 132

    
    Channel 132 is available
    Time elapsed since radar last detected: 0 day(s), 7 hour(s), 6 minute(s), 51 second(s).
      
    

    The RAP should be run through the channels to determine whether there is active radar on each of the channels.

    
    ap1520-rap # show mesh dfs channel 132 
      
    Radar detected on channel 132, channel becomes unusable (Time Elapsed: 0 day(s), 7 hour(s), 7 minute(s), 11 second(s)).
    Channel is set to 100 (Time Elapsed: 0 day(s), 7 hour(s), 7 minute(s), 11 second(s)).
    Radar detected on channel 116, channel becomes unusable (Time Elapsed: 0 day(s), 7 hour(s), 6 minute(s), 42 second(s)).
    Channel is set to 64 (Time Elapsed: 0 day(s), 7 hour(s), 6 minute(s), 42 second(s)).
    Channel 132 becomes usable (Time Elapsed: 0 day(s), 6 hour(s), 37 minute(s), 10 second(s)).
    Channel 116 becomes usable (Time Elapsed: 0 day(s), 6 hour(s), 36 minute(s), 42 second(s)).
    

Monitoring DFS

The DFS history should be run every morning or more frequently to detect the radar. This information does not get erased and is stored on the mesh access point flash. Therefore, you only need to match the times.


ap1520-rap # show controller dot11Radio 1 
  

Information similar to the following appears:


interface Dot11Radio1
Radio Hammer 5, Base Address 001c.0e6c.9c00, BBlock version 0.00, Software version 0.05.30
Serial number: FOC11174XCW
Number of supported simultaneous BSSID on Dot11Radio1: 16
Carrier Set: ETSI (OFDM) (EU) (-E)
Uniform Spreading Required: Yes
Current Frequency: 5540 MHz Channel 108 (DFS enabled)
Allowed Frequencies: *5500(100) *5520(104) *5540(108) *5560(112) *5580(116) *560
0(120) *5620(124) *5640(128) *5660(132) *5680(136) *5700(140)
* = May only be selected by Dynamic Frequency Selection (DFS)
Listen Frequencies: 5180(36) 5200(40) 5220(44) 5240(48) 5260(52) 5280(56) 5300(6
0) 5320(64) 5500(100) 5520(104) 5540(108) 5560(112) 5580(116) 5660(132) 5680(136
) 5700(140) 5745(149) 5765(153) 5785(157) 5805(161) 5825(165) 4950(20) 4955(21)
4960(22) 4965(23) 4970(24) 4975(25) 4980(26)
  

Note

An asterisk indicates that this channel has DFS enabled.


Frequency Planning

Use alternate adjacent channels in adjacent sectors. If you have two RAPs deployed at the same location, you must leave one channel in between.

Weather radars operate within the 5600- to 5650-MHz band, which means that channels 124 and 128 might be affected, but also channels 120 and 132 might suffer from weather radar activity.

If the mesh access point does detect radar, the controller and the mesh access point both will retain the channel as the configured channel. The controller retains it in volatile memory associated with the mesh access point, and the mesh access point has it stored in its flash as configuration. After the 30 minute quiet period, the controller returns the mesh access point to the static value, regardless of whether the mesh access point has been configured with a new channel or not. In order to overcome this, configure the mesh access point with a new channel, and reboot the mesh access point.

Once radar is reliably detected on a channel, that channel, and the two surrounding channels, should be added to the RRM exclusion list, as follows:


(Cisco Controller) > config advanced 802.11a channel delete channel 
  

A mesh access point goes to a new channel that is picked by RRM, and it does not consider excluded channels.

If a radar is detected on channel 124, for instance, channels 120, 124, and 128 should be added to the exclusion list. In addition, do not configure RAP to operate on those channels.

Good Signal-to-Noise Ratios

For European installations, the minimum recommendation is increased to 20 dB of signal-to-noise ratio (SNR). The extra dBs are used to mitigate the effects of radar interference with packet reception, which is not observed in non-DFS environments.

Access Point Placement

Collocated mesh access points should have a minimum of 10 feet (3.048 meters) of vertical separation or 100 (30.48 meters) feet of horizontal separation.

Bridge Group Name Misconfiguration

A mesh access point can be wrongly provisioned with a bridgegroupname and placed in a group other than it was intended. Depending on the network design, this mesh access point might or might not be able to reach out and find its correct sector or tree. If it cannot reach a compatible sector, the mesh access point can become stranded.

To recover a stranded mesh access point, the concept of default bridgegroupname has been introduced in the software. When a mesh access point is unable to connect to any other mesh access point with its configured bridgegroupname, it attempts to connect with the bridgegroupname of default.

The algorithm of detecting this strand condition and recovery is as follows:

  1. Passively scans and finds all neighbor nodes regardless of their bridgegroupname.

  2. The mesh access point attempts to connect to the neighbors heard with my own bridgegroupname using AWPP.

  3. If Step 2 fails, attempts to connect with default bridgegroupname using AWPP.

  4. For each failed attempt in Step 3, it adds the neighbor to an exclusion list and attempts to connect the next best neighbor.

  5. If the AP fails to connect with all neighbors in Step 4, it reboots the mesh access point.

  6. If connected with a default bridgegroupname for 15 minutes, the mesh access point goes into a scan state.

When an mesh access point is able to connect with the default bridgegroupname, the parent node reports the mesh access point as a default child/node/neighbor entry on the controller, so that a network administrator is Cisco Prime Infrastructure. Such a mesh access point behaves as a normal (nonmesh) access point and accepts any client, other mesh nodes as its children, and it passes any data traffic through.


Note

Do not confuse an unassigned BGN (null value) with DEFAULT, which is a mode that the access point uses to connect when it cannot find its own BGN.


To check the current state of a mesh access point’s BGN, enter the following command:


(Cisco Controller)> show mesh path Map3:5f:ff:60 
00:0B:85:5F:FA:60 state UPDATED NEIGH PARENT DEFAULT (106B), snrUp 48, snrDown 48, linkSnr 49
00:0B:85:5F:FB:10 state UPDATED NEIGH PARENT BEACON (86B) snrUp 72, snrDown 63, linkSrn 57
00:0B:85:5F:FA:60 is RAP
  

To check the current state of a mesh access point’s BGN, check the neighbor information for the mesh access point (GUI) as follows:

Choose Wireless > All APs > AP Name > Neighbor info .

Figure 5. Neighbor Information for a Child
Figure 6. Neighbor Information for a Parent

Misconfiguration of the Mesh Access Point IP Address

Although most Layer 3 networks are deployed using DHCP IP address management, some network administrators might prefer the manual IP address management and allocating IP addresses statically to each mesh node. Manual mesh access point IP address management can be a nightmare for large networks, but it might make sense in small to medium size networks (such as 10 to 100 mesh nodes) because the number of mesh nodes are relatively small compared to client hosts.

Statically configuring the IP address on a mesh node has the possibility of putting a MAP on a wrong network, such as a subnet or VLAN. This mistake could prevent a mesh access point from successfully resolving the IP gateway and failing to discover a WLAN controller. In such a scenario, the mesh access point falls back to its DHCP mechanism and automatically attempts to find a DHCP server and obtains an IP address from it. This fallback mechanism prevents a mesh node from being potentially stranded from a wrongly configured static IP address and allows it to obtain a correct address from a DHCP server on the network.

When you are manually allocating IP addresses, we recommend that you make IP addressing changes from the furthest mesh access point child first and then work your way back to the RAP. This recommendation also applies if you relocate equipment. For example, if you uninstall a mesh access point and redeploy it in another physical location of the mesh network that has a different addressed subnet.

Another option is to take a controller in Layer 2 mode with a RAP to the location with the misconfigured MAP. Set the bridge group name on the RAP to match the MAP that needs the configuration change. Add the MAP’s MAC address to the controller. When the misconfigured MAP comes up in the mesh access point summary detail, configure it with an IP address.

Misconfiguration of DHCP

Despite the DHCP fallback mechanism, there is still a possibility that a mesh access point can become stranded, if any of the following conditions exist:

  • There is no DHCP server on the network.

  • There is a DHCP server on the network, but it does not offer an IP address to the AP, or if it gives a wrong IP address to the AP (for example, on a wrong VLAN or subnet).

These conditions can strand a mesh access point that is configured with or without a wrong static IP address or with DHCP. Therefore, you must ensure that when a mesh access point is unable to connect after exhausting all DHCP discovery attempts or DHCP retry counts or IP gateway resolution retry counts, it attempts to find a controller in Layer 2 mode. In other words, a mesh access point attempts to discover a controller in Layer 3 mode first and in this mode, attempts with both static IP (if configured) or DHCP (if possible). The AP then attempts to discover a controller in Layer 2 mode. After finishing a number of Layer 3 and Layer 2 mode attempts, the mesh access point changes its parent node and re-attempts DHCP discovery. Additionally, the software exclusion-lists notes the parent node through which it was unable to obtain the correct IP address.

Identifying the Node Exclusion Algorithm

Depending on the mesh network design, a node might find another node “best” according to its routing metric (even recursively true), yet it is unable to provide the node with a connection to the correct controller or correct network. It is the typical honeypot access point scenario caused by either misplacement, provisioning, design of the network, or by the dynamic nature of an RF environment exhibiting conditions that optimize the AWPP routing metric for a particular link in a persistent or transient manner. Such conditions are generally difficult to recover from in most networks and could blackhole or sinkhole a node completely, taking it out from the network. Possible symptoms include, but are not limited to the following:

  • A node connects to the honeypot but cannot resolve the IP gateway when configured with the static IP address, or cannot obtain the correct IP address from the DHCP server, or cannot connect to a WLAN controller.

  • A node ping-pongs between a few honeypots or circles between many honeypots (in worst-case scenarios).

Cisco mesh software resolves this difficult scenario by using a sophisticated node exclusion-listing algorithm. This node exclusion-listing algorithm uses an exponential backoff and advance technique much like the TCP sliding window or 802.11 MAC.

The basic idea relies on the following five steps:

  1. Honeypot detection—The honeypots are first detected via the following steps:

    A parent node is set by the AWPP module by:

    • A static IP attempt in CAPWAP module.

    • A DHCP attempt in the DHCP module.

    • A CAPWAP attempt to find and connect to a controller fails.

  2. Honeypot conviction—When a honeypot is detected, it is placed in a exclusion-list database with its conviction period to remain on the list. The default is 32 minutes. Other nodes are then attempted as parents in the following order, falling back to the next, upon failing the current mechanism:

    • On the same channel.

    • Across different channels (first with its own bridgegroupname and then with default).

    • Another cycle, by clearing conviction of all current exclusion-list entries.

    • Rebooting the AP.

  3. Nonhoneypot credit—It is often possible that a node is not a really a honeypot, but appears to be due to some transient back-end condition, such as the following:

    • The DHCP server is either not up-and-running yet, has failed temporarily, or requires a reboot.

    • The WLAN controller is either not up-and-running yet, has failed temporarily, or requires a reboot.

    • The Ethernet cable on the RAP was accidentally disconnected.

      Such nonhoneypots must be credited properly from their serving times so that a node can come back to them as soon as possible.

  4. Honeypot expiration—Upon expiration, an exclusion-list node must be removed from the exclusion-list database and return to a normal state for future consideration by AWPP.

  5. Honeypot reporting—Honeypots are reported to the controller via an LWAPP mesh neighbor message to the controller, which shows these on the Bridging Information page. A message is also displayed the first-time an exclusion-listed neighbor is seen. In a subsequent software release, an SNMP trap is generated on the controller for this condition so that Cisco Prime Infrastructure can record the occurrence.

    Figure 7. Excluded Neighbor

Because many nodes might be attempting to join or rejoin the network after an expected or unexpected event, a hold-off time of 16 minutes is implemented, which means that no nodes are exclusion-listed during this period of time after system initialization.

This exponential backoff and advance algorithm is unique and has the following properties:

  • It allows a node to correctly identify the parent nodes whether it is a true honeypot or is just experiencing temporary outage conditions.

  • It credits the good parent nodes according to the time it has enabled a node to stay connected with the network. The crediting requires less and less time to bring the exclusion-list conviction period to be very low for real transient conditions and not so low for transient to moderate outages.

  • It has a built-in hysteresis for encountering the initial condition issue where many nodes try to discover each other only to find that those nodes are not really meant to be in the same network.

  • It has a built-in memory for nodes that can appear as neighbors sporadically so they are not accidentally considered as parents if they were, or are supposed to be, on the exclusion-list database.

The node exclusion-listing algorithm guards the mesh network against serious stranding. It integrates into AWPP in such a way that a node can quickly reconverge and find the correct network.

Throughput Analysis

Throughput depends on packet error rate and hop count.

Capacity and throughput are orthogonal concepts. Throughput is one user's experience at node N and the total area capacity is calculated over the entire sector of N-nodes and is based on the number of ingress and egress RAP, assuming separate noninterfering channels.

For example, 4 RAPs at 10 Mbps each deliver 40 Mbps total capacity. So, one user at 2 hops out, logically under each RAP, could get 5 Mbps each of TPUT, but consume 40 Mbps of the backhaul capacity.

With the Cisco Mesh solution, the per-hop latency is less than 10 msecs, and the typical latency numbers per hop range from 1 to 3 msecs. Overall jitter is also less than 3 msecs.

Throughput depends on the type of traffic being passed through the network: User Datagram Protocol (UDP) or Transmission Control Protocol (TCP). UDP sends a packet over Ethernet with a source and destination address and a UDP protocol header. It does not expect an acknowledgement (ACK). There is no assurance that the packet is delivered at the application layer.

TCP is similar to UDP but it is a reliable packet delivery mechanism. There are packet acknowledgments and a sliding window technique is used to allow the sender to transmit multiple packets before waiting for an ACK. There is a maximum amount of data the client transmits (called a TCP socket buffer window) before it stops sending data. Sequence numbers track packets sent and ensure that they arrive in the correct order. TCP uses cumulative ACKs and the receiver reports how much of the current stream has been received. An ACK might cover any number of packets, up to the TCP window size.

TCP uses slow start and multiplicative decrease to respond to network congestion or packet loss. When a packet is lost, the TCP window is cut in half and the back-off retransmission timer is increased exponentially. Wireless is subject to packet loss due to interference issues and TCP reacts to this packet loss. A slow start recovery algorithm is also used to avoid swamping a connection when recovering from packet loss. The effect of these algorithms in a lossy network environment is to lessen the overall throughput of a traffic stream.

By default, the maximum segment size (MSS) of TCP is 1460 bytes, which results in a 1500-byte IP datagram. TCP fragments any data packet that is larger than 1460 bytes, which can cause at least a 30-percent throughput drop. In addition, the controller encapsulates IP datagrams in the 48-byte CAPWAP tunnel header as shown in CAPWAP Tunneled Packets. Any data packet that is longer than 1394 bytes is also fragmented by the controller, which results in up to a 15-percent throughput decrease.

Figure 8. CAPWAP Tunneled Packets