This article is under development....

Understanding Hign Availability

High availability is technology that enables networkwide resilience to increase IP network availability. Network applications must cross different network segments, from the Enterprise Backbone, Enterprise Edge, and Service Provider Edge, through the Service Provider Core. All segments must be resilient to recover quickly enough for faults to be transparent to users and network applications.

Redundancy

A redundant design can use several mechanisms to prevent single points of failure:

Geographic diversity and path diversity are often included.
Dual devices and links are common, as shown in Figure 5-2.
Dual WAN providers are common.
Dual data centers are sometimes used, especially for large companies and large e-commerce sites.
Dual collocation facilities, dual phone central office facilities, and dual power substations can be implemented.

Redundant design must trade off cost versus benefit. It takes time to plan redundancy and verify geographic diversity of service providers. Additional links and equipment cost money to purchase and maintain. These options must be balanced against risks, costs of downtime, and so on.

Technology

Several Cisco routing continuity options, such as Cisco Nonstop Forwarding (NSF) and Stateful Switchover (SSO) exist, and graceful restart capabilities improve availability. Techniques exist to detect failure and trigger failover to a redundant device. These tech- niques include service monitoring for Cisco IOS IP Service Level Agreements (SLA) and Object Tracking.

People

In the Prepare, Plan, Design, Implement, Operate, and Optimize (PPDIOO) methodology, the people component is vitally important, too. Staff work habits and skills can impact high availability. For example, attention to detail enhances high availability, whereas carelessness hurts availability.

Processes

Sound, repeatable processes can lead to high availability. Continual process improvement as part of the PPDIOO methodology plays a role in achieving high availability.

Organizations should build repeatable processes in the following ways:

By documenting change procedures for repeated changes (for example, Cisco IOS Software upgrades)
By documenting failover planning and lab testing procedures
By documenting the network implementation procedure so that the process can be revised and improved the next time components are deployed.

Organizations should use labs appropriately, as follows:

Lab equipment should accurately reflect the production network.
Failover mechanisms are tested and understood.
New code is systematically validated before deployment.

Because staff members tend to ignore processes that consume alot of time or appear to be a waste of time, organizations also need meaningful change controls in the following ways:

Test failover and all changes before deployment.
Plan well, including planning rollbacks in detail.
Conduct a realistic and thorough risk analysis.
Perform regular capacity management audits.
Track and manage Cisco IOS versions.
Track design compliance as recommended practices change.
Develop plans for disaster recovery and continuity of operations.

Tools

Organizations are starting to monitor service and component availability. With proper failover, services should continue operating when single components fail. Without component monitoring, a failure to detect and replace a failed redundant component can lead to an outage when the second component subsequently fails.

Network diagrams help in planning and in fixing outages more quickly.
Documentation explaining how and why the network design evolved helps capture knowledge that can be critical when a different person needs to make design changes.
Key addresses, VLANs, and servers should be documented.
Documentation tying services to applications and virtual and physical servers can be incredibly useful.

High Availability and Failover Times

The overall failover time in the data center is the combination of convergence at Layer2, Layer3, and Layer4 components.

Failover Time of High-Availability Protocols

Different failover times:

OSPF/EIGRP=subsecond
RSTP=1 sec
EtherChannel=1 sec
HSRP=subsecond
Service modules(FWSM)=3-5 sec
Windows TCP/IP=9 sec

Optimal Redundancy

Optimal Redundancy in Multilayer Network

Avoid Too Much Redundancy

To Much Redundancy

Cisco NSF with SSO

Cisco NSF with SSO is a supervisor redundancy mechanism in Cisco IOS Software that enables extremely fast supervisor switchover at Layers 2 to 4. SSO enables the standby route processor (RP) to take control of the device after a hardware or software fault on the active RP. SSO synchronizes startup configuration, startup variables, and running configuration.

Cisco NSF enables for the continued forwarding of data packets along known routes while the routing protocol information is being restored following a switchover. Cisco NSF devices do not experience routing flaps because the interfaces remain up during a switchover and adjacencies do not reset.
After the routing protocols have converged, CEF updates the FIB table and removes stale route entries.

Distributed VLANs on Access Switches

If the enterprise campus requirements must support VLANs spanning multiple access layer switches, the design model uses a Layer 2 link for interconnecting the distribution switches, as shown in Figure 5-9. This design is more complex than the Layer 3 interconnection of the distribution switches. The Spanning Tree Protocol (STP) convergence process initiates for uplink failures and recoveries.

Distributed VLANs

You should take the following steps to improve this suboptimal design:

Use Rapid STP (RSTP) as the version of STP.
Provide a Layer 2 trunk between the two distribution switches to avoid unexpected traffic paths and multiple convergence events.
Place the Hot Standby Router Protocol (HSRP) primary and the STP primary root on the same distribution layer switch if you choose to load balance VLANs across uplinks. The HSRP and RSTP root should be colocated on the same distribution switches to avoid using the interdistribution link for transit.

Local VLANs on Access Switches

In this time-proven topology, no VLANs span between access layer switches across the distribution switches. A subnet equals a VLAN that, inturn, equals an accesss witch because VLAN is restricted to one accesss witch only. The root for each VLAN is aligned with the active HSRP instance.

Local VLANs

Layer 3 Access to the Distribution Interconnection

As with Local VLANs no vlans span access layer switches and the access layer switches could be layer 3 switches and have layer 3 links between access and distribution interconnect.

Layer 3 interconnect

StackWise Access Switches

StackWise

Implementing Network Monitoring

When designing a campus network, when redundancy is created, managing redundancy can be achieved by monitoring the network, through SNMP and Syslog (System Logging), and testing connectivity with an IP SLA.

Syslog

The Cisco IOS system message logging (syslog) process enables a device to report and save important error and notification messages, either locally or to a remote logging server. Syslog messages can be sent to local console connections, the system buffer, or remote syslog servers, as shown in Figure 5-18. Syslog enables text messages to be sent to a syslog server using UDP port 514.

Cisco IOS reporting options

show logging

Configuring Syslog

To configure syslog server, use the command logging ip-of-syslog-server
To configure the severity level on messages sent to the syslog server use logging trap level To configure local logging options use logging buffered ?

Simple Network Management Protocol(SNMP)

SNMP contains three elements:

Network Managment Agent
SNMP Agents
MIB Databases

SNMP Overview

SNMP Versions

SNMPv1 is defined in RFC 1157. And supports 5 basic SNMP message types:

Get Request:Used to request the value of a specific MIB variable from the agent.
Get Next Request:Used after the initial Get Request to retrieve the next object instance from a table or a list.
Set Request:Used to set a MIB variable on an agent.
Get Response:Used by an agent to respond to a Get Request or Get Next Request from a manager.
Trap:Used by an agent to transmit an unsolicited alarm to the manager. An agent sends a Trap message when a certain condition occurs, such as a change in the state of a device, a device or component failure, or an agent initialization or restart.

SNMPv2

NMPv2 was introduced with RFC 1441, but members of the Internet Engineering Task Force (IETF) subcommittee could not agree on the security and administrative sections of the SNMPv2 specification. Community-based SNMPv2 (SNMPv2C), defined in RFC 1901, is the most common implementation.
SNMPv2 introduces 2 new message types:

Get Bulk Request:Reduces repetitive requests and replies and improves performance when you are retrieving large amounts of data (for example, tables).
Inform Request:Alert an SNMP manager of specific conditions. Unlike SNMP Trap messages, which are unconfirmed, the NMS acknowledges an Inform Request by sending an Inform Response message back to the requesting device.

SNMPv2 adds new data types with 64-bit counters.

SNMPv3

SNMPv3 is described in RFCs 3410 through 3415. It adds methods to ensure the secure transmission of critical data between managed devices.
SNMPv3 introduces three levels of security:

noAuthNoPriv:No authentication is required, and no privacy(encryption) is provided.
authNoPriv:Authentication is based on Hash-based Message Authentication Code with Message Digest 5 (HMAC-MD5) or Hash-based Message Authentication Code with Secure Hash Algorithm (HMAC-SHA). No encryption is provided.
authPriv:In addition to authentication, Cipher Block Chaining-Data Encryption Standard (CBC-DES) encryption is used as the privacy protocol.

SNMP Recommendations

Configure SNMP access lists
Configure SNMP Community strings
Configure SNMP trap receiver
Configure SNMPv3 user

Switch(config)# access-list 100 permit ip 10.1.1.0 0.0.0.255 any
Switch(config)# snmp-server community cisco RO 100
Switch(config)# snmp-server community xyz123 RW 100
Switch(config)# snmp-server trap 10.1.1.50

IP Service Level Agreement

An SLA specifies connectivity and performance agreements for an end-user service from a service provider. The SLA typically outlines the minimum level of service and the expected level of service.

IP SLA Measurements

The IP SLA measurement functionality in Cisco IOS Software enables configuration of a router to send synthetic traffic to a host computer or a router that has been configured to respond.

IP SLA in Cisco IOS

IP SLA measurements could be:

Network latency and response time
Packet loss statistics
Network jitter and voice quality scoring
End-to-end network connectivity

IP SLA Responder Timestamps

IP SLA Timestamps

IP SLA Configuration

Switch(config)# <input>ip sla monitor 11</input>
Switch(config-sla)# <input>type echo prot ipIcmpEcho 10.1.1.1 source-int fa0/1</input>
Switch(config-sla)# <input>frequncy 10</input>
Switch(config0sla)# <input>exit</input>
Switch(config)# <input>ip sla monitor schedule 11 life forever start-time now</input>
Switch(config)# <input>track 1 ip sla 11 reachability</input>
Switch# <input>show ip sla statistics</input>
Round Trip Time (RTT) for Index 1
Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: 11:11:22.533 eastern Thu Jul 9 2010
Latest operation return code: Timeout
Over thresholds occurred: FALSE
Number of successes: 177
Number of failures: 6
Operation time to live: Forever
Operational state of entry: Active
Last time this entry was reset: Never

Implementing Redundant Supervisor Engines in Catalyst Switches

The Supervisor Engine is the most important component in Catalyst modular switches, which are typically found in the campus backbone and building distribution submodules.
Provisioning dual Supervisor Engines within a Catalyst family of switches, ensures high availability by providing redundancy without requiring the deployment of an entire separate switch.

RPR (Route Processor Redundancy) and RPR+ (only on Catalyst 6500): Route Processor Redundancy (RPR) was the first form of high availability feature in Cisco IOS Software. Failover time= 30sec - 4 min
SSO (Stateful SwitchOver): RPR and RPR+ recover traffic forwarding of the switch in about a minute after a switchover of the Supervisor Engine.
NSF (Non-Stop Forwarding) with SSO

First Hop Redundancy Protocols

CCNP_3_Implementing_High_Availability_in_a_Campus_Environment

Summary

Navigation menu

CCNP SWITCH/Implementing High Availability in a Campus Network