CCNP SWITCH/Implementing High Availability in a Campus Network

From Teknologisk videncenter
Revision as of 13:44, 31 August 2011 by Rael (talk | contribs) (High Availability and Failover Times)
Jump to: navigation, search

Understanding Hign Availability

High availability is technology that enables networkwide resilience to increase IP network availability. Network applications must cross different network segments, from the Enterprise Backbone, Enterprise Edge, and Service Provider Edge, through the Service Provider Core. All segments must be resilient to recover quickly enough for faults to be transparent to users and network applications.

Redundancy

A redundant design can use several mechanisms to prevent single points of failure:

  • Geographic diversity and path diversity are often included.
  • Dual devices and links are common, as shown in Figure 5-2.
  • Dual WAN providers are common.
  • Dual data centers are sometimes used, especially for large companies and large e-commerce sites.
  • Dual collocation facilities, dual phone central office facilities, and dual power substations can be implemented.

Redundant design must trade off cost versus benefit. It takes time to plan redundancy and verify geographic diversity of service providers. Additional links and equipment cost money to purchase and maintain. These options must be balanced against risks, costs of downtime, and so on.

Technology

Several Cisco routing continuity options, such as Cisco Nonstop Forwarding (NSF) and Stateful Switchover (SSO) exist, and graceful restart capabilities improve availability. Techniques exist to detect failure and trigger failover to a redundant device. These tech- niques include service monitoring for Cisco IOS IP Service Level Agreements (SLA) and Object Tracking.

People

In the Prepare, Plan, Design, Implement, Operate, and Optimize (PPDIOO) methodology, the people component is vitally important, too. Staff work habits and skills can impact high availability. For example, attention to detail enhances high availability, whereas carelessness hurts availability.

Processes

Sound, repeatable processes can lead to high availability. Continual process improvement as part of the PPDIOO methodology plays a role in achieving high availability.

Organizations should build repeatable processes in the following ways:

  • By documenting change procedures for repeated changes (for example, Cisco IOS Software upgrades)
  • By documenting failover planning and lab testing procedures
  • By documenting the network implementation procedure so that the process can be revised and improved the next time components are deployed.

Organizations should use labs appropriately, as follows:

  • Lab equipment should accurately reflect the production network.
  • Failover mechanisms are tested and understood.
  • New code is systematically validated before deployment.

Because staff members tend to ignore processes that consume alot of time or appear to be a waste of time, organizations also need meaningful change controls in the following ways:

  • Test failover and all changes before deployment.
  • Plan well, including planning rollbacks in detail.
  • Conduct a realistic and thorough risk analysis.
  • Perform regular capacity management audits.
  • Track and manage Cisco IOS versions.
  • Track design compliance as recommended practices change.
  • Develop plans for disaster recovery and continuity of operations.

Tools

Organizations are starting to monitor service and component availability. With proper failover, services should continue operating when single components fail. Without component monitoring, a failure to detect and replace a failed redundant component can lead to an outage when the second component subsequently fails.

  • Network diagrams help in planning and in fixing outages more quickly.
  • Documentation explaining how and why the network design evolved helps capture knowledge that can be critical when a different person needs to make design changes.
  • Key addresses, VLANs, and servers should be documented.
  • Documentation tying services to applications and virtual and physical servers can be incredibly useful.

High Availability and Failover Times

The overall failover time in the data center is the combination of convergence at Layer2, Layer3, and Layer4 components.

Failover Time of High-Availability Protocols

Different failover times:

  • OSPF/EIGRP=subsecond
  • RSTP=1 sec
  • EtherChannel=1 sec
  • HSRP=subsecond
  • Service modules(FWSM)=3-5 sec
  • Windows TCP/IP=9 sec

Optimal Redundancy

Optimal Redundancy in Multilayer Network

Avoid Too Much Redundancy

To Much Redundancy

Cisco NSF with SSO

Cisco NSF with SSO is a supervisor redundancy mechanism in Cisco IOS Software that enables extremely fast supervisor switchover at Layers 2 to 4. SSO enables the standby route processor (RP) to take control of the device after a hardware or software fault on the active RP. SSO synchronizes startup configuration, startup variables, and running configuration.

Cisco NSF enables for the continued forwarding of data packets along known routes while the routing protocol information is being restored following a switchover. Cisco NSF devices do not experience routing flaps because the interfaces remain up during a switchover and adjacencies do not reset.
After the routing protocols have converged, CEF updates the FIB table and removes stale route entries.

Distributed VLANs on Access Switches

If the enterprise campus requirements must support VLANs spanning multiple access layer switches, the design model uses a Layer 2 link for interconnecting the distribution switches, as shown in Figure 5-9. This design is more complex than the Layer 3 interconnection of the distribution switches. The Spanning Tree Protocol (STP) convergence process initiates for uplink failures and recoveries.

Distributed VLANs

You should take the following steps to improve this suboptimal design:

  • Use Rapid STP (RSTP) as the version of STP.
  • Provide a Layer 2 trunk between the two distribution switches to avoid unexpected traffic paths and multiple convergence events.
  • Place the Hot Standby Router Protocol (HSRP) primary and the STP primary root on the same distribution layer switch if you choose to load balance VLANs across uplinks. The HSRP and RSTP root should be colocated on the same distribution switches to avoid using the interdistribution link for transit.

Local VLANs on Access Switches

In this time-proven topology, no VLANs span between access layer switches across the distribution switches. A subnet equals a VLAN that, inturn, equals an accesss witch because VLAN is restricted to one accesss witch only. The root for each VLAN is aligned with the active HSRP instance.

Local VLANs

Layer 3 Access to the Distribution Interconnection

As with Local VLANs no vlans span access layer switches and the access layer switches could be layer 3 switches and have layer 3 links between access and distribution interconnect.

Layer 3 interconnect

StackWise Access Switches

StackWise

Summary