I’ve noticed a trend with almost every company I’ve consulted for: most network engineering does not use abstract design, but rather provisions each element in a concrete manner. This is not a cost effective approach for many reasons.
There is a paradigm associated with designing a network using COTS products that causes network engineering production centers to disregard the conventional engineering process. Consider your automobile. The last time you went to the repair shop, did the mechanic go through the entire car to ensure all the correct parts were installed? Of course not; the VIN let them know what build was used and all cars of that build were identical except for a few items that made them unique. Even the options were identical to the same model with those options. This was not done solely for the benefit of the consumer, but because it is the most cost effective way to manufacture and maintain the vehicle throughout its lifecycle. This principal can be seem in most industries …. except IT. Why is compliance software for network systems so popular and valuable? Because network devices are seldom configured according to a standard. In the cases where they are to some extent, they are configured using templates that have to be applied manually and the variables entered manually, so they still vary. This would be like automotive engineers assembling cars by hand. It isn’t cost effective for a number of reasons.
I was demonstrating a proactive change validation process for a large enterprise customer. They provided me the configlets and the change documentation for an upcoming change. I modeled the current network and applied the proposed changes in the simulation and found several errors that would have made the modified network unable to route traffic. They used templates to create the configlets used in the change, but they used one incorrect template and populated them with some incorrect variables. The change was an upgrade that had been accomplished at many locations and was standardized to some extent. If this were implemented as-is, the implementation engineer would have made the necessary modifications to make the system operational and introduce some degree of variance.
This paradigm exists in the service operations as well as design and provisioning. Tier III often changes the configuration of a device to resolve an incident. The would be like an auto repair shop making changes to your automobile design to fix a problem. It’s not running correctly so they install spark plugs that are different from the manufacturer’s specification. The repair shop wouldn’t do that. Why is it commonplace in IT? This is basically ad-hoc system redesign. If the configuration of a device needs to be changed to resolve a problem, then the system was designed wrong. There are only a few exceptions to this. For example, if a router at a remote site has a hot spare interface it is often configured and disabled. In the event of a failure, the spare interface is enabled, and possible readdressed to take the place of the failed one. This isn’t really a redesign, it is an operational procedure used in a failure scenario.
This problem has a snowball effect. Because there are so many variations in the network design, there is no feasible way to test design improvements. If there were standardization, each variation of the standard systems and sub-systems could be tested in the lab/QA environment. But because there is really no standard, and too many exceptions to any that exist, the only way to adequately test anything would be to replicate the entire network in QA. Because of this, the rate of unsuccessful changes and unexpected impacts to change are extremely high. Management attempts to mitigate this through more rigorous change management, which can not solve this problem and adds delay and effort to the change process. In the end it costs the organization in loss of productivity due to system downtime and unnecessary labor to manage change and resolve incidents caused by change.
The ITIL Service Design process treats a service, such as a network service, similar to the way the automotive industry manages this in the previous examples. When the network is treated as a service that must be subject to this same rigorous engineering process, the result is improved efficiency a high degree of predictability that reduces service disruptions caused by unexpected problems encountered during changes. This requires a great deal more engineering effort during the design and release processes, but the ROI is improved availability and reduction effort during implementation. Implementing the release package becomes a turn-key operation that should be performed by the operations or provisioning team rather than engineering. This paradigm shift often takes some time for an organization to grasp and function efficiently in, but will improve performance and efficiency and paves the way toward automated provisioning.
In order to accomplish this the design must be abstracted in such a manner to express the level of detail necessary to create physical assembly and logical provisioning such as naming, addressing, routing configuration, policy, management configuration, VLAN assignment, etc. This is most certainly possible because all of these things follow a system of logic – they are not arbitrarily assigned.
An example of this can be seen in Windows system deployment and management. In the 90’s if you wanted to install a Windows server, you would insert a disk into the server and go through an installation process. If you were really on your game, you could create a installer init file that answered most of the questions the install utility would ask. Any custom configuration would need to be accomplished manually one machine at a time. The advent of system images and group policy provided a means to abstract the system design in a way that an enterprise can easily provision new systems identically and manage them very efficiently.
While there is no out-of-the-box product that provides a mechanism to abstract the network design in the manner that Windows uses images and GPO, it is certainly not out of reach. The mechanisms to design networks using abstract construct can be developed/integrated and are worth the effort in large environments.
The larger problem is changing the paradigm. I worked on a project where we developed an Operational Support System (OSS) that provided automated provisioning. The customer entered the service order into a CRM system which caused the downstream provisioning system to push out all the necessary config changes to provision the service on the network devices. The system development took us 7 years, but it took just as long to change the organizational mindset to be able to see network design in using abstract constructs.
If you’re looking at implementing capacity planning or hiring someone to do capacity planning there are a few things you should consider.
Capacity planning should be an ongoing part of the lifecycle of any network (or any IT service for that matter). The network was designed to meet a certain capacity knowing that may grow as the network gets larger and/or support more users and services. There are several way to go about this and the best approach is dependent on your situation. There should be some fairly specific plans on how to measure utilization, forecast, report, make decisions, and increase or decrease capacity. There are also many aspects to capacity. Link utilization is one obvious capacity limitation, but processor utilization may not be so obvious, and where VPNs are involved there are logical limits to the volume of traffic that can be handled by each device. There are also physical limitations such as port and patch panel connections, power consumption, UPS capacity, etc. These should all be addressed as an integral part of the network design, and if it has been overlooked, the design needs to be re-evaluated in light of the capacity management program. There are also the programatic aspects – frequency of evaluation, control gates, decision points, who to involve where, etc. This is all part of the lifecycle.
There are a wide variety of tools available for capacity planning and analysis. Which are selected will be determined by the approach you’re taking to manage capacity, how the data is to be manipulated, reported, and consumed, as well as architectural factors such as hardware capabilities, available data, and other network management systems in use. One simple approach is to measure utilization through SNMP and use linear forecasting to predict future capacity requirements. This is very easy to set up, but doesn’t provide the most reliable results. A much better approach is to collect traffic data, overlay it on a dynamic model of the network, then use failure analysis to predict capacity changes as a result of limited failures. This can be combined with linear forecasting; however, failure scenarios will almost always be the determining factor. Many organizations use QoS to prioritize certain classes of traffic over others. This adds yet another dimension to the workflow. There is also traffic engineering design, third party and carrier capabilities, and the behavior of the services supported by the network. It can become more complicated than it might appear at first glance.
Some understanding of the technologies is necessary to evaluate the data and make recommendations on any changes. If dynamic modeling is a tool used to forecast, there are another set of skills. The tools may produce much of the reporting; however, there will need to be some analysis captured in a report that will be evaluated by other elements in the organization requiring communication and presentation skills.
It’s highly unlikely that the personnel responsible for defining the program, gathering requirements, selecting COTS tools, writing middleware, and implementing all this will be the same as those that use the tools or produce the reports or maybe even read the reports and evaluate them. The idea of “hiring a capacity management person” to do all this isn’t really feasible. Those with the skills and motivation to define the program and/or design and implement it will not likely be interested in operating the system or creating the reports. One approach to this is to bring in someone with the expertise to define the approach, design and implement the tools, then train the personnel who will be using them. These engagements are usually relatively short and provide a great value.