I’ve noticed a trend with almost every company I’ve consulted for: most network engineering does not use abstract design, but rather provisions each element in a concrete manner. This is not a cost effective approach for many reasons.
There is a paradigm associated with designing a network using COTS products that causes network engineering production centers to disregard the conventional engineering process. Consider your automobile. The last time you went to the repair shop, did the mechanic go through the entire car to ensure all the correct parts were installed? Of course not; the VIN let them know what build was used and all cars of that build were identical except for a few items that made them unique. Even the options were identical to the same model with those options. This was not done solely for the benefit of the consumer, but because it is the most cost effective way to manufacture and maintain the vehicle throughout its lifecycle. This principal can be seem in most industries …. except IT. Why is compliance software for network systems so popular and valuable? Because network devices are seldom configured according to a standard. In the cases where they are to some extent, they are configured using templates that have to be applied manually and the variables entered manually, so they still vary. This would be like automotive engineers assembling cars by hand. It isn’t cost effective for a number of reasons.
I was demonstrating a proactive change validation process for a large enterprise customer. They provided me the configlets and the change documentation for an upcoming change. I modeled the current network and applied the proposed changes in the simulation and found several errors that would have made the modified network unable to route traffic. They used templates to create the configlets used in the change, but they used one incorrect template and populated them with some incorrect variables. The change was an upgrade that had been accomplished at many locations and was standardized to some extent. If this were implemented as-is, the implementation engineer would have made the necessary modifications to make the system operational and introduce some degree of variance.
This paradigm exists in the service operations as well as design and provisioning. Tier III often changes the configuration of a device to resolve an incident. The would be like an auto repair shop making changes to your automobile design to fix a problem. It’s not running correctly so they install spark plugs that are different from the manufacturer’s specification. The repair shop wouldn’t do that. Why is it commonplace in IT? This is basically ad-hoc system redesign. If the configuration of a device needs to be changed to resolve a problem, then the system was designed wrong. There are only a few exceptions to this. For example, if a router at a remote site has a hot spare interface it is often configured and disabled. In the event of a failure, the spare interface is enabled, and possible readdressed to take the place of the failed one. This isn’t really a redesign, it is an operational procedure used in a failure scenario.
This problem has a snowball effect. Because there are so many variations in the network design, there is no feasible way to test design improvements. If there were standardization, each variation of the standard systems and sub-systems could be tested in the lab/QA environment. But because there is really no standard, and too many exceptions to any that exist, the only way to adequately test anything would be to replicate the entire network in QA. Because of this, the rate of unsuccessful changes and unexpected impacts to change are extremely high. Management attempts to mitigate this through more rigorous change management, which can not solve this problem and adds delay and effort to the change process. In the end it costs the organization in loss of productivity due to system downtime and unnecessary labor to manage change and resolve incidents caused by change.
The ITIL Service Design process treats a service, such as a network service, similar to the way the automotive industry manages this in the previous examples. When the network is treated as a service that must be subject to this same rigorous engineering process, the result is improved efficiency a high degree of predictability that reduces service disruptions caused by unexpected problems encountered during changes. This requires a great deal more engineering effort during the design and release processes, but the ROI is improved availability and reduction effort during implementation. Implementing the release package becomes a turn-key operation that should be performed by the operations or provisioning team rather than engineering. This paradigm shift often takes some time for an organization to grasp and function efficiently in, but will improve performance and efficiency and paves the way toward automated provisioning.
In order to accomplish this the design must be abstracted in such a manner to express the level of detail necessary to create physical assembly and logical provisioning such as naming, addressing, routing configuration, policy, management configuration, VLAN assignment, etc. This is most certainly possible because all of these things follow a system of logic – they are not arbitrarily assigned.
An example of this can be seen in Windows system deployment and management. In the 90’s if you wanted to install a Windows server, you would insert a disk into the server and go through an installation process. If you were really on your game, you could create a installer init file that answered most of the questions the install utility would ask. Any custom configuration would need to be accomplished manually one machine at a time. The advent of system images and group policy provided a means to abstract the system design in a way that an enterprise can easily provision new systems identically and manage them very efficiently.
While there is no out-of-the-box product that provides a mechanism to abstract the network design in the manner that Windows uses images and GPO, it is certainly not out of reach. The mechanisms to design networks using abstract construct can be developed/integrated and are worth the effort in large environments.
The larger problem is changing the paradigm. I worked on a project where we developed an Operational Support System (OSS) that provided automated provisioning. The customer entered the service order into a CRM system which caused the downstream provisioning system to push out all the necessary config changes to provision the service on the network devices. The system development took us 7 years, but it took just as long to change the organizational mindset to be able to see network design in using abstract constructs.