NERC posted a lessons learned incident where a registered Reliability Coordinator (RC) experienced a temporary loss of inter-control center communications protocol (ICCP) data feeds from their regional neighbors. This loss of connectivity was due to third-party telecommunications vendor equipment that experienced a malfunction. For nine hours, these data links were intermittently unavailable. There was no adverse effect on the Bulk Electric System.
Data communications services were provided by two major telecommunications service providers (referred to in this document as “Telco A and Telco B”) at two disparate RC data center locations for the purpose of independent redundancy. With this incident, it was discovered that even though the two data centers the RC used for that purpose were geographically disparate, there was a point of convergence for the Telco A connections several hops into their respective networking infrastructure at a northeast regional hub. That common hub location had a hardware failure that affected the RC connections and many other Telco A customers. At both of the RC data centers, Telco A’s connections were considered to be the primary pathway from a network routing perspective.
The RC understood that they had contracted for vendor diversity at both of their geographically disparate locations. There had been testing performed to assure themselves that the connections would restart using the redundant links that were provided if there was a router or circuit failure locally to the RC’s data center. What wasn’t tested was if there was a more pervasive problem within one of the telco providers’ networks that didn’t directly affect the circuits that were installed locally within the RC locations; nor whether the Telcos utilized each other’s facilities at any point in the system they were providing ICCP data. Recommendations to avoid similar issues:
NERC provided the following recommendations.
- When contracting with multiple vendors for data communications services for the purpose of redundancy, one should never assume that geographic diversity alone provides that redundancy. Ensure redundant circuit physical separation and independence of supporting equipment and power for the duration of the service is specified in the contract along with means for verification. Include language to maintain that separation will be preserved if the provider merges with or is sold to another telco.
- Validate the independence by testing with the vendor to attempt to simulate this type of failure to assure that the redundancy in place covers this type of failure scenario.
- Ensure that the data center does not continually automatically “fail back” to a preferred provider under intermittent conditions. Using a sustained signal timer or requiring manual intervention to switch back could suffice.