Broadband Reliability
By Ike Elliott
Leveraging Design, Monitoring and Test Automation to Minimize Disruptions
Reliable Internet access is more important than ever. In a 2021 Ernst & Young Survey, 58% of American broadband users now think that reliability is more important than speed. The COVID-19 pandemic taught us the importance of reliable connectivity as many began to work from home or attend school remotely. But even if the pandemic wanes, broadband users will not forget the importance of network reliability.
When your Internet goes down, it is more than just a nuisance. Deloitte estimates that “the daily economic impact of a shutdown of the Internet and Internet-based services in a highly connected country is an average of $23.6 million per 10 million population.”
Unfortunately, only 31% of households believe that their Internet service provider proactively gives effective tips on how to ensure a reliable connection, according to the same 2021 Ernst & Young Survey. Users are begging for help – but what can providers do to improve the reliability of their services and also improve their reputations?
In recent years, we’ve seen the emergence of network reliability engineering (NRE) as a specialist role, focused on a new approach to network operations resulting in improved network reliability. NRE is an implementation of many of the core principles of DevOps, including
- placing all operational knowledge and code in a shared repository with strong version control,
- automating everything that makes sense to automate, especially if the automation provides safeguards that are often overlooked,
- monitoring and measuring the entire system, aimed at outcomes rather than network elements,
- designing the system with a recognition that failures will occur, with automated failure recovery if possible; and,
- employing automated testing procedures prior to deploying new software or hardware into the system.
Some of the biggest gains in network reliability can be achieve through the last two elements on the list, because they are two things that providers may not be doing yet, and they are feasible with a focused effort and with the adoption of new technologies.
Designing for failure with automated recovery
Most providers and end users understand that networks will experience failures over time. Whether it is a natural disaster like a hurricane that topples utility poles and cuts communications lines, a fiber cut caused by a backhoe during road construction, a power outage, or the failure of a piece of network equipment, network outages do happen. We know we can’t eliminate service disruptions, but by designing for failures, we can minimize the disruption caused to end users.
One way that providers are designing for failure is by leveraging the fact that many end users subscribe to two Internet access services:
- one from a fixed, often wired, broadband service provider that typically serves a household or business; and,
- the other from a mobile service provider for smartphones or other mobile devices.
As a good example of designing for failure, many businesses use custom gateway hardware at their locations that have a primary and secondary network connection, often using the fixed service as the primary network, and the mobile service as the secondary connection. If the fixed service goes down, the gateway hardware automatically begins to use the mobile service for Internet access. If the fixed service is restored, the gateway hardware automatically switches back to the fixed service. This has been an effective tool for improving reliability, but it comes with a price tag: the gateway hardware that has this capability is a lot more costly than a gateway that supports only a fixed network service. That is why these services have not been widely used in the residential broadband market.
The situation is changing, though, with the introduction of software-based solutions that can detect a fixed network outage on a fixed network gateway and automatically failover to a smartphone hotspot in the home. Kyrio’s Adaptive Route ControlTM (ARC) Hotspot product is an example of such a service. It works by pairing one or more smartphones with the broadband gateway in the home, and when the fixed network goes down, the gateway signals to a smartphone to start its hotspot. All traffic received by the gateway is relayed through the smartphone hotspot to the Internet, and vice versa. When the fixed service is restored, the traffic is switched back to the fixed service. Solutions like this have two advantages: they are easy to deploy because the solution is software-only, and they are inexpensive enough to be deployed to residential broadband subscribers.
Automated testing
The networks we have built are not the entirety of the system. For example, if you think of the network as the routers, switches, gateways, OLTs, CMTSs, eNodeB’s/gNodeB’s, mobile cores, and the connections between them all with cables and wires, that network will not operate correctly without the ability to provision and configure those network elements. The entire system needs to include both the network elements and the provisioning and configuration elements (and the monitoring and measurement elements, etc.).
Many of the outages in these “network systems” are caused by changes we make to those systems, especially when those changes haven’t been tested in a system context prior to implementation in the production network. The changes may be a new software load on a network element or may simply be a configuration change that someone assumes is harmless, but in fact is dangerous.
At this point, you might be thinking “Ok, I can see that we need to test more to achieve this vision of testing changes in a system context, but if we test the whole system, won’t that be super-expensive and time consuming, and won’t it delay all of the network changes and product rollouts we need to make?” That is where test automation comes in. With an appropriate focus and investment in building automated testing that consider the whole system context, what once seemed insurmountable becomes achievable.
One way to start down this path is to think of your test infrastructure as a digital twin of your production network. At Kyrio, we have helped many operators build parts of their digital twins in our labs, leveraging shared equipment (because many operators use the same equipment as other operators), and automatically (and sometimes manually) re-configuring it to match a particular operator’s “network system.” This often requires connecting the Kyrio lab to the operator’s lab in order to leverage instances of equipment and software there. You can imagine that this approach heavily relies on the DevOps principle of keeping all network configs, code, and software version tracking in a shared repository – otherwise there would be too many variables for a human to track.
Historically, Kyrio has done this kind of work for DOCSIS, PON, and Wi-Fi networks but is now beginning to add mobile networks to the capability set with Kyrio’s support of CableLabs and the NTIA on the 5G Challenge work. As part of that effort, Kyrio became the first O-RAN Alliance Open Testing and Integration Centre in the Americas. As with any other network, the disaggregated network interfaces defined by the O-RAN specification are actually part of a bigger system and ideally should be tested in that broader context, leveraging test automation to scale the testing effort.
Conclusion
Network reliability is becoming a critical differentiator for network operators, and key to attracting and retaining subscribers. Designing for failure and automating testing of network systems are key principles that operators are implementing to increase reliability. Kyrio continues to play an important role helping operators achieve higher reliability and better customer experiences.
Ike Elliott,
President & CEO, Kyrio
Ike Elliott is President & CEO of Kyrio, a growing provider of services for the broadband industry, including software and testing services. Ike joined Kyrio in late 2020 after leading the CableLabs Strategy team for a decade. Before joining CableLabs and during his 37 years in the communications industry, Ike held senior executive operating, strategy, and technology roles at Level 3 Communications, Unity Business Networks, and MCI. Ike holds 36 patents and bachelor’s and Master’s degrees in Computer Science.
Shutterstock