How to train your network: the role of artificial intelligence in network operations

This article was originally published on ITProportal on September 13, 2018.

With the help of machine learning and AI, software-defined networks could soon aid businesses with network management.

A network that can fix and optimize itself without human intervention could become a reality soon – but not without some training. With the help of machine learning and artificial intelligence, software-defined networks can learn to help with network management by using operational data.  Initial application of AI to WAN operations includes security functions such as DDoS attack mitigation as well as near real-time, automated path selection, and eventually AI-defined network topologies and basic operations essentially running on ‘auto-pilot’.

Enhancing IT operations with artificial intelligence (AI), including configuration management, patching, and debugging and root cause analysis (RCA) is an area of significant promise – enough so that Gartner has defined the emerging market as “AIOps”. These platforms use big data and machine learning to enhance a broad range of IT operations processes, including availability and performance monitoring, event correlation and analysis, IT service management, and automation (Gartner “Market Guide for AIOps platforms,” August 2017).

Gartner estimates that by 2022, 40 percent of all large enterprises will combine big data and machine learning functionality to support and partially replace monitoring, service desk and automation processes and tasks, up from five percent today.

Limits of automation and policy for NetOps

Given the traditional split between APM (application performance management) and NPM (network performance management), even the best network management tools aren’t always going to help trace the root cause of every application and service interruption. There can be interactions between network and application that give rise to an issue, or a router configuration and issue with a service provider that’s impacting application performance.

Network operations personnel might respond to an incident by setting policies in the APM or NPM systems that will alert us when an unwanted event is going to happen again. The issue with policy-based management is that it is backwards looking. That’s because historical data is used to create into policies that should prevent something from happening again. Yet, policy is prescriptive; it doesn’t deal with unanticipated conditions. Furthermore, changes in business goals again more human intervention if there isn’t a matching rule or pre-defined action.

On the whole, SD-WAN services represent an improvement over management of MPLS networks. Still, the use of an SD-WAN isn’t without its own challenges. Depending on the number of locations that have to be linked, there can be some complexity in managing virtual network overlays. The use of on-demand cloud services adds another layer of complexity. Without sufficient monitoring tools, problems can escalate and result in downtime. At the same time, adding people means adding cost, and potentially losing some of the cost efficiencies of SD-WAN services.

AI is way forward for SD-WAN management

What would AIOps bring to SD-WAN management?

Starting with a programmable SD-WAN architecture is an important first step towards a vision of autonomous networking. Programmable in this case means API-driven, but the system also needs to leverage data from the application performance and security stack as well as the network infrastructure as inputs into the system so that we can move from simple alerting to intelligence that enables self-healing, managing and optimization with minimal human intervention.

Monitoring all elements in the system in real time (or at least near real time) will require storing and analyzing huge amounts of data. On the hardware side, cloud IaaS services have made that possible. Acting on the information will require artificial intelligence in the form of machine learning.

Use Cases for AI in SD-WAN

There are a variety of ways to apply machine learning algorithms to large datasets from supervised to unsupervised (and points in between) with the result being applications in areas such as:

  • Security, where unexpected network traffic patterns and patterns of requests against an application can be detected to prevent DDoS attacks.
  • Enhancing performance of applications over the internet network with optimized route selection.

Looking more closely at security as a use case, how would AI and ML be able to augment security of SD-WANs? While the majority of enterprises are still trying to secure their networks with on-premise firewalls and DDoS mitigation appliances, they are also facing attacks that are bigger and more sophisticated. According to statistics gathered by Verisign last year:

  • DDoS attacks peaked at over 5Gbps approximately 25% of the time
  • During Q3 2017, 29% of attacks combined five or more different attack types.

Challenge: A multi-vector attack on an enterprise network has affected service availability in Europe.

Response: Application of AIOps to the SD-WAN underlay can automate the response to the attack. Instead of manually re-configuring systems, the network can automatically direct traffic to different traffic scrubbing centers based on real-time telemetry around network and peering point congestion, mitigation capacity, and attack type/source. Because the system can process data from outside sources at speeds far beyond human ability to manage the network, the system can adjust traffic flows back to normal transit routes as soon as the attack subsides, saving money on the cost of attack mitigation. AI and ML in conjunction with a programmable SD-WAN are capable of responding more quickly and in more granular fashion than is possible with standard policy-based “automatic detection” and mitigation techniques.

Where does AI in network go next?

Although the industry is still in the early days of applying machine learning to networking, there are a number of efforts underway to keep an eye on. One is the Telecom Infra Project (TIP), founded by Facebook and telecom first firms such as Deutsche Telecom and SK Telecom, which now counts several hundred other companies as members. The TIP recently started collaborating on AI with an eye towards predictive maintenance and dynamic allocation of resources. Important groundwork for the project will include defining common dataset formats that are used to train systems. That work could lead to further sharing of data between network providers and web companies, offering the prospect of significant improvements to security and threat detection for enterprises and consumers.

Further in the future, we might expect to see an AI designed network topology, combined with SDN control over resources. Networking will have moved from a paradigm of self-contained networks to a network ‘awareness’ overlay which enables coordinated, intelligent actions based on operator intention. Network engineers can put the system on ‘auto-pilot’ during everyday computing, and instead spend time orchestrating resources based on the goals of the business.

Read more about Apcela’s monitoring tools.

How can we help? 

We love talking about software-defined networks and the cloud! Let us know if we can help by filling out the form. Cheers!