This article was originally published on CloudTweaks on October 15, 2018.

AI to Optimize Network Operations

Increasing numbers of companies have implemented SD-WAN technology, thanks to benefits like higher performance, lower cost, and greater business agility compared to WAN services built on MPLS. But industry experts note that there can be a reality check after the initial honeymoon when they encounter issues – some new, some familiar – when managing their networks. The good news is that with the help of machine learning and artificial intelligence, software-defined networks can ‘learn’ to help operations personnel by using operational data.

According to a survey of network professionals by Dr. Jim Metzler, co-founder and principal analyst with consultancy Ashton, Metzler & Associates, enterprises are having challenges with implementing SD-WAN on their own and integrating the solution with their existing WAN:

  • 25% of respondents reported challenges in troubleshooting performance/configuration issues;
  • 25% of respondents reported increased complexity of operations; and
  • 27% reported that setup and maintenance of policies was more difficult than expected.

Some part of the complexity owes to the process of technology adoption. As noted in the report, “implementing an SD-WAN changes how operations are performed and changing how people work is a complex task.”

It is also true that troubleshooting performance and configuration issues doesn’t get easier with SD-WAN. In some cases, as integration of security, optimization and other functions are added to the vendor’s SD-WAN solution, monitoring and troubleshooting can get more complex.

Unlocking data from disparate systems

One issue is that some SD-WAN products restrict access to data from the platform. This can take different forms, such as limiting the amount of data that can be used to create graphs and limiting what can be imported to spreadsheets, as an example. What customers need is the flexibility to query statistics within any time period they deem relevant – say, a six-month window for a particular circuit – and be able to compare those measurements to results from other circuits. The idea is to allow users to perform quicker root-cause analysis of errors, and the ability to proactively update their network architecture based on data, not guesses.

Having data locked in to your monitoring and analytics systems is a problem for another reason: it creates interoperability issues with other elements of your network, including security platforms, like firewalls. Correlating an issue with a firewall configuration to network performance issues gets that much harder if there are two separate systems for monitoring and analytics.

Elements of a robust digital analytics platform

The issue of separate monitoring and analytics systems extends beyond just networking. Correlating performance issues with applications is another area of concern. Tools for application performance management (APM) have also historically been a separate category with their own datasets. Again, the issue is how to gain insight into interactions between network and application at a given point in time.

What enterprises need is a network telemetry and monitoring agent that can scan and report on both network and application performance issues across underlay and overlay networks. That system should also be able to take data from outside the SD-WAN orchestration platform, and use datasets from switches, routers, firewalls, and other network and application management platforms to feed telemetry into an analytics and visualization platform to identify correlations and run tests.

In short, enterprises need to unlock all of the data in all of the systems that touch the network in order to bolster their ability to operate and maintain the network. Ultimately, though, acting on the information can’t be done by humans alone; it will require artificial intelligence in the form of machine learning.

The promise of AIOps

The larger picture with these questions around data and network operations is how to move from simple alerting to intelligence that enables self-healing, managing and optimization with minimal human intervention.

This is where enhancing IT operations with artificial intelligence (AI) comes into play. The application of AI to IT operations, including configuration management, patching, and debugging and root cause analysis (RCA), is an area of significant promise – enough so that Gartner has defined the emerging market as “AIOps.” These platforms use big data and machine learning to enhance a broad range of IT operations processes, including availability and performance monitoring, event correlation and analysis, IT service management, and automation (Gartner, “Market Guide for AIOps platforms,” August 2017).

Gartner estimates that by 2022, 40 percent of all large enterprises will combine big data and machine learning functionality to support and partially replace monitoring, service desk and automation processes and tasks, up from five percent today.

Unlock network data, unlock business value

Starting with a programmable SD-WAN architecture is an important first step towards greater business agility.  However, the system also needs to leverage data from the application performance and security stack as well as the network infrastructure so that enterprise customers can get a holistic picture of IT operations.

The combination of an SD-WAN-based network infrastructure, Big Data and AI/machine learning processing techniques will help infrastructure and operations teams find performance problems faster. And that means the transformative promise of SD-WAN can be fully realized and give enterprises more time to grow their business.

To learn more about Apcela’s monitoring tools, click here.