Troubleshooting: It’s 2019, are you still blaming the firewall admin?

In 2013, Wired Magazine published an article written by Tufin CTO Reuven Harrison entitled “Can’t Access Your Apps? Don’t Blame the Firewall Admin.” It’s now 2019, but has much changed at your organization? In my conversations with IT professionals, it seems that for the most part we have not moved forward far enough to solve connectivity issues.

Does this troubleshooting scenario still sound all too familiar?

In the earlier article, Reuven gave an example of an outage of a newly-deployed application. The source of the problem is attributed to the firewall team for not opening the right access. But, after hours of manual troubleshooting, the issue is identified, and the actual cause of the outage originated from the application itself. A lot has changed since then, but much has not. Today, applications are developed and deployed faster than ever increasing the scale of the troubleshooting problem and exacerbating the challenge to identify the fault of an outage and fix it quickly.

The repercussions of an application outage range from the more innocuous, like lost productivity to the more serious, such as reputational damage and the real possibility of revenue implications from a breach.

As the network continues to grow more and more complex, with multiple security and cloud vendors to manage, the process only gets harder. Who on your team has the visibility needed to quickly identify and fix the problem?  It is exactly this lack of visibility into the network and the fact that managing security configurations and granting access requests are still done manually that leads to IT Operations leaders continuing to struggle – even in 2019. 

The finger pointing problem: Who is responsible? Network, Security or Application teams?

How many times have you heard your network team complain about having to deal with app teams blaming the network every time they have a problem, or vice versa? The blame game often leads to finger pointing and bad blood between your teams which as you know, is not good for productivity. With all the solutions we have today, why do we still struggle?

Network complexity continues to grow with heterogeneous, hybrid cloud deployments

There is a laundry list of items adding to network complexity including mergers and acquisitions, multinational deployments, multiple security vendors, hybrid cloud, heterogeneous platforms and disparate devices.  For example, on HelpNetSecurity, Ken Elefant, Managing Director at Sorenson Capital, reported that the organizations he spoke with averaged over 80 security vendors.  IDC predicts that by 2020 seventy-five percent of enterprises using public cloud will also use an enterprise private cloud platform. As organizations adopt the cloud, move to containers and microservices, and eventually go serverless, the network will only continue to grow more complex and fragmented, leading to more finger pointing if not managed properly.

With application deployment hitting warp speed, there are more incidents

Last year McAfee reported that the average enterprise has 464 custom applications deployed and that enterprises would develop and deploy an average of 37 new applications in the next 12 months. According to the report, this rapid pace of development represents a 20.5 percent increase in the number of custom applications that are deployed at the average enterprise today. Business is moving fast, and with this rapid rate of growth and change, it becomes even more challenging to find the source and troubleshoot a problem.

It takes too much time and too many resources to find and fix an issue, especially with manual processes

Manual change processes lead to misconfigurations and errors that increase risk and require great effort to troubleshoot. Once a trouble ticket is opened – and let’s face it, it could very well be a firewall issue – the time to identify which rule on which device is blocking connectivity is time consuming and ties up the team from completing other tasks. Then when the problem is finally identified, fixing the issue manually takes up even more of the team’s time. In order to lower risk and improve mean time to repair (MTTR), you must find ways to remove manual change processes.  

How do you break the cycle?  It’s time to resolve connectivity issues faster by automating controls

There are a few key foundational elements that are critical to stopping the blame game and lead to faster resolution of downtime and connectivity issues, no matter if it originates from the firewall or the application. What follows are some recommendations on where to focus your efforts:

Manage your network complexity and increase visibility.

As your networks continue to grow more fragmented, it’s critical to gain central visibility to manage the complexity. To better understand your network, you need to be able to visualize the network topology and application connectivity. This visibility provides the basis for connectivity analysis to discover which device or which rule is blocking connectivity. Once the issue has been identified, your team can jump on fixing the problem faster.

Check application connectivity against security policy.

In may seem obvious, but one often overlooked way to improve troubleshooting accuracy is to decrease your number of overall incidents in the first place. Most of the problems that lead to downtime is because the application has violated an existing rule or doesn’t have the proper access to the services it needs to properly run. To avoid downtime and outages application connectivity changes need to be checked against a security policy before implementation. If you have unified your security policy across your entire environment, you will have consistency and integrity across the hybrid network needed to ensure that the application will not violate any rules and will be provisioned with the right access privileges. 

Automate. Automate. Automate.

Manual change processes lead to mistakes and misconfigurations that can cause application downtime. Automation is essential to realizing increased efficiency and improving the accuracy of your network changes. A more accurate change process means you will lower the overall risk of downtime or outage. In the event of an incident, automation will help accelerate MTTR by responding to an issue in less time and free your team to focus on other projects.

The moral of the story: Don’t blame the firewall admin.

The moral of the story is that in order to improve MTTR and meet very aggressive business goals, your teams will need to work together more cohesively. Finding common ground, specifically through an established security policy, places proper controls in place to find the root cause of the problem before it escalates.  Security policy offers all your teams, whether firewall, cloud, security or application developers the foundation needed to work off the one set of rules to reduce friction, enable faster troubleshooting.

Ultimately, by unifying your security policy across your entire environment, you ensure new connectivity requests are implemented accurately and securely.  Best of all, your teams will have a common language and single source of truth to work together not against each other.

It’s time to take a big leap forward out of 2013 and into 2019 where we can finally, proactively identity and fix connectivity issues before a ticket is issued, and stop blaming the firewall admin.

Take the next step:

To learn more about how to manage your network complexity and improve your MTTR through automation, we encourage you to check out the 5 Clear Signs that You Need Automation webinar.

automate network security



*** This is a Security Bloggers Network syndicated blog from Tufin - Cybersecurity & Agility with Network Security Policy Orchestration authored by Karen Crowley. Read the original post at: https://www.tufin.com/node/2334