Why Do We Still Have Application Outages Caused by Expired Certificates?
Tue, 10/23/2018 – 15:34
I was reminded about the level of pain that outages cause everyone just the other day. I was out of the office at an event when I got an urgent call from a large financial organization. They are a customer, but they haven’t decided to invest in visibility of all certificates across their entire infrastructure yet. They were panicking because somewhere in their infrastructure an intermediate and a root certificate had expired. This was bad. Really bad. What it meant was that every certificate that chained up to the root was instantly invalidated—all at once.
After we helped them put out the first fire, I think I was able to piece together what happened. An application went down. When they finally figured out that it was caused by an expired root certificate, they responded immediately by setting up a new PKI to replace the certificates that had expired. Using their new self-signed root certificate, they issued an intermediate certificate. Just getting to this point was a lot of work and their repair job was far from complete. When they called me, they were trying to reissue the leaf certificates and get these out to the application endpoints to get them back to full functionality.
They weren’t entirely sure of all the locations where the old chain was being used. So, they needed help finding everywhere they were using these CA intermediates and CA root certificates throughout their environment. Presumably that would tell them all of the places where they needed to distribute the new chain (this is a common assumption; the reality is that it may or may not have been the case). If they had a full and complete inventory of certificates across their entire organization, they could have accessed this information in minutes, located the owners and installation locations of all certificates that were impacted. And, if they added automated capabilities to their machine identity protection program, they could have triggered actions that would have removed, replaced, installed and validated all impacted certificates very quickly.
One of the reasons that this example stood out to me is that I would hope that all trusted root and intermediate Certificate Authorities (CAs) within an organization would be on the radar, if not under the purview, of the PKI team or the security team as a whole. It seems that this may not have been the case on this particular occasion. In fact, it seems apparent to me that these root and intermediate certificates may, in fact, have come from a different team — maybe another application team.
At Venafi, we solve these kinds of problems all the time. We allow you to take any CA certificate, import the certificate into our platform and monitor expiration dates as well as signs of misuse. Indeed, if this particular organization had taken the time to put all of their CA certificates into our platform, we could have warned them months in advance to get their security and audit teams together to start planning out a key renewal ceremony for their certificate authorities. Instead, they were left scrambling to repair a preventable (and expensive) outage.
How complete is your certificate inventory? Are you sure?
Let’s face it, there’s one machine identity challenge that continues to plague large enterprises—certificate-related outages. They consume an inordinate amount of time and resources to fix, and to make matters worse, they are actually quite difficult to diagnose. When an application goes down, your IT and security response teams may follow several false avenues of investigation before identifying an expired certificate as the culprit. All this adds up to a huge drain on availability, not to mention productivity.
So, what do you do? Budget for fixing outages? Well, I guess that’s practical in a twisted sort of way, but to my way of thinking it’s also sadly defeatist. It makes a whole lot more sense to get comprehensive visibility across your entire inventory of certificates since this gives you the information you need to eliminate certificate-related outages entirely. Don’t believe me? We’ve had large customers go from hundreds of expired certificate outages a year to zero when they stepped up and took charge. It’s not impossible. But it does take persistence and dedication (not to mention the right tools).
*** This is a Security Bloggers Network syndicated blog from Rss blog authored by kdobieski. Read the original post at: https://www.venafi.com/blog/why-do-we-still-have-application-outages-caused-expired-certificates