Posted under: Research and Analysis
Unlike the tools and processes discussed in previous sections, here we focus on containers in production systems. This includes which images are moved into production repositories, selecting and running containers, and the security of underlying host systems.
- The Control Plane: Our first order of business is ensuring the security of the control plane — the platforms for managing host operating systems, the scheduler, the container client, engine(s), the repository, and any additional deployment tools. Again, as we advised for container build environment security, we recommend limiting access to specific administrative accounts: one with responsibility for operating and orchestrating containers and another for system administration (including patching and configuration management). We recommend network and physical segregation (on-premise) or logical segregation (for cloud and virtual systems). The good news is there are several third-party tools which offer full identity and access management, LDAP/AD integration, and token-based SSO (i.e.: SAML) across systems.
- Resource Usage Analysis: Many readers are familiar with this concept from a performance standpoint, but it offers insigt into basic code security as well. Does the container allow Port 22 (i.e.: Admin) access? Does the container try to update itself? What external systems and utilities does it depend upon? All external resource usage is a potential attack point for attackers so it’s good hygiene to limit these ingress and egress points. To manage the scope of what containers can access, third-party tools can monitor runtime access to environment resources both inside and outside the container. Usage analysis is basically automated review of resource requirements. This is useful in a number of ways — especially for firms moving from a monolithic architecture to microservices. They help developers understand what references they can remove from their code, and help Operations narrow down roles and access privileges.
- Selecting the Right Image: We recommend establishing a trusted image repository and ensuring that your production environment can only pull containers from that trusted source. Ad hoc container management is a good way to facilitate engineers bypassing security controls, so we recommend establishing a central, trusted repositories where production images are stored. We also recommend scripting the process to avoid manual
intervention and ensure that the latest certified container is always selected. This means checking application signatures in your scripts before to putting containers into production, elimiating any manual overhead for verification. Trusted repository and registry services can help by rejecting containers which are not properly signed. Fortunately many options are available, so find one you like. Keep in mind that if you build many containers each day, a manual process will quickly break down. It is okay to have more than one image repository — if you are running across multiple cloud environments, there are advantages to leveraging the native registry in each.
- Immutable Images: Developers often leave shell access to container images, so once in production they can log into the container. The motivation is often for debugging and changing code on the fly, which is a bad thing for consistency and for security. Immutable containers – not allowing SSH connections – prevents changes in runtime. If forces developers to fix code in the development pipeline, and it takes away one of the priciple attack paths attackers use. Attackers looking to take over a container and use it to attack the undelying host or other containers scan for this. We strongly suggest use of immutable containers that do not offer this sort of ‘port 22’ access, and making sure containers are changed in the build process, not in production.
- Input Validation: At startup containers accept parameters, configuration files, credentials, JSON, and scripts. In some more aggressive scenarios, ‘agile’ teams shove new code segments into a container as input variables, making existing containers behave in fun new ways. Validate that input data is suitable and satisfy policy, either manually or using a third-party security tool. You must ensure that each container receives the correct user and group IDs to map to the assigned view at the host layer. Taken together, this prevents someone from forcing a container to misbehave, or simply prevent developers from making dumb mistakes.
- Container Group Segmentation: One of the principle benefits of conatiner management systems is to help scale tasks across pools of shared servers. Each manager platform offers a modular architecture, with scaling performed on node/minion/slave sub-groups, which in turn host a set of containers. Each node forms it’s own logical subnet, limiting network addressibility between sets of containers. This segregation provides a form of ‘blast radius’ to limit what resources a container can communicate with. It is up to the application architects and security teams to leverage this construct to improve security. You can enforce this with network policies of the conatiner manager service, or with network security controls provided by your cloud vendor. Over and above this orchestration manager feature, third party container security tools – running as an agent in the container or as part of the underlying operation system – can provide a type of logical network segmentation further limiting network connections between grouos of containers. Taken together this offers fine grained isolation of the conatiner and the container groups from one another.
- Blast Radius: For those of you running containers in a cloud environment, is to run different containers under different cloud user accounts. This limits the resources available to any given container. If an account or container set is compromised the same cloud service restrictions which prevent tenants from interfering with each other limit damage between accounts and projects. For more information see our reference material on limiting blast radius with user accounts.
Until recently, when someone talked about container security, what they were really talking about is how to secure the hypervisor and unde;ying operating systems. And because of that most articles and presentations you find on container security focuses on this single – albeit important – facet. We believe that runtime security needs to be more than that, and we break the challenge into three areas; host OS hardening, isolation of namespaces, and segregation of workloads by trust level. The following section discusses these three areas.
- Host OS/Kernel Hardening: Hardening a host operating system is how to protect that OS from attacks or misuse. It typically starts with the selection of a hardened variant of the operating system you wish to use. But while these versions come with both secure variants of libraries and features, there will still be work to be donw with baseline configuration and the removal of unneeded features. At a minimun you’ll want to ensure user authentication and roles for access are set, that permissions for binary file access are properly set, logging for audit data is enabled, and that the base OS bundle is fully patched. A review of patching and configuration of the virtualization libraries (such as libcontainer, libvirt, and LXC) that the container engine relies upon to protect itself are fully patched.
- Resource Isolation and Allocation: A critical element to container security is limiting container access to underlying operating system resources, specifically so a container cannot snoop on – or steal – data from other containers. The first step is making sure container priviledges are assigned to a role. While the container engine must run at the root user, you containers must not, so you want too set up user roles for the container groups. Next up is the resource isolation model for containers, which is built atop two concepts; cgroups and namepsaces. Namespaces creates a virtual map of resources any given task is to be provided. It maps specific users and groups to subsets of resources (e.g.: networks, files, IPC, etc.) within their Namespace. We recommend default deny on all inbound requests and only allow only those containers that need to comminicate an open network channel. And it is essential that you not mix container and non-container services on the same machine. You will create specific user IDs for containers and/or group IDs for different classes of containers, and then assign IDs to containers at runtime. A container is then limited to how much of a resouce it is allotted by a Control Group (i.e.: cgroup). The cgroup provides a mechanism to partition tasks into hierarchical groups, and control how much of any particular resouce (e.g.: memory, CPU) a task can use. This helps protect one group of containers from being resource starved by another.
- Segregation of Workloads: We discussed resource isolation at the kernel level, but you should also isolate container engine/OS groups and their containers at the network layer. For container isolation we recommend mapping groups of mutually trusted containers to separate machines and/or network security groups. For containers running critical services or management tools, consider running a limited number of containers per VM and group them by trust level/workload or grouping them into into a dedicated cloud VPC to limit attack surface and minimize an attacker’s ability to pivot should a service or container be compromised. As we mentioned above in Container Group Segregation item above, there are 3rd party and orchesrtration manager features to aid with segmentation. In extreme cases you can consider one container per VM or physical server for on-premise applications, but this defeats some benefits of using containers atop virtual infrastructure.
Platform security and container isolation are both huge fields of study, and we have only scratched the surface. If you want to learn more, OS platform providers, Docker, and many third-party security providers offer best practices, research papers, and blogs with in great detail, often detailing issues with specific operating systems.
Orchestration Manager Security
This research effort is focused on container security, but any discussion of container security now comes from the perspective of securing containers within a specific orchestration management framework. There are many orchestration managers out there: Kubernetes, Mesos, Swarm, as well as cloud native container management systems from AWS, Azure and GCP. Kubernetes is the dominant tool for managing clusters of containers, and with it’s rapid rise in popularity comes many additional concerns for any container security program, both because of the added complexity of the environment, but also the default security of Kubernetes is generously described as ‘poor’. There are publicly demonstrable attacks which show it is possible to gain root access of nodes, escalate priviledges, bypass identity checks, and exfiltrate code, keys and credentials. Our point here is that many of the container managers need lots of tuning to be secure.
We have already discussed security aspects like OS hardening, image safety, Namespaces and network isolation to reduce ‘blast radius’. And we have already discussed hardening contianer code, use of trusted image repositories to keep admins from accidentally running malicious containers, and use of immutable container images to disallow direct shell access. Here we cover specific aspects you should consider to secure you orchestration manager.
- Management Plane Security: Cluster management, regardless if you’re using Swarm, Kubernetes or Apache Meso, will be handle via command line APIs. For example, ‘etcd’ key-value store and ‘kubectl’ controller are fundamental components to managing a Kubernetes cluster, and these tools can be misused in a variety of ways by an attacker. In fact the graphical user interface on some platforms do not require user authentication, so disabling their use is a common best practice. You’ll want to limit who has access to administrative features, but as command line tools such as this can also be imported by developers or attackers, simple access controls are not enough! Network isolation will help protect the master management server, and help control where administrative commands can be run. A combination of network isolation, leveraging more recent IAM services built into the cluster manager (RBAC for Kubernetes), and setting up least privleges for service accounts on nodes.
- Segregation of workloads: We have already reviewed Namepsaces and network isolation, but there are more basic controls that should be put in place. With both on-premise Kubernetes and cloud container deployments, it is common for us to find a flat network architecture. We also find that developers and QA personnel have direct access to production accounts and servers. We strongly recommend segregation of development and production resources in general, and within the production orchestration systemsm to segregate sentive workloads to different nodes or even cluster instances. Second, setting up network security policies or security groups (in AWS parlace) to ‘default deny’ on inbound connections at as a good starting point, and only add specific exceptions that are needed for applications to run. This is the default network policy for most cloud services and is effective at reducing attack surface. Default-deny also helps reduce the liklihood containers try to auto-update themselves from external sources, and deny attackers the ability to upload new attack tools should they gain a foothold in your environment.
- Limiting Discovery: Cluster management tools collect lots of meta data on cluster configuration, containers and nodes within the system. This data is essential for the cluster management server to run, but it is also a map for attackers when probing your system. Limiting what services an users can access meta-data, and ensuring requsting parties are fully authorized helps reduce the attack service. Many of the platforms offer meta-data proxies to help filter and validate requests.
- Upgrade and patch: The engineering efforts behind most container managers have reponded well to known security issues, and as a result, newer versions of cluster managers tend to be far more secure. With virtualization a key element of any cluster, and as these platforms have redumdamcy built in, you can leverage cluster management feature to quickly patch an replace both cluster services as well as containers.
- Logging: We recommend collecting logs from all containers and all nodes. As many attacks focus on privledge escalation and obtaining certificates, we recommend monitoring all identity modification API calls and all failures to detect attacks.
- Test yourself: There are secuirty checkers and CIS security benchmarks for containers and container orchestration managers, which youy can use to get an idea of how well your baseline security stacks up. These are a good initial step when validating cluster security. As the default configurations for container managers tends to be insecure, and most admins are not fully aware of all of the features any given cluster offers, these checkers are a great way to get up to speed on appropriate security controls.
Keep in mind these are very basic recommendations, and we really cannot do this topic justice within the scope of this research paper. That said we really want to raise the readers awareness that there are existing and proven attacks on all of the open source container management systems and there is a considerable amount of work needed to get a cluster security. Beyond the basics, each container manager has it’s own set of specific security issues and nuances on how to protect the cluster from specific types of attack.
When you start up a container, or an orchestration manager for that matter, it will need permissions to communicate with other containers, nodes, databases and other network accessible resources. In highy modular, service oriented architectures, a container without credentials to connect to APIs, encrypt data, or prove it’s identity to other services won’t get much work done. What we don’t want is engineers hard coding secrets into the container, nor do we really want these secrets sitting in files on the server. But provisioning machine identities is tricky; we need to securely pass sensitive data to ephemeral instances as they start up.
The new class of products that address this issue are called ‘Secrets Management’ platforms. These products securely store encryption keys, API certificates, identity tokens, SSL certificates and passwords. These secrets can be shared across groups of trusted services and users, leveraging existing directory services to determine who has access to what. Solutions are readuly available; there are commercial tools available, and many of the orchestration managers and container ecosystem providers (i.e.: Docker) offer a secrets management capability built in.
We cannot fully discuss this topic in the scope of this research paper, so if you need more information please see our complete research work on Secrets Management.
In our last post we will discuss logging and monitoring.
This is a Security Bloggers Network syndicated blog post authored by firstname.lastname@example.org (Securosis). Read the original post at: Securosis Blog