Home » Security Bloggers Network » EKS vs GKE vs AKS – Evaluating Kubernetes in the Cloud

EKS vs GKE vs AKS – Evaluating Kubernetes in the Cloud

by The Container Security Blog on StackRox on February 10, 2020

Now that Kubernetes has won the container orchestration wars, all the major cloud service providers offer managed Kubernetes services for their customers. Managed Kubernetes services provide and administer the Kubernetes control plane, the set of services that would normally run on Kubernetes master nodes in a cluster created directly on virtual or physical machines. While dozens of vendors have received Certified Kubernetes status from the Cloud Native Computing Federation, which means their Kubernetes offering or implementation offers conformance to a consistent interface, details between offerings can differ. This variation can be especially true for managed Kubernetes services, which will often support different features and options for their cluster control planes and nodes.

We took a wide-ranging look at the current features and limitations of the managed Kubernetes services from the three largest cloud service providers: Amazon’s Elastic Kubernetes Service (EKS), Microsoft’s Azure Kubernetes Service (AKS), and Google Kubernetes Engine (GKE). We hope that by presenting this information side-by-side, both current Kubernetes users and prospective adopters can see their options or get an overview of the current state of managed Kubernetes.

Gartner Report – Best Practices for Running Containers and Kubernetes in Production

Download to get a deep dive on Gartner’s recommendations for ensuring your container and Kubernetes clusters are production-ready.

Download Report

General information

	Amazon EKS	Microsoft AKS	Google GKE	Kubernetes
Currently supported Kubernetes version(s)	1.14 (default) 1.13 1.12	1.17 (preview) 1.16 (preview) 1.15 1.14 (default) 1.13	1.16 (rapid channel – beta) 1.15 1.14 1.13 (default)	1.18 (alpha) 1.17 1.16 1.15
# of supported minor version releases	≥3 + 1 deprecated	3	2-3	3
Original GA release date	June 2018	June 2018	August 2015	July 2015 (Kubernetes 1.0)
CNCF Kubernetes conformance	Yes	Yes	Yes	Yes
Latest CNCF-certified version	1.14	1.14	1.14	–
Master upgrade process	User initiated; user must also manually update the system services that run on nodes (e.g., kube-proxy, coredns, AWS VPC CNI)	User initiated	Automatically upgraded during cluster maintenance window; can be user-initiated	–
Node upgrade process	Unmanaged node groups: all user-initiated and managed Managed node groups: User-initiated; EKS will drain and replace nodes	User initiated; AKS will drain and replace nodes	Automatically upgraded (default; can be turned off) during cluster maintenance window; can be user-initiated; GKE drains and replaces nodes	–
Container runtime	Docker	Moby Docker	Docker (default) containerd	Linux: Docker, containerd, cri-o, rktlet, any runtime that implements the Kubernetes CRI (Container Runtime Interface) Windows: Docker EE-basic 18.09
Master/control plane high availability options	Control plane is deployed across multiple Availability Zones	Azure docs do not state redundancy measures for control plane	Zonal/multi-zonal clusters: single control plane Regional clusters: control plane replicas in multiple zones	Supported
Master (control plane) SLA	99.9%	9.5%	Zonal clusters: 99.5% Regional clusters: 99.95%	–
SLA financially-backed	Yes	No	No	–
Pricing	$0.10/hour (USD) per cluster + standard costs of EC2 instances and other resources	Standard costs of node VMs and other resources	Standard costs of GCE machines and other resources	–
GPU support	Yes (NVIDIA); user must install device plugin in cluster	Yes (NVIDIA); user must install device plugin in cluster	Yes (NVIDIA); user must install device plugin in cluster	Supported with device plugins
Master component log collection	Optional, off by default; Sent to AWS CloudWatch	Optional, off by default; Sent to Azure Monitor	Optional, on by default; Sent to Stackdriver	–
Container performance metrics	Optional, off by default; Sent to AWS CloudWatch Container Insights	Optional, off by default; Sent to Azure Monitor (preview)	Optional, on by default; Sent to Stackdriver	–
Node health monitoring	No Kubernetes-aware support; if node instance fails, the AWS autoscaling group of the node pool will replace it	None	Node auto-repair enabled by default	–

Comments

Interestingly, all three providers offer a fairly uniform set of supported Kubernetes versions, in addition to at least some degree of support for some more recent Kubernetes features, like Windows containers and GPUs.

The most glaring difference concerns the amount of actual management that the various providers do for their managed service. GKE clearly takes the lead here, offering automated upgrades for masters and nodes, in addition to detecting and fixing unhealthy nodes. Upgrades in EKS and AKS require at least some degree of manual work (or at least customer-programmed automation), especially on EKS. In addition, neither EKS nor AKS offers any specialized node health monitoring or repair. AWS customers can create custom health checks to do some degree of node health monitoring and customer-automated replacement for EKS clusters, but AKS does not offer comparable feature parity for their VMs.

Service level agreements for the Kubernetes master control plane also have some surprising differences. EKS remains the only one of the three providers to charge for their masters at $0.10/cluster/hour. That amount will make up a negligible part of the total cost for all but the smallest clusters, but it brings something the other providers do not offer: a financially-backed SLA. While the refunds for SLA penalties almost never compare to the loss of potential productivity or revenue suffered during a provider outage, offering published penalties can bring a greater degree of confidence, real or perceived, in the seriousness of the provider’s commitment to reliability and uptime.

Another interesting data point, or rather lack thereof, concerns the high-availability of the AKS master components. The Azure documents do not state explicitly whether or not AKS uses cluster control planes with built-in redundancy. That is a very curious omission, because customers with SLAs of their own for their applications hosted on AKS generally want and need to be able to confirm that the services and cloud infrastructure on which they rely have also been engineered for similar reliability.

While pods and nodes running in a Kubernetes cluster can survive outages of the control plane and its components, even short-lived interruptions can be problematic for some workloads. Depending on the affected control plane components, failed pods may not get rescheduled or clients may not be able to connect to the cluster API to perform queries or to manage resources in the cluster. If the etcd cluster that the control plane uses to store the current state of the cluster and its deployed resources fails, loses quorum (assuming it has been deployed as a highly-available cluster), or experiences severe data corruption or loss, the Kubernetes cluster may become unrecoverable.

Service Limits

Limits are per account (AWS), subscription (AKS), or project (GKE) unless otherwise noted.

Limits for which the customer can request an increase are noted with an asterisk (*).

	EKS	AKS	GKE	Kubernetes (as of v1.17)
Max clusters	100/region*	100	50/zone + 50 regional clusters	–
Max nodes per cluster	Managed node groups: 1000* (Formula: max nodes per node group * max node groups per cluster)	1000 100 (VM Availability Sets) 400 (kubenet network) 800 (VM Scale Sets)	5000 1000 if using GKE ingress controller	5000
Max nodes per node pool/group	Managed node groups: 100*	100	1000	–
Max node pools/groups per cluster	Managed node groups: 10*	10	Not documented	–
Max pods per node	Linux: Varies by node instance type: ((# of IPs per Elastic Network Interface – 1) # of ENIs) + 2* Windows: # of IPs per ENI – 1	110 (kubenet network) 30 (Azure CNI, default) 250 (Azure CNI, max, configured at cluster creation time)	110	100 (recommended value, configurable)

Comments

While most of these limits are fairly straightforward, a couple are not.

In AKS, the absolute maximum number of nodes that a cluster can have depends on a few variables, including whether the node is in a VM State Set or Availability Set, and whether cluster networking uses kubenet or the Azure CNI. Even then, it is still not clear which number takes absolute precedence for certain configurations.

Meanwhile, in EKS, planning for the maximum number of pods that can be scheduled on a Linux) node requires some research and math. EKS clusters use the AWS VPC CNI for cluster networking. This CNI puts the pods directly on the VPC network by using ENIs (Elastic Network Interfaces), virtual network devices that can be attached to EC2 instances. Different EC2 instance types support both different numbers of ENIs and a different number of IP addresses (one is needed per pod) per ENI.

Therefore, to determine how many pods a particular EC2 instance type can run in an EKS cluster, you would get the values from this table and plug them into this formula: ((# of IPs per Elastic Network Interface – 1) * # of ENIs) + 2

A c5.12xlarge EC2 instance, which can support 8 ENIs with 30 IPv4 addresses each, can therefore accommodate up to ((30 – 1) * 8) + 2 = 234 pods. Note that very large nodes with the maximum number of scheduled pods will eat up the /16 IPv4 CIDR block of the cluster’s VPC very quickly.

Pod limits for Windows nodes are easier to compute, but also much more limited in the number of pods supported in EKS. Here, use the formula # of IP addresses per ENI – 1. The same c5.12xlarge instance that could run as many as 234 pods as a Linux node could only run 29 pods as a Windows node.

Networking + Security

	EKS	AKS	GKE	Kubernetes
Network plugin/CNI	AWS VPC CNI	Option between kubenet or Azure CNI	kubenet	kubenet (default; CNIs can added)
Kubernetes RBAC	Required	Enabled by default Must be enabled at cluster creation time	Enabled by default Can be enabled at any time	–
Kubernetes Network Policy	Not enabled by default Calico can be manually installed at any time	Not enabled by default Must be enabled at cluster creation time kubenet: Calico Azure CNI: Calico or Azure Policy	Not enabled by default Can be enabled at any time Calico	Not enabled by default CNI implementing Network Policy API can be installed manually
PodSecurityPolicy support	PSP controller installed in all clusters with permissive default policy (v1.13+)	PSP can be installed at any time (preview)	PSP can be installed at any time	PSP admission controller needs to be enabled as kube-apiserver flag
Private or public IP address for cluster Kubernetes API	Public by default Private-only address optional	Public by default Private-only address optional (preview)	Public by default Private-only address optional	–
Public IP addresses for nodes	Unmanaged node groups: Optional Managed node groups: Required	No	No	–
Pod-to-pod traffic encrypted by cloud	No	No	No	No
Firewall for cluster Kubernetes API	CIDR whitelist option	CIDR whitelist option	CIDR whitelist option	–
Read-only root filesystem on node	Not supported	Not supported	COS: yes Ubuntu: no Windows: no	Supported

Comments

Networking and security features and configurations of Kubernetes clusters often intertwine closely, so we’re putting them together. That does give this table a lot of information to unpack, though.

Overall, EKS does the best job of making core Kubernetes security controls standard in every cluster. Conversely, AKS makes it harder to manage security by requiring RBAC and network policies be enabled at cluster creation time.

All three providers now deploy with Kubernetes RBAC enabled by default, a big win in the security column. EKS even makes RBAC mandatory, as it does for Pod Security Policy support, which, while supported by the other two providers, requires opt-in for each cluster.

On the other hand, none of the providers currently enables Network Policy support by default. EKS requires the customer to install and manage upgrades for the Calico CNI themselves. AKS provides two options for Network Policy support, depending on the cluster network type, but only allows enabling support at cluster creation time.

All three clouds now offer a few options for limiting network access to the Kubernetes API endpoint of a cluster. Even with Kubernetes RBAC and a secure authentication method enabled for a cluster, leaving the API server open to the world still leaves it unprotected from undiscovered or future vulnerabilities, seen most recently in the now-patched Billion Laughs vulnerability, which created the potential of denial-of-service attacks by unauthenticated users. Applying a CIDR whitelist or giving the API a private, internal IP address rather than a public address also protects against scenarios such as compromised cluster credentials.

EKS introduced managed node groups at re:Invent last December. While managed node groups remove a fair bit of the previous work required to create and maintain an EKS cluster, they do come with a distinct disadvantage for node network security: all nodes in a managed node group must have a public IP address and must be able to send traffic out of the VPC. Effectively restricting egress traffic from the nodes becomes more difficult. While external access to these public addresses can be protected with proper security group rules and network ACLs, they still pose a serious risk if the customer incorrectly configures or does not restrict the network controls of a cluster’s VPC. This risk can be mitigated somewhat by only placing the nodes on private subnets.

Container Image Services

	EKS	AKS	GKE
Image repository service	ECR (Elastic Container Registry)	ACR (Azure Container Registry)	GCR (Google Container Registry)
Supported formats	Docker Image Manifest V2, Schema 1 Docker Image Manifest V2, Schema 2 Open Container Initiative (OCI) Specifications	Docker Image Manifest V2, Schema 1 Docker Image Manifest V2, Schema 2 Open Container Initiative (OCI) Specifications	Docker Image Manifest V2, Schema 1 Docker Image Manifest V2, Schema 2 Open Container Initiative (OCI) Specifications
Access security	Permissions managed by AWS IAM Permissions can be applied at repository level Network endpoint is public by default Network endpoint can be limited to specific VPCs	Permissions managed by Azure RBAC Can be applied at repository level (preview) Network endpoint is public by default Network endpoint can be limited to specific VNets (preview)	Permissions managed by GCP IAM Permissions can only be applied at registry level Network endpoint is public by default Network access for GCR registries can be limited to specific VPCs with service perimeters.
Storage	S3 buckets	Not documented	GCS buckets
Storage security	Bucket access can be secured with S3 or AWS IAM permissions Bucket network endpoint access can be limited to specific VPCs	Not documented (may not be applicable)	Bucket access can be secured through GCP IAM, but some permissions apply globally to all buckets in the project ACLs can be used to manage permissions for individual bucket objects. (GCR does not honor GCS object ACLs, only the bucket permissions.) Network access can be limited using service perimeters
Supports image signing	No	Yes	No
Supports immutable image tags	Yes	No	No
Image scanning service	Yes; OS packages only	Yes (preview); not clear if runtime application language libraries are covered	Yes; OS packages only
Registry SLA	99.9%; financially-backed	99.9%; financially-backed	None

Comments

All three cloud providers offer similar container image registry services. The addition of ACR’s support for image signing allows users to establish and confirm the authenticity and integrity of their images. ECR’s support for immutable image tags helps its users trust that using the same image:tag will result in deployment of the same build of a container image every time.

All three registry services also allow some degree of access control. However, the degree of control varies. ECR and ACR both support scoping access controls to the repository level, which GCR does not support. In addition, because access control to a GCR registry depends directly on the permissions for the Google Cloud Storage bucket backing that registry, limiting access to the storage bucket by using a service perimeter can break access to GCS buckets in other GCP projects, among other side effects.

Notes on Data and Sources

The information in this post should be considered a snapshot of these Kubernetes services at the time of publication. Supported Kubernetes versions in particular will change regularly. Features currently in preview (Azure terminology) or beta (GCP terminology) at this time are marked as such and may change before becoming generally available. AWS has no relevant preview features listed in their EKS documentation at this time.

All data in the tables comes from the official provider online documentation (kubernetes.io in the case of open-source Kubernetes), supplemented in some cases by inspection of running clusters and service API queries. (Cloud-Native Computing Federation conformance data is an exception.) Some of this information, particularly for supported Kubernetes versions, may be specific to regions in the US; availability may vary in other regions. Values for open-source Kubernetes are omitted where they are either specific to a managed service or depend on how and where a self-managed cluster is deployed.

We also do not attempt to make comparisons of pricing in most cases. Pricing of resources, even for a single provider, can vary wildly between regions, and even if we came up with a standard sample cluster size and workload, the ratios of the costs may not be proportional for a different configuration. In particular, note that some optional features like logging, private network endpoints for services, and container image scanning may incur additional costs in some clouds.

We also do not address performance differences between providers. A lot of variables come into play for benchmarking. If you need accurate numbers, running your own tests to compare the multiple compute, storage, and network options of each provider, in addition to testing with your application stack, would provide the most accurate data for your needs.

All attempts have been made to ensure the completeness and accuracy of this information. However, errors or omissions may exist, due to unclear or missing provider documentation, or due to errors on our part.