Skip to main content

Autoscaling EKS Cluster With Karpenter Using Terraform

ยท 6 min read
Abdulmalik
DevSecOps

You are finding ways to autoscale up and down your nodes in the Kubernetes cluster, and figuring out which autoscaler is the best can be hard since there are many options, which one should you go for?

Well, I would advise going for Karpenter instead of the native Cluster Autoscaler, both projects are sponsored by the AWS team though, but Karpenter is fast when it comes to scaling up and scaling down the nodes.

Prerequisitesโ€‹

  • Running EKS Cluster Provisioned with Terraform
  • Create a Terraform file named karpenter.tf

Let's jump into it;

Provision the metric server on your EKSโ€‹

Karpenter needs the metric server because the metric server provides accurate metrics about pods in the nodes, That means you need to also set up resource limits for your deployments for better performance.

Now let's deploy the metric server with helm_release provider

karpenter.tf
resource "helm_release" "metric-server" {
name = "metric-server"
repository = "https://kubernetes-sigs.github.io/metrics-server/"
chart = "metrics-server"
version = "3.11.0"
namespace = "kube-system"
cleanup_on_fail = true
timeout = "1200"
set {
name = "apiService.create"
value = "true"
}
}

Provision Karpenter Policy, IRSA, Instance Profile and Karpenter Helm Releaseโ€‹

At the end of the day Karpenter needs access to create and spin up nodes and ec2 instances in your EKS cluster on your behalf, this is where the IAM Role Service Account for EKS comes in.

๐Ÿ‘‰ Step 1: Create the Karpenter controller policyโ€‹

karpenter.tf
resource "AWS_iam_policy" "karpenter_controller" {
name = "KarpenterController"
path = "/"
description = "Karpenter controller policy for autoscaling"
policy = <<EOF
{
"Statement": [
{
"Action": [
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:RunInstances",
"ec2:CreateTags",
"ec2:TerminateInstances",
"ec2:DeleteLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeSpotPriceHistory",
"iam:PassRole",
"ssm:GetParameter",
"pricing:GetProducts"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "Karpenter"
},
{
"Action": "ec2:TerminateInstances",
"Condition": {
"StringLike": {
"ec2:ResourceTag/Name": "*karpenter*"
}
},
"Effect": "Allow",
"Resource": "*",
"Sid": "ConditionalEC2Termination"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::777XXXX:role/KarpenterNodeRole-${module.eks.cluster_name}",
"Sid": "PassNodeIAMRole"
},
{
"Effect": "Allow",
"Action": "eks:DescribeCluster",
"Resource": "arn:aws:eks:US-EAST-2:777XXXX:cluster/${module.eks.cluster_name}",
"Sid": "eksClusterEndpointLookup"
}
],
"Version": "2012-10-17"
}
EOF
}

Replace 777XXXX with your IAM User ID and US-EAST-2 with whichever region you are using, while the cluster ${module.eks.cluster_name} value we passed in, just adds your cluster name from the EKS module, you can replace it with your cluster name straight up if you don't use the EKS module to provision your EKS cluster.

๐Ÿ‘‰ Step 2: Create Karpenter EC2 Instance Profileโ€‹

Now let's create the EC2 Instance profile which we are going to attach in the next step, and we will attach the existing EKS Node IAM Role, using module.eks.eks_managed_node_groups.regular.iam_role_name

karpenter.tf
resource "AWS_iam_instance_profile" "karpenter" {
name = "KarpenterNodeInstanceProfile"
role = module.eks.eks_managed_node_groups.regular.iam_role_name
}

๐Ÿ‘‰ Step 3: Create Karpenter IAM Role Service Account for EKSโ€‹

here you can just use the IRSA modules instead of the other raw way, which makes you move faster with fewer lines of code.

karpenter.tf
module "karpenter_irsa_role" {
source = "terraform-AWS-modules/iam/AWS//modules/iam-role-for-service-accounts-eks"
version = "5.32.1"
role_name = "karpenter_controller"

## i am attaching the policy i created in step 2 here instead of using the attach_karpenter_controller_policy = true argument

role_policy_arns = {
policy = aws_iam_policy.karpenter_controller.arn
}

karpenter_controller_cluster_id = module.eks.cluster_id
karpenter_controller_node_iam_role_arns = [module.eks.eks_managed_node_groups["regular"].iam_role_arn]

oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:karpenter"]
}
}
}

๐Ÿ‘‰ Step 4: Deploy Karpenter on EKS using Helm Releaseโ€‹

Now we can deploy Karpenter using helm_release resource

karpenter.tf
resource "helm_release" "karpenter" {
name = "karpenter"
chart = "karpenter"
repository = "oci://public.ecr.aws/karpenter"
version = "v0.33.0"
namespace = "kube-system" #refrenced the namespaced we created in step 1
cleanup_on_fail = true
set {
name = "serviceAccount.annotations.eks\\.amazonAWS\\.com/role-arn"
value = module.karpenter_irsa_role.iam_role_arn #here we refrenced the IRSA ARN created in setp 4
}

set {
name = "replicas"
value = "1"
}

set {
name = "settings.aws.clusterName"
value = module.eks.cluster_name
}

set {
name = "settings.AWS.clusterEndpoint"
value = module.eks.cluster_endpoint
}
}

Configure Karpenter Node Autoscaling using NodePools and NodeClassesโ€‹

Now you are almost done setting up Karpenter on EKS, you just have to configure and deploy NodePools and NodeClasses

NodePools?, this is the configuration you can use in declaring the type of nodes you want karpenter to create and the type of pods that can run on those nodes, the time it should take those nodes to be terminated when empty and more, you can read more about nodepools

With the below nodepools, we are simply telling Karpenter to spin up an ON DEMAND node in either m5 EC2 Instances or t3 EC2 Instances but also not in the sizes of nano, micro, small or large, so we won't see t3.large or c5.large but we can see t3.medium, c5.medium and more.

karpenter.tf
resource "kubectl_manifest" "nodepools" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: [t3, c5]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: [nano, micro, small, large]
limits:
cpu: 100
YAML
}

And here is NodeClasses, NodeClasses Dictates to the NodePools which subnets, security groups, or amiFamily that would be attached to the nodes that the NodePools will be creating, i f the nodes be attached to a private subnet or public.

If a private subnet is declared in the subnetSelector, this means the nodes will be created in the private subnet, likewise the subnet Security Group, head over to the docs on NodeClasses to read more about node classes options.

karpenter.tf
resource "kubectl_manifest" "karpenter_node_template" {
yaml_body = <<-YAML
apiVersion: karpenter.k8s.AWS/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "YOUR-CLUSTER-NAME}"
subnetSelectorTerms:
- tags:
kubernetes.io/cluster/YOUR-CLUSTER-NAME}: "owned"
role: "YOUR EKS NODE IAM ROLLE"
YAML
}

Now that we are done with the codes, you can run the Terraform plan and apply it, then create a small deployment and scale it up and down to test your just concluded Karpenter installation.

The results from my side are that, it helped us use our nodes to their maximum level before provisioning another node compared to when we didn't have karpenter, we had a lot of resources and space wasting.

t3 medium karpenter

t3.medium Node created by Karpenter used up well

t3 large karpenter

t3.large Node Created Manually Having Much Resouces Left and After Deployment Settled

t3 largee karpenter

t3.large Second Node Created Manually Having Much Resouces Left and After Deployment Settled

I hope you've learned something useful from this blog to take home for your cluster autoscaling and better deployment management using Karpenter.

Till next time โœŒ๏ธ

Referencesโ€‹


Comments