How I Bypassed Jenkins Master-Slave Architecture using Spot Instances and Terraform to push Docker images to AWS ECR.

Abhishek Yadav
9 min readApr 10, 2023

--

In this blog post, I will share my experience of bypassing Jenkins Master-Slave Architecture by using Spot Instances and Terraform to push docker images to AWS ECR. I will explain the benefits of using Spot Instances and how to configure Terraform and ECR to create a scalable and cost-effective Jenkins infrastructure.

Jenkins Master-Slave Architecture is a way of configuring a Jenkins server to handle large-scale builds or multiple concurrent jobs. In this architecture, there is one master node that manages the configuration of the entire Jenkins instance and one or more slave nodes that perform the build or run the jobs. The master node is responsible for scheduling and assigning work to the slave nodes, while the slave nodes execute the jobs and report back to the master node.

Running a large number of slave nodes can be expensive, as each node requires dedicated resources, such as CPU, memory, and storage. Additionally, scaling up the master node to handle more jobs can also incur additional costs. The communication between the master and slave nodes is typically unencrypted, which can pose security risks, especially if the Jenkins instance is used to build or deploy sensitive code. Configuring and maintaining a Jenkins Master-Slave Architecture can be complex, requiring expertise in various areas such as networking, security, and infrastructure management.

Spot Instances are a purchasing option offered by Amazon Web Services (AWS) that allows users to bid on unused EC2 instances and run their workloads at a much lower cost than on-demand or reserved instances. Spot Instances provide significant savings for non-critical workloads and can be ideal for batch jobs, testing, and other workloads that can tolerate interruptions and have flexible start and end times. This is the reason I am using spot instances to build highly CPU consuming docker images with spot instances. By using spot instances to scale capacity when docker images are built, I have managed to run Jenkins on a T3.small on-demand instance with 10 build executors to run jobs parallelly.

Prerequisites:
1 - Jenkins server.
2 - AWS CLI installed and configured.
3 - Terraform installed and configured.
4 - Helm or kubectl (Optional, If need to make deployment to k8s).

GitHub repo link: https://github.com/Abhishek-EN/jenkins-spot-integration
Clone this link to get the necessary terraform and sh files.

Step 1: We need to create an Amazon machine image (AMI) with the required packages. Create a t2.micro ubuntu EC2 instance and install run install.sh on it.

This will install Docker, git, & AWS CLI on the EC2 instance. The next step is to configure ssh keys to make Jenkins ssh into the spot and pull the code from SCM (GitHub, BitBucket, etc). You can also use AWS Key pairs to ssh into the server but ssh keys are needed to pull the code into the instance.
Generate SSH Keys with “ssh-keygen” command to create private and public keys in the .ssh directory.

$ ssh-keygen

After generating the keys, add the public key “ id_rsa.pub ”to your GitHub account keys section to pull the code through the ssh keys option. If you are unaware of how to do this you can also check here. Your git or bitbucket also needs to be in known_hosts files to work without a prompt to add.
You can skip the prompt with this command.

$ ssh-keyscan bitbucket.org >> ~/.ssh/known_hosts

Now try to pull the code from that server, if it works you are good to go. Create an AMI of the instance and get the AMI id. Create a Security group to attach to the instance with the inbound rule SSH only from your Jenkins IP.

Create a role with an ECR power user to push images to ECR repos.
Add AWS-managed policy “ AmazonEC2ContainerRegistryPowerUser ” and create a role (Remember name).

The next step is to create a Launch template with the same AMI id and security group. You can create a launch template with only 8GB of EBS volume because it will be used only once. Make sure to add subnets in which Jenkins can communicate to the instance (Same VPC). You can also choose between public or private subnets. Make sure to add tags to the launch template for the specific app because it will be used to get the IP of the instance. Eg: Name = App-name. Also, add the instance profile which you created before.

"arn:aws:iam::1234567890:instance-profile/SPOT_ECR_ROLE"

You can create a launch template with CLI.

aws ec2 create-launch-template --launch-template-name app-name --launch-template-data '{"TagSpecifications":[{"ResourceType":"instance","Tags":[{"Key":"Name","Value":"App-name"}]}],"ImageId":"ami-1234567890","InstanceType":"t3.medium","KeyName": "key.pem","NetworkInterfaces":[{"DeviceIndex":0,"Groups":["sg-1234567890"],"DeleteOnTermination":true,"SubnetId":"subnet-1234567890","AssociatePublicIpAddress":false}],"IamInstanceProfile":{"Arn":"arn:aws:iam::1234567890:instance-profile/SPOT_ECR_ROLE"}}'

Next, we need a role for Terraform to tag spot instances and manage them.
When Creating a role select “Use cases for other AWS services: EC2” for AWS service and then from the dropdown select “EC2 Spot Fleet Role”.

Then attach policy AmazonEC2SpotFleetTaggingRole.

Create a role with the name “Spot-fleet-tag-role”.

Now, Create a folder in Jenkins home directory named “terraform” and another directory named your “app-name” and copy the terraform code from this link. I have written a simple terraform code for beginners to understand and not used more variables.

main.tf

The “iam_fleet_role” parameter specifies the ARN of an IAM role that grants permissions for managing the Spot Fleet. The “allocation_strategy” parameter sets the strategy for selecting the cheapest Spot Instances. The “wait_for_fulfillment” parameter indicates whether the Terraform should wait until the Spot Fleet Request has been fulfilled before moving on to other tasks. The “terminate_instances_on_delete” parameter indicates whether the instances launched by this Spot Fleet Request should be terminated when the request is deleted.

The “launch_template_config” block specifies the launch template to use when launching instances. The “id” parameter specifies the ID of the launch template, and the “version” parameter specifies the version of the launch template to use.

variables.tf

Now, we need to set up the Jenkins job to use this config to create docker images. First, we need to install a plugin “Promoted builds” in Jenkins.
Go to Manage Jenkins → Plugin Manager → Available and search for “promoted builds” and install it.

Next, create a freestyle Jenkins job.

Scroll down at last and select “add build step” and select “execute shell”

Paste the contents of shell.txt from my git repo.

  1. cd /var/lib/jenkins/terraform/app-name/: This command changes the current working directory to the directory where the Terraform configuration for the application is stored.
  2. terraform init: This command initializes the Terraform configuration and downloads any necessary plugins or modules.
  3. terraform apply --auto-approve: This command applies the Terraform configuration and creates the necessary infrastructure resources in AWS without prompting for confirmation.
  4. sleep 5: This command causes the script to pause for 5 seconds.
  5. whoami: This command prints the name of the current user.
  6. echo "export BUILD_NUMBER=$BUILD_NUMBER" > bashrc: This command exports the value of the "BUILD_NUMBER" environment variable to a file named "bashrc" because the new spot instance doesn’t know the job’s build number to use as a tag.
  7. cat bashrc: This command displays the contents of the "bashrc" file.
  8. ip=$(aws --region ap-south-1 ec2 describe-instances --filters "Name=tag:Name,Values=app-name" "Name=instance-state-name,Values=running" --query 'Reservations[*].Instances[*].[PrivateIpAddress]' --output text): This command retrieves the private IP address of an EC2 instance in the "ap-south-1" region that has the tag "Name=app-name" and is currently running. The IP address is stored in the "ip" variable.
  9. echo $ip: This command displays the IP address of the EC2 instance.
  10. sleep 60: This command causes the script to pause for 60 seconds.
  11. scp -i /var/lib/jenkins/keys/key.pem -T -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no bashrc ubuntu@$ip:/tmp/: This command copies the "bashrc" file to the "/tmp/" directory on the EC2 instance using Secure Copy (SCP). The "-i" option specifies the path to the SSH private key (You can use ssh keys created for git), and the "-T" option disables TTY allocation. The "-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" options disable host key checking.
  12. rm -rf bashrc: This command deletes the "bashrc" file from the local machine.
  13. sleep 15: This command causes the script to pause for 15 seconds.
  14. ssh -i /var/lib/jenkins/keys/key.pem -T -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ubuntu@$ip << 'EOF': This command establishes an SSH connection to the EC2 instance using the SSH private key and runs the following commands until "EOF" is encountered:
  15. whoami: This command prints the name of the current user on the EC2 instance.
  16. cd /tmp/: This command changes the current working directory to the "/tmp/" directory on the EC2 instance.
  17. source bashrc: This command sources the "bashrc" file to set the value of the "BUILD_NUMBER" environment variable because the new spot instance doesn’t know the job’s build number to use as a tag.
  18. cd /home/ubuntu/: This command changes the current working directory to the home directory of the "ubuntu" user on the EC2 instance.
  19. docker -v: This command displays the version of Docker installed on the EC2 instance.
  20. aws --version: This command displays the version of the AWS CLI installed on the EC2 instance.
  21. git --version: This command displays the version of Git installed on the EC2 instance.
  22. sudo aws ecr get-login-password --region ap-south-1 | sudo docker login --username AWS --password-stdin 1234567890.dkr.ecr.ap-south-1.amazonaws.com: Uses the AWS CLI to retrieve an authentication token for logging into the Amazon Elastic Container Registry (ECR) and logs in to ECR using Docker.
  23. git clone -o StrictHostKeyChecking=no --branch master git@bitbucket.org:your-repo/app.git: Clones the Git repository of the application code.
  24. cd app/: Changes the current directory to the root directory of the cloned repository.
  25. sudo docker build --no-cache -t 1234567890.dkr.ecr.ap-south-1.amazonaws.com/ecr-repo-name:$BUILD_NUMBER .: Builds a Docker image(Dockerfile should be in the repo) for the application code, tags the image with the current build number, and saves the image to the ECR repository.
  26. sudo docker push 1234567890.dkr.ecr.ap-south-1.amazonaws.com/ecr-repo-name:$BUILD_NUMBER: Pushes the Docker image to the ECR repository with the help of instance profile we associated with launch template
  27. exit: Exits the SSH session.

You can also create another execute shell to upgrade helm charts with the updated image tag.

The next step is to create a promotion status to destroy the spot instance after the image is built and pushed. I am using promotion because in case the build fails, I need a way to terminate the spot instance without getting into the Jenkins server. In case the build is failed job will get promoted automatically to terminate the spot instance.

Scroll up and select “Promote builds when” to create a promotion status.
Give it the name “terraform destroy”, set visible to true, and an icon of your choice. also, select the boxes to promote the job after the build is completed, and if fails then also.

In same promotion select below “add build step” and then “execute shell” and paste the shell.

This will destroy the spot instance after the build is completed and if not also. You can also add support for poll scm to start to build when the code is pushed. I have created this job and config specifically to build docker images and upgrade helms. You can also set roll-backs and many more promotion statuses. This config makes Jenkins run on even smaller instances and does the heavy task on the spot instances to reduce cost and Jenkins remains stable. You can create multiple launch templates (Names should be different of instances so that we can retrieve IPs with CLI) and terraform folders to create the same configs for multiple apps and run tasks on spot instances. I have created a Jenkins Job DSL to create similar jobs with a click and automated further creation of launch templates, terraform codes, and helm charts to make it more seamless (I’ll share this in the next blog post).

The next step is to build these jobs and let Spot instances work hard.

You can clone the code from my GitHub.
I am open to further discussions on automation. You can follow me on LinkedIn and Medium, I talk about DevOps, Learnings & Leadership.

--

--

Abhishek Yadav

DevOps Engineer with hands on experience and skills in deployment and automation of cloud infrastructure. Worked with startup and leading organizations