AWS Batch Job to run your work load in backend (either manual or automated)
AWS Batch is used for run your job either on on-demand or set this automated to run on certain time frequency or on some event.
Most common use case is, like you need to process some data, like import export. Generally we have the scenario where we need to read file from S3 and then process those record and insert back to database.
So now we can say, we have Lambda function as well which we can use to achieve these requirements.
Yes, we have lambda function but there are some key points with lambda which we never forget when we decide to use, those are:
- Execution timeout of the Lambda function. The Lambda function’s hard limit is 15 minutes. If the processing we want to implement will take longer, we should think about another solution.
- Amount of memory that we can allocate. We can use a maximum of 10240 MB. This number may be too small in some scenarios.
So by considering above situation, if you have some work which you think might take more than 15 minutes or memory utilization and in most of backend batch process job run more than 15 minutes, so in that case we should go with AWS batch.
Now, some quick understanding on AWS Batch available options on AWS console tab:
- Dashboard: Monitor all the jobs process and its status.
- Jobs: submit job to run, so where you select your job definition and it start new job instance to run.
- Job definitions: define your job here, so your docker file will be setup where and the role require to run and do all job tasks as defined with docker file.
- Job queues : setup job queue configuration like how multiple jobs will be processed according to priority you setup. While create this, map job queue to Compute environment you created, so that queue job will process with that environment to run the job.
- Compute environments : setup environment which will run your job like EC2 instance or fragment.
Keep note, whenever you go for setup, you have to go from bottom to up i.e. create first "Compute environment" > then "Job Queues" > then "Job Definition" > then "Jobs".
Other note I wanted to share, AWS Batch where you will run your code just like lambda function but your code will run through docker file. Let see how in demo.
Now, its time for demo:
In this demo, I created file(.py) python code and whenever we will run this code it will just insert 5 record into DynamoDB and we will run this code through batch job.
Python code sample:
Created docker file and provided command to run this python code file.
Just to note, so that you do not face any docker command execution error, both of my above file inside same folder name as "aws-batch-job-demo1" on my system:
Now, open CMD to build docker file and push into docker hub.
Here, I make sure, you have docker installed and running on your system, so that you can run docket command and also you should have docker hub account with one repository, so that we can push build docker image form local to docker hub.
You can download window docker to install on your system from here : https://docs.docker.com/docker-for-windows/install/
Docker hub url: https://hub.docker.com/
Use command "cd F:\POCs" to move into docker file location.
Run "docker build -t aws-batch-job-demo1 ." command to build docker file. Here "aws-batch-job-demo1" is my docker image name, I am creating.
Run "docker images" command to see all the build images available on your local docker.
Run "docker login --username <your-docker-hub-username-for-login>" command to login into docker hub, once you enter it will ask you to provide password.
Run "docker tag 1db2ea5632f6 sarajeevraj/aws-batch-job-demo1:latest" command to create tag for your docker image. Here yellow bold highlighted one is the docker image id which you can pick from the "docker images" command output and here "sarajeevraj" is my docker hub repository and "aws-batch-job-demo1:latest" is the docker image with ":latest" is tag name.
Run "docker push sarajeevraj/aws-batch-job-demo1" command to push image to docker hub. And on successful push, your image will be available to your docker hub repository, refer below screenshot:
Now, login to your AWS console and go to AWS batch service and get started:
Create "Computer environment", this is where I will setup my instance to run my job. I did setup EC2 instance with default VPC and gateway for this demo:
While you create this compute, it either create self new IAM role or you can create self and then select here, so mainly this role by default created with having all ECS access permission but probably you neeither create self new IAM role or you can create self and then select here, so mainly this role by default created with having all ECS access permission but probably you need to add other permissions as well as per your job code criteria, like in my case I was wring data to DynamoDB, so this instance was require to have dynamo db put item permission with that role.
Now create "Job queues" and provide the priority of the job and also map the compute instance you created above:
Now create "Job definition" and here mainly you have specifying your docker file and command for your docker file which will run your job start. Here with container image property, I am providing my docker hub image i.e. "my-docker-hub-repository-name/my-docker-image". And with command I am providing the same command which I have with my docker file, so here it is not mandatory but at least you have to provide the language source text i.e. "python".
While you create job definition, it will ask for IAM role and that role must be "ECS" task type and then make sure you have all other required permissions which need for that job to perform like that role must have cloud watch log write permission and also dynamo db put item permission, so attach all of them with policy and use that role here.
Now, create jobs, so basically here I am submitting/manually-triggering my job to run. So while creating, select your job definition and job queue you create above:
Now, you can go to dashboard and from there track the progress, as per status count digit, you can click on that and will redirect to that particular page to see the detail, for example in case of fail, you can click on that fail count and it will redirect you to job page where you can see how the status resign along with cloud watch log data as well:
After job success, you can see data got inserted into my DynamoDB:
Note: Here I had first created my dynamo db table "AWSBatchJobTestTable" with Id field in advance before this job run, so this job just inserted record to existing table:
Categories/Tags: AWS Batch~AWS Batch Job