- 28th Jan, 2021
- Farhan S.
26th Feb, 2024 | Shakirali V.
Image Source: AWS
In today's fast-paced world, machine learning models need to be deployed quickly, efficiently, and at scale. Serverless architectures offer a compelling solution, enabling you to deploy models without managing servers or infrastructure.
This blog post will guide you through leveraging AWS ECR and SageMaker to achieve serverless deployment for your machine learning models.
A paradigm where you pay only for the resources your code consumes, eliminating server management and scaling concerns.
A managed container registry for storing and managing Docker images used in your AWS deployments.
A fully managed service for building, training, and deploying machine learning models.
1. Prepare Inference Code
Ensure that your inference code generates the final results intended by your model during training.
Consider adopting a standardised format such as JSON for clarity and compatibility across different systems.
If required, manage preprocessing and postprocessing tasks within the inference code to ensure accurate and seamless operation.
2. Build Flask APIs
/ping Endpoint:
/invocations Endpoint:
3. Combine Code and APIs
While consolidating these components for simplicity is understandable, enhancing maintainability and scalability can be achieved by segregating Flask API logic from the inference code.
Consider separating them into distinct modules for improved organisation. Aim to keep both the Flask app and inference code as streamlined as possible.
The AWS Command Line Interface (CLI) is a powerful tool for managing AWS resources from the terminal. Follow these steps to set it up:
Note: Run the below commands in bash terminal.
1. Check if AWS CLI is installed:
aws cli
If the command doesn't respond, CLI is not configured.
2. Install AWS CLI:
Please follow this link to install aws cli based on your OS.
3. Set up a local user:
4. Configure AWS CLI:
aws configure
Enter the access key, secret access key, and preferred region. Skip the default output format for now.
5. Verify the configuration:
aws configure list
After successfully creating a local Docker image and container, the next step is to publish it from our local Docker repository to AWS Elastic Container Registry (ECR), facilitated through Amazon Elastic Container Service (ECS).
Wondering what ECS is? Let me explain.
ECS, or Amazon Elastic Container Service, is a comprehensive container orchestration service. It's fully managed and simplifies the deployment, management, and scalability of containerized applications.
Additionally, Elastic Container Registry (ECR) serves as the dedicated registry for Docker containers stored within ECS.
In our workflow, we leverage ECS to seamlessly push our Docker container to ECR, ensuring efficient management and deployment of our containerized applications on the AWS platform.
1. Create a repository in ECR:
aws ecr create-repository --repository-name <repo_name> --region <region_name>
Verify repository creation in the AWS console.
2. Push Docker image to ECR:
aws ecr get-login-password --region <region_name> | docker login -u AWS --password-stdin <repo_uri>
Replace <region_name>
and <repo_uri>
accordingly.
docker tag <source_image_tag> <target_ecr_repo_uri>
docker push <ecr_repo_uri>
Navigate to the sagemaker -> Inference -> Models
Creating sagemaker model will package your model artifacts docker image for deployment on sagemaker. Below is the snapshot to create a model.
Navigate to the sagemaker -> Inference -> Endpoint configuration
Create an endpoint configuration by specifying the necessary parameters related to server configuration. Here, it is crucial to indicate the desired server type – whether 'serverless' or 'provisioned'.
Additionally, you need to define the 'Memory size' and 'Max Concurrency', representing the maximum concurrent invocations for a single endpoint.
Navigate to SageMaker -> Inference -> Endpoint
Endpoint configurations in SageMaker allow you to create an endpoint. SageMaker generates an HTTPS URL, enabling you to invoke your endpoint seamlessly through client applications.
This is achieved using the existing runtime client and making an 'invoke_endpoint' request.
Once the endpoint is successfully deployed, it will automatically display the status 'inService' as it should.
All details regarding the API URL are provided in the endpoint details. You can directly test your URL through Postman requests or integrate it with your existing services.
Here are some important points to remember to ensure smooth serverless deployment without encountering issues:
In conclusion, the decision between serverless and provisioned services when deploying machine learning models with AWS SageMaker and ECR hinges on various factors, each with its own set of advantages and considerations.
Serverless deployment using AWS SageMaker is an excellent choice for scenarios where flexibility, scalability, and cost efficiency are top priorities. This approach allows for automatic scaling, enabling you to handle varying workloads without manual intervention.
Serverless architectures are particularly beneficial for applications with sporadic or unpredictable usage patterns, as you only pay for the resources consumed during execution.
When rapid development, ease of management, and the ability to offload infrastructure concerns are paramount, serverless is often the preferred option.
It's well-suited for scenarios where the development team wants to focus primarily on the machine learning model's logic and less on infrastructure provisioning and management.
On the other hand, provisioned services are better suited for situations where predictability, fine-tuned control over resources, and specific performance requirements are crucial.
Provisioned services provide the necessary stability and control when dealing with steady, predictable workloads or when low-latency performance is a priority.
If your application demands dedicated resources, and you have specific infrastructure requirements that are known and consistent, provisioned services may be the more appropriate choice.
This approach allows for the customization of instances and infrastructure settings to meet the exact needs of your machine learning workload.
In summary, the decision to go serverless or provisioned depends on the specific requirements and priorities of your machine learning deployment.
Serverless is advantageous for its flexibility and cost-effectiveness, while provisioned services offer fine-grained control and stability for workloads with well-defined characteristics.
Careful consideration of these factors will guide you toward the deployment strategy that aligns best with your project's goals.
Get insights on the latest trends in technology and industry, delivered straight to your inbox.