About The Client
Agentz works in ML (Machine learning) and AI (Artificial intelligence) space. They provide artificial intelligence-powered voice & text enabled trained digital assistants (bots).
The Business Challenge
Agentz was already on cloud, however, they had to provision different environments for different customers manually. As a matter of fact, manual provisioning and maintenance of the infrastructure was time-consuming and filled with the risk of human errors.
As the client is working in the AI and ML space, they process large amounts of data to fulfill their needs. Against their ongoing expensive infrastructure, they wanted a cost-effective, fault tolerant high computing power and high performing databases. In addition, Agentz also wanted to provision their on-going cloud environment by automating processes on demand.
How CloudHedge Helped?
CloudHedge’s team of developers and certified solution architects charted a plan based on Agentz requirements. The plan for automated cloud infrastructure management was segregated in the following steps:
As a first step, we analyzed the customer’s infrastructure and suggested a few changes related to security and stability of the environment. To scale up further, CloudHedge proposed the use of AWS which could ensure full utilization of costs invested in the infrastructure.
In the second step, we wrote automation to provision and deletion of the environment on demand. That being said, the environment included multiple infrastructure components along with multiple AWS services like AWS Cognito and AWS MQ. To run automation jobs, it needed more than 85 input values from the user.
In the third step, we wrote another layer of the automation which accepts a single config file for the environment. Automation validates the file before provisioning anything and if some inputs are missing, automation will either try to pick the values from a predefined set or from AWS through querying. To process and save time, it automates possible services in parallel to reduce provision time.
In the fourth step, we added monitoring around infrastructure which was also provisioned through automation. To simplify the monitoring and provide crisp analytics of the infrastructure, we implemented AWS CloudWatch. VPN endpoints were created by the team to reduce the overall cost of the infrastructure.
The team also used a combination of various tools to push automation aspect in the infrastructure, these included: Terraform, CloudFormation, Python, and Groovy.
The environment architected by CloudHedge includes following components/AWS services:
- Networking (VPC, Subnets, Internet gateways, NAT gateways)
- Security groups.
- Database (RDS- MySQL)
- Amazon MQ
- Cognito
- ElastiCache
- ElasticSearch
- Kafka server (EC2)
- Spring-boot servers (EC2)
- Model servers (EC2 in autoscaling)
- ECS
- S3 Buckets
- Route53
- CloudWatch and SNS
Benefits
- The automation helped Agentz to provision infrastructure on demand without risk of human errors which eventually helped in saving manpower.
- Automation deployed services in parallel wherever possible, it reduced time to provision infrastructure.
- AWS CloudWatch monitors the environment efficiently and ensures the system is running error free.
- Since everything was automated, the engagement brought about major cost savings as the payments were made only for the actual consumption/usage.