How to scale up credit model APIs using AWS

The last post explained how to improve performance for a credit model built in R. Specifically, it showed that by using multiple Docker containers the credit model could cut down latency by two-thirds. This post will further explore ways to build a scalable and fault-tolerant infrastructure for APIs of machine learning models using Amazon Web Service (AWS).

Our last post talked about scaling up APIs for R credit models using Docker. What's special about the example API was that it was running using Plumber, a native R package. That was a huge win because the R model did not have to be translated to some other language. Another big advantage was scalability - with Docker, the API could handle concurrent requests much faster (improved average response time from 0.22 seconds to 0.08 seconds).

This post will further scale the R credit model using AWS (Amazon Web Service). AWS has multiple regions where we can host our credit model. Because we are based in Chicago, we will use their Ohio region, which is the closest region from Chicago. As our baseline, we will first measure speed when using one server having one container. Then, we will increase the number of containers in the server to five. Lastly, we will increase the number of servers to five, each of which runs five containers (total 25 Docker containers woo-hoo!).

1. Baseline: One Server, One Docker Container

Let us first create an EC2 instance (a server in AWS cloud) in the Ohio region.

$ docker-machine create \  
     --driver amazonec2 \  
     --amazonec2-access-key xxx \  
     --amazonec2-secret-key xxx \  
     --amazonec2-region us-east-2 \  
     --amazonec2-instance-type t2.small \  

Docker-Machine Create

docker-machine create

Then, let us deploy our credit model to the EC2 instance by following the similar steps in the last post (that is, docker build and docker run).

$ eval "$(docker-machine env PlumberAppServer1)" && docker build -t knowru/plumber_example

Docker Build

docker build in a remote host

$ eval "$(docker-machine env PlumberAppServer1)" && docker run -p 8000:8000 -d knowru/plumber_example

Docker Run

docker run in a remote host

Let us see it works.

$ curl --data "@data.json"  
# {"default.probability":0.3058}

Nice. Lastly, let us check the performance of our credit model in AWS.

$ siege -H 'Content-Type:application/json' " POST < data.json" -b -c 10 -r 100

We see the following performance.

One Server, One Container Performance

One Server, One Container Performance

So our baseline is 0.27 seconds.

2. Experiment 2: One Server, Five Docker Containers

Let us have 4 more containers.

$ eval "$(docker-machine env PlumberAppServer1)" && docker run -p 8001:8000 -d knowru/plumber_example  
$ eval "$(docker-machine env PlumberAppServer1)" && docker run -p 8002:8000 -d knowru/plumber_example   
$ eval "$(docker-machine env PlumberAppServer1)" && docker run -p 8003:8000 -d knowru/plumber_example   
$ eval "$(docker-machine env PlumberAppServer1)" && docker run -p 8004:8000 -d knowru/plumber_example

Let us use nginx to load-balance (by the way nginx is pronounced as "engine X").

$ eval "$(docker-machine env PlumberAppServer1)" && git clone
$ eval "$(docker-machine env PlumberAppServer1)" && docker run -v /home/ubuntu/plumber_example:/etc/nginx/conf.d:ro -d -p 80:80 nginx

Now we have 5 Docker containers running in one EC2 instance. Let us check the performance of our current deployment.

siege -H 'Content-Type:application/json' " POST < data.json" -b -c 10 -r 100

One Server, Five Container Performance

siegeOne Server, Five Container Performance

Yeah! We are seeing 33.3% improvement (the average response time from 0.27 secs to 0.18 secs).

3. Experiment 2: Five Servers, Five Docker Containers

Lastly, let us create 4 more EC2 instances (i.e. servers) that have the same configuration. We can easily create EC2 instances of a same configuration using AMI.

Create AMI - Step 1
Create AMI - Step 2
Create AMI - Step 3

Creating an AMI (please refer the AWS documentation for details)

Wait for a few minutes here so that the AMI is ready. Then let us spawn 4 more servers. Because these 4 servers are based on the AMI of the server we used above, it will have our credit model Docker image available.

$ for i in {2..5}  
    docker-machine create \  
         --driver amazonec2 \  
         --amazonec2-access-key xxxx \  
         --amazonec2-secret-key xxxx \  
         --amazonec2-region us-east-2 \  
         --amazonec2-instance-type t2.small \  
         --amazonec2-ami ami-c20420a7 \  
$ for i in {1..5}  
    eval "$(docker-machine env PlumberAppServer$i)" && docker start $(docker ps -a -q)  

Now we have 5 servers running the credit model but they are all independently running on different IP addresses. Let us create a load balancer (which is called Elastic Load Balancer (in short ELB) in AWS) in front of them so that we make a request to the load balancer, which will then distribute the request to one of the five servers.

Creating ELB

For more information on how to create an ELB, please refer their AWS documentation. We used their new Application Load Balancer in this experiment.

Great. Now we can make a request to the ELB instead of to an EC2 instance to see if our deployment works correctly.

$ curl --data "@data.json"  
# {"default.probability":0.3058}

We can also check using the AWS console that all servers are healthy (i.e. ready to take requests).

Statuses of EC2 instances in the ELB

All instances are ready to predict default likelihood!

Now let us enjoy seeing how much of performance improvement we have gained.

$ siege -H 'Content-Type:application/json' " POST < data.json" -b -c 10 -r 100

Five Server, Five Container Performance

Five Server, Five Container Performance

Hooray! From the previous deployment (one server, five containers), the average response time has improved 16.7 % (from 0.18 seconds to 0.15 seconds)

4. Conclusion

Let us tabulate our observations.

  • Number of Concurrent Users: 10
  • Number of Requests Per User: 100
  • Total Number of Requests for Each Experiment: 1000
Number of Servers Number of Docker Containers Per Server Total Number of Docker Containers Average Response Time (sec)
1 1 1 0.27
1 5 5 0.18
5 5 25 0.15

Effectively, we have 25 docker containers running the credit model in our last deployment, which is quite amazing considering R is a single-thread language. First of all, there was no translation and the R model ran in an R environment. Furthermore, without any additional coding, we enabled the credit model to run in a parallel, scalable fashion. The outcome was faster and (though not specifically mentioned here) more fault-tolerant environment for our credit model.

I hope that your journey until here is a pleasant one. Even though we make an API ourselves, still creating, scaling and maintaining one can be resource-intensive and daunting tasks for data scientists. If you are looking for a more convenient and reliable way, please find the following resources we offer: