thumbnail

To do or not to do, a dilemma between Cloud and On-Premise in corporations

Cloud computing and storage have become an integral part of an IT infrastructure. It is highly scalable and you only pay for what you use. So should everyone use cloud?

A bottom-up perspective from a former IT consultant

Only a few years ago, cloud did not have another meaning besides "a visible mass of condensed water vapor floating in the atmosphere, typically high above the ground"

Nowadays, when you hear the word "cloud", it leads to a whole new world. Cloud computing and storage have become an integral part of an IT architecture. It is highly scalable and you only pay for what you use. In fact, various researches including Gartner show that many businesses have adopted the cloud technology full-scale or at least in a hybrid form.

Surprisingly, in my own experience and according to Mckinsey Studies, it is the large enterprise environment that shows the slowest adoption of cloud. Corporates, especially those with revenues in billions, have their own unique problems and consideration criteria that many individuals and small organizations rarely encounter.

To incorporate cloud computing to the existing architecture, you actually have to move a mountain!

The core value of an enterprise is solely driven by generating revenue and minimizing cost. However, adopting cloud computing and storage can actually go against their core value.

Large corporations at one point in the last 20 years must have adopted the latest on-premise technology. Usually, on-premise Oracle, SAP or IBM infrastructure costs to a couple million dollars. Enterprises pay for the user licenses upfront every year. On top of that, they have to hire internal/external DBA and implementation consultants to continuously make changes to their systems.

In short, enterprise invested in infrastructure, in recurring licenses and trained internal employees (IT managers & internal developers) to their particular system. It is a big investment of resources - money and time.

Many corporates have grown by M&A. M&A in IT means combining infrastructures. Merging two IT infrastructures is a difficult project, and in some cases, they would rather leave the applications separate, knowing that merging is like opening a can of worms.

Enterprises are very conservative about security. Like really.

Enterprise are skeptical about cloud security. It is not about how cloud computing handles authorization and authentication, but rather the concept that their data is stored somewhere outside and managed by someone else.

Does the physical location of your data have any effect on security? The answer is no unless regulations say otherwise. Whether it is in AWS data centers or in the server room at the basement of the corporate building, the access is The last 3 posts explained how to create a credit model, build an API for the model using plumber and scale it up using AWS and Docker:

These posts demonstrate machine learning models can be easily delivered as a service in the form of an API. Nonetheless, the approach suggested in the three posts can be daunting for data scientists and companies which are about to start predictive modeling initiatives for the following reasons:

  • Lack of knowledge: A data scientist might know many things about machine learning and data science but not necessarily possess knowledge on cloud or container technology (dev ops engineers or data engineers are likely more familiar with these subjects)
  • Lack of interest: a data scientist might be only interested in finding insights, making informative graphs and creating predictive models from data, but not necessarily in making a service out of models or maintaining the services
  • Lack of resources: a data scientist or a company which made APIs for predictive models might find they need human beings to handle any sort of emergency situation (two popular examples: instance failures in AWS and difficulty to scale up a relational database), and realize that they do not have enough human resources to do so. In the end, a data scientist has to sleep; how can she ensure that her machine learning model as a service is up and running without any issue while she sleeps?

This post shows how to conveniently create an API for the credit model used throughout the 3 blog posts above in 5 minutes - what's going to be as rewarding as the convenience is that once you have an API running, you don't have to worry about all the technical details from maintaining the infrastructure to creating a documentation for the API.

5-minute instructions

Step 1. Sign-up or sign-in

Time: 1 minute

If you have not signed up, go to our sign up page to sign up. The page only asks you a username, your email address and a password. You will receive a verification email to the email address you provided.

Or, if you already have signed up, please go to our sign in page to sign in.

Step 2. Prepare a wrapper for the credit model

Time: 2 minutes

Write a small wrapper around the credit model built. Name it knowledge.R. The script should have a function named run. An example file is provided below. This file along with GermanCreditDecisionTree.RData which we created in our first blog post can be downloaded here.

run <- function(data) {
  load("GermanCreditDecisionTree.RData")
  
  model.result <- predict(german.credit.decision.tree, data)
  
  return(
    list(
      predicted.default.probability=round(model.result[1,2], digits=4)
    )
  )
}

Step 3. Upload the wrapper and saved credit model

Time: 1 minute

Once signed in, click the "Models" button on the left menu.

Click the "Deploy A Model" button on the right bottom.

You will see a form like below. Give it a name like "Credit Model API Demo," choose "R 3.2" for the language type and select the two files (knowledge.py and GermanCreditDecisionTree.RData) from your local drive. If you have not downloaded or created these files yet, they can be downloaded here: download credit model example files. If you are not familiar with how the file GermanCreditDecisionTree.RDatawas generated, refer to the first blog post.

You will see that the API is getting built in the backend. Its status will change from "building" to "docking" and "ready." Building is a step where all relevant files are gathered together into a docker container, docking is moving that container to an AWS instance and ready is when all these previous processes are complete. The whole process should't take more than 30 seconds. Let us click the model's title to have a closer look.

Step 3. Check it works

Time: 1 minute

First, let us check whether the API is "ready." Again it should be ready within seconds because the API itself does not involve any sort of heavy libraries. Typically it is downloading packages that take the most time and if your model requires downloading many and heavy libraries, this process can take a while.

Once it is ready, click the "Run" button on the bottom right.

Then, provide the input JSON as below. How these values came up should be straight forward if you read the first blog post in this sequel. Basically when we created the credit model, we knew what variables to provide and we are only providing these values to the API.

{ "Credit.history": "A32", "Duration.in.month": 24, "Savings.account.bonds": "A63", "Status.of.existing.checking.account": "A11" }

Hooray! It worked and returned a predicted default probability.

Let us click the name of the run record to see more.

Once we come back to the model detail page where we made the run request, we can see that the activity graph correctly shows that we made one request in this hour.

Knowru creates an API documentation and Graphical User Interface (GUI) automatically for you as well. Knowru can also validate input to your API even before it hits the API. Let us talk about these in more detail because I promised only 5 minutes of your time :).

Now the API is up and running you can start calling it from anywhere with connections to the Internet using any programming language that supports the HTTP protocol.

Sign up to experience it yourself.

Why Knowru

Hope that you enjoyed reading this post. We hope that the value we provide through our platform could demonstrate itself while you go through the steps illustrated in the previous and this blog posts. Just to reiterate, below is a concise list of benefits our platform offers for you and your organization.

Lower cost

How long did it take for you to follow the steps in the first 3 blog posts to set up an API for a machine learning model? How long did it take this time? Also if you are responsible for data science initiatives in your organization, how many data engineers do you hire to create services for machine learning models? How many dev ops to maintain and monitor services? This platform can greatly improve their efficiency.

Auto-scale

Once models are deployed in our platform, we automatically adjust the number of containers and servers to meet the demands of your models. For business customers, we offer dev ops (monitoring and maintenance) services as well. You do not need to worry about a mid-night call asking your attention for hardware failure issues.

Auto API documentation

You do not need to write a documentation for your API - we will do for you.

Alerting

You can choose to get an email when there is an error in your email.

Reporting

The activity graph succinctly shows your volume. We are also adding many features to visualize distribution of input and output variables over time and set up alarms based on these distributions.

Access Control Management

In our business version, you can choose who can read, execute, edit and delete your API for granular access control.

Related blog posts

These posts demonstrate machine learning models can be easily delivered as a service in the form of an API. Nonetheless, the approach suggested in the three posts can be daunting for data scientists and companies which are about to start predictive modeling initiatives for the following reasons:

  • Lack of knowledge: A data scientist might know many things about machine learning and data science but not necessarily possess knowledge on cloud or container technology (dev ops engineers or data engineers are likely more familiar with these subjects)
  • Lack of interest: a data scientist might be only interested in finding insights, making informative graphs and creating predictive models from data, but not necessarily in making a service out of models or maintaining the services
  • Lack of resources: a data scientist or a company which made APIs for predictive models might find they need human beings to handle any sort of emergency situation (two popular examples: instance failures in AWS and difficulty to scale up a relational database), and realize that they do not have enough human resources to do so. In the end, a data scientist has to sleep; how can she ensure that her machine learning model as a service is up and running without any issue while she sleeps?

This post shows how to conveniently create an API for the credit model used throughout the 3 blog posts above in 5 minutes - what's going to be as rewarding as the convenience is that once you have an API running, you don't have to worry about all the technical details from maintaining the infrastructure to creating a documentation for the API.

5-minute instructions

Step 1. Sign-up or sign-in

Time: 1 minute

If you have not signed up, go to our sign up page to sign up. The page only asks you a username, your email address and a password. You will receive a verification email to the email address you provided.

Or, if you already have signed up, please go to our sign in page to sign in.

Step 2. Prepare a wrapper for the credit model

Time: 2 minutes

Write a small wrapper around the credit model built. Name it knowledge.R. The script should have a function named run. An example file is provided below. This file along with GermanCreditDecisionTree.RData which we created in our first blog post can be downloaded here.

run <- function(data) {
  load("GermanCreditDecisionTree.RData")
  
  model.result <- predict(german.credit.decision.tree, data)
  
  return(
    list(
      predicted.default.probability=round(model.result[1,2], digits=4)
    )
  )
}

Step 3. Upload the wrapper and saved credit model

Time: 1 minute

Once signed in, click the "Models" button on the left menu.

Click the "Deploy A Model" button on the right bottom.

You will see a form like below. Give it a name like "Credit Model API Demo," choose "R 3.2" for the language type and select the two files (knowledge.py and GermanCreditDecisionTree.RData) from your local drive. If you have not downloaded or created these files yet, they can be downloaded here: download credit model example files. If you are not familiar with how the file GermanCreditDecisionTree.RDatawas generated, refer to the first blog post.

You will see that the API is getting built in the backend. Its status will change from "building" to "docking" and "ready." Building is a step where all relevant files are gathered together into a docker container, docking is moving that container to an AWS instance and ready is when all these previous processes are complete. The whole process should't take more than 30 seconds. Let us click the model's title to have a closer look.

Step 3. Check it works

Time: 1 minute

First, let us check whether the API is "ready." Again it should be ready within seconds because the API itself does not involve any sort of heavy libraries. Typically it is downloading packages that take the most time and if your model requires downloading many and heavy libraries, this process can take a while.

Once it is ready, click the "Run" button on the bottom right.

Then, provide the input JSON as below. How these values came up should be straight forward if you read the first blog post in this sequel. Basically when we created the credit model, we knew what variables to provide and we are only providing these values to the API.

{ "Credit.history": "A32", "Duration.in.month": 24, "Savings.account.bonds": "A63", "Status.of.existing.checking.account": "A11" }

Hooray! It worked and returned a predicted default probability.

Let us click the name of the run record to see more.

Once we come back to the model detail page where we made the run request, we can see that the activity graph correctly shows that we made one request in this hour.

Knowru creates an API documentation and Graphical User Interface (GUI) automatically for you as well. Knowru can also validate input to your API even before it hits the API. Let us talk about these in more detail because I promised only 5 minutes of your time :).

Now the API is up and running you can start calling it from anywhere with connections to the Internet using any programming language that supports the HTTP protocol.

Sign up to experience it yourself.

Why Knowru

Hope that you enjoyed reading this post. We hope that the value we provide through our platform could demonstrate itself while you go through the steps illustrated in the previous and this blog posts. Just to reiterate, below is a concise list of benefits our platform offers for you and your organization.

Lower cost

How long did it take for you to follow the steps in the first 3 blog posts to set up an API for a machine learning model? How long did it take this time? Also if you are responsible for data science initiatives in your organization, how many data engineers do you hire to create services for machine learning models? How many dev ops to maintain and monitor services? This platform can greatly improve their efficiency.

Auto-scale

Once models are deployed in our platform, we automatically adjust the number of containers and servers to meet the demands of your models. For business customers, we offer dev ops (monitoring and maintenance) services as well. You do not need to worry about a mid-night call asking your attention for hardware failure issues.

Auto API documentation

You do not need to write a documentation for your API - we will do for you.

Alerting

You can choose to get an email when there is an error in your email.

Reporting

The activity graph succinctly shows your volume. We are also adding many features to visualize distribution of input and output variables over time and set up alarms based on these distributions.

Access Control Management

In our business version, you can choose who can read, execute, edit and delete your API for granular access control.

Related blog posts

I have seen corporations mandate employees to go through security screening (just like in the airports) and cover the cameras & mics with stickers that change color if you remove and reapply them. This simply shows the extent of importance corporations put on their database and information. There is the private cloud option, but it costs more.

Looking at the two points above, it seems that corporations might as well not switch to cloud. Nonetheless, we must not overlook one of the greatest advantages of cloud. Among the many advantages of cloud, I want to point out one crucial factor that stands out.

At the end of the day, companies want more efficiency - faster database retrieval time and server processing time. Database grows at an exponential rate which leads to poor performance. To alleviate the problem, many corporate application developers focus on optimizing the applications by restructuring or tuning the DBs.

But what is the point of tuning a database (ex: increase buffer cache hit rate) if the company uses an outdated hardware? According to the report from Oracle CEO, Mark Hurd, "IT expenses is down while legacy systems age" . Hurd stated that the current on-premises systems on average are 20 years old.

20 years! That is years behind. This number clearly indicates two things.

  • First, enterprises do not want to spend money on IT hardware unless it is absolutely necessary.
  • Second, since the hardware is outdated, applications that utilize cutting edge features (or even bleeding edge) will not be able to perform as designed.

As mentioned above, corporates spend millions of dollars on application maintenance, DBAs, external consultants, and IT managers. If they are going to let their infrastructure age which will ultimately lead to performance degradation, they might be better off to take advantage of cloud.

Tags

Comments