Deploy ML/DL Models to Production via Panini

What is Panini?

Panini is a platform that serves ML/DL models at low latency and makes the ML model deployment to production from few days to a few minute. Once deployed in Panini’s server, it will provide you with an API key to infer the model. Panini query engine is developed in C++, which provides very low latency during model inference and Kubernetes cluster is being used to store the model so, it is scalable to multiple nodes. Panini also takes care of caching and batching inputs during model inference.

We currently support frameworks in Python, but we plan to expand to other languages in the future. I’ll give a demo to upload classic transfer learning Pytorch CNN model to classify Dogs Vs. Cats. The source code is available from Make sure you have Pytorch 1.0 installed and Python 3.6.

Traditional Approach

The traditional approach is to use Flask and Gunicorn with Nginx, which requires a lot of setup time. Furthermore, inferring the model with Flask is slow and requires custom code for caching and batching. Scaling in multiple machines using Flask also causes many complications.
Most people are using Flask to expose ML models to the internet and make an API call. Which creates many complications as Flask is not a suitable platform to serve ML models. Some significant drawbacks of using Flask includes:

  1. It does not support Caching and Batching.
  2. High latency. Sometimes requires predictions to be computed beforehand.
  3. Maintenance.
  4. Hard to scale in multiple clusters.

To address these issues, we have developed Panini.

Installing Required Packages

Panini requires python 3.6 so make sure to have an environment with 3.6 installed. Also, if you’re using Pytorch, make sure you have Pytorch 1.0 installed.

Deploying classic Dogs Vs Cats

The full Source code can be retrieved from

I’ll be using transfer learning to modify the last layer of DenseNet.

When saving the model, make sure to save only the weights, not the entire model architecture. The recommended approach is to save the weights by calling state_dict(). Code to train the model is pretty straightforward and can be retrieved by opening the above link. I’ve included Jupyter Notebook and a YouTube tutorial. I’ve already pre-trained DenseNet model to classify Dogs Vs. Cats. I’m saving the weights as “last_layers.pth”

Preparing Files for Panini

You need to upload at least three files to Panini

  • requirements.txt
  • Our saved model weights (last_layers.pth)

The most important file is This file tells Panini to load your model and to make a prediction. If you have additional packages such as numpy and pandas it can be installed by specifying pip packages in a requirements.txt file. The last file we need is a saved weights model. This file could be named anything but the extension must be .pth. script needs to have two methods inside. They are load() and predict()

  • load(path): Load gets executed the first time when your model is loaded in Panini’s Kubernetes cluster. It must take an argument called path and return a reference to your model. You need to specify your model architecture here and load the weights.pth file back into your model. After your model is loaded with pre-trained weights, it needs to return a reference to your model.
  • predict(model,input_from_client): Predict gets executed every time a POST request is sent to your API link. It needs arguments of model and input_from_client. The first argument is the reference to your model that is returned by the load function. The next argument is data sent by the client via a POST request. Also, input_from_client is an array so we must use a for loop to access each image. Once we have each image, we can apply some pre-process and convert our image into a tensor. Once our image is converted into a tensor, we can feed into our model. Our model will return a log probability, which is also a list and we can return that back to our client. Make sure the value predict() is returning an array. Also, the length of the array must be equal to the length of the final layer in our model. We modified the last layer of our model above by nn.Linear(256,2). As a result, predict() needs to return an array with a length of two.

Time to Deploy

Sign in into and create an account. You can sign in, using Google account or GitHub.

Panini Dashboard

Once signed in, you’ll be presented with a dashboard and will be asked to fill out a few pieces of information. First, give your model a name. A name must be less than 10 characters or less and can only be(a-z/0–9). For framework choose, PyTorch. Currently, you have three different options to choose.

  • PyTorch: If you need to use deep learning. Currently, we support Pytorch 1.0
  • Python Function: Any custom python function. This includes SciKit-Learn and traditional machine learning such as SVM and regression.
  • Fast AI: If you’re using fast ai library.

We also need to specify, type of input our model is expecting. Since we will be sending a picture of dog/cat and will be encoded as base64 bytes, let’s choose bytes for input type. Click on browse and select your three files.

  • requirements.txt
  • last_layers.pth

Extra files are optional and not needed. Some models require additional files such as vocab to integer mapping pickle file for RNNs. In our case, we don’t have additional files so we can leave that blank.

Once three files are selected click “Deploy!” and wait for the file upload to reach 100%. Currently, there is a limit of 2GB for model size. If your model is more than 2GB, email us.

Log Output

Once, the page shows 100% you can refresh the page to see the latest log output.

Last log output should be “Done deploying your model! You’re good to go!” After this message appears, you should see an API link above the console.

How to infer using our API?

We’re going to encode a picture of an adorable cat as base64 and send to Panini for prediction.

Let’s send this cute picture to Panini for Prediction.

Now, all we need to do is send a POST request to our API URL. To send an image, we need to encode it using base64 and wrap it in JSON format. Also, the key label should be “input” and value will be base64 encoded bytes format of our image.

Prediction result from Panini

We can also use Postman

In Conclusion

We have successfully deployed our DL model into production using Panini. As traffic increase to our model, panini will automatically replicate it into several nodes. If we want to incorporate this into a web application, we can make a POST request from our javascript/NodeJs. We can focus on developing our models and not worry about DevOps. If you’re having trouble uploading your model, send us an email!

YouTube Tutorial:

Source Code:

read original article at——artificial_intelligence-5