Thanks to cheaper and bigger storage we have more data than what we had a couple of years back. We do owe our thanks to Big Data no matter how much hype it has created. However, the real MVP here is faster and better computing ,which made papers from the 1980s and 90s more relevant (LSTMs were actually invented in 1997)! We are finally able to leverage the true power of neural networks and deep learning thanks to better and faster CPUs and GPUs. Whether we like it or not, traditional statistical and machine learning models have severe limitations on problems with high-dimensionality, unstructured data, more complexity and large volumes of data.
Deep Learning has really started shining in these areas and we have slowly started seeing it’s adoption across the industry in several real-world problems at scale. Renowned AI legend Andrew Ng himself has mentioned the very same a couple of years back!
The good part about deep learning is, we have better compute, more data and a wide variety of easy to use open-source frameworks like
pytorch to choose from, when building solutions.
The bad part about deep learning? Setting up your own deep learning environment from scratch can be a huge pain especially if you can’t wait to start coding and implementing your own deep learning models.
Having gone through this painful process several times and also discovering easy to use services in my journey, this guide is meant to ease you into racing through the less desirable aspects of setting up your own deep learning environment, so that you can get to building your deep learning models and solving problems faster. We will be covering the following aspects in this guide.
- Minimal Configuration Cloud-based Deep Learning Environments
- Setup your own Cloud-based Deep Learning Environment
- Tips for On-premise Setup
Without any further ado, let’s get started!
Minimal Configuration Cloud-based Deep Learning Environments
If you really want to start building deep learning models without investing in dedicated hardware, or you want to skip all those pesky configuration and setup commands, there are a few options just for you! Using a pre-configured cloud-based deep learning environment is the best. Typically, there are several cloud-based service providers for deep learning. The following options enable you to start working right away, with minimal setup and configuration. Do note this is my no means a comprehensive list but options I have experimented with or heard from fellow deep learning practitioners.
We will cover the essentials for these providers so you know enough to get started with them. Besides this, we encourage you to go out and explore them in further detail and choose one based on your preference!
Perhaps one of the best and (still) free options out there from Google, which enable you to run interactive Jupyter notebooks in a GPU or even TPU backed deep learning environment. Google has been actively using and promoting its usage in various areas including it’s popular Machine Learning Crash Course. In a nutshell, Colaboratory is a free Jupyter notebook environment that requires no setup and enables you to run your deep learning models even using a GPU for no charge. More details can be found in this article.
By default you get a CPU backed deep learning environment with all the libraries pre-installed and you can verify the same using the following code.
Now, using a CPU is fine for relatively simple models with smaller workloads and data, you definitely need to leverage a GPU for more complex problems. Changing the runtime in Google Colab to use a GPU just takes mere seconds as illustrated in the following snapshot.
Google Colab then allocates a new GPU enabled deep learning backend for you and you can view the GPU type using the following code.
Looks like you get a Tesla K80 GPU with 12 GB memory for free! That is what the AWS
p2.xlarge instance gives you for a whopping 0.9$ an hour. Pretty neat! Finally you can use the following code to confirm that your deep learning libraries are using the GPU.
This should get you up and running for trying out your own deep learning models on Google Colab. Feel free to leverage my colab notebook for testing CPU and GPU enabled deep learning environments when you use Colab!
Gradient° is an offering from Paperspace, a company which focuses on Effortless infrastructure for Machine Learning and Data Science. It gives developers a complete suite of tools for exploring data, training deep learning models and running compute jobs on GPUs. Gradient° includes 1-click jupyter notebooks backed with the full power of the Paperspace GPU cloud. This introductory video talks about it in further detail. The following are some of their basic plans (and further details can be found here).
You do need to pay for GPU usage by the hour here but the rates are pretty competitive with other service providers with a Quadro P4000 costing 0.5$ an hour and the Tesla K80 costing around 0.59$ an hour which are still comparatively cheaper than similar options on AWS.
An interesting offering from FloydHub is FloydHub Workspace, which aims to cut down the hassles of setting up your own deep learning environment by providing a fully configured development environment for deep learning on the cloud. The best part? though it is not free, you can seamlessly switch from a CPU to GPU backend any only pay for what you use per second!
Their rates are also pretty good considering you can get a dedicated instance having Tesla K80 with 12 GB Memory, 61 GB RAM and 200 GB SSD for 10 hours at 12$ as the cheapest option.
Lambda Labs or Lambda is an AI infrastructure company which provides computation to accelerate human progress. They specialize in deep learning workstations and have recently launched Lambda GPU Cloud which is still in closed beta phase. Each Lambda GPU Cloud instance has 4 GPUs and is 2x faster than a p2.8xlarge instance from AWS. They claim that you can simply press a button and get immediate SSH remote access to a 4-GPU instance. Pricing is however 0.9$ per GPU per hour. You can sign up for the private beta here.
Amazon Web Services (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms based on paid subscriptions. Recently, they have launched Deep Learning AMIs which are Amazon Machine Images (AMIs) dedicated for GPU intensive workloads for building deep learning models. AWS Deep Learning AMIs provide us the necessary infrastructure and pre-configured tools and frameworks to accelerate deep learning in the cloud, at scale. It comes pre-configured with all the latest and best deep learning frameworks.
You can get the Conda AMI which has separate virtual environments for each deep learning framework or the Base AMI for configuring and using custom builds.
Virtual Environments in the Conda AMIs
Feel free to check out the Deep Learning AMI Guide here and also how to start using the Conda and Base AMIs here. Be a bit wary when you choose your own AWS instance since you are charged by the hour. The cheapest option would be a
p2.xlarge giving you a 12 GB GPU for 0.9$ an hour.
GCP also known as Google Cloud Platform provides us with a suite of cloud computing services including infrastructure for running deep learning models and workloads. The best part is that this runs on the same infrastructure that Google uses internally for its end-user products. GCP also offers you 300$ worth of free credits in the first year if you sign up which is pretty cool!
Google Cloud Deep Learning VM Images enable developers to instantiate a VM image containing the popular deep learning and machine learning frameworks on a Google Compute Engine instance. You can launch Compute Engine instances pre-installed with popular ML frameworks like TensorFlow, PyTorch, or scikit-learn. You can check out further details here. The best part is, you can also add Cloud TPU and GPU support with a single click. The rates are very competitive and much cheaper than AWS. GCP VMs give you access to a 12 GB Tesla K80 GPU at only 0.45$ per hour! Check out pricing slabs here for more information.
These options should give you a good idea around potential options to kickstart your deep learning journey with minimal configuration and setup.
Setup your own Cloud-based Deep Learning Environment
While pre-configured setups on the cloud are great to use, sometimes you want to build your own customized cloud-based or on-premise deep learning environment. In this section, we will look at how you can build a robust deep learning environment in the cloud by leveraging any popular cloud platform service provider. The major steps involved are as follows:
- Choosing a cloud provider
- Creating your virtual server
- Configuring your virtual server
- Setup your deep learning environment
- Accessing your deep learning environment
- Validating GPU enablement
Let’s now take a detailed walkthrough in setting up our cloud-based deep learning environment.
Choosing a cloud provider
There are multiple cloud providers with affordable and competitive rates these days. We have already seen some of them in the previous section. We are looking to leverage Platform as a Service (PaaS) capabilities where we just manage our data, applications, and basic configurations but use GPU computing for deep learning. The following figure shows some popular cloud providers leveraged by deep learning practitioners.
Cloud Providers having Deep Learning Instances
Popular providers include Amazon’s AWS, Microsoft’s Azure, and Google’s Google Cloud Platform (GCP).
Creating your virtual server
The next step after choosing your cloud service provider is to create your VM instance which will basically be a server hosting your code, data and configuration settings. The steps for creating a VM depends on the choice of your cloud provider. The following step-by-step tutorials give you an in-depth guide to creating and setting up your own instance in AWS and GCP.
- Create and Setup a Cloud Instance with the AWS Deep Learning AMI
- Create and Setup a Cloud Instance with GCP Marketplace
I do cover a step-by-step guide to creating and instantiating your own VM on AWS in my book, ‘Hands-on Transfer Learning with Python’ in Chapter 2. The entire codebase is open-sources and further details are in the GitHub repository for the book in case you are interested.
Configuring your virtual server
Once your instance is created, you can start the instance from the cloud provider’s platform. Typically the EC2 user interface in AWS or the VM Instances page in GCP. Now usually you need a private key to login to your server using SSH from a local terminal. Usually AWS allows you to setup your own keys during the last step of creating your VM and gives you a downloadable private key. GCP allows you to login to your system directly using the GCP Instances page with SSH. Then, if needed you can create your own SSH keys there if you don’t have one by following this guide.
Remember to save your private SSH key in a safe place and login to your server using the following command from the terminal.
Congratulations! You are now successfully logged in to your deep learning server. The remaining aspects of our deep learning setup will be carried out assuming you are on a Linux server. Our Linux distro was Ubuntu 18.10. You are free to choose your own OS however based on your preference!
Since we will be using Jupyter notebooks extensively for prototying and development, it often helps to setup a password for the notebook server so unknown people can’t use it even if they have your public IP address in some way. In case you don’t want to setup a password, you can skip the password setup steps in this section. The first thing here is to create a new SSL certificate using Open SSL.
In case python is not installed in the system, we recommend using the Anaconda distribution which has a great package management system and comes with a suite of pre-installed libraries. We recommend following the official guide to install the Anaconda python distribution.
The next step is to generate a config file for our Jupyter notebook server in case it is not present. Usually the file is located in your home directory at
~/.jupyter/jupyter_notebook_config.py and in case it is not present, you can create it with the following command.
To enable password-based security for the notebooks, we need to first generate a password and its hash. We can leverage the
passwd() function in
Ipython.lib as follows:
Once you enter your password and verify it, the function returns a hash to you, which is the hash of your password (in this case, the password key I typed was literally the word password, which is something you should definitely not be using!). Copy and save that hash value since we will need it soon. Next, fire up your favorite text editor to edit the Jupyter config file, as follows:
We are now ready to setup our deep learning environment.
Setup your deep learning environment
We will now start setting up the necessary configurations required by our deep learning environment to start using GPUs. In case CUDA and cuDNN are already configured for your instance, you can skip some of the following steps as needed.
1. Install Graphics Drivers
The first step here is to make sure your graphics drivers are installed for your GPU. Going forward we assume you are using an NVIDIA GPU. The best way to test if the drivers are installed is to run the
nvidia-smi command from the terminal. Now in case the command fails to work, we would need to install the GPU drivers.
2. Install CUDA
The NVIDIA® CUDA® Toolkit is basically a development environment for creating applications and programs which can leverage NVIDIA GPUs to the maximum. GPU-accelerated CUDA libraries enable drop-in acceleration across multiple domains including linear algebra, image and video processing, deep learning and graph analytics. Assuming we are using an Ubuntu based system, you can go to the official NVIDIA CUDA page and download the necessary setup file. At the time of writing this article, CUDA 10 is out but is still pretty new. Hence we will be using the legacy CUDA 9.0 version, which you can obtain from the legacy releases page. If you are on a server, it is better to use the terminal and download the setup file directly and configure CUDA using the following commands.
3. Install cuDNN
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. The cuDNN library provides highly tuned implementations for standard routines in neural networks including forward and backward convolution, pooling, normalization, and activation layers. Deep learning practitioners can rely on cuDNN which accelerates widely used deep learning frameworks on GPUs. You can download cuDNN from the official page, but you would need to sign up for an NVIDIA account for this! You will get an download link for cuDNN which you can then use in the terminal to download directly in the server.
Typically, this takes care of most of the necessary dependencies for our GPU setup.
4. Install Deep Learning Frameworks
Now, we need to install and set up our Python deep learning frameworks, in case they are not installed. We typically use a lot of keras and tensorflow and the following commands help us install them in our environment.
Accessing your deep learning cloud environment
We don’t really want to sit and code in the Terminal on the server all the time. Since we want to leverage Jupyter Notebooks for interactive development, we will access the Notebooks on our cloud server from our local system. For this, we first need to kickstart our Jupyter Notebook server on our remote instance.
Now if you have a public IP assigned to your instance and port
8888 is exposed, you can directly type
http://<IP_Address>:8888 and start accessing your Jupyter server in your cloud VM from your local browser!
Another option, especially for AWS instances, is to enable port-forwarding on our local instance for accessing our cloud server notebooks, from the browser of our local machine. This is also known as SSH tunneling.
In case of port forwarding, head over to your local browser and navigate to the localhost address, e.g. https://localhost:8890, which we are forwarding to the remote notebook server in our virtual server. Make sure that you use
https in the address, otherwise you’ll get an SSL error.
Validating GPU enablement
The final step is to make sure everything is working and that our deep learning frameworks are leveraging our GPU (for which we usually pay by the hour!). The following code should help us validate this.
Looks like our deep learning setup is configured to use GPUs for deep learning and we are ready to go!
Tips for On-premise Setup
Often users or organizations may not want to leverage cloud services, especially if their data is sensitive, and hence they focus on building an on-premise deep learning environment. The major focus here should be to invest in the right type of hardware as well as software, to enable maximum performance and leverage the right GPU for building deep learning models. With regards to hardware, special emphasis goes to the following:
- Processor: You can invest in an i5 or an i7 Intel CPU, or maybe an Intel Xeon if you are looking to spoil yourself!
- RAM: Invest in at least 32 GB of DDR4\DDR5 or better RAM for your memory.
- Disk: A 1 TB hard disk is excellent, and also you can invest in a minimum of 128 GB or 256 GB of SSD for fast data access!
- GPU: Perhaps the most important component for deep learning. Invest in a NVIDIA GPU, anything above a GTX 1070 with 8 GB at the minimum.
Other things you shouldn’t neglect include a motherboard, power supply, robust case, and cooler. Once you get your rig set up, for the software configuration, you can repeat all the steps from the previous section, excluding the cloud setup, and you should be good to go!
The intent of this detailed hands-on guide is to enable developers, engineers and deep learning practitioners to go from zero to deep learning in minutes. I hope this guide helps you with your own deep learning setup and you don’t end up spending hours breaking your head reading countless posts on forums and Stack Overflow to setup their own deep learning environment. Now go out there and start ‘deep learning’!