Deep learning is arguably the most popular aspect of AI, especially when it comes to data science (DS) applications. But what exactly are deep learning frameworks, and how are they related to other terms often used in AI and data science?
source:- Google Images
In this context, “framework” refers to a set of tools and processes for developing a certain system, testing it, and ultimately deploying it. Most AI systems today are created using frameworks. When a developer downloads and installs a framework on his computer, it is usually accompanied by a library. This library (or package, as it is often termed in high-level languages) will be compiled in the programming languages supported by the AI framework. The library acts like a proxy to the framework, making its various processes available through a series of functions and classes in the programming language used.
This way, you can do everything the framework enables you to do, without leaving the programming environment where you have the rest of your scripts and data. So, for all practical purposes, that library is the framework, even if the framework can manifest in other programming languages too.
This way, a framework supported by both Python and Julia can be accessed through either one of these languages, making the language you use a matter of preference. Since enabling a framework to function in a different language is a challenging task for the creators of the framework, oftentimes the options they provide for the languages compatible with that framework are rather limited.
In a nutshell, a system is a standalone program or script designed to accomplish a certain task or set of tasks. In a data science setting, a system often corresponds to a data model. However, systems can include features beyond just models, such as an I/O process or a data transformation process.
The term model involves a mathematical abstraction used to represent a real-world situation in a simpler, more workable manner. Models in DS are optimized through a process called training, and validated through a process called testing, before they are deployed.
Another term that often appears alongside these terms is methodology, which refers to a set of methods and the theory behind those methods, for solving a particular type of problem in a certain field. Different methodologies are often geared towards different applications/objectives.
It’s easy to see why frameworks are celebrities of sorts in the AI world. They help make the modeling aspect of the pipeline faster, and they make the data engineering demanded by deep learning models significantly easier. This makes AI frameworks great for companies that cannot afford a whole team of data scientists, or prefer to empower and develop the data scientists they already have.
These systems are fairly simple, but not quite “plug and play.” In this article we’ll explore the utility behind deep learning models, their key characteristics, how they are used, their main applications, and the methodologies they support.
Deep Learning (DL) is a subset of AI that is used for predictive analytics, using an AI system called an Artificial Neural Network (ANN). Predictive analytics is a group of data science methodologies that are related to the prediction of certain variables. This includes various techniques such as classification, regression, etc. As for an ANN, it is a clever abstraction of the human brain, at a much smaller scale. ANNs manage to approximate every function (mapping) that has been tried on them, making them ideal for any data analytics related task. In data science, ANNs are categorized as machine learning methodologies.
The main drawback DL systems have is that they are “black boxes.” It is exceedingly difficult — practically unfeasible — to figure out exactly how their predictions happen, as the data flux in them is extremely complicated.
Deep Learning generally involves large ANNs that are often specialized for specific tasks. Convolutional Neural Networks (CNNs) ANNs, for instance, are better for processing images, video, and audio data streams. However, all DL systems share a similar structure. This involves elementary modules called neurons organized in layers, with various connections among them. These modules can perform some basic transformations (usually non-linear ones) as data passes through them. Since there is a plethora of potential connections among these neurons, organizing them in a structured way (much like real neurons are organized in network in brain tissue), we can obtain a more robust and function form of these modules. This is what an artificial neural network is, in a nutshell.
In general, DL frameworks include tools for building a DL system, methods for testing it, and various other Extract, Transform, and Load (ETL) processes; when taken together, these framework components help you seamlessly integrate DL systems with the rest of your pipeline. We’ll look at this in more detail later in this article.
Although deep learning systems share some similarities with machine learning systems, certain characteristics make them sufficiently distinct. For example, conventional machine learning systems tend to be simpler and have fewer options for training. DL systems are noticeably more sophisticated; they each have a set of training algorithms, along with several parameters regarding the systems’ architecture. This is one of the reasons we consider them a distinct framework in data science.
DL systems also tend to be more autonomous than their machine counterparts. To some extent, DL systems can do their own feature engineering. More conventional systems tend to require more fine-tuning of the feature-set, and sometimes require dimensionality reduction to provide any decent results.
In addition, the generalization of conventional ML systems when provided with additional data generally don’t improve as much as DL systems. This is also one of the key characteristics that makes DL systems a preferable option when big data is involved.
Finally, DL systems take longer to train and require more computational resources than conventional ML systems. This is due to their more sophisticated functionality. However, as the work of DL systems is easily parallelizable, modern computing architecture as well as cloud computing, benefit DL systems the most, compared to other predictive analytics systems.
At their cores, all DL frameworks work similarly, particularly when it comes to the development of DL networks. First, a DL network consists of several neurons organized in layers; many of these are connected to other neurons in other layers. In the simplest DL network, connections take place only between neurons in adjacent layers.
The first layer of the network corresponds to the features of our dataset; the last layer corresponds to its outputs. In the case of classification, each class has its own node, with node values reflecting how confident the system is that a data point belongs to that class. The layers in the middle involve some combination of these features. Since they aren’t visible to the end user of the network, they are described as hidden (see Figure 1).
The connections among the nodes are weighted, indicating the contribution of each node to the nodes of the next layer it is connected to, in the next layer. The weights are initially randomized, when the network object is created, but are refined as the ANN is trained.
Moreover, each node contains a mathematical function that creates a transformation of the received signal, before it is passed to the next layer. This is referred to as the transfer function (also known as the activation function). The sigmoid function is the most well-known of these, but others include softmax, tanh, and ReLU. We’ll delve more into these in a moment.
Furthermore, each layer has a bias node, which is a constant that appears unchanged on each layer. Just like all the other nodes, the bias node has a weight attached to its output. However, it has no transfer function. Its weighted value is simply added to the other nodes it is connected to, much like a constant c is added to a regression model in Statistics. The presence of such a term balances out any bias the other terms inevitably bring to the model, ensuring that the overall bias in the model is minimal. As the topic of bias is a very complex one, we recommend you check out some external resources4 if you are not familiar with it.
Once the transformed inputs (features) and the biases arrive at the end of the DL network, they are compared with the target variable. The differences that inevitably occur are relayed back to the various nodes of the network, and the weights are changed accordingly. Then the whole process is repeated until the error margin of the outputs is within a certain predefined level, or until the maximum number of iterations is reached. Iterations of this process are often referred to as training epochs, and the whole process is intimately connected to the training algorithm used. In fact, the number of epochs used for training a DL network is often set as a parameter and it plays an important role in the ANN’s performance.
All of the data entering a neuron (via connections with neurons of the previous layer, as well as the bias node) is summed, and then the transfer function is applied to the sum, so that the data flow from that node is y = f(Σ(wixi + b)), where wi is the weight of node i of the previous layer, and xi its output, while b is the bias of that layer. Also, f() is the mathematical expression of the transfer function.
This relatively simple process is at the core of every ANN. The process is equivalent to that which takes place in a perceptron system — a rudimentary AI model that emulates the function of a single neuron. Although a perceptron system is never used in practice, it is the most basic element of an ANN, and the first system created using this paradigm.
The function of a single neuron is basically a single, predefined transformation of the data at hand. This can be viewed as a kind of meta-feature of the framework, as it takes a certain input x and after applying a (usually non-linear) function f() to it, x is transformed into something else, which is the neuron’s output y.
While in the majority of cases one single meta-feature would be terrible at predicting the target variable, several of them across several layers can work together quite effectively — no matter how complex the mapping of the original features to the target variable. The downside is that such a system can easily overfit, which is why the training of an ANN doesn’t end until the error is minimal (smaller than a predefined threshold).
This most rudimentary description of a DL network works for networks of the multi-layer perceptron type. Of course, there are several variants beyond this type. CNNs, for example, contain specialized layers with huge numbers of neurons, while RNNs have connections that go back to previous layers. Additionally, some training algorithms involve pruning nodes of the network to ensure that no overfitting takes place.
Once the DL network is trained, it can be used to make predictions about any data similar to the data it was trained on. Furthermore, its generalization capability is quite good, particularly if the data it is trained on is diverse. What’s more, most DL networks are quite robust when it comes to noisy data, which sometimes helps them achieve even better generalization.
When it comes to classification problems, the performance of a DL system is improved by the class boundaries it creates. Although many conventional ML systems create straightforward boundary landscapes (e.g. rectangles or simple curves), a DL system creates a more sophisticated line around each class (reminiscent of the borders of certain counties in the US). This is because the DL system is trying to capture every bit of signal it is given in order to make fewer mistakes when classifying, boosting its raw performance. Of course, this highly complex mapping of the classes makes interpretation of the results a very challenging, if not unfeasible, task. More on that later in this article.
Having knowledge of multiple DL frameworks gives you a better understanding of the AI field. You will not be limited by the capabilities of a specific framework. For example, some DL frameworks are geared towards a certain programming language, which may make focusing on just that framework an issue, since languages come and go. After all, things change very rapidly in technology, especially when it comes to software. What better way to shield yourself from any unpleasant developments than to be equipped with a diverse portfolio of DL know-how?
The main frameworks in DL include MXNet, TensorFlow, and Keras. Pytorch and Theano have also played an important role, but currently they are not as powerful or versatile as the aforementioned frameworks. Also, for those keen on the Julia language, there is the Knet framework, which to the best of our knowledge, is the only deep learning framework written in a high-level language mainly (in this case, Julia). You can learn more about it at its Github repository.
MXNet is developed by Apache and it’s Amazon’s favorite framework. Some of Amazon’s researchers have collaborated with researchers from the University of Washington to benchmark it and make it more widely known to the scientific community.
TensorFlow is probably the most well-known DL framework, partly because it has been developed by Google. As such, it is widely used in the industry and there are many courses and books discussing it.
Keras is a high-level framework; it works on top of TensorFlow (as well as other frameworks like Theano). Its ease of use without losing flexibility or power makes it one of the favorite deep learning libraries today. Any data science enthusiast who wants to dig into the realm of deep learning can start using Keras with reasonably little effort. Moreover, Keras’ seamless integrity with TensorFlow, plus the official support it gets from Google, have convinced many that Keras will be one of the long-lasting frameworks for deep learning models, while its corresponding library will continue to be maintained.
As a set of techniques, DL is language-agnostic; any computer language can potentially be used to apply its methods and construct its data structures (the DL networks), even if each DL framework focuses on specific languages only. This is because it is more practical to develop frameworks that are compatible with certain languages, some programming languages are used more than others, such as Python. The fact that certain languages are more commonly used in data science plays an important role in language selection, too. Besides, DL is more of a data science framework nowadays anyway, so it is marketed to the data science community mainly, as part of Machine Learning (ML). This likely contributes to the confusion about what constitutes ML and AI these days.
In addition, almost all the DL frameworks support C / C++, since they are usually written in C or its object-oriented counterpart. Note that all these languages access the DL frameworks through APIs, which take the form of packages in these languages. Therefore, in order to use a DL framework in your favorite language’s environment, you must become familiar with the corresponding package, its classes, and its various functions.
Deep learning frameworks add value to AI and DS practitioners in various ways. The most important value-adding processes include ETL processes, building data models, and deploying these models. Beyond these main functions, a DL framework may offer other things that a data scientist can leverage to make their work easier. For example, a framework may include some visualization functionality, helping you produce some slick graphics to use in your report or presentation. As such, it’s best to read up on each framework’s documentation, becoming familiar with its capabilities to leverage it for your data science projects.
A DL framework can be helpful in fetching data from various sources, such as databases and files. This is a rather time-consuming process if done manually, so using a framework is very advantageous. The framework will also do some formatting on the data, so that you can start using it in your model without too much data engineering. However, doing some data processing of your own is always useful, particularly if you have some domain knowledge.
The main function of a DL framework is to enable you to efficiently build data models. The framework facilitates the architecture design part, as well as all the data flow aspects of the ANN, including the training algorithm. In addition, the framework allows you to view the performance of the system as it is being trained, so that you gain insight about how likely it is to overfit.
Moreover, the DL framework takes care of all the testing required before the model is tested on different than the dataset it was trained on (new data). All this makes building and fine-tuning a DL data model a straightforward and intuitive process, empowering you to make a more informed choice about what model to use for your data science project.
Model deployment is something that DL frameworks can handle, too, making movement through the data science pipeline swifter. This mitigates the risk of errors through this critical process, while also facilitating easy updating of the deployed model. All this enables the data scientist to focus more on the tasks that require more specialized or manual attention. For example, if you (rather than the DL model) worked on the feature engineering, you would have a greater awareness of exactly what is going into the model.
Deep learning is a very broad AI category, encompassing several data science methodologies through its various systems. As we have seen, for example, it can be successfully used in classification — if the output layer of the network is built with the same number of neurons as the number of classes in the dataset. When DL is applied to problems with the regression methodology, things are simpler, as a single neuron in the output layer is enough. Reinforcement learning is another methodology where DL is used; along with the other two methodologies, it forms the set of supervised learning, a broad methodology under the predictive analytics umbrella (see Appendix B).
DL is also used for dimensionality reduction, which (in this case) comprises a set of meta-features that are usually developed by an autoencoder system (see Appendix C for more details on this kind of DL network). This approach to dimensionality reduction is also more efficient than the traditional statistical ones, which are computationally expensive when the number of features is remarkably high. Clustering is another methodology where deep learning can be used, with the proper changes in the ANN’s structure and data flow. Clustering and dimensionality reduction are the most popular unsupervised learning methodologies in data science and provide a lot of value when exploring a dataset. Beyond these data science methodologies involving DL, there are others that are more specialized and require some domain expertise. We’ll talk about some of them more, shortly.
There are many applications of deep learning. Some are more established or general, while others are more specialized or novel. Since DL is still a new tool, its applications in the data science world remain works in progress, so keep an open mind about this matter. After all, the purpose of all AI systems is to be as universally applicable as possible, so the list of applications is only going to grow.
For the time being, DL is used in complex problems where high-accuracy predictions are required. These could be datasets with high dimensionality and/or highly non-linear patterns. In the case of high-dimensional datasets that need to be summarized into a more compact form with fewer dimensions, DL is a highly effective tool for the job. Also, since the very beginning of its creation, DL has been applied to image, sound, and video analytics, with a focus on images. Such data is quite difficult to process otherwise; the tools used before DL could only help so much, and developing those features manually was a very time-consuming process.
Moving on to more niche applications, DL is widely used in various natural language processing (NLP) methods. Where it is important to identify any positive or negative attitudes in the text, we use a methodology called “sentiment analysis,” which offers a fertile ground for many DL systems. There are also DL networks that perform text prediction, which is common in many mobile devices and some text editors. More advanced DL systems manage to link images to captions by mapping these images to words that are relevant and that form sentences. Such advanced applications of DL include chatbots, in which the AI system both creates text and understands the text it is given. Also, applications like text summarization are under the NLP umbrella too and DL contributes to them significantly. Some DL applications are more advanced or domain-specific — so much so that they require a tremendous amount of data and computing power to work. However, as computing becomes more readily available, these are bound to become more accessible in the short term.
DL frameworks make it easy and efficient to employ DL in a data science project. Of course, part of the challenge is deciding which framework to use. Because not all DL frameworks are built equal, there are factors to keep in mind when comparing or evaluating these frameworks.
The number of languages supported by a framework is especially important. Since programming languages are particularly fluid in the data science world, it is best to have your language bases covered in the DL framework you plan to use. What’s more, having multiple languages support in a DL framework enables the formation of a more diverse data science team, with each member having different specific programming expertise.
You must also consider the raw performance of the DL systems developed by the framework in question. Although most of these systems use the same low-level language on the back end, not all of them are fast. There may also be other overhead costs involved. As such, it’s best to do your due diligence before investing your time in a DL framework — particularly if your decision affects other people in your organization.
Furthermore, consider the ETL processes supporting a DL framework. Not all frameworks are good at ETL, which is both inevitable and time-consuming in a data science pipeline. Again, any inefficiencies of a DL framework in this aspect are not going to be advertised; you must do some research to uncover them yourself.
Finally, the user community and documentation around a DL framework are important things, too. Naturally, the documentation of the framework is going to be helpful, though in some cases it may leave much to be desired. If there is a healthy community of users for the DL framework you are considering, things are bound to be easier when learning its more esoteric aspects — as well as when you need to troubleshoot issues that may arise.
Interpretability is the capability of a model to be understood in terms of its functionality and its results. Although interpretability is often a given with conventional data science systems, it is a pain point of every DL system. This is because every DL model is a “black box,” offering little to no explanation for why it yields the results it does. Unlike the framework itself, whose various modules and their functionality is clear, the models developed by these frameworks are convoluted graphs. There is no comprehensive explanation as to how the inputs you feed them turn into the outputs they yield.
Although obtaining an accurate result through such a method may be enticing, it is quite hard to defend, especially when the results are controversial or carry a demographic bias. The reason for a demographic bias has to do with the data, by the way, so no number of bias nodes in the DL networks can fix that, since a DL network’s predictions can only be as good as the data used to train it. Also, the fact that we have no idea how the predictions correspond to the inputs allows biased predictions to slip through unnoticed.
However, this lack of interpretability may be resolved in the future. This may require a new approach to them, but if it’s one thing that the progress of AI system has demonstrated over the years, it is that innovations are still possible and that new architectures of models are still being discovered. Perhaps one of the newer DL systems will have interpretability as one of its key characteristics.
Maintenance is essential to every data science model. This entails updating or even upgrading a model in production, as new data becomes available. Alternatively, the assumptions of the problem may change; when this happens, model maintenance is also needed. In a DL setting, model maintenance usually involves retraining the DL network. If the retrained model doesn’t perform well enough, more significant changes may be considered such as changing the architecture or the training parameters. Whatever the case, this whole process is largely straightforward and not too time-consuming.
How often model maintenance is required depends on the dataset and the problem in general. Whatever the case, it is good to keep the previous model available too when doing major changes, in case the new model has unforeseen issues. Also, the whole model maintenance process can be automated to some extent, at least the offline part, when the model is retrained as new data is integrated with the original dataset.
When to use DL over conventional data science systems
Deciding when to use a DL system instead of a conventional method is an important task. It is easy to be enticed by the new and exciting features of DL, and to use it for all kinds of data science problems. However, not all problems require DL. Sometimes, the extra performance of DL is not worth the extra resources required. In cases where conventional data science systems fail, or don’t offer any advantage (like interpretability), DL systems may be preferable. Complex problems with lots of variables and cases with non-linear relationships between the features and the target variables are great matches for a DL framework.
If there is an abundance of data, and the main objective is good raw performance in the model, a DL system is typically preferable. This is particularly true if computational resources are not a concern, since a DL system requires quite a lot of them, especially during its training phase. Whatever the case, it’s good to consider alternatives before setting off to build a DL model. While these models are incredibly versatile and powerful, sometimes simpler systems are good enough.