Democratising Data

Introduction

The fourth industrial revolution. The cognitive age. Data is the new gold. Beyond the hyperbole and hype there are exciting developments taking place in the world of data and analytics.

Decoded have worked with some of the world’s largest organisations to support them on their digital transformation journeys and advise them on how they can effectively use data to set themselves apart from the competition.

This white paper will support business leaders to separate the signal from the noise when it comes to data science. It will also provide some useful pointers as to where to start (or continue) the journey of effectively using data within your organisations.

Over the coming weeks we will be releasing bitesize instalments from the white paper. The content will culminate in an event hosted by us and bring together like-minded people that care about where the industry is going and delivering best practice.

Email us to join the guestlist. The event date and location will be announced shortly.

Definitions

The first stumbling block for anybody wishing to explore the world of data is understanding the definitions. They can be complex, confusing, ambiguous and nebulous. It is not helpful that some of the most important terms do not have agreed meanings even with the data community. Below are some of the most common and a plain language explanation of what they mean.

To begin with, we have the three most used terms for describing emerging tech data projects:

Artificial Intelligence (AI)

The starting place for most conversations around data. The term began in the mid-1950s as an idea to create machines that could mimic ‘human-like intelligence’. The idea was that machines would have functions like learning, reasoning and perceiving. However the term is not that useful. It no longer gives an accurate picture of what machines are doing, and attempts to humanise what is essentially advanced statistics, cause confusion — machines can’t “see” and they don’t “learn”. What they do is complete tasks based on a set of stipulated rules that solve problems (algorithms) — from a human point-of-view we recognise this as intelligence, therefore give it the label artificial intelligence.

Artificial Intelligence can be thought of in two broad categories — Artificial General Intelligence — think fully functioning, multi-purpose robots like C3PO and the Terminator, and Artificial Narrow Intelligence, where an AI is designed to handle one specific task, like spam filters or image recognition. General AI doesn’t exist. Everything we are talking about is Narrow AI — this includes Machine Learning and Deep Learning.

Machine Learning

Field of AI that uses statistical techniques to give computer systems the ability to progressively improve performance on a specific task (or “learn”) from data, rather than being explicitly programmed. The “learning” comes from a computer system finding its own “best-way” to achieve a desired outcome. We might give it inputs, and suggest a goal for it to achieve — it “learns” the best way to configure its algorithm to achieve that goal. Machine Learning can be used for classifying, or grouping, large data sets — for example, figuring out how an email should be classified — should it go to your inbox, or your spam folder.

Deep Learning

Deep Learning is basically Machine Learning on steroids. It uses significantly more complex algorithms and techniques modelled around Neural Networks, ie the biological brain, to increase an algorithms likelihood of achieving a nominated goal. The word “deep” comes from the thousands of layers that live inside the algorithm’s model. Each layer is assigned a random statistical weighting by the algorithm, then as the algorithm processes information it automatically adjusts those statistical weightings until it comes up with a set of layers and weights that gives the result closest to the goal. Due to the massive amount of calculations it is processing, Deep Learning needs high-end machines and significantly more training data to deliver accurate results. Also, due to the complexity of Deep Learning algorithms, they are becoming increasingly difficult for humans to understand exactly how computers are classifying data.

Next up comes the two different types of data that algorithms process

Structured Data

Pieces of information in a format that can be easily searched, processed and analysed. Examples would include your electricity bill, a log of when telephone calls are made, a list of registered voters and their political preferences. Essentially anything that could go into a table or spreadsheet.

Unstructured data

Data types that are not easily searched and analysed. This could include images posted on social media, a voice recording of a telephone call, or the text found within a book.

And finally, we have three ways that machines can create the algorithms that power Artificial Intelligence:

Supervised learning

This starts with someone manually telling the computer what data is being processed. Imagine someone going down a credit card bill and categorising items into “groceries”, “entertainment” or “utilities”. Clicking “this is spam” on emails is another example. Once a human has trained a small set of data, the computer turns this into a model it can use to recognise future pieces of data. This is called training data.

Algorithms create models by analysing categorised, or labelled, training data to create a prediction on unlabelled data. Supervised learning techniques include Decision Trees and Neural Networks. For example, a model based on images labelled “cat” or “banana”, would be able to predict the likelihood that an unlabelled image is either a “cat” or a “banana”.

Unsupervised learning

This is when an algorithm looks over an unlabelled data set and determines whether there are any logical clusters, or similarities. This is incredibly useful for uncovering patterns in data that might be invisible to humans. Spotify is a brilliant example of this. Previously, radio programmes would have used “age”, “frequency of play” and “geography” as reasons to schedule a song on a playlist. Spotify on the other hand can use an ever increasing number of variables to target hyper niche groups — creating playlists that might appear insane to traditional programmers, but incredibly relevant to their users. While this is fun for Spotify, it can be incredibly powerful for insurance companies and retailers looking to offer tailored products to their customers.

Reinforcement learning

This is where things get exciting. In the shortest sentence possible: reinforcement learning is the science of making the best possible decisions. To explain, this means that an algorithm will try actions, and then improve those actions based on whether the actions took it closer to its goal, or further away. Imagine a toddler learning to walk. It is given the goal of picking up a toy from across the room. To cross the room it will fall over many times, but each time it will factor in feedback from those falls — considering whether its steps were too big, its balance was off, or whether its walking on carpet or polished floors. This is reinforcement learning.

Reinforcement learning is exciting because it allows algorithms to solve problems in unknown environments. It would be impossible to program an autonomous vehicle with every feature of every road or city it could drive on. It is however possible to program a vehicle to stay in a straight line relative to obstacles around it — factoring in elements such as speed or humidity. It is also possible to ask a vehicle to achieve the goal of going from one point to another using the least fuel possible.

Reinforcement learning is different from other types of Machine Learning because the algorithm doesn’t have anyone supervising its individual actions — it has to decide by itself whether an action is taking it closer to its goal.

Next week we will look at why data is such a hot topic right now.

Watch this space…

Email us to join the event guestlist. The event will be held in Sydney in February, date TBC.

read original article at https://blog.decoded.com/democratising-data-9842ff94aa11?source=rss——artificial_intelligence-5