Computer Vision

Copyright 2019 Martian Technologies, Co.

Brian S. Haney


Generally, Artificial Intelligence (“AI”) refers to a machine with the ability to replicate cognitive activities associated with human thought. The goal for many AI researchers is whole brain emulation, which describes machine intelligence copying the computational structure of the human brain. Computer vision, the study of visual data, is one important aspect of achieving this goal. This article will focus on describing the role of convolutional neural networks in computer vision.

Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in computer vision tasks. Generally, deep learning allows machines to learn with architectures inspired by the biological neocortex. Unsurprisingly, CNNs are a deep learning mechanism modeled upon the biological visual cortex. The biological visual cortex is composed of receptive fields made up of cells that are sensitive to small sub-regions of the visual field. In an artificial visual cortex, the response of a neuron to a stimulus in its receptive field is modeled with a mathematical convolutional operation. Convolution is a form of mathematical operation with two matrices: an input matrix and a kernel, or filter. A kernel is a small square matrix that is applied each element of the input matrix.

Generally, a neural network is a function, or transformation of information, operating on input data allowing the abstraction of meaning from the corresponding output. Further, in a CNN Each kernel is convolved across an input matrix and the resulting output is called a feature map. The full output of the layers is obtained by stacking all of the feature maps to create dimensionality. In contrast to some DNNs, the weight coefficients in a CNN are not all connected. Instead, a window is defined over a smaller input space and the units are connected to a small subset of the inputs. In other words, the kernel is centered over a subset of the input matrix and then multiplied for the purpose of feature abstraction. And, the flexibilities of CNNs allow them to be constantly improved with novel architecture design.


The most important part of any neural network is the data. In the two examples in this article, the dataset is the popular MNIST Dataset. The MNIST Database of Handwritten Digits includes a training set of 60,000 examples, and a test set of 10,000 examples.

Random samples from MNIST Dataset.

The digits have been size-normalized and centered in a fixed-size image. This dataset is often used for training and introductions to CNNs.


The following examples are CNNs for the MNIST dataset. The first example uses the Python library Numpy. The second example uses Keras, the TensorFlow API.

Example 1

The script begins by importing the required the packages. Numpy is a programming package for scientific computing in Python and the main package in this example. The script also imports scripy.special and matplotlib.pyplot.

#Import packages
import numpy
#Import scripy.special for the sigmoid function
import scipy.special
#Import library for plotting arrays
import matplotlib.pyplot

Next, the script defines a neural network class. The __init__ function initializes the neural network by defining the input, hidden, and output nodes. The ‘self’ parameter assigns the function to the object, an instance of the neuralNetwork class.

#Neural network class definition
class neuralNetwork:
#initialize the neural network
def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate):
#set number of nodes in each input, hidden, output layer
self.inodes = inputnodes
self.hnodes = hiddennodes
self.onodes = outputnodes

#link weight matrices, wih and who
#weights inside the arrays are w_i_j, where link is from node i to node j in the next layer
#w11 w21 -> w12 w22 etc
self.wih = numpy.random.normal(0.0, pow(self.inodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.onodes, self.hnodes))

 #learning rate = learningrate
#activation function is the sigmoid function
self.activation_function = lambda x: scipy.special.expit(x)


Third, a function is defined for training the neural network. The ‘train’ function includes three parameters ‘self’, ‘inputs_list’, and ‘target_list’ within its definition. Here, the inputs of the function flow forward, where the target reflects the desired answer. Additionally, hidden layers, output layers, and weight updates are defined.

#Train neural network
def train(self, inputs_list, targets_list):
#Convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T

#calculate signals into hidden layer
hidden_inputs =, inputs)

 #calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)

#calculate signals into final output layer
final_inputs =, hidden_outputs)
#calculate signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)

#output layer error is the (target-actual)
output_errors = targets — final_outputs

 #hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors =, output_errors)

#update the weights for the links between the hidden and output layers
self.who += * * final_outputs * (1.0 — final_outputs)), numpy.transpose(hidden_outputs))

#update the weights for the links between the input and hidden layers
self.wih += * * hidden_outputs * (1.0 — hidden_outputs)), numpy.transpose(inputs))


Then, a function is defined to query the network. And, the networks signals are calculated.

#query the neural network
def query(self, inputs_list):
#convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
#calculate signals into hidden layer
hidden_inputs =, inputs)
#calculate signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)

#calculate signals into final output layer
final_inputs =, hidden_outputs)
#calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)

return final_outputs

Fifth, the network’s nodes are defined along with the learning rate. Here, there are 784 input nodes because the MNIST dataset stores values as a 28 x 28 pixel array. And, there are 10 output nodes because there are 10 labels for each of the numbers in the dataset, 0–9.

#number of input, hidden and output nodes
input_nodes = 784
hidden_nodes = 100
output_nodes = 10
#learning rate is 0.3
learning_rate = 0.3

Then, an instance of the network class is created and the training data is loaded. Here, the training data was downloaded from the MNIST website and stored locally in a .csv file.

#create instance of neural network
n = neuralNetwork(input_nodes, hidden_nodes, output_nodes, learning_rate)
#load the MNIST training data csvfile into a list
training_data_file = open(“mnist_train.csv”, ‘r’)
training_data_list = training_data_file.readlines()

Next, the neural network is trained through an iterative process. The split method, divides the data by commas.

#train neural network
#go through all records in the training data set
for record in training_data_list:
 #split the record by the ‘,’ commas
all_values = record.split(‘,’)
 #scale and shift the inputs
inputs = (numpy.asfarray(all_values[1:]) / 225.0 * 0.99) + 0.01
 #create the target output values(all 0.01, except the desired label which is 0.99)
targets = numpy.zeros(output_nodes) + 0.01
 #all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
n.train(inputs, targets)

Then, the test data is loaded.

#load the mnist test data csv file into a list
test_data_file = open(“mnist_test.csv”, ‘r’)
test_data_list = test_data_file.readlines()
#get the first test record
all_values = test_data_list[0].split(‘,’)
image_array = numpy.asfarray(all_values[1:]).reshape((28,28))

The network is then tested and a scorecard displays the results. The scorecard identifies the correct label and the network’s answer.

#test the neural network
#scorecard for how well the network performs, initially empty
scorecard = []
#go through all the records in the test data set
for record in test_data_list:
 #split the record by the ‘,’ commas
all_values = record.split(‘,’)
 #correct answer is first value
correct_label = int(all_values[0])
print(correct_label, “correct label”)
 #scale and shift inputs
inputs = (numpy.asfarray(all_values[1:])/ 255.0 * 0.99)+0.01
 #query the network
outputs = n.query(inputs)
 # the index of the highest value corresponds to the label
label = numpy.argmax(outputs)
print(label, “network’s answer”)
 #append correct or incorrect to list
if (label == correct_label):
#network’s answer matches correct answer, add 1 to scorecard
#network’s answer doesn’t match correct answer, add 0 to
#calculate the performance score, the fraction of correct answers
scorecard_array = numpy.asarray(scorecard)
print(“performace =”, scorecard_array.sum() / scorecard_array.size)

The network’s output looks something like this:

Numpy CNN Performance

Here, the CNN written in Numpy predicted the correct number with roughly 94% accuracy.

Example 2

Keras is Google’s high-level TensorFlow API for building and training deep learning models. The API is built on top of TensorFlow and is accessible through the new TensorFlow 2.0 Alpha, which was released in early 2019. TensorFlow 2.0 Alpha includes the Keras API through tf.keras module.

First, the packages are imported and the data is defined.

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Second, the model is defined. The layers of the neural network allow for the information to flow from input to output. The activation functions used are relu and softmax. ReLU is an acronym for Rectified Linear Units and is a popular activation function in neural networks. Softmax is an activation function that transforms its input into a probability distribution.

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)

Third, the model is optimized using the Adam Optimizer.


Lastly, the model is tested and evaluated., y_train, epochs=5)
model.evaluate(x_test, y_test)

The output of the Keras CNN will look something like this:

Keras CNN Performance

The Keras CNN predicted the correct number with just over 98% accuracy.


In sum, CNNs are a mechanism allowing computers to understand and assess visual data. And, the assessment of visual data is a critical aspect of AI systems. This article illustrated how to develop a CNN with Numpy and Keras. While the Keras code is much more concise and accurate, the Numpy code is more detailed. The complete code for both models can be found on my GitHub.


[1] Brian S. Haney, The Perils & Promises of Artificial General Intelligence, 45 J. Legis. __ (2019) (Forthcoming).

[2] Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford University Press 2017).

[3] Justin Johnson, Lecture 1|Introduction to Convolutional Neural Networks for Visual Recognition, Stanford School of Engineering (2017).

[4] Damien Matti, Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection (2017)

[5] Serena Yeung, et. al., End-to-end Learning of Action Detection from Frame Glimpses in Videos, Stanford University (2015)

[6] Manon Legrand, Deep Reinforcement Learning for Autonomous Vehicle Control among Human Drivers, Universite Libre de Bruxelles (2017).

[7] Ethem Alapaydin, Machine Learning (The MIT Press, 2016).

[8] Yan LeCun, et. al., The MNIST Database of Handwritten Digits,

[9] Brian S. Haney, CNN, GitHub (2018)

[10] Daniel Maturana, Sebastian Scherer, 3D Convolutional Neural Networks for Landing Zone Detection from LiDar (2015)

[11] Damien Matti, Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection (2017)

[12] Tariq Rashid, Build Your Own Neural Network (2018).

read original article at——artificial_intelligence-5