Community

Perceptron Stochastic Gradient Descent algorithm in python implemented from scratch. Why is the training degrading over time?  

  RSS

intandem
(@intandem)
New Member
Joined: 3 months ago
Posts: 1
27/09/2018 5:36 am  

Hi,

I am implementing my own perceptron algorithm in python wihtout using numpy or scikit yet. I wanted to get the basics right before proceeding to machine learning specific modules.

I wrote the code as given below. 1. used iris data set to classify based on sepal length and petal size. 2. updating the weights at the end of each training set 3. learning rate, number of iterations for training provided to the algorithm from client

Issues:

My training algorithm degrades instead of improving over time. Can someone please explain what i am doing incorrectly.

This is my error set across iteration number, as you can see the error is actually increasing. 😥  😥  😥  😥  😥  😥  😥  😥 

{ 0: 0.01646885885483229, 1: 0.017375368112097056, 2: 0.018105024923841584, 3: 0.01869233173693685, 4: 0.019165059856726563, 5: 0.01954556263697238, 6: 0.019851832477317588, 7: 0.02009835160930562, 8: 0.02029677690109266, 9: 0.020456491062436744 }

 

 
import pandas as panda

import matplotlib.pyplot as plot

import random

remote_location = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

class Perceptron(object):




def __init__(self, epochs, learning_rate, weight_range = None):

self.epochs = epochs

self.learning_rate = learning_rate

self.weight_range = weight_range if weight_range else [-1, 1]

self.weights = []

self._x_training_set = None

self._y_training_set = None

self.number_of_training_set = 0

def setup(self):

self.number_of_training_set = self.setup_training_set()

self.initialize_weights(len(self._x_training_set[0]) + 1)

def setup_training_set(self):

"""

Downloading training set data from UCI ML Repository - Iris DataSet




"""

data = panda.read_csv(remote_location)

self._x_training_set = list(data.iloc[0:, [0,2]].values)

self._y_training_set = [0 if i.lower()!='iris-setosa' else 1

for i in data.iloc[0:, 4].values]

return len(self._x_training_set)

def initialize_weights(self, number_of_weights):

random_weights = [random.uniform(self.weight_range[0], self.weight_range[1])

for i in range(number_of_weights)]

self.weights.append(-1) # setting up bias unit

self.weights.extend(random_weights)

def draw_initial_plot(self, _x_data, _y_data, _x_label, _y_label):




plot.xlabel(_x_label)

plot.ylabel(_y_label)

plot.scatter(_x_data,_y_data)

plot.show()

def learn(self):




self.setup()

epoch_data = {}

error = 0

for epoch in range(self.epochs):

for i in range(self.number_of_training_set):

_x = self._x_training_set[i]

_desired = self._y_training_set[i]

_weight = self.weights

guess = _weight[0] ## setting up the bias unit

for j in range(len(_x)):

guess += _weight[j+1] * _x[j]




error = _desired - guess

## i am going to reset all the weights

if error!= 0 :

## resetting the bias unit

self.weights[0] = error * self.learning_rate

for j in range(len(_x)):

self.weights[j+1] = self.weights[j+1] + error * self.learning_rate * _x[j]

#saving error at the end of the training set

epoch_data[epoch] = error




# print(epoch_data)

self.draw_initial_plot(list(epoch_data.keys()), list(epoch_data.values()),'Epochs', 'Error')

def runMyCode():

learning_rate = 0.01

epochs = 15

random_generator_start = -1

random_generator_end = 1

perceptron = Perceptron(epochs, learning_rate, [random_generator_start, random_generator_end])

perceptron.learn()

runMyCode()

ReplyQuote
Topic Tags
JoOlley
(@joolley)
New Member
Joined: 3 months ago
Posts: 1
28/09/2018 3:46 pm  

Hi intandem,

if you haven’t enrolled in the/a neural networks class yet it’s probably worth doing so, so you can learn more about how they work. There are quite a few mistakes in your code. The most obvious ones that will be increasing the error each epoch are: 1. You need to update the weights and bias with the learning rate times the negative of the gradient (if you think of trying to find the minimum of a parabola you want to decrease x if its gradient is positive and increase it if the gradient is negative). 2. You need to update the bias instead of overwriting it each epoch like you do for the weights. 3. It’s not usually a good idea to use just the difference between the target and predicted values as your loss function as positive and negative losses will cancel out. The squared error is often used when predicting real values as you are doing.

Jo


ReplyQuote
dan
 dan
(@dan)
New Member
Joined: 3 months ago
Posts: 3
26/10/2018 4:54 pm  

If I have understood Geoffrey Hinton correctly, one regret he had was coining the term "multi-layer perceptron" as it is a misnomer.  The old perceptron updated its weights in an entirely different, simpler, and less useful way than today's neural networks, or the ones consisting of layers of RBMs that use back-propagation based on gradient descent.

Andrew Ng has some excellent courses on Coursera.  There's a basic intro, and then there are five more courses going toward a specialization.  I highly recommend all six courses, but what you are trying to do would be covered extremely well in the first course.  The first course is based on MATLAB or Octave.  The give course series is based on Python or Jupyter notebooks which you might want to set up with Anaconda.

Geoffrey Hinton had another excellent rigorous course on Coursera but it is being disconnected as it is regarded outdated.  I hoped I could talk him and Coursera into keeping the course active as it really is excellent and I feel it provides another perspective reinforcing what might be learned in Andrew's courses.  Both professors are amazing.

There are other easier courses available though most are less rigorous.  Even so, some really don't focus on rigor or building the internal structure of things.  Some focus on out-of-the box solutions like keras, tensorflow, and such.  And that may be perfectly OK for some people, or it may be a decent way to get running fast.  Fast.ai is one of those and it is done by someone who has won Kaggle competitions.  I believe their philosophy is that you don't have to go through the rigorous mathematics before getting started, but that it might be better to get a more useful and encouraging start by starting simple.  Many instructors on Udemy also take this approach.  But Fast.ai also offers their own special libraries that they keep up to date with the latest best practices implemented and tested and made available to the public.

Gradient descent is not one of the easiest things to learn.  The calculus itself is pretty simple--derivatives or partials.  Chain rule.  But then you're applying it to matrices or vectors and various activation functions for back-propagation.  And your algorithm can take forever if you fail to vectorize your calculations and hand them to a GPU.

But once you work through Andrew's first course, you should be a great deal stronger in your understanding of how things work.

This may sound a little too, pardon the expression, but "anal".  But it really pays to study and work through every quiz and project to the point of retaking each one until you can predictably score 100% on each one.  This is not about being an unreasonable perfectionist or stroking your ego.  Each time a point is missed, some knowledge is missed.  It really pays to put in the extra effort and time.

Sorry for so long of an answer, but I hope it helps.  For your question, it would help to look around on the web to get a really good grasp between the difference in how perceptrons and normal "fully-connected layers" work.  Fully connected is sort of a misnomer, too.  It's more like a restricted boltzmann machine (RBM) since you're really not connecting all possible nodes in the smaller network, but you are connecting all the nodes of an input layer into a hidden layer, or all the nodes in a hidden layer to the next hidden layer or all the nodes in the last hidden layer to the output layer.

Another thing to work on understanding is why there is a non-linear activation function between the layers.  That is, why do you need a sigmoid or a RELU or a SELU.  There is so much to learn really.  Normalization.  Regularization.  Softmax. One-hot. Then when you have become comfortable with that, there are convolution nets and recursive nets and LSTMs and inception nets and what's that skip net thing...residual nets and YOLO2.

And that's a good thing.  There are also GANs to learn and reinforcement learning and PCA.  And George Hinton's folks are working on making capsule nets into the mainstream, too.  The sky is the limit.  And what that means is not that there is a massive, intractable burden to blow people's minds and make it seem not worth pursuing.  What it means is there is a whole lot of really cool stuff you can do that will empower you to do amazing things.  And you don't have to stop enjoying learning this stuff.  The more you learn, the more you can enjoy it.

 

Proving an old dog can easily learn new tricks!


ReplyQuote
Share:
Subscribe to our newsletter

We use cookies to collect information about our website and how users interact with it. We’ll use this information solely to improve the site. You are agreeing to consent to our use of cookies if you click ‘OK’. All information we collect using cookies will be subject to and protected by our Privacy Policy, which you can view here.

OK
  
Working

Please Login or Register