Are you over 18 and want to see adult content?
More Annotations
![A complete backup of sanigroup.com.ec](https://www.archivebay.com/archive2/5557c9b2-2ae2-444e-838f-ae5809a06933.png)
A complete backup of sanigroup.com.ec
Are you over 18 and want to see adult content?
![A complete backup of longchampbags.org.uk](https://www.archivebay.com/archive2/d266676b-0589-45cc-a1e2-c7ff6ff5b561.png)
A complete backup of longchampbags.org.uk
Are you over 18 and want to see adult content?
![A complete backup of commissaries.com](https://www.archivebay.com/archive2/edf49e24-7021-474b-970c-6e2e80ec4baf.png)
A complete backup of commissaries.com
Are you over 18 and want to see adult content?
![A complete backup of marketresearchfuture.com](https://www.archivebay.com/archive2/eae7a9ad-6b15-464b-9f0d-02614c0ed4fc.png)
A complete backup of marketresearchfuture.com
Are you over 18 and want to see adult content?
![A complete backup of instant-famous.com](https://www.archivebay.com/archive2/ecdd27c5-f6e7-4280-883a-69dc372e71b5.png)
A complete backup of instant-famous.com
Are you over 18 and want to see adult content?
![A complete backup of senamasasandalye.com](https://www.archivebay.com/archive2/74807acf-7633-4414-9de2-96270e62438c.png)
A complete backup of senamasasandalye.com
Are you over 18 and want to see adult content?
Favourite Annotations
![A complete backup of www.masrawy.com/sports/sports_news/details/2020/2/20/1728208/%D9%85%D8%B1%D8%AA%D8%B6%D9%89-%D9%85%D9%86%D8](https://www.archivebay.com/archive2/591394a7-aa7e-4b51-84ca-e0079f0f19bf.png)
A complete backup of www.masrawy.com/sports/sports_news/details/2020/2/20/1728208/%D9%85%D8%B1%D8%AA%D8%B6%D9%89-%D9%85%D9%86%D8
Are you over 18 and want to see adult content?
![A complete backup of www.eurointegration.com.ua/news/2020/02/20/7106595/](https://www.archivebay.com/archive2/cc277e60-e020-4d6c-a2b1-2e8a482e8af3.png)
A complete backup of www.eurointegration.com.ua/news/2020/02/20/7106595/
Are you over 18 and want to see adult content?
![A complete backup of navbharattimes.indiatimes.com/education/education-news/uppsc-pre-result-2019-declared-check-uppsc-up-nic-in](https://www.archivebay.com/archive2/143c371d-77af-4efb-b2b5-7f3ae924e7a4.png)
A complete backup of navbharattimes.indiatimes.com/education/education-news/uppsc-pre-result-2019-declared-check-uppsc-up-nic-in
Are you over 18 and want to see adult content?
Text
the value of f.
MAKE YOUR OWN NEURAL NETWORK: 2020 If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: CALCULATING THE OUTPUT SIZE The picture shows how the kernel moves along the image in steps of size 1.The areas covered by the kernel do overlap but this is not a problem. Across the top of the image, the kernel can take 5 positions, which is why the output is 5 wide. Down the image, the kernel can also take 5 positions, which is why the output is a 5 by 5 square. Easy! MAKE YOUR OWN NEURAL NETWORK: YOUR OWN HANDWRITING Your Own Handwriting - The Real Test. We've trained and tested the simple 3 layer neural network on the MNIST training and test data sets. That's fine and worked incredibly well - achieving 97.4% accuracy! That's all fine but it would feel much more real if we got the neural network to work on our own handwriting, or images wecreated.
MAKE YOUR OWN NEURAL NETWORK: COMPLEX VALUED NEURAL 2. Complex Valued Link Weights. The first idea of a thing to upgrade from normal real numbers to complex values is the link weights. That itself is actually a massive change because now signals are not just amplified or diminished as they pass through a network, they can now be rotated too. This is a significant step change in the richness of MAKE YOUR OWN NEURAL NETWORK: IMPROVING NEURAL NETWORK The previous 100 nodes gives us 96.7% accuracy. Increasing the hidden layer to 200 nodes boosts the performance to 97.5%. We've broken the 97% barrier! Actually, 500 hidden nodes gives us 97.6% accuracy, a small improvement, but at the cost of a much larger number of calculations and time taken to do them. So for me, 200 is the sweetspot.
MAKE YOUR OWN NEURAL NETWORK: LEARNING MNIST WITH GPU We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent: optimiser = torch.optim.SGD(net.parameters (), lr=0.1) We feed this optimiser the adjustable parameters of our neural network, and we MAKE YOUR OWN NEURAL NETWORK: WHY ACTIVATION FUNCTIONS If the activation function is linear too, then the output is a linear function of the outputs of the nodes which feed into it. That is output = linear_function (inputs). Now if we expand our thinking from one node to nodes, we have the output of the seconf node as being a linear function of the output of the first node, which itself was a MAKE YOUR OWN NEURAL NETWORK: THE MNIST DATASET OF count += 1. title ("Label is " + linebits ) imshow (imarray, cmap='Greys', interpolation='None') pass. The output in IPython is a series of images, You can check that the label matches the handwritten image: Cool - we can now import handwritten image data from the MNIST dataset and work with it in Python! MAKE YOUR OWN NEURAL NETWORK: ERROR BACKPROPAGATION REVISTED This additional heuristic is the same as the previous very simple one - but with an attempt to apply some kind of normalisation. We want to see if the lack of a normalisation in the simple heuristic has a negative effect on performance. MAKE YOUR OWN NEURAL NETWORK If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: 2020 If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: CALCULATING THE OUTPUT SIZE The picture shows how the kernel moves along the image in steps of size 1.The areas covered by the kernel do overlap but this is not a problem. Across the top of the image, the kernel can take 5 positions, which is why the output is 5 wide. Down the image, the kernel can also take 5 positions, which is why the output is a 5 by 5 square. Easy! MAKE YOUR OWN NEURAL NETWORK: YOUR OWN HANDWRITING Your Own Handwriting - The Real Test. We've trained and tested the simple 3 layer neural network on the MNIST training and test data sets. That's fine and worked incredibly well - achieving 97.4% accuracy! That's all fine but it would feel much more real if we got the neural network to work on our own handwriting, or images wecreated.
MAKE YOUR OWN NEURAL NETWORK: COMPLEX VALUED NEURAL 2. Complex Valued Link Weights. The first idea of a thing to upgrade from normal real numbers to complex values is the link weights. That itself is actually a massive change because now signals are not just amplified or diminished as they pass through a network, they can now be rotated too. This is a significant step change in the richness of MAKE YOUR OWN NEURAL NETWORK: IMPROVING NEURAL NETWORK The previous 100 nodes gives us 96.7% accuracy. Increasing the hidden layer to 200 nodes boosts the performance to 97.5%. We've broken the 97% barrier! Actually, 500 hidden nodes gives us 97.6% accuracy, a small improvement, but at the cost of a much larger number of calculations and time taken to do them. So for me, 200 is the sweetspot.
MAKE YOUR OWN NEURAL NETWORK: LEARNING MNIST WITH GPU We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent: optimiser = torch.optim.SGD(net.parameters (), lr=0.1) We feed this optimiser the adjustable parameters of our neural network, and we MAKE YOUR OWN NEURAL NETWORK: WHY ACTIVATION FUNCTIONS If the activation function is linear too, then the output is a linear function of the outputs of the nodes which feed into it. That is output = linear_function (inputs). Now if we expand our thinking from one node to nodes, we have the output of the seconf node as being a linear function of the output of the first node, which itself was a MAKE YOUR OWN NEURAL NETWORK: THE MNIST DATASET OF count += 1. title ("Label is " + linebits ) imshow (imarray, cmap='Greys', interpolation='None') pass. The output in IPython is a series of images, You can check that the label matches the handwritten image: Cool - we can now import handwritten image data from the MNIST dataset and work with it in Python! MAKE YOUR OWN NEURAL NETWORK: ERROR BACKPROPAGATION REVISTED This additional heuristic is the same as the previous very simple one - but with an attempt to apply some kind of normalisation. We want to see if the lack of a normalisation in the simple heuristic has a negative effect on performance. MAKE YOUR OWN NEURAL NETWORK: YOUR OWN HANDWRITING Your Own Handwriting - The Real Test. We've trained and tested the simple 3 layer neural network on the MNIST training and test data sets. That's fine and worked incredibly well - achieving 97.4% accuracy! That's all fine but it would feel much more real if we got the neural network to work on our own handwriting, or images wecreated.
MAKE YOUR OWN NEURAL NETWORK: LEARNING MNIST WITH GPU We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent: optimiser = torch.optim.SGD(net.parameters (), lr=0.1) We feed this optimiser the adjustable parameters of our neural network, and we MAKE YOUR OWN NEURAL NETWORK: A GENTLE INTRODUCTION TO Well there is an extra step where the power is used as a multiplier before it is reduced. So the 5 in 2t 5 is used as an additional multiplier before the power is reduced 5*2t 4 = 10t 4. The following summarises this power rule for doing calculus. Let’s try it on MAKE YOUR OWN NEURAL NETWORK: IMPROVING NEURAL NETWORK The previous 100 nodes gives us 96.7% accuracy. Increasing the hidden layer to 200 nodes boosts the performance to 97.5%. We've broken the 97% barrier! Actually, 500 hidden nodes gives us 97.6% accuracy, a small improvement, but at the cost of a much larger number of calculations and time taken to do them. So for me, 200 is the sweetspot.
MAKE YOUR OWN NEURAL NETWORK: GRADIENT DESCENT UNSTABLE If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: MAKE YOUR FIRST GAN WITH Make Your First GAN with PyTorch is now available! Amazon printed edition: https://www.amazon.com/dp/B085RNKXPD . All code is on gi MAKE YOUR OWN NEURAL NETWORK: 2017 We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent: optimiser = torch.optim.SGD(net.parameters (), lr=0.1) We feed this optimiser the adjustable parameters of our neural network, and we MAKE YOUR OWN NEURAL NETWORK: MAY 2016 3. Complex Neural Nodes. Traditional neural network nodes do two things. They sum up the incoming signals, moderated by the link weights, and they then use an activation function to produce an output signal. That activation function has historically been S-shaped or step-shaped to reflect how we thought biological neurons worked. MAKE YOUR OWN NEURAL NETWORK: BACKPROPAGATION PART 2/3 The derivative of o with respect to w is Δo/Δw and is simply Δo/Δw = i. This is simple school level calculus. Let's rearrange that expression so that we have the output and weights on different sides of the equation. That becomes Δo = Δw.i and this simply says that a small change in w multiplied by i gives the small change in o. MAKE YOUR OWN NEURAL NETWORK: BACKPROPAGATION 3/3 Nice and neat! You could work this out using all the algebra but that's too boring for a blog - you see the few steps of algebra here. But we actually want the relationship between the output f and the weig hts aka Δf/ Δw.That's ok - and easy - because the weights are independent of each other. MAKE YOUR OWN NEURAL NETWORK If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: 2020 If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: COMPLEX VALUED NEURAL If we have categories, rather than a continuum of input values, then it makes sense to divide the unit circle into sectors. For example, insect body lengths are a continuum, and so can map to the unit circlefairly easily.
MAKE YOUR OWN NEURAL NETWORK: YOUR OWN HANDWRITING The 4 and 5 are my own handwriting using different "pens". The 2 is a traditional textbook or newspaper two but blurred. The 3 is my own handwriting but with bits deliberately taken out to MAKE YOUR OWN NEURAL NETWORK: IMPROVING NEURAL NETWORK The neural network we've developed is simple. It has only one hidden layer. The activation function is simple. The process is straightforward. We've not done anything particularly fancy - though there are techniques and optimisations to boost performance. MAKE YOUR OWN NEURAL NETWORK: BIAS NODES IN NEURAL NETWORKS It is worth experimenting to determine whether you need a bias node to augment the input layer, or whether you also need one to augment the internal hidden layers. MAKE YOUR OWN NEURAL NETWORK: WHY ACTIVATION FUNCTIONS In the previous post we looked at the basic of a neural network, the node, and how it works. For simplicity we used a linear function for the activation function - but for real neural networks we can't do this. Why not? Let's step through the reasoning step by step. MAKE YOUR OWN NEURAL NETWORK: LEARNING MNIST WITH GPU You can see that this new tensor x is created on the GPU, it is shown as GPU 0, as there can be more. If we perform a calculation on x, it is actually varied out on the same GPU 0, and if the results are assigned to a new variable, they are also stored on the same GPU. MAKE YOUR OWN NEURAL NETWORK: THE MNIST DATASET OF In the machine learning community common data sets have emerged. Having common datasets is a good way of making sure that different ideas can be tested and compared in a meaningful way - because the data they are tested against is the same. MAKE YOUR OWN NEURAL NETWORK: ERROR BACKPROPAGATION REVISTED This additional heuristic is the same as the previous very simple one - but with an attempt to apply some kind of normalisation. We want to see if the lack of a normalisation in the simple heuristic has a negative effect on performance. MAKE YOUR OWN NEURAL NETWORK If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: 2020 If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (x,y) = (0,0).At this point, if one player sets x = 0, the second player can’t affect the the value of f no matter what value of y is chosen. The same applies if y = 0, no value of x can changethe value of f.
MAKE YOUR OWN NEURAL NETWORK: COMPLEX VALUED NEURAL If we have categories, rather than a continuum of input values, then it makes sense to divide the unit circle into sectors. For example, insect body lengths are a continuum, and so can map to the unit circlefairly easily.
MAKE YOUR OWN NEURAL NETWORK: YOUR OWN HANDWRITING The 4 and 5 are my own handwriting using different "pens". The 2 is a traditional textbook or newspaper two but blurred. The 3 is my own handwriting but with bits deliberately taken out to MAKE YOUR OWN NEURAL NETWORK: IMPROVING NEURAL NETWORK The neural network we've developed is simple. It has only one hidden layer. The activation function is simple. The process is straightforward. We've not done anything particularly fancy - though there are techniques and optimisations to boost performance. MAKE YOUR OWN NEURAL NETWORK: BIAS NODES IN NEURAL NETWORKS It is worth experimenting to determine whether you need a bias node to augment the input layer, or whether you also need one to augment the internal hidden layers. MAKE YOUR OWN NEURAL NETWORK: WHY ACTIVATION FUNCTIONS In the previous post we looked at the basic of a neural network, the node, and how it works. For simplicity we used a linear function for the activation function - but for real neural networks we can't do this. Why not? Let's step through the reasoning step by step. MAKE YOUR OWN NEURAL NETWORK: LEARNING MNIST WITH GPU You can see that this new tensor x is created on the GPU, it is shown as GPU 0, as there can be more. If we perform a calculation on x, it is actually varied out on the same GPU 0, and if the results are assigned to a new variable, they are also stored on the same GPU. MAKE YOUR OWN NEURAL NETWORK: THE MNIST DATASET OF In the machine learning community common data sets have emerged. Having common datasets is a good way of making sure that different ideas can be tested and compared in a meaningful way - because the data they are tested against is the same. MAKE YOUR OWN NEURAL NETWORK: ERROR BACKPROPAGATION REVISTED This additional heuristic is the same as the previous very simple one - but with an attempt to apply some kind of normalisation. We want to see if the lack of a normalisation in the simple heuristic has a negative effect on performance. MAKE YOUR OWN NEURAL NETWORK: 2017 Python to R Translation Having read through Make your own Neural Network (and indeed made one myself) I decided to experiment with the Python code and write a translation into R.Having been involved in statistical computing for many years I’m always interested in seeing how different languages are used and where they can be best utilised. MAKE YOUR OWN NEURAL NETWORK: IMPROVING NEURAL NETWORK The neural network we've developed is simple. It has only one hidden layer. The activation function is simple. The process is straightforward. We've not done anything particularly fancy - though there are techniques and optimisations to boost performance. MAKE YOUR OWN NEURAL NETWORK: YOUR OWN HANDWRITING The 4 and 5 are my own handwriting using different "pens". The 2 is a traditional textbook or newspaper two but blurred. The 3 is my own handwriting but with bits deliberately taken out to MAKE YOUR OWN NEURAL NETWORK: 2016 Let’s work through a couple of examples with numbers, just to see this weight update method working. The following network is the one we worked with before, but this time we’ve added example output values from the first hidden node o j=1 and the second hidden node o j=2.These are just made up numbers to illustrate the method and aren’t worked out properly by feeding forward signals MAKE YOUR OWN NEURAL NETWORK: A GENTLE INTRODUCTION TO This post is taken from a draft Appendix to the upcoming Make Your Own Neural Network. It aims to introduce the idea of calculus and differentiation in a gentle manner, sufficient to understand the backpropagation algorithm used to train neural networks. MAKE YOUR OWN NEURAL NETWORK: LEARNING MNIST WITH GPU You can see that this new tensor x is created on the GPU, it is shown as GPU 0, as there can be more. If we perform a calculation on x, it is actually varied out on the same GPU 0, and if the results are assigned to a new variable, they are also stored on the same GPU. MAKE YOUR OWN NEURAL NETWORK: MAY 2016 If we have categories, rather than a continuum of input values, then it makes sense to divide the unit circle into sectors. For example, insect body lengths are a continuum, and so can map to the unit circlefairly easily.
MAKE YOUR OWN NEURAL NETWORK: JANUARY 2016 We've only derived the weight update expression for the link weights between the hidden and output layer of a node. The logic to extend this to previous layers is no different to normal neural network backpropagation - the errors are split backwards across links in proportion to the link weight, and recombined at each hidden layer node.This is illustrated as follows with example numbers: MAKE YOUR OWN NEURAL NETWORK: BACKPROPAGATION PART 2/3 As is all too common in mathematics, you can get an insight into a function by looking at how it depends on a variable.You'll remember this is simply calculus - MAKE YOUR OWN NEURAL NETWORK: BACKPROPAGATION 3/3 Nice and neat! You could work this out using all the algebra but that's too boring for a blog - you see the few steps of algebra here. But we actually want the relationship between the output f and the weig hts aka Δf/ Δw.That's ok - and easy - because the weights are independent of each other. MAKE YOUR OWN NEURAL NETWORK THURSDAY, 5 MARCH 2020 GRADIENT DESCENT UNSTABLE FOR GANS? When training neural networks we use GRADIENT DESCENT to find a path down a loss function to find the combination of learnable parameters that minimise the error. This is a very well researched area and techniques today are very sophisticated, the Adam optimiser being agood example.
The dynamics of a GAN are different to a simple neural network. The generator and discriminator networks are trying to achieve opposing objectives. There are parallels between a GAN and adversarial games where one player is trying to maximise an objective while the other is trying to minimise it, each undoing the benefit of the opponent’sprevious move.
Is the gradient descent method of finding the correct, or even good enough, combination of learnable parameters suitable for such adversarial games? This might seem like an unnecessary question, but the answer is rather interesting. SIMPLE ADVERSARIAL EXAMPLE The following is a very simple objective function: One player has control over the values of X and is trying to maximise the objective F. A second player has control over Y and is trying to minimise the objective F. Let’s visualise this function to get a feel for it. The following picture shows a surface plot of F = X·Y from three slightly differentangles.
We can see that the surface of F = X·Y is a SADDLE. That means, along one direction the values rise then fall, but in another direction, the values fall then rise. The following picture shows the same function from above, using colours to indicate the values of F. Also marked are the directions of increasing gradient. If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at (X,Y) = (0,0). At this point, if one player sets X = 0, the second player can’t affect the the value of F no matter what value of y is chosen. The same applies if Y = 0, no value of X can change the value of F. The actual value of f at this point is also the best compromise. Elsewhere there are as many higher values of f as there are lower, so it seems like a good compromise. You can explore the surface interactively yourself using theMATH3D.ORG website:
* https://www.math3d.org/wz85eIlP * https://www.math3d.org/x6xNjkaR Let’s now move away from intuition and work out the answer by simulating both players using gradient descent, each trying to find a good solution for themselves. You’ll remember from _Make Your Own Neural Network_ that parameters are adjusted by a small amount that depends on the gradient of theobjective function.
The reason we have different signs in these UPDATE RULES is that Y is trying to minimise F by moving down the gradient, but X is trying to maximise F by moving up the gradient. That lr is the usual learningrate.
Because we know F = X·Y we can write those update rules with the gradients worked out. We can write some code to pick starting values for X and Y, and then repeatedly apply these update rules to get successive X and Y values. The following shows how X and Y evolve as training progresses. We can see that the values of X and Y don’t converge, but oscillate with ever greater amplitude. Trying different starting values leads to the same behaviour. Reducing the learning rate merely delays the inevitable DIVERGENCE. This is bad. It shows that gradient descent can’t find a good solution to this simple adversarial game, and even worse, the method leads to disastrous divergence. The following picture shows X and Y plotted together. We can see the values orbit around the ideal point (0,0) but run away from it. It can be shown mathematically (see below) that the best case scenario is that (X,Y) orbits in a fixed circle around the (0,0) without getting closer to it, but this is only when the update step is infinitesimally small. As soon we have a finite step size, as we do when approximate that continuous process in discrete steps, the orbitdiverges.
You can explore the code which plays this adversarial game using gradient descent here:*
https://github.com/makeyourownneuralnetwork/gan/blob/master/Appendix_D_convergence.ipynb GRADIENT DESCENT ISN’T IDEAL FOR ADVERSARIAL GAMES We’ve shown that gradient descent fails to find a solution to an adversarial game with a very simple objective function. In fact, it doesn’t just fail to find a solution, it catastrophically diverges. In contrast, gradient descent used in the normal way to minimise a function is guaranteed to find a minimum, even if it isn’t theglobal minimum.
Does this mean GAN training will fail in general? No. Realistic GANs with meaningful data will have much more complex loss functions, and that can reduce the chances of runaway divergence. That’s why GAN training throughout this book has worked fairly well. But this analysis does indicate why training GANs is hard, and can become chaotic. Orbiting around a good solution might also explain why some GANs seem to progress onto different modes of single-mode collapse with extended training rather than improving the quality ofimages themselves.
Fundamentally, gradient descent is the wrong approach for GANs, even if it works well enough in many cases. Finding optimisation techniques designed for adversarial dynamics like those in GANs is currently an open research question, with some researchers already publishing encouraging results. WHY A CIRCULAR ORBIT? Above we stated that (X,Y) orbits as a circle when two players each use gradient descent to optimise F = X·Y in opposite directions. Here we’ll do the maths to show why it is a circle. Let’s look at the update rules again. If we want to know how X and Y evolve over time T, we can write: If we take the second derivatives with respect to T, we get thefollowing.
You may remember from school algebra that expressions of the form D2Y/DT2 = - A2X have a solution the form Y = SIN(AT) or Y = COS(AT). To satisfy the first derivatives above, we need X and Y to be the following combination. These describe (X,Y) moving around a unit circle with angular speedLR.
Posted by MYO NeuralNetat 14:57
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
MONDAY, 17 FEBRUARY 2020 CALCULATING THE OUTPUT SIZE OF CONVOLUTIONS AND TRANSPOSECONVOLUTIONS
CONVOLUTION is common in neural networks which work with images, either as classifiers or as generators. When designing such convolutional neural networks, the shape of data emerging from each convolution layer needs to be workedout.
Here we’ll see how this can be done step-by-step with configurations of convolution that we’re likely to see working with images. In particular, TRANSPOSED CONVOLUTIONS are thought of as difficult to grasp. Here we’ll show that they’re not difficult at all by working though some examples which all follow a very simple recipe. EXAMPLE 1: CONVOLUTION WITH STRIDE 1, NO PADDING In this first simple example we apply a 2 BY 2 kernelto an input
of size 6 BY 6, with stride 1. The picture shows how the kernel moves along the image in steps of size 1. The areas covered by the kernel do overlap but this is not a problem. Across the top of the image, the kernel can take 5 positions, which is why the output is 5 wide. Down the image, the kernel can also take 5 positions, which is why the output is a 5 BY 5 square. Easy! The PyTorch function for this convolution is: NN.CONV2D(IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE=2, STRIDE=1) EXAMPLE 2: CONVOLUTION WITH STRIDE 2, NO PADDING This second example is the same as the previous one, but we now have astride of 2.
We can see the kernel moves along the image in steps of size 2. This time the areas covered by the kernel don’t overlap. In fact, because the kernel size is the same as the stride, the image is covered without overlaps or gaps. The kernel can take 3 positions across and down the image, so the output is 3 BY 3. The PyTorch function for this convolution is: NN.CONV2D(IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE=2, STRIDE=2) EXAMPLE 3: CONVOLUTION WITH STRIDE 2, WITH PADDING This third example is the same as the previous one, but this time weuse a padding of 1.
By setting padding to 1, we extend all the image edges by 1 pixel, with values set to 0. That means the image width has grown by 2. We apply the kernel to this extended image. The picture shows the kernel can take 4 positions across the image. This is why the output is 4 BY4.
The PyTorch function for this convolution is: NN.CONV2D(IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE=2, STRIDE=2,PADDING=2)
EXAMPLE 4: CONVOLUTION WITH COVERAGE GAPS This example illustrates the case where the chosen kernel size and stride mean it doesn’t reach the end of the image. Here, the 2 BY 2 kernel moves with a step size of 2 over the 5 BY 5 image. The last column of the image is not covered by the kernel. The easiest thing to do is to just ignore the uncovered column, and this is in fact the approach taken by many implementations, including PyTorch. That’s why the output is 2 BY 2. For medium to large images, the loss of information from the very edge of the image is rarely a problem as the meaningful content is usually in the middle of the image. Even if it wasn’t, the fraction of information lost is very small. If we really wanted to avoid any information being lost, we’d adjust some of the option. We could add a padding to ensure no part of the input image was missed, or we could adjust the kernel and stride sizes so they matches the image size. EXAMPLE 5: TRANSPOSE CONVOLUTION WITH STRIDE 2, NO PADDING The transpose convolution is commonly used to expand a tensor to a larger tensor. This is the opposite of a normal convolution which is used to reduce a tensor to a smaller tensor. In this example we use a 2 BY 2 kernel again, set to stride 2, appliedto a 3 BY 3 input.
The process for transposed convolution has a few extra steps but isnot complicated.
First we create an intermediate grid which has the original input’s cells spaced apart with a step size set to the stride. In the picture above, we can see the pink cells spaced apart with a step size of 2. The new in-between cells have value 0. Next we extend the edges of the intermediate image with additional cells with value 0. We add the maximum amount of these so that a kernel in the top left covers one of the original cells. This is shown in the picture at the top left of the intermediate grid. If we added another ring of cells, the kernel would no longer cover the originalpink cell.
Finally, the kernel is moved across this intermediate grid in step sizes of 1. This step size is always 1. The stride option is used to set how far apart the original cells are in the intermediate grid. Unlike normal convolution, here the stride is not used to decide howthe kernel moves.
The kernel moving across this 7 BY 7 intermediate grid gives us anoutput of 6 BY 6.
Notice how this transformation of a 3 BY 3 input to a 6 BY 6 output is the opposite of EXAMPLE 2 which transformed an input of size 6 BY 6 to an output of size 3 BY 3, using the same kernel size and strideoptions.
The PyTorch function for this transpose convolution is: NN.CONVTRANSPOSE2D(IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE=2, STRIDE=2) EXAMPLE 6: TRANSPOSE CONVOLUTION WITH STRIDE 1, NO PADDING In the previous example we used a stride of 2 because it is easier to see how it is used in the process. In this example we use a stride of1.
The process is exactly the same. Because the stride is 1, the original cells are spaced apart without a gap in the intermediate grid. We then grow the intermediate grid with the maximum number of additional outer rings so that a kernel in the top left can still cover one of the original cells. We then move the kernel with step size 1 over this intermediate 7 BY 7 grid to give an output of size 6 BY 6. You’ll notice this is the opposite transformation to EXAMPLE 1. The PyTorch function for this transpose convolution is: NN.CONVTRANSPOSE2D(IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE=2, STRIDE=1) EXAMPLE 7: TRANSPOSE CONVOLUTION WITH STRIDE 2, WITH PADDING In this transpose convolution example we introduce padding. Unlike the normal convolution where padding is used to expand the image, here it is used to reduce it. We have a 2 BY 2 kernel with stride set to 2, and an input of size 3 BY 3, and we have set padding to 1. We create the intermediate grid just as we did in EXAMPLE 5. The original cells are spaced 2 apart, and the grid is expanded so that the kernel can cover one of the original values. The padding is set to 1, so we remove 1 ring from around the grid. This leaves the grid at size 5 BY 5. Applying the kernel to this grid gives us an output of size 4 BY 4. The PyTorch function for this transpose convolution is: NN.CONVTRANSPOSE2D(IN_CHANNELS, OUT_CHANNELS, KERNEL_SIZE=2, STRIDE=2,PADDING=1)
CALCULATING OUTPUT SIZES Assuming we’re working with square shaped input, with equal width and height, the formula for calculating the output size for aconvolution is:
The L-shaped brackets take the mathematical floor of the value inside them. That means the largest integer below or equal to the given value. For example, the floor of 2.3 is 2. If we use this formula for EXAMPLE 3, we have INPUT SIZE = 6, PADDING = 1, KERNEL SIZE = 2. The calculation inside the floor brackets is (6 + 2 - 1 -1) /2 + 1, which is 4. The floor of 4 remains 4, which is thesize of the output.
Again, assuming square shaped tensors, the formula for transposedconvolution is:
Let’s try this with EXAMPLE 7, where the INPUT SIZE = 3, STRIDE = 2, PADDING = 1, KERNEL SIZE = 2. The calculation is then simply 2*2 - 2 + 1 + 1 = 4, so the output is of size 4. On the PyTorch references pages you can read about more general formulae, which can work with rectangular tensors and also additional configuration options we’ve not needed here. * NN.CONV2D https://pytorch.org/docs/stable/nn.html#conv2d * NN.CONVTRANSPOSE2D https://pytorch.org/docs/stable/nn.html#convtranspose2dMORE READING
* Convolutional neural networks:Â https://en.wikipedia.org/wiki/Convolutional_neural_network * Convolutions in image classification and generation:Â http://makeyourownalgorithmicart.blogspot.com/2019/06/generative-adversarial-networks-part-iv.html Posted by MYO NeuralNetat 09:28
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
WEDNESDAY, 12 SEPTEMBER 2018 APPLICATION OF NEURAL NETWORKS - SATELLITE MEASUREMENT OF WATER WAVES It's always great to see interesting uses of machine learning methods - and especially satisfying to see someone inspired by my book toapply the methods.
I was privileged to have an initial discussion with Dennis when he was planning on applying neural networks to the task of classifying water waveforms measured by radar from a satellite orbiting the Earth. He went on to succeed and presented his work at a well respectedconference
.
You can see his presentation slides here:ALTIMETRY
Satellite radar is used to measure the altitude (height) of surface features - which can be both land and water. The signal needs to be interpreted and so that: * we can establish if the surface is land or water * and if water, calculate the height of the water waves from the non-trivial signal patternLAND OR WATER?
A neural network was trained to determine whether the signal was fromland or water.
As you can see from the slide above, the signal signature is verydifferent.
A neural network was very successful in detecting water. Detecting land was a little more challenging but this initial work showed greatpromise.
WATER WAVE HEIGHT
The next step is to calculate the height of the water waves. In-situ measurements were used as reference data to train a different neuralnetwork.
Part of the challenge for a neural network is that there are several peaks that can be detected during a measurement, and we want the highest peak of a wave. Tracking a peak as it moves allows us to have a higher level of confidence in labelling it a water wave peak.RESULTS
The results are promising with some areas identified for furtherwork.
The following shows how good the calculated water wave heights are based on automatic analysis by neural networks. The first area for improvement is detecting land where the accuracy rate is lower than it is for water. The second area for further work is to the resolve the "delay" visible in the calculated heights. This is not a major issue in this application as the height and shape are more important than the horizontal displacement / phase. The following shows more challenging wave forms. A good next challenge is to automate the detection of the correct peak, and neural network architectures that take into account a sequence of data - such as RECURRENT NEURAL NETWORKS- can help in
these scenarios.
Posted by MYO NeuralNetat 04:51
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
TUESDAY, 22 MAY 2018 IMAGEIO.IMREAD() REPLACES SCIPY.MISC.IMREAD() Some of the code we wrote reads data from image files using a helper function SCIPY.MISC.IMREAD().
However, recently, users were notified that this function isdeprecated:
We're encouraged to use the IMAGEIO.IMREAD()function instead.
FROM IMREAD() TO IMREAD() The change is very easy. We first change the import statements which include the helper library.From this:
IMPORT SCIPY.MISC
To this:
IMPORT IMAGEIO
We then change the actual function which reads image data from files.From this form:
IMG_ARRAY = SCIPY.MISC.IMREAD(IMAGE_FILE_NAME, FLATTEN=TRUE)To this form:
IMG_ARRAY = IMAGEIO.IMREAD(IMAGE_FILE_NAME, AS_GRAY=TRUE)Easy!
We can see the new function is used in a very similar way. We still provide the name of the image file we want to read into a array ofdata.
Previously we used FLATTERN=TRUE to convert the image pixels into a greyscale value, instead of having separate numbers for the red, green, blue and maybe alpha channels. We now use AS_GREY=TRUE which does the same thing. I thought we might have to mess about with inverting number ranges from 0-255 to 255-0 but it seems we don't need to.GITHUB CODE UPDATED
The notebooks which use IMREAD() have been updated on the main githubrepository.
This does mean the code is slightly different to that described in the book, but the change should be easy to understand until a new version of the book is released. Posted by MYO NeuralNetat 08:10
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
WEDNESDAY, 16 MAY 2018 ONLINE INTERACTIVE COURSE BY EDUCATIVE.IO I've been really impressed with educative.io who took the content for Make Your OwnNeural Network
and developed a beautifully designed interactive online course. The course breaks the content down into digestible bite-size chunks, and the interactivity is really helpful to the process of learning through hands-on experimentation and play.Have a go!
Posted by MYO NeuralNetat 07:43
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
WEDNESDAY, 7 FEBRUARY 2018 SAVING AND LOADING NEURAL NETWORKS A very common question I get is how to SAVE a neural network, and LOAD it again later.WHY SAVE AND LOAD?
There are two key scenarios when being able to save and load a neuralnetwork are useful.
* During a long training period it is sometimes useful to STOP AND CONTINUE at a later time. This might be because you're using a laptop which can't remain on all the time. It could be because you want to stop the training and test how well the neural network performs. Being able to resume training at a different time is really helpful. * It is useful to SHARE your trained neural network with others. Being able to save it, and for someone else to load it, is necessaryfor this to work.
WHAT DO WE SAVE?
In a neural network the thing that is doing the learning are the link weights. In our Python code,
these are represented by matrices like WIH and WHO. The WIH matrix contains the weights for the links between the input and hidden layer, and the WHO matrix contains the weights for the links between the hidden and output layer. If we save these matrices to a file, we can load them again later. That way we don't need to restart the training from the beginning.SAVING NUMPY ARRAYS
The matrices WIH and WHO are NUMPY arrays. Luckily the NUMPY library provides convenience functions for saving and load them. The function to save a numpy array is NUMPY.SAVE(FILENAME, ARRAY). This will store ARRAY in FILENAME. If we wanted to add a method to our NEURALNETWORK class, we could do it simply it like this: # save neural network weightsdef save(self):
 numpy.save('saved_wih.npy', self.wih)  numpy.save('saved_who.npy', self.who) pass
This will save the WIH matrix as a file SAVED_WIH.NPY, and the WIH matrix as a file SAVED_WIH.NPY. If we want to stop the training we can issue N.SAVE() in a notebook cell. We can then close down the notebook or even shut down the computer if we need to. LOADING NUMPY ARRAYS To load a numpy array we use ARRAY = NUMPY.LOAD(FILENAME). If we want to add a method to our neuralNetwork class, we should use the filenames we used to save the data. # load neural network weightsdef load(self):
 self.wih = numpy.load('saved_wih.npy')  self.who = numpy.load('saved_who.npy') pass
If we come back to our training, we need to run the notebook up to the point just before training. That  means running the Python code that sets up the neural network class, and sets the various parameters like the number of input nodes, the data source filenames, etc. We can then issue N.LOAD() in a notebook cell to load the previously saved neural networks weights back into the neural network object N.GOTCHAS
We've kept the approach simple here, in line with our approach to learning about and coding simple neural networks. That means there are some things our very simple network saving and loading code doesn'tdo.
Our simple code only saves and loads the two WIH and WHO weights matrices. It doesn't do anything else. It doesn't check that the loaded data matches the desired size of neural network. We need to make sure that if we load a saved neural network, we continue to use it with the same parameters. For example, we can't train a network, pause, and continue with different settings for the number of nodes ineach layer.
If we want to share our neural network, they need to also be running the same Python code. The data we're passing them isn't rich enough to be independent of any particular neural network code. Efforts to develop such an open inter-operable data standard have started, for example the Open Neural Network Exchange Format . HDF5 FOR VERY LARGE DATA In some cases, with very large networks, the amount of data to be saved and loaded can be quite big. In my own experience from around 2016, the normal saving of bumpy arrays in this was didn't always work. I then fell back to a slightly more involved method to save and load data using the very mature HDF5 data format , popular in scienceand engineering.
The Anaconda Python distribution allows you to install the H5PY package, which gives Python the ability to work with HDF5data.
HDF5 data stores do more than the simple data saving and loading. They have the idea of a group or folder which can contain several data sets, such as numpy arrays. The data stores also keep account of data set names, and don't just blindly save data. For very large data sets, the data can be traverse and segmented on-disk without having to load it all into memory before subsets are taken. You can explore more here:Â http://docs.h5py.org/en/latest/quick.html#quick Posted by MYO NeuralNetat 13:31
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
TUESDAY, 23 MAY 2017 LEARNING MNIST WITH GPU ACCELERATION - A STEP BY STEP PYTORCHTUTORIAL
I'm often asked why I don't talk about neural network frameworks likeTensorflow , Caffe
, or Theano
.
REASONS FOR NOT USING FRAMEWORKS I avoided these frameworks because the main thing I wanted to do was to learn how neural networks actually work. That includes learning about the core concepts and the maths too. By creating our own neural networks code, from scratch, we can really start to understand them, and the issues that emerge when trying to apply them to real problems. We don't get that learning and experience if we only learned how to use someone else's library. REASONS FOR USING FRAMEWORKS - GPU ACCELERATION But there are some good reasons for using such frameworks, after you've learned about how neural networks actually work. One reason is that you want to take advantage of the special hardware in some computers, called a GPU, to
accelerate the core calculations done by a neural network. The GPU - graphics processing unit - was traditionally used to accelerate calculations to support rich and intricate graphics, but recently that same special hardware has been used to accelerate machine learning. The normal brain of a computer, the CPU, is good at doing all kinds of tasks. But if your tasks are matrix multiplications, and lots of them in parallel, for example, then a GPU can do that kind of work much faster. That's because they have lots and lots of computing cores, and very fast access to locally stored data. Nvidia has a page explaining the advantage, with a fun video too - link. But
remember, GPU's are not good for general purpose work, they're just really fast at a few specific kinds of jobs. The following illustrates a key difference between general purpose CPUs and GPUs with many, more task-specific, compute cores: GPU's have hundreds of cores, compared to a CPU's 2, 4 or maybe 8. Writing code to directly take advantage of GPU's is not fun, currently. In fact, it is extremely complex and painful. And very very unlike the joy of easy coding with Python. This is where the neural network frameworks can help - they allows you to imagine a much simpler world - and write code in that word, which is then translated into the complex, detailed, and low-level nuts-n-bolts code that the GPUs need. There are quite a few neural network frameworks out there .. but comparing them can be confusing. There are a few good comparisons and discussions on the web like this one - link.
PYTORCH
I'm going to use PYTORCH for three mainreasons:
* It's largely vendor independent. Tensorflow has a lot of momentum and interest, but is very much a Google product. * It's designed to be Python - not an ugly and ill-fitting Python wrap around something that really isn't Python. Debugging is also massively easier if what you're debugging is Python itself. * It's simple and light - preferring simplicity in design, working naturally with things like the ubiquitous numpy arrays, and avoiding hiding too much stuff as magic, something I really don't like. Some more discussion of PyTorch can be found here - link.
WORKING WITH PYTORCH To use PyTorch, we have to understand how it wants to be worked with. This will be a little different to the normal Python and numpy worldwe're used to.
The main ideas are:
* build up your network architecture using the building blocks provided by PyTorch - these are things like layers of nodes and activation functions. * you let PyTorch automatically work out how to back propagate the error - it can do this for any of the building blocks it provides, which is really convenient. * we train the network in the normal way, and measure accuracy as usual, but pytorch provides functions for doing this. * to make use of the GPU, we configure a setting to and push the neural network weight matrices to the GPU, and work on them there. We shouldn't try to replicate what we did with our pure Python (and bumpy) neural network code - we should work with PyTorch in the way it was designed to be used. A key part of this auto differentiation. Let's look at that next. AUTO DIFFERENTIATION A powerful and central part of PyTorch is the ability to create neural networks, chaining together different elements - like activation functions, Â convolutions, and error functions - and for PyTorch to work out the error gradients for the various parameters wewant to improve.
That's quite cool if it works! Let's see it working. Imagine a simple parameter yy which depends on another input variable xx. Imagine thaty=x2+5x+2y=x2+5x+2
Let's encode this in PyTorch:import torch
from torch.autograd import Variable x = Variable(torch.Tensor(), requires_grad=True) y = (x**2) + (5*x) + 2 Let's look at that more slowly.  First we import torch, and also the VARIABLE from torch.autograd, the auto differentiation library. Variable is important because we need to wrap normal Python variables with it, so that PyTorch can do the differentiation. It can't do it with normal Python variables like a = 10, or b = 5*a. VARIABLES include links to where the variables came from - so that if one depends on another, PyTorch can do the correct differentiation. We then create X as a VARIABLE. You can see that it is a simple tensor of trivial size, just a single number, 2.0. We also signal that it requires a gradient to be calculated. A TENSOR? Think of it as just a fancy name for multi-dimensional matrices. A 2-dimensional tensor is a matrix that we're all familiar with, like bumpy arrays. A 1-dimensional tensor is like a list. A 0-dimensional one is just a single number. When we create a torch.Tensor() w'ere just creating a single number. We then create the next VARIABLE called Y. That looks like a normal Python variable by the way we've created it .. but it isn't, because it is made from X, which is a PyTorch VARIABLE. Remember, the magic that VARIABLE brings is that when we define Y in terms of X, the definition of Y remembers this, so we can do proper differentiation on it with respect to X. So let's do the differentiation!y.backward()
That's it. That all that is required to ask PyTorch to use what it knows about Y and all the VARIABLEs it depends on to work out how todifferentiate it.
Let's see if it did it correctly. Remember that x=2x=2 so we're askingfor
δyδx∣∣x=2=2x+5=9δyδx|x=2=2x+5=9 This is how we ask for that to be done.x.grad
Let's see how all that works out: It works! You can also see how Y is shown as type VARIABLE, not justX.
So that's cool. And that's how we define our neural network, using elements that PyTorch provides us, so it can automatically work outerror gradients.
LET'S DESCRIBE OUR SIMPLE NEURAL NETWORK Let's look at some super-simple skeleton code which is a common starting point for many, if not all, PyTorch neural networks.import torch
import torch.nn
class NeuralNetwork(torch.nn.Module): Â def __INIT__(SELF):Â Â Â ....
   pass
 def FORWARD(SELF, INPUTS):   ....
   return outputs net = NeuralNetwork()INHERITANCE
The neural network class is derived from TORCH.NN.MODULE which brings with it the machinery of a neural network including the training and querying functions - see herefor the
documentation.
There is a tiny bit of boilerplate code we have to add to our initialisation function __INIT__() .. and that's calling the initialisation of the class it was derived from. That should be the __init__() belonging to torch.nn.Module. The clean way to do this isto use SUPER()
:
 def __init__(self):    # call the base class's initialisation too    SUPER().__INIT__()   pass
We're not finished yet. When we create an object from the NeuralNetwork class, we need to tell it at that time what shape it will be. We're sticking with a simple 3-layer design .. so we need to specify how many nodes there are at the input, hidden and output layers. Just like our pure Python example, we pass this information to the __INIT__() function. We might as well create these layers during the initialisation. Our __INIT__() now looks like this:  def __INIT__(SELF, INODES, HNODES, ONODES):    # call the base class's initialisation too    super().__init__() Â
   # define the layers and their sizes, turn off bias    SELF.LINEAR_IH = NN.LINEAR(INODES, HNODES, BIAS=FALSE)    SELF.LINEAR_HO = NN.LINEAR(HNODES, ONODES, BIAS=FALSE) Â
   # define activation function    SELF.ACTIVATION = NN.SIGMOID()   pass
The NN.LINEAR() module is the thing that creates the relationship between one layer and another and combines the network signals in a linear way .. which is what we did in our pure Python code. Because this is PyTorch, that NN.LINEAR() creates a parameter that can be adjusted .. the link weights that we're familiar with. You can read more NN.LINEAR() about it here.
We also create the activation function we want to use, in this case the logistic sigmoid function. Note, we're using the one provided by TORCH.NN, not making our own. Note that we're not using these PyTorch elements yet, we're just defining them because we have the information about the number of input, hidden and output nodes. FORWARD We have to over-ride the FORWARD() function in our neural network class. Remember, that BACKWARD() is provided automatically, but can only work if PyTorch knows how we've designed our neural network - how many layers, what those layers are doing with activation functions, what the error function is, etc. So let's create a simple FORWARD() function WHICH IS THE DESCRIPTION OF THE NETWORK ARCHITECTURE. Our example will be really simple, just like the one we created with pure Python to learn the MNIST dataset.  def FORWARD(self, INPUTS_LIST):    # convert list to Variable    INPUTS = VARIABLE(INPUTS_LIST) Â
   # combine input layer signals into hidden layer    HIDDEN_INPUTS = SELF.LINEAR_IH(INPUTS)    # apply sigmiod activation function    HIDDEN_OUTPUTS = SELF.ACTIVATION(HIDDEN_INPUTS) Â
   # combine hidden layer signals into output layer    FINAL_INPUTS = SELF.LINEAR_HO(HIDDEN_OUTPUTS)    # apply sigmiod activation function    FINAL_OUTPUTS = SELF.ACTIVATION(FINAL_INPUTS) Â
   return final_outputs You can see the first thing we do is convert the list of numbers, a Python list, into a PyTorch VARIABLE.  We must do this, otherwise PyTorch won't be able to calculate the error gradient later. The next section is very familiar, the combination of signals at each node, in each layer, followed immediately by the activation function. Here we're using the NN.LINEAR() elements we defined above, and the activation function we defined earlier, using the TORCH.NN.SIGMOID() provided by PyTorch.ERROR FUNCTION
Now that we've defined the network, we need to define the error function. This is an important bit of information because it defines how we judge the correctness of the neural network, and wrong-ness is used to update the internal parameters during training. There are any error functions that people use, some better for some kinds of problems than others. We'll use the really simple one we developed for the pure Python network, the squared error function. It looks like the following. error_function = TORCH.NN.MSELOSS(size_average=False) We've set the size_average parameter to False to avoid the error function dividing by the size of the target and desired vectors.OPTIMISER
We're almost there. We've just defined the error function, which means we know how far wrong the neural network is during training. We know that PyTorch can calculate the error gradients for eachparameter.
When we created our simple neural network, we didn't think too much about different ways of improving the parameters based on the error function and error gradients. We simply descended down the gradients a small bit. And that is simple, and powerful. Actually there are many refined and sophisticated approaches to doing this step. Some are designed to avoid false minimum traps, others designed to converge as quickly as possible, etc. We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent: optimiser = TORCH.OPTIM.SGD(net.parameters(), lr=0.1) We feed this optimiser the adjustable parameters of our neural network, and we also specify the familiar learning rate as LR. FINALLY, DOING THE UPDATE Finally, we can talk about doing the update - that is, updating the neural network parameters in response to the error seen with eachtraining example.
Here's how we do that FOR EACH TRAINING EXAMPLE: * calculate the OUTPUT for a training data example * use the ERROR FUNCTION to calculate the difference (the LOSS, aspeople call it)
* ZERO GRADIENTS of the optimiser which might be hanging around from a previous iteration * perform AUTOMATIC DIFFERENTIATIOn to calculate new gradients * use the optimiser to UPDATE PARAMETERS based on these newgradients
In code this will look like: for inputs, target in training_set:  output = net(inputs)  # Compute and print loss  loss = error_function(output, target)  print(loss.data)  # Zero gradients, perform a backward pass, and update theweights.
 optimiser.zero_grad() loss.backward()
 optimiser.step() It is a common error not to zero the gradients during each iteration, so keep an eye out for that. I'm not really sure why the default is not to clear them ...THE FINAL CODE
Now that we have all the elements developed and understood, we can rewrite the pure python neural network we developed in the course of Make Your Own Neural Network and throughout this blog. You can find the code as a notebook on GitHub: * https://github.com/ ... /pytorch_neural_network_mnist_data.ipynb The only unusual thing I had to work out was that during the evaluation of performance, we keep a scorecard list, and append a 1 to it if the network's answer matches the known correct answer from the test data set. This comparison needs the actual number to be extracted from the PyTorch tensor via numpy, as follows. We couldn't just say label == correct_label. if (LABEL.DATA == correct_label): The results seem to match our pure python code for performance - no major difference, and we expected that because we've tried to architect the network to be the same. PERFORMANCE COMPARISON ON A LAPTOP Let's compare performance between our simple pure python (with bumpy) code and the PyTorch version. As a reminder, here are the details of the architecture and data: * MNIST training data with 60,000 examples of 28x28 images * neural network with 3 layers: 784 nodes in input layer, 200 in hidden layer, 10 in output layer * learning rate of 0.1 * stochastic gradient descent with mean squared error * 5 training epochs (that is, repeat training data 5 times) * no batching of training data The timing was done with the following python notebook magic command in the cell that contains only the code to train the network. The options ensure only one run of the code, and the -c option ensures unix user time is used to account for other tasks taking CPU time onthe same machine.
%%TIMEIT -N1 -R1 -C
The results from doing this twice eon a MacBook Pro 13 (early 2015), which has no GPU for accelerating the tensor calculations, are: * HOME-MADE SIMPLE PURE PYTHON - 440 SECONDS, 458 SECONDS * SIMPLE PYTORCH VERSION - 841 SECONDS, 834 SECONDS AMAZING! Our own home-made code is about 1.9 times faster .. ROUGHYTWICE AS FAST!
GPU ACCELERATED PERFORMANCE One of the key reasons we chose to invest time learning a framework like PyTorch is that it makes it easy to take advantage of GPU acceleration. So let's try it. I don't have a laptop with a CUDA GPU so I fired up a Google Cloud Compute Instance . Â The specs for mineare:
* n1-highmem-2 (2 vCPUs, 13 GB memory) * Intel Sandy Bridge * 1 x NVIDIA Tesla K80 GPU So we can compare GPU results with CPU results, I ran the above code but this time not as a notebook but a command line script, using the unix TIME command. This will I've us the time to complete the whole program, including the training and testing stages. The results are:real   8m14.387s
user   7m31.223s
sys   8m39.810s
The interpretation of these numbers needs some sophistication, especially if our code has multiple threads, so we'll just stick to the simple real wall-clock time of 8m14s or 494 SECONDS. Now we need to change the code to run on he GPU. First check that CUDA - NVIDIA's GPU acceleration framework - is available to Python and PyTorch:PYTHON
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016,12:22:00)
on linux
Type "help", "copyright", "credits" or "license" for moreinformation.
>>> iMPORT TORCH
>>> TORCH.CUDA.IS_AVAILABLE()True
So CUDA is available. This gave a False on my own home laptop. The overall approach to shifting work from the CPU to the GPU is to shift the tensors there. Hereis the
current (but immature) PyTorch guidance on working with the GPU. To create a Tensor on a GPU we use TORCH.CUDA: >>> X = TORCH.CUDA.FLOATTENSOR()>>> X
1
2
You can see that this new tensor X is created on the GPU, it is shown as GPU 0, as there can be more. If we perform a calculation on x, it is actually varied out on the same GPU 0, and if the results are assigned to a new variable, they are also stored on the same GPU.>>> Y = X**X
>>> Y
1
4
This may not seem like much but is incredibly powerful - yet easy to use, as you've just seen. The changes to the code are minimal: * we move the neural network class to the GPU once we've created itusing N.CUDA()
* the inputs are converted from a list to a PyTorch Tensor, we now use the CUDA variant:Â inputs = Variable(torch.CUDA.FloatTensor(inputs_list).view(1, self.inodes)) * similarly the target outputs are also coverted using this variant:Â target_variable = Variable(torch.CUDA.FloatTensor(targets_list).view(1, self.onodes), requires_grad=False) That's it! Not too difficult at all .. actually that took a day to work out because the PyTorch documentation isn't yet that accessibleto beginners.
The results from the GPU enabled version of the code are:real   6m6.328s
user   5m57.443s
sys   0m13.488s
That is faster at 366 SECONDS. That's about 25% faster. We're seeing some encouraging results. Let's do more runs, just to be scientific and collate the results:CPU GPU.
494 366.
483 372.
451 355.
476.0 364.3
The GPU based network is consistently faster by about 25%. Perhaps we expected the code to be much much faster? Well for such a small network, the overheads corrode the benefits. The GPU approach really shines for much larger networks and data. Let's do a better experiment and compare the PyTorch code in CPU and GPU mode, varying the number of hidden layer nodes. Â Here are theresults:
NODES CPU GPU
200 463 362
1000 803 356
2000 1174 366
5000 3390 518
Visualising this ... We can see now the benefit of a PyTorch using the GPU. As the scale of the network grows (hidden layer nodes here), the time it takes for the GPU to complete training rises very slowly, compared to the CPU doing it, which rises quickly. One one more tweak .. the contributors at GitHub suggested setting an environment variable to control how many CPU threads the task is managed by. See here . In my Google GPU instance I'll set this to OMP_NUM_THREADS=2. The resulting duration is 361 seconds .. so not much improved. We didn't see an improvement when we tried it on the CPU only code, earlier. I did see that less threads were being used, by using the top utility, but at these scales I didn't see a difference. Posted by MYO NeuralNetat 09:50
No comments:
Email This
BlogThis!
Share
to Twitter
Share
to Facebook
Share
to Pinterest
Older Posts
Home
Subscribe to: Posts (Atom)COMING SOON
THE BOOK
THE CODE
All the code is on github.
SEARCH THIS BLOG
ABOUT ME
* MYO NeuralNet
View my complete profileBLOG ARCHIVE
* ▼ 2020
(2)
* ▼ March
(1)
* Gradient Descent Unstable For GANs?* ► February
(1)
* ► 2018
(4)
* ► September
(1)
* ► May
(2)
* ► February
(1)
* ► 2017
(6)
* ► May
(1)
* ► April
(1)
* ► March
(1)
* ► February
(1)
* ► January
(2)
* ► 2016
(19)
* ► August
(1)
* ► July
(1)
* ► June
(2)
* ► May
(3)
* ► April
(4)
* ► March
(5)
* ► February
(1)
* ► January
(2)
* ► 2015
(14)
* ► August
(1)
* ► May
(3)
* ► April
(4)
* ► March
(3)
* ► January
(3)
Awesome Inc. theme. Powered by Blogger .Details
Copyright © 2024 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0