Neural network only converges when data cloud is close to 0












2












$begingroup$


I am new to tensorflow and am learning the basics at the moment so please bear with me.



My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.



I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:



import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math

def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):

n_input = arch_list[0]
n_output = arch_list[-1]

X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])

layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)


Y = tf.placeholder(tf.float32, [None, n_output])

loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)

return [X, Phi, Y, train_step]


With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.



So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:



before training



enter image description here



Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.



This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:



enter image description hereenter image description here



Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.



Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occurring are very much welcome!



Thanks,
Joker










share|improve this question









$endgroup$




bumped to the homepage by Community yesterday


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1




    $begingroup$
    The network isn't learning..... Lower your learning rate .5 is too high and correspondingly increase the epochs and report back what you found... Happy learning
    $endgroup$
    – Aditya
    May 11 '18 at 1:41










  • $begingroup$
    Hey Aditya, thank you for your comment. I have tried different learning rates (5,0.5,0.05,0.005,...) and have gone up to several houndred thousand iterations but results remain unchanged.
    $endgroup$
    – Joker123
    May 11 '18 at 10:51
















2












$begingroup$


I am new to tensorflow and am learning the basics at the moment so please bear with me.



My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.



I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:



import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math

def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):

n_input = arch_list[0]
n_output = arch_list[-1]

X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])

layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)


Y = tf.placeholder(tf.float32, [None, n_output])

loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)

return [X, Phi, Y, train_step]


With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.



So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:



before training



enter image description here



Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.



This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:



enter image description hereenter image description here



Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.



Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occurring are very much welcome!



Thanks,
Joker










share|improve this question









$endgroup$




bumped to the homepage by Community yesterday


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1




    $begingroup$
    The network isn't learning..... Lower your learning rate .5 is too high and correspondingly increase the epochs and report back what you found... Happy learning
    $endgroup$
    – Aditya
    May 11 '18 at 1:41










  • $begingroup$
    Hey Aditya, thank you for your comment. I have tried different learning rates (5,0.5,0.05,0.005,...) and have gone up to several houndred thousand iterations but results remain unchanged.
    $endgroup$
    – Joker123
    May 11 '18 at 10:51














2












2








2





$begingroup$


I am new to tensorflow and am learning the basics at the moment so please bear with me.



My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.



I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:



import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math

def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):

n_input = arch_list[0]
n_output = arch_list[-1]

X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])

layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)


Y = tf.placeholder(tf.float32, [None, n_output])

loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)

return [X, Phi, Y, train_step]


With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.



So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:



before training



enter image description here



Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.



This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:



enter image description hereenter image description here



Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.



Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occurring are very much welcome!



Thanks,
Joker










share|improve this question









$endgroup$




I am new to tensorflow and am learning the basics at the moment so please bear with me.



My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.



I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:



import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math

def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):

n_input = arch_list[0]
n_output = arch_list[-1]

X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])

layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)

Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)


Y = tf.placeholder(tf.float32, [None, n_output])

loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)

return [X, Phi, Y, train_step]


With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.



So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:



before training



enter image description here



Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.



This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:



enter image description hereenter image description here



Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.



Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occurring are very much welcome!



Thanks,
Joker







machine-learning neural-network regression






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 10 '18 at 21:41









Joker123Joker123

1111




1111





bumped to the homepage by Community yesterday


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community yesterday


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1




    $begingroup$
    The network isn't learning..... Lower your learning rate .5 is too high and correspondingly increase the epochs and report back what you found... Happy learning
    $endgroup$
    – Aditya
    May 11 '18 at 1:41










  • $begingroup$
    Hey Aditya, thank you for your comment. I have tried different learning rates (5,0.5,0.05,0.005,...) and have gone up to several houndred thousand iterations but results remain unchanged.
    $endgroup$
    – Joker123
    May 11 '18 at 10:51














  • 1




    $begingroup$
    The network isn't learning..... Lower your learning rate .5 is too high and correspondingly increase the epochs and report back what you found... Happy learning
    $endgroup$
    – Aditya
    May 11 '18 at 1:41










  • $begingroup$
    Hey Aditya, thank you for your comment. I have tried different learning rates (5,0.5,0.05,0.005,...) and have gone up to several houndred thousand iterations but results remain unchanged.
    $endgroup$
    – Joker123
    May 11 '18 at 10:51








1




1




$begingroup$
The network isn't learning..... Lower your learning rate .5 is too high and correspondingly increase the epochs and report back what you found... Happy learning
$endgroup$
– Aditya
May 11 '18 at 1:41




$begingroup$
The network isn't learning..... Lower your learning rate .5 is too high and correspondingly increase the epochs and report back what you found... Happy learning
$endgroup$
– Aditya
May 11 '18 at 1:41












$begingroup$
Hey Aditya, thank you for your comment. I have tried different learning rates (5,0.5,0.05,0.005,...) and have gone up to several houndred thousand iterations but results remain unchanged.
$endgroup$
– Joker123
May 11 '18 at 10:51




$begingroup$
Hey Aditya, thank you for your comment. I have tried different learning rates (5,0.5,0.05,0.005,...) and have gone up to several houndred thousand iterations but results remain unchanged.
$endgroup$
– Joker123
May 11 '18 at 10:51










1 Answer
1






active

oldest

votes


















0












$begingroup$

the input is simply not enough to correctly predict the output, the model can not learn the output conditional distribution P(y|x).



Either you have to add more features to the Naive NN model, e.g concat the previous x to the current x to predict the current y or use RNN like models to model the problem as

p(y_t|x_t,x_t-1,x_t-2,....x_t0)



see Time Series Prediction with LSTM






share|improve this answer









$endgroup$














    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31499%2fneural-network-only-converges-when-data-cloud-is-close-to-0%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    the input is simply not enough to correctly predict the output, the model can not learn the output conditional distribution P(y|x).



    Either you have to add more features to the Naive NN model, e.g concat the previous x to the current x to predict the current y or use RNN like models to model the problem as

    p(y_t|x_t,x_t-1,x_t-2,....x_t0)



    see Time Series Prediction with LSTM






    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      the input is simply not enough to correctly predict the output, the model can not learn the output conditional distribution P(y|x).



      Either you have to add more features to the Naive NN model, e.g concat the previous x to the current x to predict the current y or use RNN like models to model the problem as

      p(y_t|x_t,x_t-1,x_t-2,....x_t0)



      see Time Series Prediction with LSTM






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        the input is simply not enough to correctly predict the output, the model can not learn the output conditional distribution P(y|x).



        Either you have to add more features to the Naive NN model, e.g concat the previous x to the current x to predict the current y or use RNN like models to model the problem as

        p(y_t|x_t,x_t-1,x_t-2,....x_t0)



        see Time Series Prediction with LSTM






        share|improve this answer









        $endgroup$



        the input is simply not enough to correctly predict the output, the model can not learn the output conditional distribution P(y|x).



        Either you have to add more features to the Naive NN model, e.g concat the previous x to the current x to predict the current y or use RNN like models to model the problem as

        p(y_t|x_t,x_t-1,x_t-2,....x_t0)



        see Time Series Prediction with LSTM







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 11 '18 at 12:09









        Fadi BakouraFadi Bakoura

        653212




        653212






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31499%2fneural-network-only-converges-when-data-cloud-is-close-to-0%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Tabula Rosettana

            Aureus (color)