How to decide neural network architecture?

I was wondering how do we have to decide how many nodes in hidden layers, and how many hidden layers to put when we build a neural network architecture.

I understand the input and output layer depends on the training set that we have but how do we decide the hidden layer and the overall architecture in general?

asked Jul 6 '17 at 19:05

user7677413

16515

$begingroup$
Typically we experiment, using our intution; consider it a hyperparameter. There are ways of learning the architecture but I don't know how practical they are: blog.acolyer.org/2017/05/10/…
$endgroup$
– Emre
Jul 6 '17 at 19:12

2

$begingroup$
I looked for a duplicate to this, because I am sure it has cropped up many times before on this site. However, could not find a pure version that wasn't attached to some dataset or problem. Maybe this could be the generic question we point others to? Sadly there isn't a great "how to" answer to be had in general, but it's a common question when faced with so much choice.
$endgroup$
– Neil Slater
Jul 6 '17 at 19:23

$begingroup$
datascience.stackexchange.com/questions/22199/…
$endgroup$
– KHAN irfan
Aug 23 '17 at 14:24

$begingroup$
This is a very interesting question to answer (Researcher started working on your question). What would be the optimal architecture for dataset A and dataset B. Please read below paper that tried to answer to your question. Welcome the world of Neural Architecture Search (NAS). arxiv.org/abs/1611.01578
$endgroup$
– iDeepVision
yesterday

add a comment |

I was wondering how do we have to decide how many nodes in hidden layers, and how many hidden layers to put when we build a neural network architecture.

I understand the input and output layer depends on the training set that we have but how do we decide the hidden layer and the overall architecture in general?

asked Jul 6 '17 at 19:05

user7677413

16515

$begingroup$
Typically we experiment, using our intution; consider it a hyperparameter. There are ways of learning the architecture but I don't know how practical they are: blog.acolyer.org/2017/05/10/…
$endgroup$
– Emre
Jul 6 '17 at 19:12

2

$begingroup$
I looked for a duplicate to this, because I am sure it has cropped up many times before on this site. However, could not find a pure version that wasn't attached to some dataset or problem. Maybe this could be the generic question we point others to? Sadly there isn't a great "how to" answer to be had in general, but it's a common question when faced with so much choice.
$endgroup$
– Neil Slater
Jul 6 '17 at 19:23

$begingroup$
datascience.stackexchange.com/questions/22199/…
$endgroup$
– KHAN irfan
Aug 23 '17 at 14:24

$begingroup$
This is a very interesting question to answer (Researcher started working on your question). What would be the optimal architecture for dataset A and dataset B. Please read below paper that tried to answer to your question. Welcome the world of Neural Architecture Search (NAS). arxiv.org/abs/1611.01578
$endgroup$
– iDeepVision
yesterday

add a comment |

I was wondering how do we have to decide how many nodes in hidden layers, and how many hidden layers to put when we build a neural network architecture.

I understand the input and output layer depends on the training set that we have but how do we decide the hidden layer and the overall architecture in general?

asked Jul 6 '17 at 19:05

user7677413

16515

I was wondering how do we have to decide how many nodes in hidden layers, and how many hidden layers to put when we build a neural network architecture.

I understand the input and output layer depends on the training set that we have but how do we decide the hidden layer and the overall architecture in general?

machine-learning neural-network

asked Jul 6 '17 at 19:05

user7677413

16515

asked Jul 6 '17 at 19:05

user7677413

16515

asked Jul 6 '17 at 19:05

user7677413

16515

asked Jul 6 '17 at 19:05

user7677413

16515

asked Jul 6 '17 at 19:05

user7677413

16515

$begingroup$
Typically we experiment, using our intution; consider it a hyperparameter. There are ways of learning the architecture but I don't know how practical they are: blog.acolyer.org/2017/05/10/…
$endgroup$
– Emre
Jul 6 '17 at 19:12

2

$begingroup$
I looked for a duplicate to this, because I am sure it has cropped up many times before on this site. However, could not find a pure version that wasn't attached to some dataset or problem. Maybe this could be the generic question we point others to? Sadly there isn't a great "how to" answer to be had in general, but it's a common question when faced with so much choice.
$endgroup$
– Neil Slater
Jul 6 '17 at 19:23

$begingroup$
datascience.stackexchange.com/questions/22199/…
$endgroup$
– KHAN irfan
Aug 23 '17 at 14:24

$begingroup$
This is a very interesting question to answer (Researcher started working on your question). What would be the optimal architecture for dataset A and dataset B. Please read below paper that tried to answer to your question. Welcome the world of Neural Architecture Search (NAS). arxiv.org/abs/1611.01578
$endgroup$
– iDeepVision
yesterday

add a comment |

$begingroup$
Typically we experiment, using our intution; consider it a hyperparameter. There are ways of learning the architecture but I don't know how practical they are: blog.acolyer.org/2017/05/10/…
$endgroup$
– Emre
Jul 6 '17 at 19:12

2

$begingroup$
I looked for a duplicate to this, because I am sure it has cropped up many times before on this site. However, could not find a pure version that wasn't attached to some dataset or problem. Maybe this could be the generic question we point others to? Sadly there isn't a great "how to" answer to be had in general, but it's a common question when faced with so much choice.
$endgroup$
– Neil Slater
Jul 6 '17 at 19:23

$begingroup$
datascience.stackexchange.com/questions/22199/…
$endgroup$
– KHAN irfan
Aug 23 '17 at 14:24

$begingroup$
This is a very interesting question to answer (Researcher started working on your question). What would be the optimal architecture for dataset A and dataset B. Please read below paper that tried to answer to your question. Welcome the world of Neural Architecture Search (NAS). arxiv.org/abs/1611.01578
$endgroup$
– iDeepVision
yesterday

Typically we experiment, using our intution; consider it a hyperparameter. There are ways of learning the architecture but I don't know how practical they are: blog.acolyer.org/2017/05/10/…

– Emre
Jul 6 '17 at 19:12

I looked for a duplicate to this, because I am sure it has cropped up many times before on this site. However, could not find a pure version that wasn't attached to some dataset or problem. Maybe this could be the generic question we point others to? Sadly there isn't a great "how to" answer to be had in general, but it's a common question when faced with so much choice.

– Neil Slater
Jul 6 '17 at 19:23

datascience.stackexchange.com/questions/22199/…

– KHAN irfan
Aug 23 '17 at 14:24

This is a very interesting question to answer (Researcher started working on your question). What would be the optimal architecture for dataset A and dataset B. Please read below paper that tried to answer to your question. Welcome the world of Neural Architecture Search (NAS). arxiv.org/abs/1611.01578

– iDeepVision
yesterday

add a comment |

1 Answer
1

active

oldest

votes

Sadly there is no generic way to determine a priori the best number of neurons and number of layers for a neural network, given just a problem description. There isn't even much guidance to be had determining good values to try as a starting point.

The most common approach seems to be to start with a rough guess based on prior experience about networks used on similar problems. This could be your own experience, or second/third-hand experience you have picked up from a training course, blog or research paper. Then try some variations, and check the performance carefully before picking a best one.

The size and depth of neural networks interact with other hyper-paramaters too, so that changing one thing elsewhere can affect where the best values are. So it is not possible to isolate a "best" size and depth for a network then continue to tune other parameters in isolation. For instance, if you have a very deep network, it may work efficiently with the ReLU activation function, but not so well with sigmoid - if you found the best size/shape of network and then tried an experiment with varying activation functions you may come to the wrong conclusion about what works best.

You may sometimes read about "rules of thumb" that researchers use when starting a neural network design from scratch. These things might work for your problems or not, but they at least have the advantage of making a start on the problem. The variations I have seen are:

Create a network with hidden layers similar size order to the input, and all the same size, on the grounds that there is no particular reason to vary the size (unless you are creating an autoencoder perhaps).

Start simple and build up complexity to see what improves a simple network.

Try varying depths of network if you expect the output to be explained well by the input data, but with a complex relationship (as opposed to just inherently noisy).

Try adding some dropout, it's the closest thing neural networks have to magic fairy dust that makes everything better (caveat: adding dropout may improve generalisation, but may also increase required layer sizes and training times).

If you read these or anything like them in any text, then take them with a pinch of salt. However, at worst they help you get past the blank page effect, and write some kind of network, and get you to start the testing and refinement process.

As an aside, try not to get too lost in tuning a neural network when some other approach might be better and save you lots of time. Do consider and use other machine learning and data science approaches. Explore the data, maybe make some plots. Try some simple linear approaches first to get benchmarks to beat, linear regression, logistic regression or softmax regression depending on your problem. Consider using a different ML algorithm to NNs - decision tree based approaches such as XGBoost can be faster and more effective than deep learning on many problems.

edited Jul 7 '17 at 8:36

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

$begingroup$
It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily?
$endgroup$
– user7677413
Jul 7 '17 at 7:05

$begingroup$
@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:07

$begingroup$
so is machine learning basically like guessing with intuition and experience, rather than theoretical approach?
$endgroup$
– user7677413
Jul 7 '17 at 7:12

$begingroup$
@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:19

1

$begingroup$
when is neural network necessary then?
$endgroup$
– user7677413
Jul 7 '17 at 7:25

|
show 6 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f20222%2fhow-to-decide-neural-network-architecture%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Create a network with hidden layers similar size order to the input, and all the same size, on the grounds that there is no particular reason to vary the size (unless you are creating an autoencoder perhaps).

Start simple and build up complexity to see what improves a simple network.

Try varying depths of network if you expect the output to be explained well by the input data, but with a complex relationship (as opposed to just inherently noisy).

Try adding some dropout, it's the closest thing neural networks have to magic fairy dust that makes everything better (caveat: adding dropout may improve generalisation, but may also increase required layer sizes and training times).

edited Jul 7 '17 at 8:36

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

$begingroup$
It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily?
$endgroup$
– user7677413
Jul 7 '17 at 7:05

$begingroup$
@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:07

$begingroup$
so is machine learning basically like guessing with intuition and experience, rather than theoretical approach?
$endgroup$
– user7677413
Jul 7 '17 at 7:12

$begingroup$
@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:19

1

$begingroup$
when is neural network necessary then?
$endgroup$
– user7677413
Jul 7 '17 at 7:25

|
show 6 more comments

Create a network with hidden layers similar size order to the input, and all the same size, on the grounds that there is no particular reason to vary the size (unless you are creating an autoencoder perhaps).

Start simple and build up complexity to see what improves a simple network.

Try varying depths of network if you expect the output to be explained well by the input data, but with a complex relationship (as opposed to just inherently noisy).

Try adding some dropout, it's the closest thing neural networks have to magic fairy dust that makes everything better (caveat: adding dropout may improve generalisation, but may also increase required layer sizes and training times).

edited Jul 7 '17 at 8:36

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

$begingroup$
It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily?
$endgroup$
– user7677413
Jul 7 '17 at 7:05

$begingroup$
@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:07

$begingroup$
so is machine learning basically like guessing with intuition and experience, rather than theoretical approach?
$endgroup$
– user7677413
Jul 7 '17 at 7:12

$begingroup$
@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:19

1

$begingroup$
when is neural network necessary then?
$endgroup$
– user7677413
Jul 7 '17 at 7:25

|
show 6 more comments

Create a network with hidden layers similar size order to the input, and all the same size, on the grounds that there is no particular reason to vary the size (unless you are creating an autoencoder perhaps).

Start simple and build up complexity to see what improves a simple network.

Try varying depths of network if you expect the output to be explained well by the input data, but with a complex relationship (as opposed to just inherently noisy).

Try adding some dropout, it's the closest thing neural networks have to magic fairy dust that makes everything better (caveat: adding dropout may improve generalisation, but may also increase required layer sizes and training times).

edited Jul 7 '17 at 8:36

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

Create a network with hidden layers similar size order to the input, and all the same size, on the grounds that there is no particular reason to vary the size (unless you are creating an autoencoder perhaps).

Start simple and build up complexity to see what improves a simple network.

Try varying depths of network if you expect the output to be explained well by the input data, but with a complex relationship (as opposed to just inherently noisy).

Try adding some dropout, it's the closest thing neural networks have to magic fairy dust that makes everything better (caveat: adding dropout may improve generalisation, but may also increase required layer sizes and training times).

edited Jul 7 '17 at 8:36

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

edited Jul 7 '17 at 8:36

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

answered Jul 7 '17 at 6:33

Neil Slater

17.3k23061

$begingroup$
It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily?
$endgroup$
– user7677413
Jul 7 '17 at 7:05

$begingroup$
@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:07

$begingroup$
so is machine learning basically like guessing with intuition and experience, rather than theoretical approach?
$endgroup$
– user7677413
Jul 7 '17 at 7:12

$begingroup$
@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:19

1

$begingroup$
when is neural network necessary then?
$endgroup$
– user7677413
Jul 7 '17 at 7:25

|
show 6 more comments

$begingroup$
It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily?
$endgroup$
– user7677413
Jul 7 '17 at 7:05

$begingroup$
@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:07

$begingroup$
so is machine learning basically like guessing with intuition and experience, rather than theoretical approach?
$endgroup$
– user7677413
Jul 7 '17 at 7:12

$begingroup$
@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general.
$endgroup$
– Neil Slater
Jul 7 '17 at 7:19

1

$begingroup$
when is neural network necessary then?
$endgroup$
– user7677413
Jul 7 '17 at 7:25

It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily?

– user7677413
Jul 7 '17 at 7:05

@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems.

– Neil Slater
Jul 7 '17 at 7:07

so is machine learning basically like guessing with intuition and experience, rather than theoretical approach?

– user7677413
Jul 7 '17 at 7:12

@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general.

– Neil Slater
Jul 7 '17 at 7:19

when is neural network necessary then?

– user7677413
Jul 7 '17 at 7:25

|
show 6 more comments

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk