3
$\begingroup$

I was reading DeepMind's paper on I2A's and realized that the sizes of the hidden layers in their model were all like 32, 64, 256, and so on: all powers of 2. I have found the same thing in other papers.

Is there any performance reason for it? Maybe related to data structure alignment?

More concretely, I would like to know if I should use this "special" sizes when training my own models.

asked Sep 8, 2017 at 15:20
$\endgroup$

1 Answer 1

1
$\begingroup$

While you can only be 100% certain when you ask the authors, most authors use this simply because you have to choose one value. The specific value doesn't matter too much, only the order of magnitude. Taking a power of 2 seems to be a natural choice.

You can also take a setup which uses a power of two and reduce the number by one. The computation time should be roughly equal, probably be a bit lower. If it is noticeably higher, there might be a performance benefit of using the choice of the author.

See also

answered Sep 10, 2017 at 16:28
$\endgroup$
2
  • $\begingroup$ I also thought the specific value wasn't important, but this predilection for powers of 2 seemed strange =). +1 for the experiment suggestion and the links. $\endgroup$ Commented Sep 13, 2017 at 11:27
  • $\begingroup$ Please also note that some people say they do it due to cache alignment. then you should ask them if they actually experimentally confirmed if that makes a difference. Sometimes people have heard something and either don't know enough to do it right or just assume that it worked add expected when nothing goes horribly wrong. $\endgroup$ Commented Sep 13, 2017 at 11:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.