![]() The “possible error surface” is large, logical (as opposed to syntactic), and very tricky to unit test. Everything could be correct syntactically, but the whole thing isn’t arranged properly, and it’s really hard to tell. This is just a start when it comes to training neural nets. In addition, it’s often possible to create unit tests for a certain functionality. The number of elements in the two lists isn’t equal. You plugged in an integer where something expected a string. When you break or misconfigure code you will often get some kind of an exception. Which brings me to… 2) Neural net training fails silently If you insist on using the technology without understanding how it works you are likely to fail. And just because you can formulate your problem as RL doesn’t mean you should. RNNs don’t magically let you “plug in” text. Batch norm does not magically make it converge faster. Backprop + SGD does not magically make your network work. I’ve tried to make this point in my post “Yes you should understand backprop” by picking on backpropagation and calling it a “leaky abstraction”, but the situation is unfortunately much more dire. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier. Unfortunately, neural nets are nothing like that. This is what we are familiar with and expect. That’s cool! A courageous developer has taken the burden of understanding query strings, urls, GET/POST requests, HTTP connections, and so on from you and largely hidden the complexity behind a few lines of code. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. It is allegedly easy to get started with training neural nets. 1) Neural net training is a leaky abstraction Let’s start with two important observations that motivate it. The trick to doing so is to follow a certain process, which as far as I can tell is not very often documented. However, instead of going into an enumeration of more common errors or fleshing them out, I wanted to dig a bit deeper and talk about how one can avoid making these errors altogether (or fix them very fast). So I thought it could be fun to brush off my dusty blog to expand my tweet to the long form that this topic deserves. Clearly, a lot of people have personally encountered the large gap between “here is how a convolutional layer works” and “our convnet achieves state of the art results”. The tweet got quite a bit more engagement than I anticipated (including a webinar :)). Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |