跳转至

郭睿明的小屋

Optimization

dandelight/dandelight.github.io

Optimization

For some basic knowledge you can see Deep Learning.

Adaptive Learning Rate¶

Saddle points can be very challenging -- Saddles

To detect saddle points

No.1 Overfit you training data -- you model has the flexibility to get there.

Regularization¶

Parameter Regularization
\(\ell_1\) forces some variables to be zero to preserve sparsity.
Maximum a posteriori (MAP) estimation
Structural Regularization
Go for simpler models and slowly go more and more complex.
User tasks specific models

Co-adaptation¶

Dropout: multiply the output of a hidden layer with a mask of 0s and 1s
Backward: multiply the weights by \(1-p_i\).
Stop co-adaptation and learn ensemble
Other variations
Gaussian dropout: multiply with a Gaussian with mean 1
Swapout: Allow skip connections to happen (?)

Multimodal optimization¶

Biggest challenge
Pretraining

最后更新: 2024-04-09