Closed
Description
From @dfalbel...
We might want to implement a learning rate decay technique otherwise it's too hard to tune the learning rate correctly.
Maybe something like what sklearn does:
-
‘invscaling’ gradually decreases the learning rate at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t)
-
‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5.
Metadata
Metadata
Assignees
Labels
No labels