Disable tree reuse in training.#536
Conversation
It was spinning on resign, if the 450th ply was a win/loss/draw it wasn't recognized, and if the 450th ply was a win/loss/draw it was trying to play a move from that position.
Doesn't seem to have a significant performance win, and has detrimental effects on the effectiveness of noise.
|
Note that this PR is diffbased against PR #526 - since its easier to make this change after that is fixed. |
|
I'm in favor of this, especially since it's practically free performance-wise. I'd like to see some opinions from @glinscott @Error323. |
|
Some numbers to aid consideration. |
| int play_one_game(BoardHistory& bh) { | ||
| auto search = std::make_unique<UCTSearch>(bh.shallow_clone()); | ||
| for (int game_ply = 0; game_ply < 450; ++game_ply) { | ||
| auto search = std::make_unique<UCTSearch>(bh.shallow_clone()); |
There was a problem hiding this comment.
How about a comment saying we are doing this on purpose to avoid tree reuse? Otherwise it might look like a bug.
Doesn't seem to have a significant performance win, and has detrimental effects on the effectiveness of noise.
|
I've started gathering some data. Moves is actually ply. baseline and baseline 2 are independent searches with full tree reuse policy visit delta relative to a main full tree reuse search that is in charge of choosing the moves under standard training conditions. I put in 2 baselines to get a feel for what kind of error bar there is in the data. So disabling tree reuse is ~10% more noise effect overall. dual_tree is similar. I had another run about this deep which I accidentally closed - disabling tree reuse was a bit higher in that one, closer to 15% and dual tree was a bit closer to the middle. I'll leave it running overnight to see if it moves a bit more. 10-15% may not sound large - but I'll call out the specific lines I've selected above. They are taken from late game, and obviously there are some forced moves. Noise has 0 effect on a forced move, but then in the baseline it also has 0 effect on the next move, where dual tree and no tree reuse both get nice noise effects. So it may not be lots more noise overall, it is a huge increase in noise at specific moves which I think are the moves at most risk of policy overfit. |
|
Overnight numbers: Not really much change. ~10% more for dual tree, and ~12% for no tree reuse. Delta moves: 36909 baseline: 5351446, baseline2: 5328474, dual_tree: 5854904, nohistory: 6069182 |
|
I realized there was an issue with my methodology. Rather than comparing against the move chooser, I added a new search tree with no noise to compare against - I think its gives a more realistic value magnitude then calculating the diff against something with noise in it. Also included the move_chooser to compare next to baseline to see how much effect being the move chooser has on the result. Early numbers: Results are pretty similar, but the relative magnitude of the new scenarios are a bit higher than before. |
|
One last result set. (I'm going to switch to testing something else now.) ~16% for no tree reuse and ~12% for dual tree. |
This is an alternative to #528, since I didn't find any significant performance difference and this is even more clearly a win for noise effectiveness.