Q&A and Results Showcase
In static mode, all objects are fixed. The state is represented as a flattened 64-dimensional array.
Naive DQN Loss
Experience Replay DQN Loss
Double DQN Loss
Dueling DQN Loss
We converted the model to PyTorch Lightning (LitDQN class) for the hardest random environment. To stabilize training in this chaotic mode, we integrated:
nn.SmoothL1Loss() to prevent exploding gradients when rewards fluctuate wildly.StepLR to gradually decay the learning rate, helping the model fine-tune its policy as it converges.gradient_clip_val=1.0 in the Trainer to ensure updates remain within a stable bound.The PyTorch Lightning model successfully encapsulated the training loop. By applying these tips, the model avoided catastrophic forgetting in the random environment where all objects (Player, Goal, Pit, Wall) spawn arbitrarily, successfully converging across epochs.
Rainbow DQN 結合了六項對 DQN 的改進,是 DQN 家族的集大成者:
為了在充滿挑戰的 random 模式下穩定訓練,我們將這 6 種技術整合在 hw3_4_rainbow_dqn.py 中:
NoisyLinear 取代標準全連接層;將 Dueling 結合 Categorical 分佈;訓練迴圈中使用陣列實作的 PER,並透過計算 Cross-Entropy Loss 來更新神經網路與 PER 權重。python hw3_4_rainbow_dqn.py3.58 大幅下降並穩定在 0.005 左右。
Rainbow DQN Loss