-
Notifications
You must be signed in to change notification settings - Fork 7
Optimization of Hyperparameters and Evaluation Metrics in config.h #79
Description
The current hyperparameter configuration in config.h exhibits sub-optimal training throughput and statistical variance during the evaluation phase. Specifically, the evaluation iteration count, evaluation interval frequency, and dropout regularisation parameters present opportunities for tuning to improve convergence stability and reduce computational overhead in the native C++ training loop.
Over-Regularisation (DROPOUT = 0.2f)For a compact, character-level architecture (
static const int BATCH_SIZE = 16; // Increased from 4 to stabilize gradients and utilize vectorization static const int BLOCK_SIZE = 64; // Context length static const int MAX_ITERS = 5000; // Reduced from 10000 due to larger batch size tokens-per-iteration static const int EVAL_INTERVAL = 250; // Increased from 20 to decrease context-switching overhead // Learning Rate Schedule static const float LEARNING_RATE = 5e-4f; // Adjusted marginally upward to scale with higher batch size // Statistical Stability static const int EVAL_ITERS = 100; // Increased from 1 to yield an accurate, low-variance mean loss // Architectural Regularisation static const int N_EMBD = 128; static const int N_HEAD = 4; static const int N_LAYER = 4; static const float DROPOUT = 0.05f; // Reduced from 0.2f to accelerate early-stage convergence