Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimization of Hyperparameters and Evaluation Metrics in config.h #79

Open
Labels
enhancementNew feature or request

Description

The current hyperparameter configuration in config.h exhibits sub-optimal training throughput and statistical variance during the evaluation phase. Specifically, the evaluation iteration count, evaluation interval frequency, and dropout regularisation parameters present opportunities for tuning to improve convergence stability and reduce computational overhead in the native C++ training loop.

Over-Regularisation (DROPOUT = 0.2f)For a compact, character-level architecture ($N_{\text{embd}} = 128$, $N_{\text{layer}} = 4$, $N_{\text{head}} = 4$) containing fewer than 1 million parameters, a 20% dropout rate is excessively aggressive. This high constraint risks underfitting the underlying structural pattern of the training corpus, delaying optimal cross-entropy minimization.

static const int BATCH_SIZE = 16; // Increased from 4 to stabilize gradients and utilize vectorization
static const int BLOCK_SIZE = 64; // Context length
static const int MAX_ITERS = 5000; // Reduced from 10000 due to larger batch size tokens-per-iteration
static const int EVAL_INTERVAL = 250; // Increased from 20 to decrease context-switching overhead
// Learning Rate Schedule
static const float LEARNING_RATE = 5e-4f; // Adjusted marginally upward to scale with higher batch size
// Statistical Stability
static const int EVAL_ITERS = 100; // Increased from 1 to yield an accurate, low-variance mean loss
// Architectural Regularisation
static const int N_EMBD = 128;
static const int N_HEAD = 4;
static const int N_LAYER = 4;
static const float DROPOUT = 0.05f; // Reduced from 0.2f to accelerate early-stage convergence

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AltStyle によって変換されたページ (->オリジナル) /