Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Crossformer实现的参数传入bug报告 #765

Open
@qingbyin

Description

作者您好,

以下代码的factor被上层函数误传为configs.factor(注意力缩放因子),二者不是相同的参数,此处factor应该是路由数量,原论文默认是10,这里始终传入为configs.factor,导致默认为1。

self.router = nn.Parameter(torch.randn(seg_num, factor, d_model))

另外请教,为何注意力缩放因子configs.factor默认是1(许多模型还设置为3),而不是原 Transformer 中的 1/sqrt(d_model)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /