-
Notifications
You must be signed in to change notification settings - Fork 45
fix graph_net/tools/generate_subgraph_dataset #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个脚本是为了给每个样本配置需要的维度变换Pass,目前这些Pass都已经更新到样本的graph_net.json中,因此这里不需要执行。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个function包含三个python,如果"这些Pass都已经更新到样本的graph_net.json中",那么前两个python可以不执行,但是第三个python是生成不同维度的samples的子空间的,参考#566 (comment)
这点需要明确一下我们的要求到底是什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
暂时先不删,目前不是只有small10改了,其他都没改,后续在修改这个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删了,我理解了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该运行配置并没有给多个步骤复用,可直接写到对应的命令里面
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得可以保留着,方便统一格式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我先使用了保留的方式,如果确实需要写到命令里,后续在修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
统一在DECOMPOSE_WORKSPACE目录下建立子目录,保存每个步骤执行的产物
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已完成
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一步的功能是什么?看代码是更新了输入约束文件,需要集成进来吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
功能暂时保留,后续考虑是否删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里处理的是分解后的子图,不应该修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已完成
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
前序新增了维度泛化步骤,维度泛化可能会修改model.py的内容,因为有些维度值是使用立即数的方式写死在了model.py里面,那range_decompose是不是应该以维度泛化后的样本作为输入?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我以确认,这样执行理论上没有问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这3步是为了做一件事,前面2步是为了收集信息,最后1步才是真正应用样本变换,可以合并到1步,并且注意任何生成的json文件、样本等,都统一放到DECOMPOSE_WORKSPACE指定的目录下面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已集成
Honglei-Qiu
commented
Jan 26, 2026
function generate_unittests() {
echo ">>> [15] Generate unittests for subgraph samples under ${DEDUPLICATED_FUSIBLE_SUBGRAPH_DIR}."
echo ">>>"
python3 -m graph_net.model_path_handler \
--model-path-list ${deduplicated_fusible_subgraphs_list} \
--handler-config=$(base64 -w 0 <<EOF
{
"handler_path": "$GRAPH_NET_ROOT/graph_net/sample_pass/agent_unittest_generator.py",
"handler_class_name": "AgentUnittestGeneratorPass",
"handler_config": {
"framework": "torch",
"model_path_prefix": "${DEDUPLICATED_FUSIBLE_SUBGRAPH_DIR}",
"output_dir": "$UNITTESTS_OUTPUT_DIR",
"device": "cuda",
"generate_main": false,
"try_run": true,
"resume": ${RESUME},
"data_input_predicator_filepath": "$GRAPH_NET_ROOT/graph_net/torch/constraint_util.py",
"data_input_predicator_class_name": "RenamedDataInputPredicator"
}
}
EOF
)
}
目前这个函数还没有集成完毕
1、我的思路是写一个新的代码,复制deduplicated_fusible_subgraphs下的mode.py等信息到workspace_subgraph_input_shapes_naive_rewriter/0下,因为目录下只有维度泛化以后的weight_meta.py
2、修改subgraph_input_shapes_naive_rewriter,让他增加复制的逻辑
Honglei-Qiu
commented
Jan 26, 2026
function generate_unittests() { echo ">>> [15] Generate unittests for subgraph samples under ${DEDUPLICATED_FUSIBLE_SUBGRAPH_DIR}." echo ">>>" python3 -m graph_net.model_path_handler \ --model-path-list ${deduplicated_fusible_subgraphs_list} \ --handler-config=$(base64 -w 0 <<EOF { "handler_path": "$GRAPH_NET_ROOT/graph_net/sample_pass/agent_unittest_generator.py", "handler_class_name": "AgentUnittestGeneratorPass", "handler_config": { "framework": "torch", "model_path_prefix": "${DEDUPLICATED_FUSIBLE_SUBGRAPH_DIR}", "output_dir": "$UNITTESTS_OUTPUT_DIR", "device": "cuda", "generate_main": false, "try_run": true, "resume": ${RESUME}, "data_input_predicator_filepath": "$GRAPH_NET_ROOT/graph_net/torch/constraint_util.py", "data_input_predicator_class_name": "RenamedDataInputPredicator" } } EOF ) }目前这个函数还没有集成完毕 1、我的思路是写一个新的代码,复制deduplicated_fusible_subgraphs下的mode.py等信息到workspace_subgraph_input_shapes_naive_rewriter/0下,因为目录下只有维度泛化以后的weight_meta.py 2、修改subgraph_input_shapes_naive_rewriter,让他增加复制的逻辑
我比较倾向第二个
Honglei-Qiu
commented
Jan 26, 2026
function generate_unittests() { echo ">>> [15] Generate unittests for subgraph samples under ${DEDUPLICATED_FUSIBLE_SUBGRAPH_DIR}." echo ">>>" python3 -m graph_net.model_path_handler \ --model-path-list ${deduplicated_fusible_subgraphs_list} \ --handler-config=$(base64 -w 0 <<EOF { "handler_path": "$GRAPH_NET_ROOT/graph_net/sample_pass/agent_unittest_generator.py", "handler_class_name": "AgentUnittestGeneratorPass", "handler_config": { "framework": "torch", "model_path_prefix": "${DEDUPLICATED_FUSIBLE_SUBGRAPH_DIR}", "output_dir": "$UNITTESTS_OUTPUT_DIR", "device": "cuda", "generate_main": false, "try_run": true, "resume": ${RESUME}, "data_input_predicator_filepath": "$GRAPH_NET_ROOT/graph_net/torch/constraint_util.py", "data_input_predicator_class_name": "RenamedDataInputPredicator" } } EOF ) }目前这个函数还没有集成完毕 1、我的思路是写一个新的代码,复制deduplicated_fusible_subgraphs下的mode.py等信息到workspace_subgraph_input_shapes_naive_rewriter/0下,因为目录下只有维度泛化以后的weight_meta.py 2、修改subgraph_input_shapes_naive_rewriter,让他增加复制的逻辑
我比较倾向第二个
已实现第二个方法
Xreki
commented
Jan 26, 2026
目录下只有维度泛化以后的weight_meta.py
依据「整图维度泛化」的信息进行「子图维度泛化」,子图的model.py也需要从泛化后的整图中截取?因为维度泛化时可能会修改整图的model.py。
目录下只有维度泛化以后的weight_meta.py
依据「整图维度泛化」的信息进行「子图维度泛化」,子图的model.py也需要从泛化后的整图中截取?因为维度泛化时可能会修改整图的model.py。
从可融合子图中直接复制过来,这样做的目的是为了方便generate_unittests函数的运行
Honglei-Qiu
commented
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite_device步骤是为了gen_fusible_subgraphs而添加,应该是将rewrite_device的输入改为GraphNet整图样本。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python有提供了文件拷贝的库
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用shutil
Xreki
commented
Jan 26, 2026
从可融合子图中直接复制过来,这样做的目的是为了方便generate_unittests函数的运行
你没理解我的意思。整图维度泛化后,9份样本的model.py内容可能是不一样的。你9份子图样本都直接从可融合子图复制,那结果就都是一样的,可有可能9份整图能成功运行、但是这9份子图不能成功运行。
从可融合子图中直接复制过来,这样做的目的是为了方便generate_unittests函数的运行
你没理解我的意思。整图维度泛化后,9份样本的
model.py内容可能是不一样的。你9份子图样本都直接从可融合子图复制,那结果就都是一样的,可有可能9份整图能成功运行、但是这9份子图不能成功运行。
那我维度泛化以后的weight_meta.py应该怎么用起来,我突然疑惑起来了
截屏2026年01月26日 14 26 34
为什么会不一样呀,我测试了一下diff,没有输出呀
TelGome
commented
Jan 26, 2026
从可融合子图中直接复制过来,这样做的目的是为了方便generate_unittests函数的运行
你没理解我的意思。整图维度泛化后,9份样本的
model.py内容可能是不一样的。你9份子图样本都直接从可融合子图复制,那结果就都是一样的,可有可能9份整图能成功运行、但是这9份子图不能成功运行。
维度泛化后model.py应该都是一样的,是输入不同。假设初始有一份整图,输入是固定的,假设这个图只有一个维度可以泛化,原始的值是128,现在有(sympy.Symbol("S0"),): [
[128],
[192],
[224],
[256],
[336],
[384],
[448],
[512],
[640],
],这9个值作为输入,就变成9份输入不同的图,但是还是来自同一个原始图
Xreki
commented
Jan 26, 2026
维度泛化后model.py应该都是一样的
那些维度相关的pass就是在改model.py,成功应用了pass之后多少都有点差异。假设只有batch_size一个维度泛化了,有些算子的参数中用到了batch_size维度,比如view(x, batch_size, 1024, 196),但model.py中都是直接立即数,当batch_size=1时为view(x, 1, 1024, 196),当batch_size=2时为view(x, 2, 1024, 196)。
Honglei-Qiu
commented
Jan 26, 2026
维度泛化后model.py应该都是一样的
那些维度相关的pass就是在改model.py,成功应用了pass之后多少都有点差异。假设只有
batch_size一个维度泛化了,有些算子的参数中用到了batch_size维度,比如view(x, batch_size, 1024, 196),但model.py中都是直接立即数,当batch_size=1时为view(x, 1, 1024, 196),当batch_size=2时为view(x, 2, 1024, 196)。
主要是跑出来以后进行diff测试model.py时显示是相同的,没有差异的出现呀,疑惑🤔
TelGome
commented
Jan 26, 2026
维度泛化后model.py应该都是一样的
那些维度相关的pass就是在改model.py,成功应用了pass之后多少都有点差异。假设只有
batch_size一个维度泛化了,有些算子的参数中用到了batch_size维度,比如view(x, batch_size, 1024, 196),但model.py中都是直接立即数,当batch_size=1时为view(x, 1, 1024, 196),当batch_size=2时为view(x, 2, 1024, 196)。
这里的改动应该是在input_tensor_constraints.py里面体现,但是经过apply_dim_gen_passes.sh后生成的9个图里的model.py是不会变的。
Xreki
commented
Jan 26, 2026
拿个样本看下吧,下图左边是GraphNet中原始样本GraphNet/samples/timm/coatnet_rmlp_3_rw_224/model.py,右边为维度泛化后生成的样本coatnet_rmlp_3_rw_224/coatnet_rmlp_3_rw_224 __S0_8/model.py:
size = conv2d_44.size(0)的 0 维度应该是泛化后的维度,运行时 9 份样本会取不同的值。这 9 份样本model.py的源码是一样的,都是这种符号化之后的版本。但是与GraphNet中原始图是不一样的。
TelGome
commented
Jan 26, 2026
拿个样本看下吧,下图左边是GraphNet中原始样本
GraphNet/samples/timm/coatnet_rmlp_3_rw_224/model.py,右边为维度泛化后生成的样本coatnet_rmlp_3_rw_224/coatnet_rmlp_3_rw_224 __S0_8/model.py:
size = conv2d_44.size(0)的 0 维度应该是泛化后的维度,运行时 9 份样本会取不同的值。这 9 份样本model.py的源码是一样的,都是这种符号化之后的版本。但是与GraphNet中原始图是不一样的。
好的,看到了,确实会变,可能是之前看的模型并没有对model.py作修改,误以为所有的都是不变的 🙈
拿个样本看下吧,下图左边是GraphNet中原始样本
GraphNet/samples/timm/coatnet_rmlp_3_rw_224/model.py,右边为维度泛化后生成的样本coatnet_rmlp_3_rw_224/coatnet_rmlp_3_rw_224 __S0_8/model.py:
size = conv2d_44.size(0)的 0 维度应该是泛化后的维度,运行时 9 份样本会取不同的值。这 9 份样本model.py的源码是一样的,都是这种符号化之后的版本。但是与GraphNet中原始图是不一样的。
那现在已经知道了model.py会变,那我的测试就应该不是从切开的融合子图里拿model.py,那从哪里拿????
难道要把这些维度泛化的整图全部重新跑一遍可融合子图????
我们得对齐一下流程
Xreki
commented
Jan 26, 2026
那现在已经知道了model.py会变,那我的测试就应该不是从切开的融合子图里拿model.py,那从哪里拿????
难道要把这些维度泛化的整图全部重新跑一遍可融合子图????
我们得对齐一下流程
我觉得应该从可融合子图那里,拿到子图的subgraph_range,然后再使用SubgraphGenerator对泛化后的整图进行切分。不过,你可以先解决当前的问题。
Honglei-Qiu
commented
Jan 26, 2026
那现在已经知道了model.py会变,那我的测试就应该不是从切开的融合子图里拿model.py,那从哪里拿????
难道要把这些维度泛化的整图全部重新跑一遍可融合子图????
我们得对齐一下流程我觉得应该从可融合子图那里,拿到子图的
subgraph_range,然后再使用SubgraphGenerator对泛化后的整图进行切分。不过,你可以先解决当前的问题。
当时说的是cumsum_num_kernels_generator比较慢,如果知道切分点以后,确实是一个可行方案
那对于单元测试这个我们后续对齐,我先更新目前代码,看一下其他部分是否还有问题
lixinqi
commented
Jan 27, 2026
那现在已经知道了model.py会变,那我的测试就应该不是从切开的融合子图里拿model.py,那从哪里拿????
难道要把这些维度泛化的整图全部重新跑一遍可融合子图????
我们得对齐一下流程我觉得应该从可融合子图那里,拿到子图的
subgraph_range,然后再使用SubgraphGenerator对泛化后的整图进行切分。不过,你可以先解决当前的问题。
"泛化后的整图" 不包含size节点吧?
Honglei-Qiu
commented
Jan 27, 2026
那现在已经知道了model.py会变,那我的测试就应该不是从切开的融合子图里拿model.py,那从哪里拿????
难道要把这些维度泛化的整图全部重新跑一遍可融合子图????
我们得对齐一下流程我觉得应该从可融合子图那里,拿到子图的
subgraph_range,然后再使用SubgraphGenerator对泛化后的整图进行切分。不过,你可以先解决当前的问题。
参考该方案进行修改
从samll10维度泛化出来9个整图,准化率90%,最后每一个整图产生386个泛化子图,总共386*9=3437份泛化子图
截屏2026年01月27日 16 24 20
子图差异展示
Xreki
commented
Jan 27, 2026
除了泛化样本生成的成功率指标外,还需要看泛化样本执行的成功率,可以用run_model或validate工具检查能否执行。
Honglei-Qiu
commented
Jan 27, 2026
除了泛化样本生成的成功率指标外,还需要看泛化样本执行的成功率,可以用
run_model或validate工具检查能否执行。
使用run_model进行测评
对泛化子图进行测试,结果如图
是否确定该方案为最终方案,是的话我需要提交新的sh文件
PR Category
other
Description
维度泛化集成