开源 企业版 高校版 私有云 模力方舟 AI 队友
代码拉取完成,页面将自动刷新
捐赠
捐赠前请先登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
1 Star 0 Fork 3

xiongying/Halide

加入 Gitee
与超过 1400万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
已有帐号? 立即登录
文件
main
分支 (1093)
标签 (17)
main
xtensa-codegen
vksnk/dma-limit-channels
rootjalex/trs-codegen-cross
abadams/fix_7374
abadams/remove_hack_from_gpu_only_aottest
srj/gpu-cache
srj/generator_aot_gpu_multi_context_threaded
srj/xtensa-merge
abadams/vector_scan
abadams/fix_7365
darya-ver/ir-viz
vulkan-phase2-runtime
srj/param-map-deprecation
srj/rt-return-types
srj/main-vs2022
release/15.x
srj/param-map
abadams/ir_builder_unique_ptr
vksnk/restrict
v14.0.0
v13.0.4
v13.0.3
v13.0.2
v13.0.1
v13.0.0
v12.0.1
v12.0.0
v11.0.1
v11.0.0
v10.0.1
v10.0.0
release_2019_08_27
release_8.0.0
v8.0.0
release_2018_02_15
release_2013_11_11
main
分支 (1093)
标签 (17)
main
xtensa-codegen
vksnk/dma-limit-channels
rootjalex/trs-codegen-cross
abadams/fix_7374
abadams/remove_hack_from_gpu_only_aottest
srj/gpu-cache
srj/generator_aot_gpu_multi_context_threaded
srj/xtensa-merge
abadams/vector_scan
abadams/fix_7365
darya-ver/ir-viz
vulkan-phase2-runtime
srj/param-map-deprecation
srj/rt-return-types
srj/main-vs2022
release/15.x
srj/param-map
abadams/ir_builder_unique_ptr
vksnk/restrict
v14.0.0
v13.0.4
v13.0.3
v13.0.2
v13.0.1
v13.0.0
v12.0.1
v12.0.0
v11.0.1
v11.0.0
v10.0.1
v10.0.0
release_2019_08_27
release_8.0.0
v8.0.0
release_2018_02_15
release_2013_11_11
克隆/下载
克隆/下载
提示
下载代码请复制以下命令到终端执行
为确保你提交的代码身份被 Gitee 正确识别,请执行以下命令完成配置
初次使用 SSH 协议进行代码克隆、推送等操作时,需按下述提示完成 SSH 配置
1 生成 RSA 密钥
2 获取 RSA 公钥内容,并配置到 SSH公钥
在 Gitee 上使用 SVN,请访问 使用指南
使用 HTTPS 协议时,命令行会出现如下账号密码验证步骤。基于安全考虑,Gitee 建议 配置并使用私人令牌 替代登录密码进行克隆、推送等操作
Username for 'https://gitee.com': userName
Password for 'https://userName@gitee.com': # 私人令牌
main
分支 (1093)
标签 (17)
main
xtensa-codegen
vksnk/dma-limit-channels
rootjalex/trs-codegen-cross
abadams/fix_7374
abadams/remove_hack_from_gpu_only_aottest
srj/gpu-cache
srj/generator_aot_gpu_multi_context_threaded
srj/xtensa-merge
abadams/vector_scan
abadams/fix_7365
darya-ver/ir-viz
vulkan-phase2-runtime
srj/param-map-deprecation
srj/rt-return-types
srj/main-vs2022
release/15.x
srj/param-map
abadams/ir_builder_unique_ptr
vksnk/restrict
v14.0.0
v13.0.4
v13.0.3
v13.0.2
v13.0.1
v13.0.0
v12.0.1
v12.0.0
v11.0.1
v11.0.0
v10.0.1
v10.0.0
release_2019_08_27
release_8.0.0
v8.0.0
release_2018_02_15
release_2013_11_11
Halide
/
src
/
AlignLoads.cpp
Halide
/
src
/
AlignLoads.cpp
AlignLoads.cpp 6.74 KB
一键复制 编辑 原始数据 按行查看 历史
Volodymyr Kysenko 提交于 2022年06月15日 02:08 +08:00 . Rewrite strided loads of 4 in AlignLoads (#6806)
#include <algorithm>
#include "AlignLoads.h"
#include "Bounds.h"
#include "HexagonAlignment.h"
#include "IRMutator.h"
#include "IROperator.h"
#include "ModulusRemainder.h"
#include "Scope.h"
#include "Simplify.h"
using std::vector;
namespace Halide {
namespace Internal {
namespace {
// This mutator attempts to rewrite unaligned or strided loads to
// sequences of aligned loads by loading aligned vectors that cover
// the original unaligned load, and then slicing or shuffling the
// intended vector out of the aligned vector.
class AlignLoads : public IRMutator {
public:
AlignLoads(int alignment, int min_bytes)
: alignment_analyzer(alignment), required_alignment(alignment), min_bytes_to_align(min_bytes) {
}
private:
HexagonAlignmentAnalyzer alignment_analyzer;
// Loads and stores should ideally be aligned to the vector width in bytes.
int required_alignment;
// Minimum size of load to align.
int min_bytes_to_align;
using IRMutator::visit;
// Rewrite a load to have a new index, updating the type if necessary.
Expr make_load(const Load *load, const Expr &index, ModulusRemainder alignment) {
internal_assert(is_const_one(load->predicate)) << "Load should not be predicated.\n";
return mutate(Load::make(load->type.with_lanes(index.type().lanes()), load->name,
index, load->image, load->param,
const_true(index.type().lanes()),
alignment));
}
Expr visit(const Load *op) override {
if (!is_const_one(op->predicate)) {
// TODO(psuriana): Do nothing to predicated loads for now.
return IRMutator::visit(op);
}
if (!op->type.is_vector()) {
// Nothing to do for scalar loads.
return IRMutator::visit(op);
}
if (op->image.defined()) {
// We can't reason about the alignment of external images.
return IRMutator::visit(op);
}
if (required_alignment % op->type.bytes() != 0) {
return IRMutator::visit(op);
}
if (op->type.bytes() * op->type.lanes() <= min_bytes_to_align) {
// These can probably be treated as scalars instead.
return IRMutator::visit(op);
}
Expr index = mutate(op->index);
const Ramp *ramp = index.as<Ramp>();
const int64_t *const_stride = ramp ? as_const_int(ramp->stride) : nullptr;
if (!ramp || !const_stride) {
// We can't handle indirect loads, or loads with
// non-constant strides.
return IRMutator::visit(op);
}
if (!(*const_stride == 1 || *const_stride == 2 || *const_stride == 3 || *const_stride == 4)) {
// Handle ramps with stride 1, 2, 3 or 4 only.
return IRMutator::visit(op);
}
int64_t aligned_offset = 0;
bool is_aligned =
alignment_analyzer.is_aligned(op, &aligned_offset);
// We know the alignment_analyzer has been able to reason about alignment
// if the following is true.
bool known_alignment = is_aligned || (!is_aligned && aligned_offset != 0);
int lanes = ramp->lanes;
int native_lanes = required_alignment / op->type.bytes();
int stride = static_cast<int>(*const_stride);
if (stride != 1) {
internal_assert(stride >= 0);
// If we know the offset of this strided load is smaller
// than the stride, we can just make the load aligned now
// without requiring more vectors from the dense
// load. This makes loads like f(2*x + 1) into an aligned
// load of double length, with a single shuffle.
int shift = known_alignment && aligned_offset < stride ? aligned_offset : 0;
// Load a dense vector covering all of the addresses in the load.
Expr dense_base = simplify(ramp->base - shift);
ModulusRemainder alignment = op->alignment - shift;
Expr dense_index = Ramp::make(dense_base, 1, lanes * stride);
Expr dense = make_load(op, dense_index, alignment);
// Shuffle the dense load.
return Shuffle::make_slice(dense, shift, stride, lanes);
}
// We now have a dense vector load to deal with.
internal_assert(stride == 1);
if (lanes < native_lanes) {
// This load is smaller than a native vector. Load a
// native vector.
Expr ramp_base = ramp->base;
ModulusRemainder alignment = op->alignment;
int slice_offset = 0;
// If load is smaller than a native vector and can fully fit inside of it and offset is known,
// we can simply offset the native load and slice.
if (!is_aligned && aligned_offset != 0 && Int(32).can_represent(aligned_offset) && (aligned_offset + lanes <= native_lanes)) {
ramp_base = simplify(ramp_base - (int)aligned_offset);
alignment = alignment - aligned_offset;
slice_offset = aligned_offset;
}
Expr native_load = make_load(op, Ramp::make(ramp_base, 1, native_lanes), alignment);
// Slice the native load.
return Shuffle::make_slice(native_load, slice_offset, 1, lanes);
}
if (lanes > native_lanes) {
// This load is larger than a native vector. Load native
// vectors, and concatenate the results.
vector<Expr> slices;
for (int i = 0; i < lanes; i += native_lanes) {
int slice_lanes = std::min(native_lanes, lanes - i);
Expr slice_base = simplify(ramp->base + i);
ModulusRemainder alignment = op->alignment + i;
slices.push_back(make_load(op, Ramp::make(slice_base, 1, slice_lanes), alignment));
}
return Shuffle::make_concat(slices);
}
if (!is_aligned && aligned_offset != 0 && Int(32).can_represent(aligned_offset)) {
// We know the offset of this load from an aligned
// address. Rewrite this is an aligned load of two
// native vectors, followed by a shuffle.
Expr aligned_base = simplify(ramp->base - (int)aligned_offset);
ModulusRemainder alignment = op->alignment - (int)aligned_offset;
Expr aligned_load = make_load(op, Ramp::make(aligned_base, 1, lanes * 2), alignment);
return Shuffle::make_slice(aligned_load, (int)aligned_offset, 1, lanes);
}
return IRMutator::visit(op);
}
};
} // namespace
Stmt align_loads(const Stmt &s, int alignment, int min_bytes_to_align) {
return AlignLoads(alignment, min_bytes_to_align).mutate(s);
}
} // namespace Internal
} // namespace Halide
Loading...
举报
举报成功
我们将于2个工作日内通过站内信反馈结果给你!
请认真填写举报原因,尽可能描述详细。
请选择举报类型
取消
发送
误判申诉

此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。

如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。

取消
提交

简介

MIT计算机科学和人工智能实验室的研究人员创造出一种专门设计简化图像处理的程序语言Halide,源代码托管在GitHub上,目前二进制程序只支持Mac OS X和Ubuntu 12
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
编辑仓库简介
简介内容
主页
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
C/C++
1
https://gitee.com/VisionDeveloper/Halide.git
git@gitee.com:VisionDeveloper/Halide.git
VisionDeveloper
Halide
Halide
main
点此查找更多帮助

搜索帮助

评论
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册

AltStyle によって変換されたページ (->オリジナル) /