Ying 🐇

Working on systems where algorithms, compilers, and hardware are designed as one.

Research Interests

I'm broadly interested in deep learning systems, compilers, and hardware-aware programming abstractions. The two questions I've been thinking about recently are:

What happens when the algorithm, software, and hardware layers are designed together, rather than stacked on top of each other?
What does it take for large-scale human–LLM collaboration — across people and across agents — to sustain delivery inside a real software stack over the long run?

Outside of research, I enjoy writing programs and building software systems — I just like making things.

🚀 Current Projects

Both projects are joint work with friends at @tile-ai.

🧠 TileRT — a take on algorithm · software · hardware co-design.

An ongoing effort that grew out of our earlier research — exploring what the layers between algorithms and hardware should look like when they are designed together, rather than stacked on top of each other.

🤖 TileOPs — operator library development in the agent era.

An exploration of how far LLM agents can go in autonomously developing an operator library — from writing kernels to testing and iterating on them — with quality good enough to actually ship.

🧩 Past & Ongoing Work

🚀 TileFusion — an experimental C++ macro kernel template library that raises the abstraction level of CUDA C for tile processing, so algorithm developers can innovate on hardware-aware LLM kernels without drowning in low-level details.
🧩 FractalTensor — a programming framework built around FractalTensor: nested, statically-shaped tensor lists with functional array operators (map / reduce / scan). DSL + IR work inspired by polyhedral loop analysis. [paper]
🔍 VPTQ — an extreme low-bit quantization algorithm and inference library for LLMs, led by my friend @YangWang92; I contribute on the systems side.

✍️ Writing & Elsewhere

I keep a blog where I jot down ideas that catch my attention in daily work — updates are infrequent but unhurried. @haruhi55 is also me in disguise. 🐵✨

📫 Contact

lcy.seso@gmail.com · caoyingseso@126.com

Feel free to reach out — happy to talk about deep learning systems, compilers, hardware co-design, or LLM-driven engineering.

Pinned Loading

tile-ai/TileRT Public

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 1.4k 82

tile-ai/TileOPs Public

High-performance LLM operator library built on TileLang.

Python 145 40

microsoft/TileFusion Public

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 111 6

microsoft/FractalTensor Public

FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a Fractal...

Python 32 7

microsoft/VPTQ Public

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 680 52

LearningNotes Public

Ying's notes

TeX 8

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cao Ying lcy-seso

Achievements