Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@lcy-seso
lcy-seso
Follow
MSR Asia, system research group. Previously worked at Baidu IDL(Institution of Deep Learning) and contributed as a member of the Paddle team.

Block or report lcy-seso

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lcy-seso /README.md

Ying πŸ‡

Working on systems where algorithms, compilers, and hardware are designed as one.


Research Interests

I'm broadly interested in deep learning systems, compilers, and hardware-aware programming abstractions. The two questions I've been thinking about recently are:

  • What happens when the algorithm, software, and hardware layers are designed together, rather than stacked on top of each other?
  • What does it take for large-scale human–LLM collaboration β€” across people and across agents β€” to sustain delivery inside a real software stack over the long run?

Outside of research, I enjoy writing programs and building software systems β€” I just like making things.


πŸš€ Current Projects

Both projects are joint work with friends at @tile-ai.

🧠 TileRT β€” a take on algorithm Β· software Β· hardware co-design.

An ongoing effort that grew out of our earlier research β€” exploring what the layers between algorithms and hardware should look like when they are designed together, rather than stacked on top of each other.

πŸ€– TileOPs β€” operator library development in the agent era.

An exploration of how far LLM agents can go in autonomously developing an operator library β€” from writing kernels to testing and iterating on them β€” with quality good enough to actually ship.


🧩 Past & Ongoing Work

  • πŸš€ TileFusion β€” an experimental C++ macro kernel template library that raises the abstraction level of CUDA C for tile processing, so algorithm developers can innovate on hardware-aware LLM kernels without drowning in low-level details.
  • 🧩 FractalTensor β€” a programming framework built around FractalTensor: nested, statically-shaped tensor lists with functional array operators (map / reduce / scan). DSL + IR work inspired by polyhedral loop analysis. [paper]
  • πŸ” VPTQ β€” an extreme low-bit quantization algorithm and inference library for LLMs, led by my friend @YangWang92; I contribute on the systems side.

✍️ Writing & Elsewhere

I keep a blog where I jot down ideas that catch my attention in daily work β€” updates are infrequent but unhurried. @haruhi55 is also me in disguise. 🐡✨

πŸ“« Contact

lcy.seso@gmail.com Β· caoyingseso@126.com

Feel free to reach out β€” happy to talk about deep learning systems, compilers, hardware co-design, or LLM-driven engineering.

Pinned Loading

  1. tile-ai/TileRT tile-ai/TileRT Public

    Tile-Based Runtime for Ultra-Low-Latency LLM Inference

    Python 1.4k 82

  2. tile-ai/TileOPs tile-ai/TileOPs Public

    High-performance LLM operator library built on TileLang.

    Python 145 40

  3. microsoft/TileFusion microsoft/TileFusion Public

    TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

    Cuda 111 6

  4. microsoft/FractalTensor microsoft/FractalTensor Public

    FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a Fractal...

    Python 32 7

  5. microsoft/VPTQ microsoft/VPTQ Public

    VPTQ, A Flexible and Extreme low-bit quantization algorithm

    Python 680 52

  6. LearningNotes LearningNotes Public

    Ying's notes

    TeX 8

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /