waterdrop26651 (waterdrop)

Representations flow. Interactions give them shape.

About

I am Zihao Wang, a student researcher in artificial intelligence.

B.Eng. in Artificial Intelligence, Wuhan University (2022.09 - 2026.06)
M.Phil. in Artificial Intelligence (Expected), The Chinese University of Hong Kong, Shenzhen (2026.09 - 2028.06)

My current interests sit around Speech Language Models (SLM), Representation Learning, Real-Time Interaction, and Agent Memory.

Personal Statement

I first approached foundation models through LLM safety and multimodal safety, but safety has never been the edge of my curiosity. It is more like a probe: a way to see how data enters a model, becomes representation, and begins to flow through hidden space.

What draws me is this flow. Representations can be fragile; they reveal cracks under attack, alignment, or cross-modal transfer. Yet precisely because of that fragility, they can also be read, steered, and gently intervened on. In language models, I saw safety as a trace left by internal structure. In vision-language models, I began to ask how different modalities meet within the same hidden current.

Interaction is the other current I keep following. Today, even the strongest agents often meet us through text: a prompt, a response, another prompt, another response. This runtime loop is powerful, but thin. Speech makes the loop denser: interruption, overlap, pause, repair, rhythm, and presence. It turns interaction from exchanging messages with a fixed machine into sharing time with a system that must listen, wait, adjust, and respond.

But interaction is not only between humans and models. It can also happen between modalities, between agents, between a model and its external memory, and perhaps one day within the model itself. Most models are frozen after training; what changes during use is usually only the context around them. I am curious about whether interaction can become more than context: whether it can reshape memory, update behavior, or create new internal conditions for reasoning.

Across these paths, I keep watching the same undercurrent: how representations flow, how modalities meet, and whether a fixed model can still learn to remember, respond, and change through interaction.

Research Vectors

Representation	Modality	Real-Time Interaction	External Memory
Hidden-space flow, interpretability, and steering.	Language, vision-language, speech, and whatever comes next.	Interruption, latency, turn-taking, and agents participating in time.	Context, tools, archives, and Memento-like traces for fixed models.

Projects

Memento Skill
Controlled external memory for agents: separate facts from beliefs, keep only high-gradient evidence hot, and recall archives only with a trigger.
`npx skills add waterdrop26651/Memento-skill`

Do not let every note become a tattoo.

Spider Memory	MMSteer	Ouroboros Transformer
Graph-based associative memory for agent conversations, with weighted recall, reflection, and cold-layer archival.	Multimodal model safety research for Qwen2.5-VL-class systems, centered on safety representations and preventative steering.	A cyclic Transformer architecture experiment where independent blocks form a ring and information flows through repeated loops.

Pinned Loading

Memento-skill Public

Controlled recall for fragmented experiments, evolving hypotheses, and high-information next steps.

Python

MMSteer Public

Python

ouroboros_transformer Public

Ouroboros Transformer: A cyclic neural architecture where independent transformer blocks form a ring, with information flowing through the loop multiple times. Inspired by the ancient symbol of a s...

Upload-Intelligence Public

Become part of the thought collective, become an upload intelligence.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waterdrop waterdrop26651

Achievements