Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@waterdrop26651
waterdrop26651
Follow

waterdrop waterdrop26651

Block or report waterdrop26651

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
waterdrop26651 /README.md
Typing SVG

Representations flow. Interactions give them shape.


Personal Website Email



About

I am Zihao Wang, a student researcher in artificial intelligence.

  • B.Eng. in Artificial Intelligence, Wuhan University (2022.09 - 2026.06)
  • M.Phil. in Artificial Intelligence (Expected), The Chinese University of Hong Kong, Shenzhen (2026.09 - 2028.06)

My current interests sit around Speech Language Models (SLM), Representation Learning, Real-Time Interaction, and Agent Memory.


Personal Statement

I first approached foundation models through LLM safety and multimodal safety, but safety has never been the edge of my curiosity. It is more like a probe: a way to see how data enters a model, becomes representation, and begins to flow through hidden space.

What draws me is this flow. Representations can be fragile; they reveal cracks under attack, alignment, or cross-modal transfer. Yet precisely because of that fragility, they can also be read, steered, and gently intervened on. In language models, I saw safety as a trace left by internal structure. In vision-language models, I began to ask how different modalities meet within the same hidden current.

Interaction is the other current I keep following. Today, even the strongest agents often meet us through text: a prompt, a response, another prompt, another response. This runtime loop is powerful, but thin. Speech makes the loop denser: interruption, overlap, pause, repair, rhythm, and presence. It turns interaction from exchanging messages with a fixed machine into sharing time with a system that must listen, wait, adjust, and respond.

But interaction is not only between humans and models. It can also happen between modalities, between agents, between a model and its external memory, and perhaps one day within the model itself. Most models are frozen after training; what changes during use is usually only the context around them. I am curious about whether interaction can become more than context: whether it can reshape memory, update behavior, or create new internal conditions for reasoning.

Across these paths, I keep watching the same undercurrent: how representations flow, how modalities meet, and whether a fixed model can still learn to remember, respond, and change through interaction.


Research Vectors

Representation Modality Real-Time Interaction External Memory
Hidden-space flow, interpretability, and steering. Language, vision-language, speech, and whatever comes next. Interruption, latency, turn-taking, and agents participating in time. Context, tools, archives, and Memento-like traces for fixed models.

Projects

Memento Skill
Controlled external memory for agents: separate facts from beliefs, keep only high-gradient evidence hot, and recall archives only with a trigger.
npx skills add waterdrop26651/Memento-skill

Do not let every note become a tattoo.


Spider Memory MMSteer Ouroboros Transformer
Graph-based associative memory for agent conversations, with weighted recall, reflection, and cold-layer archival. Multimodal model safety research for Qwen2.5-VL-class systems, centered on safety representations and preventative steering. A cyclic Transformer architecture experiment where independent blocks form a ring and information flows through repeated loops.

Pinned Loading

  1. Memento-skill Memento-skill Public

    Controlled recall for fragmented experiments, evolving hypotheses, and high-information next steps.

    Python

  2. MMSteer MMSteer Public

    Python

  3. ouroboros_transformer ouroboros_transformer Public

    Ouroboros Transformer: A cyclic neural architecture where independent transformer blocks form a ring, with information flowing through the loop multiple times. Inspired by the ancient symbol of a s...

  4. Upload-Intelligence Upload-Intelligence Public

    Become part of the thought collective, become an upload intelligence.

AltStyle によって変換されたページ (->オリジナル) /