Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@bighuang624
bighuang624
Follow

Kyon Huang bighuang624

🎯
Focusing
Ph.D. @ Zhejiang University & Westlake University

Block or report bighuang624

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bighuang624 /README.md

👋 Hi! I am Siteng Huang (黄思腾 in Chinese). I work at DAMO Academy as an Algorithm Expert in Hangzhou. I received my Ph.D. degree from Zhejiang University in June 2024, affiliated with a joint program with Westlake University at Machine Intelligence Laboratory (MiLAB) and advised by Prof. Donglin Wang. In my Ph.D. study, I also spent wonderful internship time at TongYi Lab, Alibaba Group. Before that, I received my B.Eng. Degree from School of Computer Science, Wuhan University in June 2019.

🔬 My research has centered on the perception, understanding, reasoning, and generation of multimodal (including images, videos, language, dynamics, etc.) data from both the internet and the physical world. I also focus on efficientAI (in terms of data, time, parameters, memory, etc.) for multimodal applications. I have published 20+ papers on the above topics at the top-tier international AI conferences and journals. Recently, I devote myself to the development of multi-modal generative, embodied, and unified foundation models.

🌟 I am honored to have supervised several self-motivated visiting students and research assistants in their research and publications. If you are seeking any form of academic cooperation, please feel free to email me at siteng.huang[AT]gmail.com (replace [AT] with @). Additionly, I maintain close cooperation with MiLAB from Westlake University. This top-tier robot learning lab is actively looking for visiting students and RAs (please refer to Recruitment). Specially, if you are willing to cooperate with me there, please also send me a copy when sending your CV to the lab.

Welcome to refer to my full publication list at my personal homepage.

Twitter GitHub GitHub

📢 News

  • 2025年10月14日 [Preprint] We released RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into scalable, highfidelity, and physically interactive simulation environments for robotic manipulation! An overview video can be found in Project page!
  • 2025年09月19日 [NeurIPS'25] SSR got accepted for NeurIPS 2025! The work transforms raw depth data into structured, interpretable textual CoT, enhancing spatial reasoning capabilities of MLLMs. See Project page and Github!
  • 2025年09月12日 [Preprint] We released VLA-Adapter, which reduces reliance on large-scale VLMs and extensive pre-training by using a lightweight Policy module with Bridge Attention, achieving state-of-the-art performance and fast inference speed with minimal computational resources! Checkpoint has been available! See Project page for more details. Got #1 Paper of the day on huggingface papers!
  • 2025年08月13日 [Preprint] We released AffordDex, a universal grasping policy for dexterous hands with an inherent understanding of both motion priors and object affordances! Grasping videos can be found in Project page!
  • 2025年08月08日 [DAMO RynnBot] We open-sourced RynnEC: a video MLLM for embodied cognition tasks, RynnVLA-001: a VLA model based on pretrained video generation model, RynnRCP: a complete set of robot service agreements and frameworks! 2025年08月11日 We released the technical blog for RynnVLA-001! 2025年09月19日 We released the technical report for RynnVLA-001!
  • 2025年08月02日 [CoRL'25] Long-VLA, a novel framework designed to enhance VLA models for challenging long-horizon robotic manipulation tasks, got accepted for CoRL 2025!
  • 2025年07月24日 [DAMO RynnBot] We released RynnBot PlayGround Beta, a platform that provides data management, SOTA VLA models, model training and validation, cloud-edge collaborative deployment, and so on! Welcome to follow our continuous progress!
  • 2025年06月27日 [Preprint] We released WorldVLA, an autoregressive action world model that unifies action and image understanding and generation! Code has been available!
  • 2025年06月26日 [ICCV'25] CARP, Coarse-to-fine AutoRegressive Prediction for visuomotor policy learning, got accepted for ICCV 2025! The approach produces highly accurate and smooth robot actions, achieving up to a 10% improvement of success rates, and delivers 10x faster inference compared to state-of-the-art policies. Paper, code and cool videos can be found in Project page!
  • 2025年05月22日 [Preprint] We released VARD, a novel RL fine-tuning method on diffusion-based generative models for both protein structure and text-to-image synthesis, enhancing sample quality with improved efficiency, effective mitigation of reward hacking, and broad applicability.
  • 2025年05月07日 [Preprint] We released OpenHelix, a low-cost open-source dual-system VLA with systematic empirical evaluations on the core design elements. Code and List of papers have been available!
  • 2025年03月31日 [Preprint] We released Unicorn to explore the question: can high-quality multimodal training data be synthesized purely from text?
  • 2025年03月28日 [Survey Preprint] We released Exploring the Evolution of Physics Cognition in Video Generation: A Survey, which dives deep into the development of physics cognition in video generation, from basic perception to active cognition! List of papers has been available!
  • 2025年03月11日 [TCSVT'25] M2IST, a novel Multi-Modal Interactive Side-Tuning method that effectively addresses the challenges of insufficient multi-modal interaction and high GPU memory consumption, got accepted for IEEE Transactions on Circuits and Systems for Video Technology! Code has been available!
  • 2025年02月24日 [Preprint] We released Humanoid-VLA, a novel framework that integrates language understanding, egocentric scene perception, and motion control, enabling universal humanoid control!
  • 2025年01月28日 [ICRA'25] QUART-Online, a novel latency-free quadruped MLLM model that achieves real-time inference while boosting the success rate across various tasks by 65%, got accepted for ICRA 2025! See Project page.
  • 2025年01月23日 [ICLR'25] ToCa, a token-wise feature caching method that achieves a 2x acceleration for PixArt-α, OpenSora, and DiT while maintaining nearly lossless generation quality, got accepted for ICLR 2025! Code has been available!
  • 2025年01月10日 [Preprint] We released GlobalCom2, a "global-to-local" approach for training-free acceleration of high-resolution MLLMs with AnyRes strategy. Code has been available!
  • 2024年12月13日 [Preprint] We released Score and Distribution Matching Policy, which transforms diffusion-based policies into single-step generators for light-weight deployment, fast reasoning and precise action generation. Project page has been available.
  • 2024年12月10日 [AAAI'25] Cobra, the first Mamba-based MLLM with efficient inference, got accepted for AAAI 2025! See Project page.
  • 2024年11月27日 [Preprint] We released a new work on token reduction for MLLM inference acceleration, which proposes a unified paradigm to demystify the popular works and guide the future designs, and further offers a suite of methods FiCoCo grounded in the paradigm. Project page and code have been available!
  • 2024年09月09日 [New Start] Joined Alibaba DAMO Academy as an Algorithm Expert!
  • 2024年06月04日 [Graduation] I successfully defended my dissertation. So many thanks to my Ph.D. committee (Prof. Xiaogang Jin, Prof. Mai Xu, Prof. Changxin Gao, Prof. Fajie Yuan, Prof. Peidong Liu, Prof. Xiaofei Li) and my advisor!

Pinned Loading

  1. bighuang624.github.io bighuang624.github.io Public

    学术主页 | Academic Page

    SCSS 15 19

  2. VoP VoP Public

    [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

    38 3

  3. DSANet DSANet Public

    Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

    Python 261 58

  4. AGAM AGAM Public

    Code for the AAAI 2021 paper "Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition".

    Python 10 6

  5. Troika Troika Public

    [CVPR 2024] Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

    Python 25 3

  6. AI-research-tools AI-research-tools Public

    🔨AI 方向好用的科研工具

    2.9k 371

AltStyle によって変換されたページ (->オリジナル) /