[Hiring] LLM Agent Training Algorithm Engineer/Expert | Alibaba Taobao & Tmall Group

About the Team

We are the Industrial Chain Agentic AI Team at Alibaba 1688 (B2B Platform). Leveraging China's largest B2B inventory and supply chain ecosystem, we aim to reshape the B2B e-commerce industry using cutting-edge Agentic AI and Reinforcement Learning (RL) technologies based on massive e-commerce data. Our team possesses deep expertise in foundation model research and development, with proven experience in training 100B+ parameter models from scratch and deploying them in downstream scenarios. We have won over 10 championships in global NLP/AI benchmarks and published 50+ papers in top-tier venues such as Nature sub-journals, ACL, and ICLR. Recently, we launched "Aoxia", a shopping agent for cross-border scenarios, and released EcomBench, the world's first agentic benchmark for real-world e-commerce, in collaboration with Qwen (Tongyi). We offer a flat management structure, a strong engineering culture, and multiple headcount openings. Join us to grow in both technology and business!

Responsibilities

  1. Agent Fundamentals Optimization: Optimize essential components for Agentic AI in e-commerce verticals, including environment/tool construction, synthetic data generation, and Reward Modeling.
  2. Post-Training Algorithm Optimization: Enhance model capabilities in Tool-use, Planning, Deep Research, and Reporting within complex environments by optimizing post-training algorithms (e.g., GRPO, PPO, SearchR1).
  3. Full-Stack Model Iteration: Participate in the full-process optimization of agentic capabilities for 100B-scale models, covering CPT, Post-train, and Multi-agent RL. Reproduce state-of-the-art research and propose novel algorithms.

Qualifications

  1. Deep understanding of LLM, RL, and Agents. Familiar with common Alignment algorithms (e.g., DPO, PPO, GRPO, DAPO).
  2. Familiar with frontier Agentic RL algorithms and frameworks. Hands-on experience in project development and training. Sensitive to cutting-edge research.
  3. Passionate about AI, highly self-driven, and possessing excellent problem-solving skills.
  4. Preferred Qualifications: Experience in coding competitions, publications in top-tier conferences, or contributions to renowned open-source libraries.
Location: Hangzhou, China
Email: liangding.liam@gmail.com / zuorui.dl@alibaba-inc.com
WeChat: Liam_DL1469

【招聘】阿里巴巴-淘天集团 | 大模型Agent训练算法工程师/专家

团队介绍

我们是阿里巴巴淘天集团 B2B平台1688产业链 Agentic AI 团队。依托全国最大的B2B货盘与供应链体系,基于海量电商数据与工具,致力于凭借前沿的 Agentic AI 和 RL 技术重塑 B2B 电商产业链。 团队成员具备深厚的基模打造与研究能力,拥有从零训练千亿参数底模及下游场景落地的成功经验。团队在全球多个 NLP/AI 榜单斩获 10+ 次冠军,并在 Nature 子刊、ACL、ICLR 等顶刊顶会发表论文 50+ 篇。 近期,我们推出了跨境场景 Shopping Agent 遨虾 (AlphaShop),并联合通义发布了全球首个真实电商场景 Agentic Benchmark — EcomBench。团队技术氛围浓厚,管理扁平,HC充足,期待与您在技术与业务上共同成长!

职位描述

  1. Agent 基础设施优化:在电商垂域开展大模型 Agentic AI 的关键要素构建与优化,涵盖环境工具建设(Environment/Tools)、高质量数据合成及 Reward Modeling;
  2. 后训练算法攻坚:优化 Post-training 算法(如 GRPO/PPO/SearchR1 等),提升模型在复杂环境下的工具使用(Tool-use)、规划(Plan)、深度推理(Deep Research)及报告生成能力;
  3. 全流程模型迭代:参与千亿级模型的 Agentic 能力全流程优化,包括 CPT (Continued Pre-training)、SFT、Post-train 及 Multi-agent RL,负责复现业界前沿工作并探索提出新算法。

职位要求

  1. 深入理解 LLM/RL/Agent 领域知识,熟悉常见的 Alignment 算法(如 DPO/PPO/GRPO/DAPO 等);
  2. 熟悉前沿 Agentic RL 算法与框架,参与过实际项目的开发与训练,具备敏锐的技术视野,对前沿研究保持敏感;
  3. 对 AI 技术充满热情,具备优秀的自驱力(Self-driven)和复杂问题解决能力;
  4. 加分项:拥有编程竞赛获奖经历、顶会论文发表记录或知名开源库贡献者优先。
工作地点: 杭州
投递邮箱: liangding.liam@gmail.com / zuorui.dl@alibaba-inc.com
微信: Liam_DL1469

AltStyle によって変換されたページ (->オリジナル) /