Xin Cai TotalVariation
Lists (19)
Sort Name ascending (A-Z)
Stars
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu...
Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with...
LaTeX.css is a CSS library that makes your website look like a LaTeX document
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
🌈Beamer风格的幻灯片模板集。包含了PowerPoint和Keynote两套格式。
A collection of awesome LaTeX Thesis/Dissertation templates and beyond! //(LaTeX / Word / Typst / Markdown 格式的学位论文、演示文稿、报告、项目申请书、简历、书籍等模板收藏)
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Witness the aha moment of VLM with less than 3ドル.
Fully open reproduction of DeepSeek-R1
A curated list of papers for generalist agents
tracking papers, datasets, and models of "large language model (LLM) for time series"
Official repository for our work on micro-budget training of large-scale diffusion models.
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Minimalistic 4D-parallelism distributed training framework for education purpose
Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022
[ICCV 2023] Understanding 3D Object Interaction from a Single Image
Code for "AffordanceLLM: Grounding Affordance from Vision Language Models"
MOKA: Open-World Robotic Manipulation through Mark-based Visual Prompting (RSS 2024)
A suite of image and video neural tokenizers
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
An Invitation to 3D Vision: A Tutorial for Everyone
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Softcopy of Engineering Books. If you want a book{s} to be taken down, please contact me.