Software Engineer ×ばつ AI/ML Developer ×ばつ Performance Architect
×ばつ AI/ML Developer ×ばつ Performance Architect " href="#--software-engineer--aiml-developer--performance-architect">I'm a software engineer who transforms complex challenges into elegant solutions that scale. From optimizing CUDA kernels for 1.46x speedups to building real-time platforms with sub-500ms latency, I thrive at the intersection of technical excellence and business impact.
My approach is simple: measure twice, optimize once, ship constantly. Whether it's achieving 94% accuracy in production ML systems or rendering 1M+ points at 858 FPS, I believe in pushing the boundaries of what's possible while keeping the user experience at the center.
Currently seeking opportunities to tackle meaningful challenges at companies building the future.
94% accuracy 120ms latency Production RAG
Built a production RAG system with fine-tuned Llama-3.1-8B that matches GPT-4 quality at a fraction of the cost. Implemented custom attention caching that reduced latency by 73%, enabling real-time responses.
Technical Deep Dive
- Architecture: Hierarchical vector indexing with FAISS
- Innovation: Custom KV-cache optimization for transformers
- Stack: PyTorch, LangChain, FastAPI, PostgreSQL
- Deployment: Kubernetes with horizontal autoscaling
<500ms sync WebSocket protocol 85% bandwidth optimized
Created a video watch party platform with perfect synchronization across distributed clients. Engineered a binary WebSocket protocol with delta compression, achieving sub-500ms latency for seamless real-time collaboration.
Technical Deep Dive
- Protocol: Custom binary format over WebSocket
- Scaling: Redis pub/sub for horizontal distribution
- Stack: React, NestJS, Socket.IO, Redis
- Security: JWT with room-based permissions
1.46x speedup 95.3% bandwidth utilization Kernel fusion
Developed fused CUDA kernels for transformer models, achieving near-theoretical memory bandwidth utilization. This optimization enables significantly faster inference for large language models through innovative kernel fusion techniques.
Technical Deep Dive
- Technique: Kernel fusion for LayerNorm + Activation
- Memory: Coalesced access patterns, shared memory
- Stack: CUDA C++, PyTorch extensions, nvprof
- Impact: 46% inference speedup for LLMs
858 FPS 1M+ points 7.2x faster
Built a 3D point cloud viewer that outperforms industry standards by 7.2x. Implemented custom spatial indexing and SIMD optimizations to achieve real-time rendering of massive datasets.
Technical Deep Dive
- Algorithm: Custom octree with frustum culling
- Rendering: Instanced drawing with GPU batching
- Stack: C++17, OpenGL 4.5, GLM, ImGui
- Optimization: SIMD intrinsics for transforms
Python TypeScript
TypeScript C++
C++ React
React PyTorch
PyTorch Docker
Docker Kubernetes
Kubernetes Systems
Systems
📚 View Complete Tech Stack
Core Languages: Expert: [Python, TypeScript, C++, JavaScript] Proficient: [CUDA, SQL, Bash] AI/ML Stack: Frameworks: [PyTorch, Transformers, LangChain, scikit-learn] Techniques: [Fine-tuning, RAG, Embeddings, Vector Search] Production: [ONNX, TensorRT, Model Quantization, Batching] Backend Engineering: Python: [FastAPI, Django, Flask, Celery] Node.js: [NestJS, Express, Socket.IO, Bull] APIs: [REST, GraphQL, gRPC, WebSockets] Frontend Development: Core: [React, Next.js, Redux, TypeScript] UI: [Tailwind CSS, Material-UI, Framer Motion] Advanced: [Three.js, D3.js, WebRTC, Canvas API] Data & Infrastructure: Databases: [PostgreSQL, MongoDB, Redis, Elasticsearch] Vector DBs: [Pinecone, FAISS, Chroma, Qdrant] Message Queues: [RabbitMQ, Kafka, Redis Pub/Sub] DevOps & Cloud: Containers: [Docker, Docker Compose, Buildkit] Orchestration: [Kubernetes, Helm, ArgoCD] CI/CD: [GitHub Actions, GitLab CI, Jenkins] Cloud: [AWS (EC2, S3, Lambda), GCP, Vercel] Performance & Systems: GPU: [CUDA, cuDNN, Thrust, OptiX] CPU: [SIMD, OpenMP, Threading, Profiling] Graphics: [OpenGL, Vulkan, Shaders]
| Capability | Evidence |
|---|---|
| 🏗️ Full Product Ownership | Shipped end-to-end solutions from concept to production |
| ⚡ Performance Excellence | 1.46x-7.2x improvements across different domains |
| 📊 Production Experience | Deployed scalable systems with real-world usage |
| 🎯 Technical Precision | 94% ML accuracy, 95.3% GPU efficiency achieved |
| 🚀 Rapid Execution | From idea to MVP in days, not months |
I'm excited about joining teams that are:
- Building products that matter - Real problems, real impact, real users
- Pushing technical boundaries - Where "impossible" is just another challenge
- Moving fast with purpose - Velocity with vision, not just for speed's sake
- Creating the future - Not just following trends, but setting them
I'm always excited to discuss challenging problems and explore how I can contribute to your team's success.
Whether you're building the next breakthrough in AI, scaling systems to billions, or creating products that change lives - let's talk.
Status: Actively seeking new opportunities | Availability: Immediate | Location: Flexible/Remote