AI Software Development Eng
Advanced Micro Devices View all jobs
- Shanghai
- Permanent
- Full-time
- End to end optimization: Build and optimize end to end distributed inference (e.g, P/D disaggregation and Large-EP) and RL solutions on mainstream frameworks like vLLM and SGlang.
- Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
- Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.
- Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems.
- Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance.
- Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers.
- Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.
- GPU Kernel Development & Optimization: Deep experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM). Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
- Deep Learning Integration: Strong experienced in integrating optimized GPU performance into machine learning and LLM frameworks (e.g., vLLM, SGlang,TensorFlow, PyTorch) to accelerate model training and inference, with a focus on scaling and throughput.
- End to end solution optimization: Understand the latest market trend of LLM and multimodal, solid hands-on E2E performance tuning experience on distributed inference (e.g, P/D disaggregation and Large-EP) and RL. Experienced in Text to Video or Image to Video is a plus.
- Software Engineering: Skilled in Python and C++, with experience in debugging, performance tuning, and test design to ensure high-quality, maintainable software solutions.
- High-Performance Computing: Expert experienced in running large-scale workloads on heterogeneous computing clusters, optimizing efficiency and scalability.
- Compiler Optimization: Solid understanding of compiler theory and tools like LLVM and ROCm for kernel and system performance optimization.
- Master’s or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related fields.
- 5+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development.