Lead or contribute to the end-to-end training, fine-tuning, and quantization of LLM/LVM/LMM models, especially in low-bit quantization Design and implement scalable, robust systems and engineering pipelines for model training, evaluation, quantization (PTQ and QAT), and can support customers' on-device deployment. Algorithms research and development in VLM, VLA and other multimodality models, diffusion-based methods for image and text generation, efficient computation (MoE, LoRA or others). Experience of multimodal inference and training, such as image generation, 3D, video generation, editing, ViT and other models. Efficient inference algorithms research and advanced quantization, e.g. batching, KV caching, efficient attentions, long context, speculative decoding, GPTQ, SpinQuant, automatic mixed precision. Apply solutions toward systems innovations for model efficiency advancement on device as well as in the cloud. Research and integrate state-of-the-art algorithms in generative AI, quantization techniques, knowledge distillation, model compression, and efficient inference. Build, maintain, and automate test suites and profiling/debugging tools to validate and benchmark model performance and deployment effectiveness. Document methodologies and results, and present key findings to stakeholders. Solid programming skills in Python, with proficiency in PyTorch; Demonstrated experience in both PTQ (Post-Training Quantization) and QAT (Quantization-Aware Training) for deep neural networks, especially under low-bit (≤8 bits) regimes. Hands-on experience with training or quantization pipelines such as Llama Factory or AIMET. Experience in LoRA adapter-tuning, speculative decoding, model compression. Experience in developing or optimizing memory-efficient, high-speed inference engines such as vLLM and SGLANG. Knowledge in start-of-the-art PTQ and QAT algorithms Knowledge in reinforcement learning (RL) Knowledge in on-device learning, federated learning, or continual learning Experience using AI coding assistants such as Claude code, Codex, or Cursor is a plus Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 1+ year of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR PhD in Computer Science, Engineering, Information Systems, or related field.