AI Frameworks Software Engineer – Model Compression Algorithm
Intel View all jobs
- Shanghai
- Permanent
- Full-time
- Develop Intel Neural Compressor product and related tools (auto-round), optimize for Intel AI platform, including CPU, GPU and AI Accelerator
- Research and implement quantization and compression techniques for large language models (LLMs) and text-to-image/video generation models
- Track and explore cutting-edge directions in efficient model deployment and inference/finetuning acceleration.
- Master’s or PHD’s degree, major in computer science or related subjects
- Solid understanding of deep learning, deep learning framework and large language model (LLM) fundamentals
- Familiarity with model compression techniques such as quantization and pruning
- Proficiency in Python/C++ or other programming languages commonly used for deep learning development
- Strong sense of teamwork and group collaboration
- Good English oral and written skill
- Strong self-motivation and problem-solving skills
- Passion for technological innovation and practical engineering, with a drive for continuous exploration and improvement
- Experience in model fine-tuning, inference optimization or related tool development is a plus