高级计算机视觉工程师/Senior Computer Vision Engineer
Caper
- Changning, Shanghai
- Permanent
- Full-time
- Design & build a high‑throughput inference engine that can orchestrate multiple vision models (detection, tracking, recognition) on edge hardware.
- Accelerate model inference using TensorRT, ONNX Runtime, TVM, custom CUDA kernels, and hardware‑specific APIs (GPU, NPU, DSP).
- Create CI/CD deployment pipelines (containerization, model versioning, zero‑downtime roll‑outs) that move models from training to on‑cart runtime.
- Implement monitoring, profiling & alerting for latency, throughput, and resource use; iterate to meet strict real‑time SLAs.
- Integrate inference services with Caper’s broader infra (message buses, telemetry, OTA update system, configuration management).
- Collaborate with XFN teams (product, data, hardware, operations) to ensure the on‑board AI stack is robust, scalable, and can run A/B or store‑level experiments with minimal friction.
- Partner with research engineers to translate prototype CV models into production‑ready, inference‑efficient versions.
- Document architecture, standards & best‑practice guidelines; mentor junior engineers on infra‑focused development.
- Stay current on emerging edge‑AI frameworks, model compression, and multi‑model scheduling algorithms; evaluate and adopt them when beneficial.
- MS or Ph.D. in Computer Science, Electrical Engineering, or a related field (AI/Computer Vision focus a plus).
- 5+ years of professional experience building and maintaining AI inference infrastructure for edge or embedded systems.
- Strong software‑engineering background with expertise in C++ and Python, solid understanding of design patterns, CI/CD, and debugging.
- Proven experience with model deployment frameworks (TensorRT, ONNX Runtime, TVM, OpenVINO, Triton Inference Server, etc.) and hardware acceleration (CUDA, cuDNN, Vulkan, OpenCL, ASIC/NPU APIs).
- Hands‑on experience creating inference engines that orchestrate multiple models simultaneously while meeting real‑time latency budgets.
- Demonstrated ability to integrate AI components into larger distributed systems (REST/gRPC services, message queues, edge‑cloud orchestration).
- Excellent communication skills; comfortable presenting technical solutions to cross‑functional stakeholders.
- Excellent English documentation skills – must be able to write clear, comprehensive technical documents for a global audience.
- Familiarity with cloud platforms (AWS, GCP, Azure) and MLOps tools (Kubeflow, MLflow, SageMaker, etc.).
- A portfolio of computer‑vision projects (object detection, tracking, pose estimation) that showcases end‑to‑end pipelines from data collection to deployment.
- Knowledge of model training workflows, dataset management, and transfer learning – useful for close collaboration with research teams.
- Publications or open‑source contributions in efficient inference, model compression, or edge AI.
- Proficiency with GPU profiling tools (Nsight, PerfKit) and performance‑tuning techniques (kernel fusion, quantization, pruning).
- Good English communication skills – ability to present ideas clearly and collaborate effectively with globally distributed teams