
AI Ops Engineer
- Guangzhou, Guangdong
- Permanent
- Full-time
- MLOps: Implementing and optimizing MLOps practices to streamline the machine learning lifecycle, from development to deployment and monitoring.
- LLMOps: Managing operations for large language models, including development, deployment and monitoring.
- CI/CD/CT/CM: Developing and maintaining continuous integration, deployment, testing, and monitoring pipelines.
- Standardization: Establish and enforce best practices for machine learning model development, testing, and deployment.
- Infrastructure Setup: Oversee the setup and management of cloud and on-prem platforms to support AI workloads.
- Domain Design and Catalog Implementation: Designing and maintaining domain models and data catalogs.
- Model Registry and Cataloging Layer: Developing and maintaining systems for registering and cataloging AI models.
- RAI Automation: Implementing and automating Responsible AI (RAI) practices for ethical AI development.
- Governance Dashboards: Building and maintaining dashboards for governance and compliance monitoring.
- Governance Framework and Workflows: Developing and enforcing governance frameworks and workflows.
- Lineage Tracking and Auditing: Implementing and maintaining tools for tracking data and model lineage and auditing.
- Risk and control library: Working with OTCR and PCA teams for AI controls as aligned with RAI principles.
- Collaboration: Work closely with data scientists, software engineers, and platform engineers to deliver high-quality AI solutions.
- Innovation: Stay abreast of the latest advancements in AI and machine learning, and apply this knowledge to improve our platform and processes.
- Mentorship: Provide technical guidance and mentorship to junior engineers and team members.
- Hands on with AI platform development and monitoring
- Hands on with ML framework, best practices and ops implementation
- Hands-on experience with GPU computing and optimization for AI workloads
- Hands-on experience with design and implementation of sophisticated data catalogue
- Programming languages such as Python, Java, or C++.
- TTO Management Team
- T&A Management Team
- CDO
- Software Engineering Platform Teams
- Business, Regional & Country CIOs & COOs
- Operational, Technology & Cyber Risk
- Chief Data Office
- Group and Country Compliance & Regulators
- Group Internal Audit
- Group Operational Risk
- Experience: 3-5 years of experience in AI and machine learning engineering, with a proven track record of delivering complex AI projects.
- Technical Expertise: Strong proficiency in machine learning frameworks (e.g., TensorFlow, PyTorch), MLOps tools (e.g., MLflow, Kubeflow), and cloud platforms and infrastructure as code (e.g., AWS, Azure).
- GPU Expertise: Hands-on experience with GPU computing and optimization for AI workloads.
- Programming Skills: Proficiency in programming languages such as Python, Java, or C++.
- Data Engineering: Strong skills in data engineering, including data integration, ETL processes, and working with large datasets.
- Problem-Solving: Excellent analytical and problem-solving skills, with the ability to think critically and creatively.
- Communication: Strong interpersonal and communication skills, with the ability to work effectively in a collaborative team environment
- Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do
- Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well
- Are better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term
- Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations.
- Time-off including annual leave, parental/maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.
- Flexible working options based around home and office locations, with flexible working patterns.
- Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits
- A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.
- Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.