AI Ops Engineer
Standard Chartered View all jobs
- Guangzhou, Guangdong
- Permanent
- Full-time
- Responsible for designing and delivering hybrid AI Platform.
- Maintain scalable, efficient and reliable data and AI pipelines.
- Maintain MLOps and LLMOps pipeline across Databricks and on-prem clusters.
- Develop and implement AI and ML models to enhance operations and automate routine tasks.
- Monitor and analyze system performance, identifying and resolving issues proactively.
- Create and maintain documentation for AIOps processes and solutions.
- Conduct root cause analysis and implement corrective actions to prevent future incidents.
- Design, implement & maintain monitoring strategy for AI Platform across central observability.
- Ensure AI platform delivery and services meet security and risk requirements and operational SLA.
- Collaborate with data scientist, data engineers, and central data governance team to align with a standardized data strategy.
- Collaborate with other teams to integrate AIOps solutions to existing services (ServiceNow, Elk, etc.)
- Cross-Collaborate with technology and architecture teams to align with best practices and standards.
- Perform code-reviews ensuring lower tech debt and avoiding spaghetti code.
- Stay up to date with industry trends, research in artificial intelligence field.
- Develop and maintain strong relations ships with internal and external stakeholders.
- Display exemplary conduct and live by the Group's Values and Code of Conduct.
- Take personal responsibility for embedding the highest standards of ethics, including regulatory and business conduct, across Standard Chartered Bank. This includes understanding and ensuring compliance with, in letter and spirit, all applicable laws, regulations, guidelines and the Group Code of Conduct.
- Effectively and collaboratively identify, escalate, mitigate and resolve risk, conduct and compliance matters.
- TTO Management Team
- T&A Management Team
- CDO
- Software Engineering Platform Teams
- Business, Regional & Country CIOs & COOs
- Operational, Technology & Cyber Risk
- Chief Data Office
- Group and Country Compliance & Regulators
- Group Internal Audit
- Group Operational Risk
- Experience: 4-6 years of experience in engineering or operation disciplines of AI Ops and ML Ops, with a proven track record of delivering complex AI projects.
- Technical Expertise: Strong proficiency in data platform (e.g. Databricks), CICD (e.g. Azure DevOps), machine learning frameworks (e.g., TensorFlow, PyTorch), MLOps tools (e.g., MLflow, Kubeflow), cloud platforms (e.g., Azure, AWS) and infrastructure as code (e.g. Terraform).
- GPU Expertise: Hands-on experience with GPU computing and optimization for AI workloads.
- Programming Skills: Proficiency in programming languages such as Python, Java, C++
- Data Engineering: Exposure to data engineering, integration, ETL and large datasets.
- Problem-Solving: Excellent analytical and problem-solving skills, with the ability to think critically and creatively.
- Communication: Strong interpersonal and communication skills, with the ability to work effectively in a collaborative team environment.
- Hands on with AI platform Development
- Hands on with ML framework, best practices and ops implementation
- Hands-on experience with GPU computing and optimization for AI workloads
- Experience with MLOps and LLMOps pipeline in Databricks and On-prem clusters.
- Experience with design and implementation of robust data integration pipelines
- Programming languages such as Python, Java, or C++.
- Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do
- Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well
- Are better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term
- Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations.
- Time-off including annual leave, parental/maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.
- Flexible working options based around home and office locations, with flexible working patterns.
- Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits
- A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.
- Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.
SAP as service providerWe use the following session cookies, which are all required to enable the website to function:
- "route" is used for session stickiness
- "careerSiteCompanyId" is used to send the request to the correct data centre
- "JSESSIONID" is placed on the visitor's device during the session so the server can identify the visitor
- "Load balancer cookie" (actual cookie name may vary) prevents a visitor from bouncing from one instance to another
YouTubeYouTube is a video-sharing service where users can create their own profile, upload videos, watch, like and comment on videos. Opting out of YouTube cookies will disable your ability to watch or interact with YouTube videos.Advertising CookiesTo make sure we only send what’s most relevant to your needs, these cookies help us and our partners understand what matters most to you. The data collected can be shared with third parties, such as advertisers or platforms, to create an ecosystem that is always relevant to you.Show More DetailsAdvertising Cookies Provider Description Enabled
LinkedInLinkedIn is an employment-oriented social networking service. We use the Apply with LinkedIn feature to allow you to apply for jobs using your LinkedIn profile. Opting out of LinkedIn cookies will disable your ability to use Apply with LinkedIn.Google AnalyticsGoogle Analytics is a web analytics service offered by Google that tracks and reports website traffic.Google Tag ManagerGoogle Tag Manager is a tag management system for conversion tracking, site analytics, remarketing and more.