Data Analytics & Management, Off
State Street View all jobs
- Hangzhou, Zhejiang
- Permanent
- Full-time
- Strong background in data warehousing, ETL/data pipelines, and cloud‑based data warehouse solutions.
- Hands‑on experience delivering end‑to‑end data pipeline solutions encompassing architecture design, implementation, testing, deployment, monitoring, and operational support.
- Build and operate scalable batch and streaming pipelines using the Snowflake and/or Databricks tech stack, Spark, or Informatica ETL (where reused) to deliver analytics-ready datasets on AWS Cloud Platform.
- Design, implement, and optimize Iceberg-based tables, partitions, and metadata structures for consistent and performant analytical access
- Partner with domain SMEs to define data semantics (Account, Product, Holdings, Transactions, Benchmarks, etc.), stewardship expectations, and consumption contracts (data exchange interfaces and consumption APIs)
- Experience with MDM, Data Quality frameworks, and data management tools, including hands‑on implementation of automated data validation, quality rules, reconciliation checks, and lineage capture to ensure governed, reliable, and trusted analytical data across enterprise platforms.
- Build and publish curated semantic layer data models (serving models, marts) and expose them via governed BI endpoints and/or consumption APIs, ensuring consistent metrics and business definitions
- Define and manage data product interfaces for consumption (schemas, SLAs, documentation, versioning and backward compatibility) to support stable API and BI integrations
- Work with Platform Engineering to standardize deployment via CI/CD, productionize jobs, and adopt platform guardrails and observability patterns
- Collaborate with architects and application teams to define data strategies and deliver logical/physical data models aligned to analytical workloads
- Optimize pipeline and query performance through appropriate database partitioning, archiving, and purging patterns
- Participate in on-call / major incident management, perform backfills where required, and support production stability for owned data products
- Strong data engineering programming skills in Python, Java, and SQL
- Strong programming skills in Spark & SQL, with hands-on knowledge of optimization and debugging
- Hands-on experience with Snowflake and Databricks as data platforms
- Working knowledge of open table formats such as Apache Iceberg, catalogs (any of Polaris, Horizon, Unity), or metadata frameworks
- Experience building production-grade services in cloud environments; AWS and/or Azure is preferred
- Experience with structured and unstructured ingestion, schema evolution, and data modeling
- Strong debugging and performance-tuning skills for data pipelines
- Experience designing curated analytical/semantic data models for consumption (BI/metrics layers and/or serving models), including governance, documentation, and change management
- Working knowledge of building data products for consumption via APIs and BI endpoints, including interface contracts, performance considerations, and access controls.
- Experience with Master Data Management (MDM), Data Quality frameworks, and other data management tools and technologies to support data consistency, governance, and reliability across enterprise platforms
- Prior experience using GitHub Copilot, Claude Code Assistant, or Genie Code Assistant is preferred.