Lead Data Engineer- Python / Spark / Data Lake
New Today
Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.
As a Lead Data Engineer- Python / Spark / Data Lake at JPMorgan Chase within the Consumer & Community Bank- Connected Commerce Technology, you play a crucial role in an agile team dedicated to improving, developing, and providing data collection, storage, access, and analytics solutions that are secure, stable, and scalable. As a key technical contributor, you are tasked with maintaining essential data pipelines and architectures across diverse technical domains within various business functions, all in support of the firm's business goals.
Job responsibilities
Generates data models for their team using firmwide tooling, linear algebra, statistics, and geometrical algorithms
Delivers data collection, storage, access, and analytics data platform solutions in a secure, stable, and scalable way
Implements database back-up,recovery, and archivingstrategy
Evaluates and reports onaccesscontrol processes todetermineeffectiveness of dataasset security with minimal supervision
Adds to team culture of diversity, opportunity, inclusion, and respect
Develops data strategy and enterprise data models for applications
Manages data infrastructure including design, construct, install, and maintenance of large scale processing systems and infrastructure
Drives data quality and ensures data accessibility to analysts and data scientists
Ensures compliance with data governance requirements and business alignment including ensuring data engineering practices align with business goals
Required qualifications, capabilities, and skills
Formal training or certification on data engineering concepts and 5+ years applied experience
Experience with bothrelational and NoSQL databases
Experience and proficiency across the data lifecycle
Experience with database back up, recovery, and archiving strategy
Proficient knowledge of linear algebra, statistics, and geometrical algorithms
Advanced proficiency in at least one programming language such as Python, Java or Scala
Advanced proficiency in at least one cluster computing framework such as Spark, Flink or Storm
Advanced proficiency in at least one cloud data lakehouse platform such as AWS data lake services, Databricks, or Hadoop; at least one relational data store such as Postgres, Oracle or similar; and at least one NOSQL data store such as Cassandra, Dynamo, MongoDB or similar
Advanced proficiency in at least one scheduling/orchestration tool such as Airflow, AWS Step Functions or similar
Proficiency in Unix scripting, data structures, data serialization formats such as JSON, AVRO, Protobuf, or similar; big-data storage formats such as Parquet, Iceberg, or similar; data processing methodologies such as batch, micro-batching, and stream; one or more data modelling techniques such as Dimensional, Data Vault, Kimball, Inmon; Agile methodology including developing PI plans and roadmaps, TDD or BDD and CI/CD tools
Able to coach team members in continuous improvement of the product and mentors team members on optimal design and development practices
Preferred qualifications, capabilities, and skills Proficiency in IaC such as Terraform or AWS cloud formation
Proficiency in cloud based data pipeline technologies such as Fivetran, DBT, Prophecy.io
Proficiency in Snowflake data platform
Budgeting and resource allocation and vendor relationship management
#J-18808-Ljbffr
- Location:
- New York, NY, United States
- Category:
- Computer And Mathematical Occupations