Lead Data Engineer- Python / Spark / Data Lake, New York, NY, United States

Lead Data Engineer- Python / Spark / Data Lake

New Today

Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference. As a Lead Data Engineer- Python / Spark / Data Lake at JPMorgan Chase within the Consumer & Community Bank- Connected Commerce Technology, you play a crucial role in an agile team dedicated to improving, developing, and providing data collection, storage, access, and analytics solutions that are secure, stable, and scalable. As a key technical contributor, you are tasked with maintaining essential data pipelines and architectures across diverse technical domains within various business functions, all in support of the firm's business goals. Job responsibilities Generates data models for their team using firmwide tooling, linear algebra, statistics, and geometrical algorithms Delivers data collection, storage, access, and analytics data platform solutions in a secure, stable, and scalable way Implements database back-up,recovery, and archivingstrategy Evaluates and reports onaccesscontrol processes todetermineeffectiveness of dataasset security with minimal supervision Adds to team culture of diversity, opportunity, inclusion, and respect Develops data strategy and enterprise data models for applications Manages data infrastructure including design, construct, install, and maintenance of large scale processing systems and infrastructure Drives data quality and ensures data accessibility to analysts and data scientists Ensures compliance with data governance requirements and business alignment including ensuring data engineering practices align with business goals Required qualifications, capabilities, and skills Formal training or certification on data engineering concepts and 5+ years applied experience Experience with bothrelational and NoSQL databases Experience and proficiency across the data lifecycle Experience with database back up, recovery, and archiving strategy Proficient knowledge of linear algebra, statistics, and geometrical algorithms Advanced proficiency in at least one programming language such as Python, Java or Scala Advanced proficiency in at least one cluster computing framework such as Spark, Flink or Storm Advanced proficiency in at least one cloud data lakehouse platform such as AWS data lake services, Databricks, or Hadoop; at least one relational data store such as Postgres, Oracle or similar; and at least one NOSQL data store such as Cassandra, Dynamo, MongoDB or similar Advanced proficiency in at least one scheduling/orchestration tool such as Airflow, AWS Step Functions or similar Proficiency in Unix scripting, data structures, data serialization formats such as JSON, AVRO, Protobuf, or similar; big-data storage formats such as Parquet, Iceberg, or similar; data processing methodologies such as batch, micro-batching, and stream; one or more data modelling techniques such as Dimensional, Data Vault, Kimball, Inmon; Agile methodology including developing PI plans and roadmaps, TDD or BDD and CI/CD tools Able to coach team members in continuous improvement of the product and mentors team members on optimal design and development practices

Preferred qualifications, capabilities, and skills Proficiency in IaC such as Terraform or AWS cloud formation Proficiency in cloud based data pipeline technologies such as Fivetran, DBT, Prophecy.io Proficiency in Snowflake data platform Budgeting and resource allocation and vendor relationship management

#J-18808-Ljbffr

Apply

Location:: New York, NY, United States
Category:: Computer And Mathematical Occupations