Job Description
Data Pipeline Development: Design and develop scalable data pipelines using Apache Spark, ensuring efficient data processing and transformation.
ETL Processes: Create and manage ETL (Extract, Transform, Load) workflows to integrate data from multiple sources into our data systems.
Data Integration: Utilize Talend for data integration tasks, including data cleansing, transformation, and loading.
Workflow Automation: Implement and manage data workflows using Apache Airflow to automate and optimize data processing tasks.
Programming: Leverage Python to write scripts for data manipulation, transformation, and analysis.
Database Management: Develop and optimize SQL queries for efficient data retrieval and analysis in relational databases.
Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data needs and deliver effective solutions.
Data Quality: Ensure data accuracy, consistency, and quality through rigorous validation and monitoring processes.
Documentation: Create and maintain detailed documentation for data processes, pipelines, and integration workflows.
Required Skills and Qualifications:
Proficiency in Apache Spark for large-scale data processing and analytics.
Experience with ETL processes and tools, including Talend.
Familiarity with Apache Airflow for managing data workflows and automation.
Strong programming skills in Python for data engineering tasks.
Advanced SQL skills for querying and managing relational databases.