Lead Data Engineering

Lead Data Engineering

Kachiguda, Hyderabad

3 years 10 months

Azure Data FactoryAzure Data LakeAzure IoT HubAzure DatabricksApache SparkPySparkPythonSQLData IngestionETL TestingData IngestionData IngestionData QualityData LineageCI/CDVersion ControlInfrastructure as CodeData PipelinesSpark structured streamingKafka

Job Description:

Purpose:

We are seeking a hands-on Data Engineer with a strong focus on data ingestion to support the delivery of high-quality, reliable, and scalable data pipelines across our Data & AI ecosystem. This role is essential in enabling downstream analytics, machine learning, and business intelligence solutions by ensuring robust and automated data acquisition from various internal and external sources.

________________________________________

Key Responsibilities:

• Design, build, and maintain scalable and reusable data ingestion pipelines to onboard structured and semi-structured data from APIs, flat files, databases, and external systems.

• Work with Azure-native services (e.g., Data Factory, Azure Data Lake, Event Hubs) and tools like Databricks or Apache Spark for data ingestion and transformation.

• Develop and manage metadata-driven ingestion frameworks to support dynamic and automated onboarding of new sources.

• Collaborate closely with source system owners, analysts, and data stewards to define data ingestion specifications and implement monitoring/alerting on ingestion jobs.

• Ensure data quality, lineage, and governance principles are embedded into ingestion processes.

• Optimize ingestion processes for performance, reliability, and cloud cost efficiency.

• Support batch and real-time ingestion needs, including streaming data pipelines where applicable.

________________________________________

Key Qualifications:

• 3+ years of hands-on experience in data engineering – bonus: with a specific focus on data ingestion or integration.

• Hands-on experience with Azure Data Services (e.g., ADF, Databricks, Synapse, ADLS) or equivalent cloud-native tools.

• Experience in Python(PySpark) for data processing tasks. (bonus: SQL knowledge)

• Experience with ETL frameworks, orchestration tools, and working with API-based data ingestion.

• Familiarity with data quality and validation strategies, including schema enforcement and error handling.

• Good understanding of CI/CD practices, version control, and infrastructure-as-code (e.g., Terraform, Git).

• Bonus: Experience with streaming ingestion (e.g., Kafka, Event Hubs, Spark Structured Streaming).