About the Role:
We are seeking a highly skilled Senior Data Engineer with 5-6 years of experience to join our dynamic team. The ideal candidate will have a strong background in data engineering, with expertise in data warehouse architecture, data modeling, ETL processes, and building both batch and streaming pipelines. The candidate should also possess advanced proficiency in Spark, Databricks, Kafka, Python, SQL, and Change Data Capture (CDC) methodologies.
Key responsibilities:
- Design, develop, and maintain robust data warehouse solutions to support the organization's analytical and reporting needs.
- Implement efficient data modeling techniques to optimize performance and scalability of data systems.
- Build and manage data lakehouse infrastructure, ensuring reliability, availability, and security of data assets.
- Develop and maintain ETL pipelines to ingest, transform, and load data from various sources into the data warehouse and data lakehouse.
- Utilize Spark and Databricks to process large-scale datasets efficiently and in real-time.
- Implement Kafka for building real-time streaming pipelines and ensure data consistency and reliability.
- Design and develop batch pipelines for scheduled data processing tasks.
- Collaborate with cross-functional teams to gather requirements, understand data needs, and deliver effective data solutions.
- Perform data analysis and troubleshooting to identify and resolve data quality issues and performance bottlenecks.
- Stay updated with the latest technologies and industry trends in data engineering and contribute to continuous improvement initiatives.
Preferred qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 5-6 years of hands-on experience in data engineering roles, with a focus on building and maintaining data warehouse solutions.
- Strong proficiency in data modeling concepts and experience with relational and dimensional data models.
- Extensive experience with ETL processes and tools for data ingestion, transformation, and loading.
- In-depth knowledge of Spark and Databricks for large-scale data processing and analytics.
- Proficiency in Kafka for building real-time streaming pipelines and event-driven architectures.
- Solid programming skills in Python for data manipulation, scripting, and automation tasks.
- Expertise in SQL for querying and analyzing data from relational databases and data warehouses.
- Experience with Change Data Capture (CDC) techniques and tools for capturing and propagating data changes.
- Excellent problem-solving skills and the ability to troubleshoot complex data engineering issues.
- Strong communication and collaboration skills, with the ability to work effectively in a team environment.