Required Skills
About the Job
Join zorba ai in Pune, Maharashtra, as a Senior PySpark Data Engineer! You will collaborate with engineering, data science, and product teams to design, build, and optimize scalable data pipelines for both batch and streaming systems. Your work will be crucial in delivering high-performance data products that drive analytics, machine learning, personalization, and real-time business operations. This role involves modernizing data platforms, enhancing reliability, and maintaining rigorous data quality standards.
**Key Responsibilities:**
- Design, develop, and maintain scalable data pipelines for ingestion, transformation, and integration.
- Build and optimize batch data processing workflows using PySpark and SQL.
- Support and enhance real-time/streaming pipelines (e.g., Kafka).
- Improve pipeline performance, scalability, and cost efficiency.
- Implement automated data quality checks and validation frameworks.
- Create and review architectural designs aligned with engineering standards.
- Collaborate with stakeholders to deliver production-ready solutions.
- Monitor, troubleshoot, and resolve data pipeline issues.
**Required Skills:**
- **Data Processing:** PySpark, SQL, Spark architecture, Performance Tuning.
- **Cloud Platforms:** Databricks, Microsoft Azure.
- **Version Control:** Git, GitHub.
- **Collaboration Tools:** JIRA, Confluence.
- **Programming:** Python (preferred).
- **Streaming:** Kafka or similar (nice to have).
**Preferred Qualifications:**
- Strong understanding of distributed systems and modern data architecture.
- Experience with data modeling and scalable data design.
- Ability to write clean, maintainable, and testable code.
- Hands-on experience with data quality frameworks and testing.
- Proven ability to troubleshoot complex data issues.
- Experience in Agile/Scrum environments.
- Strong communication skills.