Data engineering is the foundation for building analytics and data science applications in the new Big Data world. Data engineering requires combining multiple big data technologies to construct data pipelines and networks to stream, process, and store data. This course focuses on building full-fledged solutions that combine Apache Spark with other Big Data tools to create end-to-end data pipelines. Instructor Kumaran Ponnambalam begins by defining data engineering, its functions, and its concepts. Next, Kumaran goes over how Spark capabilities such as parallel processing, execution plans, state management options, and machine learning work with extract, transform, load (ETL). He introduces you to batch processing use cases and processes, as well as real-time processing pipelines. After walking you through several useful best practices, Kumaran concludes with an end-to-end exercise project.
Learn More- Events
- Career Fairs
- Resources
- Alumni Mentoring Program
- Internships and Jobs
- Skills Employers Are Looking For
- Career Podcasts
- Working Virtually
- Resumes and Cover Letters
- Interviewing and Thank Yous
- Researching Companies, Networking, & Career Fairs
- Hiring Statistics and Salary Information
- BA297, BA395A, and Bootcamps
- Build Skills with LinkedIn Learning
- International Students
- People We Serve
- Featured Jobs
- Change of Campus Students
- Careers in Your Major
- Employers
- About