PySpark for Data Engineers Full Course 2026 | Basics to Advanced
YouTube transcript, YouTube translate
A quick preview of the first subtitles so you know what the video covers.
In this master class, I'm not just going to explain how to type in a pispark code to get the output from it . We are going from the scratch from the core basics all the way to how Spark actually process this data behind the scenes . All these concepts are exactly what a real enterprise grade data engineer has to know . And also all the resources that I'm going to use in this master class . The files, the codes, everything that I use will be given to you from the GitHub link down below in the description . Without further ado, let's begin . Before we step into the first topic, I'm going to tell you how I structured this master class . I structured this with multiple phases and each phase has multiple topics compressed in it . And for every topic, I'm going to explain the concept first and show you the conceptual diagram if there is any for that topic and also I'll demonstrate the lab if there is any . That's how this master class goes . So you get the practical experience What I did here is for all the uh for all the tupils that are in the list, the first value in the tupil will be assigned to the ID column for the data frame and the second value to the name column and the third value to the signup date column . I basically want to create a list of customers with an ID, a unique ID for every customer and a signup date, a random signup date for them . Now I'm going to convert this data frame into a CSV file and store it in a in an external location where I will write the read uh expression again using Spark and then I read the data . But right now, don't you worry about what I'm doing in this expression . I'm just creating the the CSV file because we have a separate topic where we talk about the data writer API, Spark data writer API . But now we'll just skip that. What we're doing is we're just converting that data frame into a CSV