Course Description

Welcome to the Building Big Data Pipelines with PySpark &  MongoDB & Bokeh course. In this course we will be building an intelligent data pipeline using big data technologies like Apache Spark and MongoDB.

We will be building an ETLP pipeline, ETLP stands for Extract Transform Load and Predict. These are the different stages of the data pipeline that our data has to go through in order for it to become useful at the end. Once the data has gone through this pipeline we will be able to use it for building reports and dashboards for data analysis.

The data pipeline that we will build will comprise of data processing using PySpark, Predictive modelling using Spark’s MLlib machine learning library, and data analysis using MongoDB and Bokeh

Course curriculum

  1. 01
    • Introduction to the course

  2. 02
    • Python Installation

    • Python Libraries Installation

    • Apache Spark Installation

    • Java Installation

    • Testing Spark

    • MongoDB Installation

    • NoSQL Booster Installation

  3. 03
    • Integration of PySpark with Jupyter Notebook

    • Data Extraction

    • Data Transformation

    • Data Loading in MongoDB

  4. 04
    • Data Pre-Processing and Preparation

    • Building the Machine Learning Model

    • Prediction Dataset Creation

  5. 05
    • Preparation Steps

    • Map Plot Creation

    • Bar Chart

    • Maximum and Average Magnitude Plot

    • Grid Plot for Web Browser Visualization

  6. 06
    • Visual Studio Code Installation

    • Building the Spark ETL Pipeline Script

    • Building the Spark ML Pipeline Script

    • Dashboard Server Configuration

  7. 07
    • PySpark Data Pipeline

    • PySpark Quakes

Pricing - Life time Access

What will you learn?

  • How to create data processing pipelines using PySpark.

  • Machine learning with geospatial data using the Spark MLlib library.

  • Data analysis using PySpark, MongoDB and Bokeh, inside of jupyter notebook.

  • How to manipulate, clean and transform data using PySpark dataframes.

  • Basic geo mapping.

  • How to create dashboards.

  • How to create a lightweight server to serve bokeh dashboards.

GEO Premium

Access our ENTIRE content instantly with a subscription

Student profile?

  • Undergraduate students

  • Master students and PhD candidates

  • Researchers and Academics

  • Professionals and Companies

Some more information

  • Certificates of Completion

    After you successfully finish the course, you can claim your Certificate of Completion with NO extra cost! You can add it to your CV, LinkedIn profile etc

  • Available at any time! Study at your best time

    We know hard it is to acquire new skills. All our courses are self paced.

  • Online and always accessible

    Even when you finish the course and you get your certificate, you will still have access to course contents! Every time an Instructor makes an update you will be notified and be able to watch it for FREE

About your Instructor

Data Engineer and business intelligence consultant with an academic background in Bsc computer science and around 5 years of experience in IT. Involved in multiple projects ranging from Business Intelligence, Software Engineering, IoT and Big data analytics. Expertise are in building data processing pipelines in the Hadoop and Cloud ecosystems and software development. My career started as an embedded software engineer writing firmware for integrated microchips, then moved on as an ERDAS APOLLO developer at geo data design a hexagon geospatial partner. Am now a consultant at one of the top business intelligence consultancies helping clients build data warehouses, data lakes, cloud data processing pipelines and machine learning pipelines. The technologies I use to accomplish client requirements range from Hadoop, Amazon S3, Python, Django, Apache Spark, MSBI, Microsoft Azure, SQL Server Data Tools, Talend and Elastic MapReduce.

Edwin Bomela

Data Engineer and business intelligence consultant