Course Description

Welcome to the Big Data Analytics with PySpark + Power BI + MongoDB course. In this course we will be creating a big data analytics pipeline, using big data technologies like PySpark, MLlib, Power BI and MongoDB.

We will be working with earthquake data, that we will transform into summary tables. We will then use these tables to train predictive models and predict future earthquakes. We will then analyze the data by building reports and dashboards in Power BI Desktop.

Power BI Desktop is a powerful data visualization tool that lets you build advanced queries, models and reports. With Power BI Desktop, you can connect to multiple data sources and combine them into a data model. This data model lets you build visuals, and dashboards that you can share as reports with other people in your organization.

MongoDB is a document-oriented NoSQL database, used for high volume data storage. It stores data in JSON like format called documents, and does not use row/column tables. The document model maps to the objects in your application code, making the data easy to work with.

Course curriculum

  1. 01
  2. 02
  3. 03
    • Integrating PySpark with Jupyter Notebook

    • Extracting the Data Used in this Course

    • Transforming the Data

    • Loading the Data in MongoDB

  4. 04
    • Data Pre-Processing and Preparation

    • Building the Machine Learning Model

    • Creating the Prediction Dataset

  5. 05
    • Installing Visual Studio Code

    • Building the ETL Pipeline Script

    • Building the ML Pipeline Script

  6. 06
    • Installing PowerBI Desktop

    • Installing Mongo ODBC Drivers

    • Creating System DSN for MongoDB

    • Loading Data into PowerBI

    • Visualizing an Earthquake Prediction Map

    • Creating Table Plots

    • Plotting Maximum and Average Magnitude Values

    • Creating Bar Chart of Earthquake Occurence

    • Creating Doughnut Charts

  7. 07
    • PySpark Data Pipeline

    • PySpark Power BI

    • Commands

Pricing - Life time Access

What will you learn?

  • How to create big data processing pipelines using PySpark.

  • Machine learning with geospatial data using the Spark MLlib library.

  • Data analysis using PySpark, MongoDB and Power BI.

  • How to manipulate, clean and transform data using PySpark dataframes.

  • How to create Geo Maps using ArcMaps for Power BI.

  • How to create dashboards in Power BI.

GEO Premium

Access our ENTIRE content instantly with a subscription

Student profile?

  • Data Scientists at any level

  • GIS Developers at any level

  • Machine Learning engineers at any level

  • Undergraduate students

  • Master students and PhD candidates

  • Researchers and Academics

  • Professionals and Companies

Some more information

  • Certificates of Completion

    After you successfully finish the course, you can claim your Certificate of Completion with NO extra cost! You can add it to your CV, LinkedIn profile etc

  • Available at any time! Study at your best time

    We know hard it is to acquire new skills. All our courses are self paced.

  • Online and always accessible

    Even when you finish the course and you get your certificate, you will still have access to course contents! Every time an Instructor makes an update you will be notified and be able to watch it for FREE

About your Instructor

Data Engineer and business intelligence consultant with an academic background in Bsc computer science and around 5 years of experience in IT. Involved in multiple projects ranging from Business Intelligence, Software Engineering, IoT and Big data analytics. Expertise are in building data processing pipelines in the Hadoop and Cloud ecosystems and software development. My career started as an embedded software engineer writing firmware for integrated microchips, then moved on as an ERDAS APOLLO developer at geo data design a hexagon geospatial partner. Am now a consultant at one of the top business intelligence consultancies helping clients build data warehouses, data lakes, cloud data processing pipelines and machine learning pipelines. The technologies I use to accomplish client requirements range from Hadoop, Amazon S3, Python, Django, Apache Spark, MSBI, Microsoft Azure, SQL Server Data Tools, Talend and Elastic MapReduce.

Edwin Bomela

Data Engineer and business intelligence consultant