Course Description

Welcome to the Big Data Analytics with PySpark + Tableau Desktop + MongoDB course. In this course we will be creating a big data analytics solution using big data technologies like PySpark for ETL, MLlib for Machine Learning as well as Tableau for Data Visualization and for building Dashboards.

We will be working with earthquake data, that we will transform into summary tables. We will then use these tables to train predictive models and predict future earthquakes. We will then analyze the data by building reports and dashboards in Tableau Desktop.

Tableau Desktop is a powerful data visualization tool used for big data analysis and visualization. It allows for data blending, real-time analysis and collaboration of data. No programming is needed for Tableau Desktop, which makes it a very easy and powerful tool to create dashboards apps and reports.

MongoDB is a document-oriented NoSQL database, used for high volume data storage. It stores data in JSON like format called documents, and does not use row/column tables. The document model maps to the objects in your application code, making the data easy to work with.

Course curriculum

  1. 01
  2. 02
    • Python Installation

    • Apache Spark Installation

    • Java Installation

    • Testing Spark

    • MongoDB Installation

    • NoSQL Booster Installation

  3. 03
    • Integration of PySpark with Jupyter Notebook

    • Dataset Extraction

    • Dataset Transformation and Cleaning

    • Data Loading in MongoDB

  4. 04
    • Data Pre-Processing and Preparation

    • Building the Machine Learning Model

    • Prediction Output

  5. 05
    • Visual Studio Code Installation

    • Building the ETL Pipeline Script

    • Building the ML Pipeline Script

  6. 06
    • Tableau Desktop Trial Installation

    • Mongo ODBC Drivers Installation

    • System DSN Creation for MongoDB

    • Data Source Import into Tableau

    • Earthquake Prediction Map Visualization

    • Bar Chart Visualization of Earthquake Occurence

    • Plot of Maximum and Average Magnitude Values

    • Table Visualization of Earthquake Type

    • Dashboard Analytics Creation

  7. 07
    • Commands

    • PySpark Data Pipeline

    • PySpark Tableau

Pricing - Life time Access

What will you learn?

  • How to create data processing pipelines using PySpark.

  • Machine learning with geospatial data using the Spark MLlib library.

  • Data analysis using PySpark, MongoDB and Tableau.

  • How to manipulate, clean and transform data using PySpark dataframes.

  • How to create Geo Maps in Tableau Desktop.

  • How to create dashboards in Tableau Desktop.

GEO Premium

Access our ENTIRE content instantly with a subscription

Student profile?

  • Undergraduate students

  • Master students and PhD candidates

  • Researchers and Academics

  • Professionals and Companies

  • Data Engineers at any level

  • Python Developers at any level

  • GIS Developers at any level

Some more information

  • Certificates of Completion

    After you successfully finish the course, you can claim your Certificate of Completion with NO extra cost! You can add it to your CV, LinkedIn profile etc

  • Available at any time! Study at your best time

    We know hard it is to acquire new skills. All our courses are self paced.

  • Online and always accessible

    Even when you finish the course and you get your certificate, you will still have access to course contents! Every time an Instructor makes an update you will be notified and be able to watch it for FREE

About your Instructor

Data Engineer and business intelligence consultant with an academic background in Bsc computer science and around 5 years of experience in IT. Involved in multiple projects ranging from Business Intelligence, Software Engineering, IoT and Big data analytics. Expertise are in building data processing pipelines in the Hadoop and Cloud ecosystems and software development. My career started as an embedded software engineer writing firmware for integrated microchips, then moved on as an ERDAS APOLLO developer at geo data design a hexagon geospatial partner. Am now a consultant at one of the top business intelligence consultancies helping clients build data warehouses, data lakes, cloud data processing pipelines and machine learning pipelines. The technologies I use to accomplish client requirements range from Hadoop, Amazon S3, Python, Django, Apache Spark, MSBI, Microsoft Azure, SQL Server Data Tools, Talend and Elastic MapReduce.

Edwin Bomela

Data Engineer and business intelligence consultant