Building Big Data Pipelines with PySpark & MongoDB & Bokeh
Build an intelligent data pipeline using Apache Spark and MongoDB big data technologies.
Introduction to the course
Python Installation
Python Libraries Installation
Apache Spark Installation
Java Installation
Testing Spark
MongoDB Installation
NoSQL Booster Installation
Integration of PySpark with Jupyter Notebook
Data Extraction
Data Transformation
Data Loading in MongoDB
Data Pre-Processing and Preparation
Building the Machine Learning Model
Prediction Dataset Creation
Preparation Steps
Map Plot Creation
Bar Chart
Maximum and Average Magnitude Plot
Grid Plot for Web Browser Visualization
Visual Studio Code Installation
Building the Spark ETL Pipeline Script
Building the Spark ML Pipeline Script
Dashboard Server Configuration
PySpark Data Pipeline
PySpark Quakes
€10,00
Regular price
How to create data processing pipelines using PySpark.
Machine learning with geospatial data using the Spark MLlib library.
Data analysis using PySpark, MongoDB and Bokeh, inside of jupyter notebook.
How to manipulate, clean and transform data using PySpark dataframes.
Basic geo mapping.
How to create dashboards.
How to create a lightweight server to serve bokeh dashboards.
Undergraduate students
Master students and PhD candidates
Researchers and Academics
Professionals and Companies