Started as a research project at the University of California in 2009, Apache Spark is currently one of the most widely used analytics engines. No wonder: it can process data on an enormous scale, supports multiple coding languages (you can use Java, Scala, Python, R, and SQL) and runs on its own or in the cloud, as well as on other systems (e.g., Hadoop or Kubernetes).
In this Apache Spark tutorial, I will introduce you to one of the most notable use cases of Apache Spark: machine learning. In less than two hours, we will go through every step of a machine learning project that will provide us with an accurate telecom customer churn prediction in the end. This is going to be a fully hands-on experience, so roll up your sleeves and prepare to give it your best!
Before you learn Apache Spark, you need to know it comes with a few inbuilt libraries. One of them is called MLlib. To put it simply, it allows the Spark Core to perform machine learning tasks – and (as you will see in this Apache Spark tutorial) does it in breathtaking speed. Due to its ability to handle significant amounts of data, Apache Spark is perfect for tasks related to machine learning, as it can ensure more accurate results when training algorithms.
Mastering Apache Spark machine learning can also be a skill highly sought after by employers and headhunters: more and more companies get interested in applying machine learning solutions for business analytics, security, or customer service. Hence, this practical Apache Spark tutorial can become your first step towards a lucrative career!
I am a firm believer that the best way to learn is by doing. That’s why I haven’t included any purely theoretical lectures in this Apache Spark tutorial: you will learn everything on the way and be able to put it into practice straight away. Seeing the way each feature works will help you learn Apache Spark machine learning thoroughly by heart.
I will also be providing some materials in ZIP archives. Make sure to download them at the beginning of the course, as you will not be able to continue with the project without it.
Apart from Spark itself, I will also introduce you to Databricks – a platform that simplifies handling and organizing data for Spark. It’s been founded by the same team that initially started Spark, too. In this course, I will explain how to create an account on Databricks and use its Notebook feature for writing and organizing your code.
After you finish my Apache Spark tutorial, you will have a fully functioning telecom customer churn prediction project. Take the course now, and have a much stronger grasp of machine learning and data analytics in just a few hours!
I am Solution Architect with 12+ year’s of experience in Banking, Telecommunication and Financial Services industry across a diverse range of roles in Credit Card, Payments, Data Warehouse and Data Center programmes
My role as Bigdata and Cloud Architect to work as part of Bigdata team to provide Software Solution.
- Support all Hadoop related issues
- Benchmark existing systems, Analyse existing system challenges/bottlenecks and Propose right solutions to eliminate them based on various Big Data technologies
- Analyse and Define pros and cons of various technologies and platforms
- Define use cases, solutions and recommendations
- Define Big Data strategy
- Perform detailed analysis of business problems and technical environments