Skip to main content

Processing Big Data with Hadoop in Azure HDInsight

Learn how to use Hadoop technologies in Microsoft Azure HDInsight to process big data in this five week, hands-on course.

Enrollment is Closed

About This Course

More and more organizations are taking on the challenge of analyzing big data. This course teaches you how to use the Hadoop technologies in Microsoft Azure HDInsight to build batch processing solutions that cleanse and reshape data for analysis. In this five-week course, you’ll learn how to use technologies like Hive, Pig, Oozie, and Sqoop with Hadoop in HDInsight; and how to work with HDInsight clusters from Windows, Linux, and Mac OSX client computers.

NOTE: To complete the hands-on elements in this course, you will require an Azure subscription and a Windows, Linux, or Mac OS X client computer. You can sign up for a free Azure trial subscription (a valid credit card is required for verification, but you will not be charged for Azure services). Note that the free trial is not available in all regions. It is possible to complete the course and earn a certificate without completing the hands-on practices.

What you'll learn

In this course, you’ll learn how to:

  • Provision an HDInsight cluster.
  • Connect to an HDInsight cluster, upload data, and run MapReduce jobs.
  • Use Hive to store and process data.
  • Use custom Python user-defined functions from Hive and Pig.
  • Define and run workflows for data processing using Oozie.
  • Transfer data between HDInsight and databases using Sqoop.

Course Syllabus

Module 1: Getting Started with HDInsight
The course begins with an introduction to big data concepts and Hadoop, before examining Microsoft Azure HDInsight and the Hadoop distribution it provides. You’ll learn how to provision an HDInsight cluster, upload data to the cluster, and perform Map/Reduce jobs that process the data.

Module 2: Processing Big Data with Hive
The second week of the course is all about Hive. You’ll learn how to create Hive tables and use HiveQL to query them, before exploring some advanced Hive techniques like partitioning and indexing.

Module 3: Going Beyond Hive
In the third week of the course, you’ll learn how to use Pig to process big data, and how to extend the capabilities of Pig and Hive by using user-defined functions implemented in Python.

Module 4: Building a Big Data Workflow
Week four builds on the data processing techniques covered in previous weeks, and teaches you how to build an end-to-end big data processing workflow using Oozie and Sqoop.

Final Exam
The fifth week of the course is given over the final exam. You must achieve a score of 50% or higher to pass this course and earn a certificate.

Meet the instructor

Course Staff Image #1

Graeme Malcolm

Graeme has been a trainer, consultant, and author for longer than he cares to remember, specializing in SQL Server and the Microsoft data platform. He is a Microsoft Certified Solutions Expert for the SQL Server Data Platform and Business Intelligence. After years of working with Microsoft as a partner and vendor, he now works in the Microsoft Learning Experiences team as a senior content developer, where he plans and creates content for developers and data professionals who want to get the best out of Microsoft technologies.

  1. Course Number

  2. Classes Start

  3. Classes End

  4. Estimated Effort

    15-20 hours in total