This is the third course in the Data Curation Professional, SAS Academy for Data Science program. The program is required to earn your SAS data science certification. Designed for SAS data scientists, this program covers SAS topics for data curation techniques, including big data preparation with Hadoop. In this course, you learn about the Hadoop environment, Apache Hive, and Apache Pig, as well as various SAS methods for interacting with Hadoop.
Learn How To
Process and prepare structured and unstructured big data for analysis.
Organize data into a variety of storage formats for the Hadoop Distributed File System (HDFS).
Use Hive and Pig to query and process data in Hadoop.
Write SAS code to integrate with Hive and Pig.
Leverage the SAS DS2 procedure to process data in Hadoop.
Work with Hadoop data using the point-and-click interface of SAS Data Loader for Hadoop.
Who Should Attend
SAS data scientists
Prerequisites
Before attending this course, you should have:;
Experience with SAS programming basics and data manipulation techniques.
Familiarity with SQL processing. ;You can gain this experience by completing the SAS Programming 1: Essentials, SAS Programming 2: Data Manipulation Techniques, and SAS SQL 1: Essentials courses.
SAS Products Covered
Base SAS;SAS Decision Manager;SAS Data Connector to Hadoop;SAS Data Loader for Hadoop
Course Outline
Understanding the Hadoop Ecosystem
Key concepts about Hadoop.
Working with Hadoop Data Using Hive and HiveQL
Working with HDFS data.
Apache Hive overview.
Data Definition Language.
Hive SerDes and storage formats.
Data manipulation language.
Working with Hadoop Data Using Pig and Pig Latin
Apache Pig overview.
Anatomy of a Pig script.
Basic Pig programming.
Pig programming using functions.
Accessing HDFS and Invoking Hadoop Applications from SAS
SAS and Hadoop.
The HADOOP FILENAME statement and the HADOOP procedure.
Using the SQL Pass-Through Facility
SQL pass-through methods and syntax.
Investigating Hive metadata.
Creating SQL procedure pass-through queries.
Creating and loading Hive tables with SQL pass-through EXECUTE statements.
Handling Hive STRING data types.
Using the SAS ACCESS LIBNAME Engine
The LIBNAME method: Hive processing and SAS processing.
Limiting rows and columns from Hive tables.
Creating views.
Combining tables.
Creating Hive tables.
Sorting and reporting on Hive tables.
SAS DS2 and Hadoop
Introduction to DS2.
Basic DS2 syntax.
Similarities to the DATA step.
Converting DATA steps to DS2 DATA programs.
DATA program structuring.
Data types.
Automatic data type conversion.
Expressions, selected functions, and methods.
User-defined packages and predefined packages.
Threads.
SAS In-Database Code Accelerator.
Working with SAS Data Loader for Hadoop
Acquiring and discovering data.
Transforming and transposing data.
Cleansing data.
Integrating data.
Delivering data.
Managing directives.
The hands-on lab is preconfigured to support this course and will not support hands-on practice for all your enrolled courses.
Hands-On Lab Reservation System
When you are planning your study time, keep in mind that the virtual lab takes 45-60 minutes to start
There was an error in getting content for the activity