This course gives business analysts and data scientists a seamless platform to profile, integrate, cleanse, and move big data without writing code in a Hadoop environment using an intuitive web-based interface.
Learn How To
move data in and out of Hadoop
interrogate and profile data for quality issues
transform, transpose, and join data that is fit-for-purpose
cleanse and integrate data suitable for analysis and reporting
perform master data management activities of record clustering and survivorship in Hadoop
load data into the SAS In-Memory Analytics Server for analytics and exploration
execute custom SAS and HiveQL code inside the Hadoop cluster
chain custom-built data management flows into re-useable jobs.
Who Should Attend
Business users who interact with data, perform data discovery, query data, and ensure that data is in the proper place and format for other users; data analysts, data scientists, and statisticians who review results of data discovery activities, create new tables, create new data elements, change the format/structure of data tables to view them in a variety of ways, manipulate and score data elements, and load data for use by other users; and data management specialists who apply enterprise standards to the data, ensure data quality throughout the enterprise, move data into and out of the Hadoop cluster, and optimize code running in the Hadoop cluster
Prerequisites
There are currently no prerequisites for this course.
SAS Products Covered
SAS Data Loader for Hadoop
Course Outline
Introduction to Big Data and Hadoop
big data and Hadoop
Hadoop ecosystemSAS Data Loader Overview
SAS Data Loader capabilities and architecture
SAS Data Loader directives and tasks
steps common to most directives
preparing data for analysis and reporting
course overview and logisticsAcquiring and Discovering Data
introduction to acquiring and discovering data
copying a table into Hadoop
importing a delimited file into Hadoop
profiling data for inconsistencies
querying data for relevant columns and rowsTransforming and Transposing Data
introduction to transforming and transposing data
transforming data to be fit-for-purpose
transposing data for use in analysis and reportingCleansing Data
introduction to cleansing data
parsing data into meaningful subsets
standardizing data into consistent formats
using match codes to determine data similarity
using names to identify gender
analyzing data for data types
applying casing for data consistency
extracting data in useful tokens
analyzing data for inconsistent patternsIntegrating Data
introduction to integrating data
joining data in Hadoop
sorting and de-duplicating data
clustering and surviving data to determine a best record
matching and merging data into a single table
deleting rows in Hadoop tables
running user-written programs inside HadoopDelivering Data
introduction to delivering data from Hadoop
loading data to the SAS LASR Analytic Server for analysis and reporting
copying Hadoop data to SAS and relational database tablesManaging and Integrating Directives
introduction to managing and integrating directives
creating data flows by chaining directives
integrating directives into SAS platform applications
running directives as batch jobsAdditional Topics
SAS and Hadoop data processing
SAS DS2 programs
debugging Hadoop jobs
The hands-on lab is preconfigured to support this course and will not support hands-on practice for all your enrolled courses.
Hands-On Lab Reservation System
When you are planning your study time, keep in mind that the virtual lab takes 30-45 minutes to start
There was an error in getting content for the activity
The e-Learning and classroom materials for this course are identical. When teaching an instructor-led course, you may direct students to use either the e-Learning course for activity, demo, and practice steps or the Practices and Demonstrations documents in the Course Materials tab.