This course gives business analysts and data scientists a seamless platform to profile, integrate, cleanse, and move big data without writing code in a Hadoop environment using an intuitive web-based interface.
Learn How To
move data in and out of Hadoopinterrogate and profile data for quality issuestransform, transpose, and join data that is fit-for-purposecleanse and integrate data suitable for analysis and reportingperform master data management activities of record clustering and survivorship in Hadoopload data into the SAS In-Memory Analytics Server for analytics and explorationexecute custom SAS and HiveQL code inside the Hadoop clusterchain custom-built data management flows into re-useable jobs.Who Should Attend
Business users who interact with data, perform data discovery, query data, and ensure that data is in the proper place and format for other users; data analysts, data scientists, and statisticians who review results of data discovery activities, create new tables, create new data elements, change the format/structure of data tables to view them in a variety of ways, manipulate and score data elements, and load data for use by other users; and data management specialists who apply enterprise standards to the data, ensure data quality throughout the enterprise, move data into and out of the Hadoop cluster, and optimize code running in the Hadoop cluster
Prerequisites
There are currently no prerequisites for this course.
SAS Products Covered
SAS Data Loader for Hadoop
Course Outline
Introduction to Big Data and Hadoop
big data and HadoopHadoop ecosystemSAS Data Loader OverviewSAS Data Loader capabilities and architectureSAS Data Loader directives and taskssteps common to most directivespreparing data for analysis and reportingcourse overview and logisticsAcquiring and Discovering Dataintroduction to acquiring and discovering datacopying a table into Hadoop importing a delimited file into Hadoopprofiling data for inconsistenciesquerying data for relevant columns and rowsTransforming and Transposing Dataintroduction to transforming and transposing data transforming data to be fit-for-purposetransposing data for use in analysis and reportingCleansing Data introduction to cleansing dataparsing data into meaningful subsetsstandardizing data into consistent formatsusing match codes to determine data similarityusing names to identify genderanalyzing data for data typesapplying casing for data consistencyextracting data in useful tokensanalyzing data for inconsistent patternsIntegrating Dataintroduction to integrating datajoining data in Hadoopsorting and de-duplicating dataclustering and surviving data to determine a best recordmatching and merging data into a single tabledeleting rows in Hadoop tablesrunning user-written programs inside HadoopDelivering Dataintroduction to delivering data from Hadooploading data to the SAS LASR Analytic Server for analysis and reporting copying Hadoop data to SAS and relational database tablesManaging and Integrating Directivesintroduction to managing and integrating directivescreating data flows by chaining directivesintegrating directives into SAS platform applicationsrunning directives as batch jobsAdditional TopicsSAS and Hadoop data processing SAS DS2 programs debugging Hadoop jobs