Working with SAS® Data Loader for Hadoop
DIDL4H : DL31HD
This course gives business analysts and data scientists a seamless platform to profile, integrate, cleanse, and move big data without writing code in a Hadoop environment using an intuitive web-based interface.
Learn How To
- move data in and out of Hadoop
- interrogate and profile data for quality issues
- transform, transpose, and join data that is fit-for-purpose
- cleanse and integrate data suitable for analysis and reporting
- perform master data management activities of record clustering and survivorship in Hadoop
- load data into the SAS In-Memory Analytics Server for analytics and exploration
- execute custom SAS and HiveQL code inside the Hadoop cluster
- chain custom-built data management flows into re-useable jobs.
Who Should Attend
Business users who interact with data, perform data discovery, query data, and ensure that data is in the proper place and format for other users; data analysts, data scientists, and statisticians who review results of data discovery activities, create new tables, create new data elements, change the format/structure of data tables to view them in a variety of ways, manipulate and score data elements, and load data for use by other users; and data management specialists who apply enterprise standards to the data, ensure data quality throughout the enterprise, move data into and out of the Hadoop cluster, and optimize code running in the Hadoop cluster
Prerequisites
There are currently no prerequisites for this course.
SAS Products Covered
SAS Data Loader for Hadoop
Course Outline
Introduction to Big Data and Hadoop
- big data and Hadoop
- Hadoop ecosystem
- SAS Data Loader capabilities and architecture
- SAS Data Loader directives and tasks
- steps common to most directives
- preparing data for analysis and reporting
- course overview and logistics
- introduction to acquiring and discovering data
- copying a table into Hadoop
- importing a delimited file into Hadoop
- profiling data for inconsistencies
- querying data for relevant columns and rows
- introduction to transforming and transposing data
- transforming data to be fit-for-purpose
- transposing data for use in analysis and reporting
- introduction to cleansing data
- parsing data into meaningful subsets
- standardizing data into consistent formats
- using match codes to determine data similarity
- using names to identify gender
- analyzing data for data types
- applying casing for data consistency
- extracting data in useful tokens
- analyzing data for inconsistent patterns
- introduction to integrating data
- joining data in Hadoop
- sorting and de-duplicating data
- clustering and surviving data to determine a best record
- matching and merging data into a single table
- deleting rows in Hadoop tables
- running user-written programs inside Hadoop
- introduction to delivering data from Hadoop
- loading data to the SAS LASR Analytic Server for analysis and reporting
- copying Hadoop data to SAS and relational database tables
- introduction to managing and integrating directives
- creating data flows by chaining directives
- integrating directives into SAS platform applications
- running directives as batch jobs
- SAS and Hadoop data processing
- SAS DS2 programs
- debugging Hadoop jobs
Live Class Schedule
Duration: 14 hours
Step into our live classes and experience a dynamic learning environment where you can ask questions, share ideas, and connect with your instructor and classmates. With on-demand lab hours, you can explore the material at your own pace. Our globally acclaimed instructors will motivate you to think bigger, so you can take what you've learned and achieve your biggest goals.
This course isn't publicly scheduled, but private training and mentoring may be available. Contact us to explore options.
Private Training
Get training tailored specifically for your team, led by expert SAS instructors. Choose from virtual sessions, or training at your location (or ours). Perfect for teams seeking a customized curriculum and plenty of interaction with a SAS specialist. We'll schedule it at a time that works for you.
Mentoring Services
Take your training to the next level with personalized mentoring. While private training offers structured coursework, mentoring provides hands-on, real-time support from a subject matter expert. As you work with your own data, you'll receive expert guidance to help you uncover insights, unlock the full potential of your data, and make faster progress. Perfect for those looking to apply what they’ve learned and see quicker results.