Available in:
Hadoop Data Management with Hive, Pig, and SAS®
DIHPS : DIHDM
In this course, you use processing methods to prepare structured and unstructured big data for analysis. You learn to organize this data into a variety of Hadoop distributed file system (HDFS) storage formats for processing efficiency using Apache Hive and Apache Pig. You also learn SAS software technologies that integrate with Hive and Pig and how to leverage these open source capabilities by programming with Base SAS and SAS/ACCESS Interface to Hadoop.
Learn How To
- Move data in and out of the Hadoop Distributed File System (HDFS).
- Create processing-efficient Hadoop data storage formats.
- Use Hive to design a data warehouse in Hadoop.
- Perform data analysis using Hive Query Language (HiveQL).
- Join data sources using HiveQL.
- Perform extract, load, and transformation.
- Create and access processing-efficient Hadoop storage formats using Hive table definitions.
- Perform analysis on unstructured data using Apache Pig.
- Join massive data sets using Pig.
- Use user-defined functions (UDFs).
- Analyze big data using Pig.
- Use SAS programming to submit Hive and Pig programs that execute in Hadoop and store results in Hadoop or return results to SAS.
Who Should Attend
Data scientists and programmers, database administrators, applications developers, and ETL developers who are looking for an in-depth technical overview of data management in the Hadoop ecosystem
Prerequisites
A basic understanding of and experience with UNIX and SQL is preferred. For advanced topics such as user-defined functions, prior programming experience is necessary.
SAS Products Covered
SAS Data Connector to Hadoop;Base SAS
Course Outline
Hadoop Essentials
- Hadoop architecture.
- Hadoop ecosystem.
- Apache Hive overview.
- Data definition language.
- Hive SerDes and storage formats.
- Data manipulation language.
- Apache Pig overview.
- Anatomy of a Pig script.
- Basic Pig programming.
- Pig programming using functions.
- Using Base SAS tools.
- Using SAS/ACCESS methods.
Live Class Schedule
Duration: 14 hours
Step into our live classes and experience a dynamic learning environment where you can ask questions, share ideas, and connect with your instructor and classmates. With on-demand lab hours, you can explore the material at your own pace. Our globally acclaimed instructors will motivate you to think bigger, so you can take what you've learned and achieve your biggest goals.
This course isn't publicly scheduled, but private training and mentoring may be available. Contact us to explore options.
Private Training
Get training tailored specifically for your team, led by expert SAS instructors. Choose from virtual sessions, or training at your location (or ours). Perfect for teams seeking a customized curriculum and plenty of interaction with a SAS specialist. We'll schedule it at a time that works for you.
Mentoring Services
Take your training to the next level with personalized mentoring. While private training offers structured coursework, mentoring provides hands-on, real-time support from a subject matter expert. As you work with your own data, you'll receive expert guidance to help you uncover insights, unlock the full potential of your data, and make faster progress. Perfect for those looking to apply what they’ve learned and see quicker results.