Feature Engineering and Data Preparation for Analytics
DMDP : DMDP42
This course introduces programming techniques to craft and feature engineer meaningful inputs to improve predictive modeling performance. In addition, this course provides strategies to preemptively spot and avoid common pitfalls that compromise the integrity of the data being used to build a predictive model. This course relies heavily on SAS programming techniques to accomplish the desired objectives.
Learn How To
- Extract data from a relational data table structure.
- Define population qualifications and create a target sample.
- Use feature engineering techniques to transform transactional data into meaningful inputs into a predictive model.
- Transform low-, mid-, and high-cardinality categorical input variables into meaningful predictive modeling inputs.
- Use ZIP codes and latitude/longitude points to calculate great-circle distance, driving distance, and estimated driving time.
- Use Bayes' theorem to estimate meaningful predictive modeling inputs, impute missing observations, and partition the target sample into training and validation data sets for honest assessment of the predictive model.
Who Should Attend
Analysts, data scientists, and IT professionals looking to craft better inputs to improve predictive modeling performance
Prerequisites
This course assumes some experience in both predictive modeling and SAS programming. Before attending this course, you should have:
- Exposure to DATA step programming equivalent to the SAS Programming 1: Essentials course.
- Exposure to programming in SQL or the SQL procedure.
- Exposure to querying data in PROC SQL and building and deploying a predictive model.
- Familiarity with the analytical process of building predictive models and scoring new data.
SAS Products Covered
SAS/STAT;Base SAS
Course Outline
Extracting Relevant Data
- Data difficulties.
- Assessing available data.
- Accessing available data.
- Drawing a representative target sample.
- Drawing an uncontaminated input sample.
- Advantages and disadvantages of transactions data.
- Common transaction structures.
- Defining the time horizon.
- Fixed and variable time horizon methods.
- Implementing common transaction transformations.
- Definitions and difficulties of nonnumeric data.
- Miscoding and multicoding detection.
- Controlling degrees of freedom.
- Geocoding.
- Exploring input variable distributions.
- Detecting data anomalies.
- Creating custom exploratory tools for candidate input variables.
- Missing value imputation.
- Data partitioning.
Live Class Schedule
Duration: 21 hours
Step into our live classes and experience a dynamic learning environment where you can ask questions, share ideas, and connect with your instructor and classmates. With on-demand lab hours, you can explore the material at your own pace. Our globally acclaimed instructors will motivate you to think bigger, so you can take what you've learned and achieve your biggest goals.
This course isn't publicly scheduled, but private training and mentoring may be available. Contact us to explore options.
Private Training
Get training tailored specifically for your team, led by expert SAS instructors. Choose from virtual sessions, or training at your location (or ours). Perfect for teams seeking a customized curriculum and plenty of interaction with a SAS specialist. We'll schedule it at a time that works for you.
Mentoring Services
Take your training to the next level with personalized mentoring. While private training offers structured coursework, mentoring provides hands-on, real-time support from a subject matter expert. As you work with your own data, you'll receive expert guidance to help you uncover insights, unlock the full potential of your data, and make faster progress. Perfect for those looking to apply what they’ve learned and see quicker results.