Lesson 01

Forecasting Using Model Studio in SAS® Viya®
Lesson 01, Section 1 Practice the Demo: Create a Forecasting Project and Load the Data

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you create a new project in Model Studio. The baseline sales forecasts project is used throughout the course.

  1. Sign in to SAS Viya. From SAS Drive, click the Show list of applications menu button in the upper left corner and select Build Models. This takes you to Model Studio.

    Model Studio is an integrated visual environment that provides a suite of analytic tools to facilitate end-to-end data mining, text, and forecast analysis. The tools in Model Studio are designed to take advantage of SAS Viya programming and cloud processing environments to deliver and distribute the results of the analysis, such as champion models, score code, and results.

  2. In Model Studio, click New Project. The New Project dialog box is displayed.

    Note: If this is your first session, there will be no existing projects unless projects were set up for you. (If projects already exist, the New Project button is available in the upper right corner.)

  3. Name your project baseline sales forecasts.

    Note: Naming your project something relevant and adding a reasonably detailed description of the project is considered a forecasting best practice.

  4. For Type, select Forecasting.

    There are three types: Data Mining and Machine Learning, Forecasting, and Text Analytics. This course will only deal with Forecasting.

  5. For Data Source, click Browse to select the modeling data source.

    The Browse Data dialog box is displayed. A list of data sets is displayed in the left-side Available tab. These are data sets that are available in CAS and ready for use in a Model Studio project.

    Important: You cannot import data in SAS Viya for Learners. The table is already loaded and you can skip to step 10 and continue from there.

  6. Click the Import tab, and then select Local file.

  7. Navigate to D:\Workshop\Winsas\FVVF, and select the lookingglass_forecast.sas7bdat table.

  8. Select Open.

  9. Select Import Item.

    Note: If there is a note that the table already exists, you can select the radio button for Replace file to overwrite it.

  10. Click the Available tab to view in-memory tables that are available for model building. Click on the LOOKINGGLASS_FORECAST in-memory table. The details list the column names and characteristics.

  11. Click the Profile tab and select Run Profile to produce summary statistics and other details about the columns in the data.

  12. Click OK and then Save to create the new project. The project now appears.

  13. Ensure that the project's Data tab is selected in order to assign variable roles.

    A Note on Variable Assignment:
    • Individual variables can be selected for role assignment by either clicking the variable name or by selecting the corresponding check box.
    • Individual variables are deselected after their role is assigned by either clearing their check box or selecting another variable's name.
    • More than one variable can be selected at the same time using the check boxes.
    • Because selecting a variable using the check box does not deselect others, it is easy for new users to inadvertently re-assign variable roles. Taking a few minutes to get comfortable with the variable selection functionality is considered a best practice for using the software.

  14. The Txn_Month variable is assigned to the role of Time for the project. Select Txn_Month in the middle variables list panel. In the right properties panel, you will see a table of attributes for that variable. Its natural interval has been detected as monthly.

    Note: Other time intervals are available by selecting the down arrow next to Month. The time interval combined with the Multiplier and Shift options indicates that the desired interval of the time series data is one month and that the 12-month annual cycle starts in January. These options can be changed to modify the time index if it is appropriate for your data.

  15. Sale is the target for the analysis. Click sale in the middle variables list panel. In the right property panel, select Dependent.

    The options indicate that monthly intervaled Sale time series will be created by summing sales each month. Accumulation is the process of creating time series from transactional data.

    Note: Missing interpretation options enable the user to interpret, or impute, values for embedded missing values in the series. By default, embedded missing values have no value assigned to them.

  16. Deselect sale. Assign productline and productname, in that order, to the BY Variable role. Change hierarchical reconciliation to the middle, or productline, level of the hierarchy defined by the assigned BY variables.

    The assigned BY variables define a three-level modeling hierarchy with total monthly sales at the top, productline in the middle, and productline-productname pairs at the bottom.

    Note: The order in which the BY variables are assigned defines the modeling hierarchy, and the order can be changed using the arrows to the right of the selected BY variables.

  17. Additional variables will be assigned to roles. Price, discount, and cost can be useful as explanatory variables in subsequent analyses. Select these three variables, where the order does not matter, and change their roles to Independent and change Usage in system-generated models to Maybe.

    Note: For each of these variables, accumulation is accomplished by averaging observed values in each month.

    Note: By setting Usage in system-generated models to Maybe, you define these three variables as candidate explanatory variables for each series. If the model for a given series accommodates explanatory variables, the non-collinear combination of these three variables that results in the best overall fit is selected.

Forecasting Using Model Studio in SAS® Viya®
Lesson 01, Section 2 Practice the Demo: Load an Attributes Table to Subset the Time Series

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

A unique and useful feature in SAS Visual Forecasting is the ability to visualize the modeling data and operate on generated forecasts outside the hierarchy defined by the project's BY variables. The hierarchical arrangement of the modeling data for this project is defined by product characteristics. However, it is routinely useful to be able to explore and operate on forecasts across facets of the data such as customer demographics or geographic regions.

In this practice, you incorporate the LG_ATTRIBUTES table into the baseline sales forecasts project and then use the variables in the table to expand the ways that the modeling data can be visualized.

  1. In SAS Drive, click the tab for Build Models and double-click the baseline sales forecasts project. This is the project that you created in the last practice.

  2. Change the data source type from Time Series to Attributes by navigating to the data sources panel, selecting the New data sources menu and then selecting Attributes. Note: A default attributes table is created when the BY variables are assigned in the project. The BY variables that define the modeling hierarchy are primary attributes for the project.

    Important: The next 2 steps are for importing the table. You cannot import data in SAS Viya for Learners, so the table is already loaded for you. On the Available tab, select LG_ATTRIBUTES and click OK. Then, go to step 5.

  3. Click on the Import tab to import a new data source.

  4. Select Local File and navigate to D:\Workshop\Winsas\FVVF. Select lg_attributes.sas7bdat to load the table into memory. Click on the radio button to Replace file and then click the button to Import Item. Then click OK.

  5. The in-memory table, LG_ATTRIBUTES, is now the attributes table for the project. This table contains two new attributes: a geographic indicator, Cust_Region, and a margin flag, margin_cat. The margin flag categorizes the profitability of product names as LOW, MED, or HIG (high).

  6. Switch to the Pipelines tab by selecting it.

    This first pipeline incldues a Data node, Auto-forecasting, Model Comparison, and Output.

  7. Right-click and run the Data node.

    Note: Pipelines are structured analytic flows and are described in detail later in the course.

  8. After the Data node runs (you will see a green circle with a check mark inside), right-click the green checkmark and select Time series viewer.

    The envelope plot shows the aggregated data at the top level of the hierarchy (918 of 918 series). The colored bands illustrate one and two standard deviations around the aggregated series. The available attribute variables are listed in the left filters panel.

  9. You can explore time series in the middle level of the hierarchy by expanding the product line attribute. By default, the product line attribute should already be expanded. Visualize demand for the product line series, Line07. Under the productline attribute, select Line07.

    The plot changes on the fly to show an aggregation of the four product names contained in Line07: Product 21, Product 22, Product 23, and Product 24. Notice that the Envelope Plot changes because it is now relevant for only the four product lines in Line07.

  10. Expanding the Cust_Region attribute and selecting Greater Texas plots the one product name that flows through both Line07 and the Greater Texas region.

  11. You can select Reset to remove the filters that you created based on attributes and return to 918 series displayed.

Forecasting Using Model Studio in SAS® Viya®
Lesson 01, Section 2 Practice: Creating a Project and a Visualization

Answer the following questions based on the baseline sales forecast project and the LG_ATTRIBUTES table.

  1. How many high-margin (HIG) product name series are in the modeling data?

    Solution:

    • Select Reset to remove any previously applied filters.
    • Expand the margin_cat attribute.
    • Select the HIG category.

    There are 109 product name series that are high margin.

  2. How many high-margin product name series are there in the South customer region?

    Solution:

    • Select the South customer region.

    There are 18 high-margin series in the South customer region.

  3. Characterize the combined, high-margin product name sales variation in the South and Mid Atlantic customer regions from the start of 2014 until the end of the data.

    Solution:

    • Add Mid Atlantic to the Cust_Region filter. There are 40 high-margin product name series in the South and Mid Atlantic customer regions.
    • Click the Show button in the top right corner of the plot, and then select Show overview axis to help focus on the time period of interest.
    • Move the left overview axis tab over to JAN2014 to focus on the time period of interest.

    Beginning in JAN2014, average sales for high-margin product names in these regions are between 400 and 500 units, and they appear to be trending up slightly. Placing your mouse pointer on the three highest peaks in demand reveals that these spikes correspond to the following intervals; NOV2014, JAN2015, and APR2016. Because sales peaks in the previous history do not correspond to these months, these peaks are probably an artifact of promotional activity, and not seasonal variation in the data.

Lesson 02

Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 3 Practice the Demo: Perform Basic Forecasting with a Pipeline

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you perform basic forecasting with a pipeline.

  1. Starting from SAS Drive, select Build Models and open the baseline sales forecasts project created previously.

  2. Navigate to the Pipelines tab.

    The Auto-Forecasting template is the default pipeline template for Visual Forecasting. It consists of the essential steps in a forecasting analysis:
    • accumulates the data into time series
    • automatically identifies, estimates, and selects forecast models for the time series
    • assesses forecasting results
    • publishes results for use outside the pipeline

    Note: If the modeling data are hierarchically arranged, the identification, estimation, and selection steps in the default forecasting pipeline are done on series in the base level of the hierarchy.

  3. Select Run Pipeline in the upper right corner of the workspace.

    Note: If you run into problems with this step, make sure that the modeling and attribute tables are loaded in memory. If the server containing in-memory versions of the modeling and attributes table has been shut down since you last opened the project, tables need to be reloaded.

Auto-forecasting Node Results

  1. Right-click on the Auto-forecasting node and select Results.

    Because the Auto-forecasting node is designed to be run with minimal input from the analyst, relatively few options are surfaced for this node. The Auto-forecasting node automatically identifies, estimates, and generates forecasts for the 918 series in the base or product name level of the modeling hierarchy. Most of the forecast models selected for these series are in the ARIMAX family.

    For each series, two families of time series models are considered by default: ARIMAX (ARIMA with exogenous variables) and ESM (exponential smoothing models). The champion model for each series is chosen based on root mean square error. Other selection statistics are available in the Model selection criterion option.

    The MAPE Distribution histogram is located in the upper left-hand corner. The distribution of Mean Absolute Percent Error (MAPE) for forecasts in the product name level of the hierarchy can be used to compare the accuracy of different forecast models. Each of the bars represents the proportion of the series that have a specific range of MAPE values. In general, smaller values of MAPE imply greater accuracy. MAPE is an alternative selection criterion supported in the software.

    The Model Type chart, located in the lower left, summarizes systematic variation found in the identification process. Approximately 72% of the forecast models selected at least one of the candidate input variables, about 34% of the series have a seasonal pattern, and about 30% selected a Trend Model.

    The Model Family histogram is located in the upper right. Among each of these 918 series, approximately 72% were modeled best using the ARIMA model. The rest, 27.23% percent, were modeled using an Exponential Smoothing Model.

    The Execution Summary, located in the lower right, provides information about results that are potentially problematic, anomalous, or both.

  2. Click the Output Data tab above the MAPE Distribution plot.

    Several output tables are created. You can view them by clicking on them.

    Note: In order to view the OUTFOR data source, you also need to click the View Output Data button. This file is large, containing forecasted values for every indexed time interval in the forecast range.

  3. Click the OUTMODELINFO data source to open it.

    For each series, the selected model is named and attributes of the model are displayed.

  4. Close the Results window.

  5. Right-click and open the results of the Model Comparison node.

    The Champion Model is the Auto-forecasting model, which is the only one included in the pipeline. WMAE and WMAPE are weighted sums of the MAE and MAPE values across all series. WMAPE and WMAE represent average performance of all the models in a modeling node.

    Note: For the WMAPE and WMAE, the final computation is based on weighted measurements from each time series, where more weight is given to time series with a higher average of the dependent variable.

Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 4 Practice the Demo: Honest Assessment

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you select models using honest assessment.

  1. If it is not still open, reopen the project from the previous demonstration.

  2. Click the Options ellipsis on the Pipeline 1 tab and select Duplicate.

  3. Rename the new pipeline by clicking on its Options and selecting Rename.

  4. Rename this pipeline Honest Assessment Auto.

  5. Click the Auto-Forecasting node to make it active.

  6. On the right, in the node options area, expand both the Model Generation and Model Selection menus.

  7. Under Model Generation, check the box to include UCM models.

    Note: UCM models can lead to excessive run time. They are usually reserved for special case or high-value series.

  8. Under Model Selection, change Number of data points used in the holdout sample from 0 (the default) to 12 (a full seasonal cycle's worth of the monthly data) and then either click the return key or click outside of the box.

    When you do this, another box will appear asking you for a percentage of data points to use in the holdout sample. If you also put a value here, the holdout sample will be the smaller of the two.

  9. Enter 25 for Percentage of data points used in the holdout sample.

    Note: The actual size of the holdout sample is the smaller of the number of data points selected and the percentage of data points. This value can vary from series to series in a project.

  10. Leave the model selection criterion as MAPE and run this pipeline.

  11. Right-click and open the results of the Auto-forecasting node.

    The ESM model is selected for nearly half of the series. The UCM model accounts for another quarter. Remember that these models were selected on the basis of MAPE on the holdout sample of 12 time points, rather than the fit sample, which was the basis for assessment in the previous pipeline.

  12. Close the Results window.

  13. Right-click and open the results of the Model Comparison node.

    WMAE and WMAPE are slightly higher for the honest assessment pipeline than for Pipeline 1. That is to be expected because the data used to assess the models were not the same as the data used to generate the models.

Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Practice the Demo: Exploring More Pipeline Templates

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you explore pipeline templates other than the default Auto-forecasting template. This practice proceeds from the end of the previous one.

  1. From the Pipelines tab in the baseline sales forecasts project, click on the plus sign (+) to add a new pipeline.

  2. Name the new pipeline Naïve model forecasts. The model chosen for all of our time series will be a seasonal random walk model.

  3. Select Browse from the Template drop-down menu. The templates described earlier can be accessed here.

  4. Select the Naïve Forecasting template and click OK.

  5. Click Save in the New pipeline window.

    A new pipeline is created based on the Naïve modeling node. The subsequent added nodes are described previously.

  6. Select the Naïve model node and look at the options on the right.

    The options on the Naïve modeling node (in the Node options menu on the right) indicate that a Seasonal random walk model will be fitted to each series.

    Note: These models can be useful for providing benchmark measures of forecasting accuracy.

  7. Run the Naïve model forecasts pipeline by clicking on Run Pipeline.

  8. Right-click the Model Comparison node and select Results.

    Holdout samples are not options within the Naïve Model forecasting node, so WMAE and WMAPE are based on the entire samples of each series.

  9. Close the results.

Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Practice: Working with Pipelines

Answer the following questions based on the results from the previous practice.

  1. Based on the results of the Naïve model forecasting node, describe the distribution of the generated MAPE statistics for the forecasts generated by the naive forecasting models.

    Solution:

    • Expand the MAPE Distribution pane.
    • Move the cursor over the bar (or bars) with the greatest Percent values.

    The MAPE distribution for the naïve models is close to being multi-modal. Its peak has a MAPE value just above 8.9.

  2. Use the modeling results generated in Pipeline 1 to answer the following question: Does the choice of naïve forecasting model seem appropriate?

    Solution:

    The Model Type chart shows that only about 34% of the models accommodate a seasonal pattern. The seasonal random walk might not be the best choice.

  3. Change the naïve model type to Moving average and rerun the pipeline. Describe any changes to the distribution of MAPE values that result.

    Note: The results in the rest of this lesson are based on the choice of a Seasonal random walk for the Naïve model type.

    Solution:

    • Change Naïve model type to Moving average in the Naïve forecasting node options.
    • Rerun the pipeline. Open the results of the Naïve model forecasting node.

    The distribution of MAPE values has changed. The primary difference is some relatively large generated MAPE values.

Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Practice the Demo: Pipeline Comparison

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you compare pipelines and use accuracy and fit statistics to determine a champion model. This practice proceeds from the end of the previous one.

  1. Click the Pipeline Comparison tab.

    The results indicate that Pipeline 1 is selected as the champion. This is because the forecasts generated by the models in Pipeline 1 have the lowest aggregated root mean square error among the pipelines in the comparison. Relative values of other statistics of fit are also shown.

    Note: The declaration of a champion pipeline is important for subsequent steps in the forecasting workflow. The forecast table that can be exported from the project is based on the models in the champion pipeline. Also, any overrides that are set will be implemented on the champion model forecasts. Overrides are described in detail later.

    Recall that Pipeline 1 did not use a holdout sample and, therefore, the models were not selected using honest assessment. Because the purpose of these models is forecasting, Pipeline 1 should be excluded from consideration as a champion model. Based on WMAPE, the Honest Assessment Auto pipeline would beat the Naïve Model Forecasts pipeline. Manually select that pipeline as the champion pipeline.

  2. Right-click the Honest Assessment Auto pipeline and select Set as champion from the drop-down menu. The pipeline has changed.

  3. To compare summary results and diagnostics across pipelines, select check boxes next to the Honest Assessment Auto and Naïve Model Forecasts pipelines and then click Compare.

    You can now compare the MAPE distributions and Execution Summary results across all selected pipelines in one window.

  4. Click Close to exit the compare window and keep Pipeline 1 as the champion model.

    Note: The pipeline selection criterion can be changed, and the automated choice of the champion pipeline can be overridden. For example, to manually change the champion pipeline, clear the box for Pipeline 1, click the Project pipeline menu icon (the three vertical dots in the upper right), and select Set as champion. Selecting a new Champion would recreate the data used and override the project.

Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Practice: Comparing Pipelines

Answer the following question based on the pipeline comparison results from the previous practice.

What is the mode of the MAPE distribution for each of the two pipelines?

Solution:

The mode for Pipeline 1 is about 5.28. The mode for the Naïve Model Forecasts pipeline is about 11.56.

Lesson 03

Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 1 Practice the Demo: Generate Hierarchical Forecasts with the Default Settings

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you use the BY variables, productline and productname, to perform hierarchical forecasting using the Hierarchical Forecasting node.

  1. Starting from SAS Drive, create the LG hierarchy forecast project. This project will use the same in-memory tables and variable metadata as the baseline sales forecasts project created earlier. For convenience, a summary of the project creation steps is below.
    • From the Show list of applications menu, select Build Models.
    • On the right side of Model Studio, select New Project.
    • Name the project LG hierarchy forecast, set the type to Forecasting, and provide a reasonably detailed description.
    • Navigate to the LOOKINGGLASS_FORECAST table on the Available tab and click OK.
    • Save the new project.
    • Assign variable roles. Txn_Month is the time variable, the dependent variable is sale, and the BY variables are productline and productname. Set Reconciliation level to productline.
    • Select and edit price, discount, and cost. Set the role of these variables to Independent and change Usage in system-generated models to Maybe.

  2. Navigate to the Pipelines tab, and select Run Pipeline. This pipeline is identical to the ones used previously, but running it now allows you to compare this pipeline to the Hierarchical pipeline that is used later.

  3. Click the plus (+) to add a new pipeline.

  4. Name the new pipeline Pipeline 2.

  5. Select Hierarchical Forecasting for the template and click Save.

  6. In contrast to the Auto-forecasting node, the Hierarchical Forecasting node allows extensive customization. Select the Hierarchical Forecasting node and scrutinize the options on the right side of the screen.

  7. Click on Run Pipeline.

  8. When it finishes running, right-click the Hierarchical Forecasting node and select Results.

  9. Results are given on both the productline (middle) and productname (base) levels of the hierarchy. Model Type and Model Family results are added to the previously introduced diagnostics.

    Note: Recall that the modeling hierarchy was set when the productline and productname variables were assigned as BY variables in the project.
  10. The average Weighted MAPE over all of the series is 3.40 on the productline hierarchy level, and the Weighted MAPE is 5.76 on the hierarchy level productname. Within each of those, you can see the MAPE Distribution of all the possible series used. Looking at productline, notice that the MAPE values are bunched up between 3 and 4.

    The Model Family information indicates that this is still a selection among simple models. The best models selected were mostly ESM models, about 54%. The ARIMA models were the best models for about 45.5 percent in this series.

    Looking at the Model Types, notice that about 45.5 percent of the models used the independent, or input, variables. Among those models, nearly 68% had seasonal components to them, and about 38% had a trend. And we can look at the same information on the hierarchy level productname.

  11. Close the results.

Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 1 Practice: Hierarchical Forecasting

Answer the following questions based on the results from the previous practice.

  1. Compare the MAPE distributions at the productline and productname levels in the Hierarchical Forecasting results. At what level of the hierarchy is the best fit to the data obtained?

    Solution:

    The aggregated MAPE and MAPE distribution results indicate that the productline level of the hierarchy has the best fit to the data of the two levels shown.

  2. Is the result that you found above ubiquitous in forecasting? Provide some insight into why this result occurred.

    Solution:

    Yes, this result is fairly common. Series at the base level of the hierarchy are typically sparse and noisy. As the data in the middle and upper levels of the hierarchy are created through the process of aggregation, series become less sparse, and systematic relationships between the target and input variables tend to become easier to detect.

Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Add Combined Models to the Hierarchical Forecasting Pipeline

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

The previous practice generated forecasts for all series in the three-level hierarchy under the default settings for the Hierarchical Forecasting node. In this practice, you try to improve the fit of the forecasts by adding combined models to the pipeline.

For each series, the combined model combines the generated forecasts from default families of models considered for that series to produce a new forecast. The default combination method is a simple average of forecasts.

This practice proceeds from Pipeline 2, created in the previous practice.

  1. Expand the Nodes menu on the left side of the workspace.

  2. Click and then drag and drop a Hierarchical Forecasting node on top of the Data node.

  3. Right-click and rename the new Hierarchical Forecasting node to Hierarchical Forecasting with combined models. Click OK.

  4. Select the Hierarchical Forecasting with combined models node, and expand the Model Generation options on the Node Options panel on the right.

    Notice that the default options for Model Generation include ARIMA and ARIMAX models, and ESM models. The sliders for UCM and external models are not slid to the right, so those models are not included.

  5. Scroll down to the Include combined models option and slide the toggle to on.

    With combined models, you can average the results from all of the ARIMA and ESM models. Often, the combined models can perform better in forecast than the ARIMA models and the ESM models individually.

    Keep the default methods for combination, a straight average of all the models. Keep all the other statistics and options as they are by default.

  6. Select Run Pipeline to run the updated components.

  7. Right-click the Hierarchical Forecast with combined models node and open Results. The Model Family results show that the majority of forecast models selected for the series in the base and middle levels of the hierarchy are generated by Combined (comb) forecasts.

    The aggregated, or weighted, MAPE measures have improved, relative to the forecasts generated under the default settings, for both levels of the hierarchy. The Weighted MAPE is 3.21 for the productline, and 5.08 for productname.

    The combined model was chosen as the best model for about 63% of the series. ARIMA accounts for only about 21% of this series, and the exponential smoothing about 16%.

  8. Close the results.

  9. Right-click on the Model Comparison node and select Results. The Hierarchical Forecasting with combined models node with combined models is the champion for the pipeline. The Weighted MAPE for the Hierarchical Forecasting with combined models is 5.08, compared with the Weighted Mape of the Hierarchical Forecasting node, 5.76, and smaller values are better.

  10. You can compare results at the base level of the hierarchy across the two pipelines in the diagnostics. Close the results.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Select Models Based on Forecast Accuracy

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

One potential issue with the selection of the Hierarchical Forecasting with combined models node as the champion in the previous practice is that the selection criterion reflects how well the models fit the series in the training data.

In this practice, you split each time series in the data into two parts: training and validation. The Champion modeling node is selected based on aggregated, out-of-sample performance, or accuracy. This practice proceeds from Pipeline 2, created in the previous practice.

  1. Select the default Hierarchical Forecasting node, expand the right Node Options panel, and expand the Model Selection options.

  2. Change the Model selection criterion to RMSE (Root Mean Squared Error).

  3. Change the number of data points used in the holdout sample to 12. The number of data points is set to 12 because this is monthly data, and the holdout sample should include at least one seasonal cycle of data.

  4. Change the percentage of data points used in the holdout sample to 25. The holdout sample typically includes a maximum of about 25% data points.

    Note: When you choose both criteria for a number of data points and percentage of data points, the smaller number of observations generated by either of these restrictions is used as the holdout sample size for each series.

  5. Select the Hierarchical Forecasting with combined models node, expand the right Node Options panel, and expand the Model Selection options.

  6. Change the Model selection criterion to RMSE (Root Mean Squared Error).

  7. Change the number of data points used in the holdout sample to 12 and the percentage of data points used in the holdout sample to 25.

  8. Rerun the pipeline by clicking on Run Pipeline.

  9. Right-click on the Hierarchical Forecasting with combined models node and click on Results. The Model Family and Model Type results are similar, but the MAPE distributions and aggregated MAPE values have changed over the base and middle levels of the hierarchy. The Weighted MAPE on the hierarchy level productline is now 3.49, and Weighted MAPE for productname is 5.77.

    The distributions of the models that are selected are slightly different. The combined models within productline are not necessarily the majority of the models that were selected. These diagnostics are now based on residuals generated over the holdout sample region for each series. That is, they are accuracy statistics. In general, the MAPE values tend to be a bit larger when working on a holdout sample.

  10. Close the results.

  11. Right-click on the Model Comparison node and select Results. Although the choice of the champion pipeline has not changed, this result is more relevant. The pipeline with the models that extrapolate best onto data that they have not seen before is chosen as the champion.

Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Share a Custom Pipeline via the Exchange

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you use the pipeline developed in the previous practice as a custom template for other forecasting projects. The Exchange provides a repository for collecting and sharing project objects with others. This practice proceeds from Pipeline 2, created in the previous practice.

  1. From Pipeline 2 in the LG hierarchy forecast project, click on Options. Select Save to The Exchange.

  2. Name the pipeline LG Hierarchical Forecasting with Combined Models. Add a description and click Save.

    Note: Providing a representative name and a detailed description is always useful.

  3. On the left side of the window, click the icon for The Exchange.

  4. Under Templates on the left panel, expand Pipelines and select Forecasting. The custom pipeline that was saved from the LG hierarchy forecast project is now available to others.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice: Using a Custom Pipeline

  1. Create a new project, TEST custom pipelines, based on the LOOKINGGLASS_FORECAST table. Generate forecasts using the custom LG hierarchical forecasting with combined models pipeline that was saved previously.

    Solution:

    1. Starting from the main page in Model Studio, create a new project, and name it TEST custom pipelines. This project uses the same in-memory tables and variable metadata as the project created earlier.

    2. Set the type to Forecasting and provide a reasonably detailed description.

    3. Navigate to the LOOKINGGLASS_FORECAST table on the Available tab and click OK.

      Note: If the LOOKINGGLASS_FORECAST table is not on the Available tab, you need to load it into memory following steps shown previously in the course.

    4. Assign variable roles. Txn_Month is the time variable, the dependent variable is sale, and the BY variables are productline and productname. Set the hierarchical reconciliation level to productline.

    5. Select and edit price, discount, and cost. Set the role of these variables to Independent, and change Usage in system-generated models to Maybe.

    6. Navigate to the Pipelines tab and select New Pipeline. Name the pipeline and browse the available templates.

    7. Select the saved pipeline. Click OK

    8. Click Save, and then run the new pipeline to reproduce the results generated in the previous practice.

Lesson 04

Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Add the Attributes Table to a Project

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

Attributes are useful for visualizing the data outside the dimensions defined by the modeling hierarchy. This practice proceeds from the LG hierarchy forecast project created in the previous lesson.

  1. Start from SAS Drive, and open the LG hierarchy forecast by double-clicking it.

  2. Click the Data tab. Click the New data source menu button and select Attributes.

  3. Select LG_Attributes from the list of available data sources. Click OK. The LG_Attributes table is now the attributes table for the project.

  4. Click the Pipelines tab, and open Pipeline 2. Rerun the pipeline.

  5. Open Pipeline 1 and perform the modifications that you made to Pipeline 2 earlier.
    • Click on Auto-forecasting and expand the Node options.
    • Expand Model Selection.
    • Change the model selection criterion to RMSE(Root Mean Square Error).
    • Change the number of data points used in the holdout sample to 12.
    • Change the percentage of data points used in the holdout sample to 25.
    • Rerun the pipeline.

  6. Select Pipeline Comparison. Pipeline 2 is the champion pipeline. Forecasts shown on the Overrides tab are generated by the champion node from the champion pipeline in a project.

  7. Click the Overrides tab. The plot shows an aggregation of the 918 series in the base level of the hierarchy.

Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Apply Overrides to Generated Forecasts

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

Attributes can also be helpful in post-modeling tasks such as applying overrides. This practice proceeds from the previous practice.

  1. Start from SAS Drive, and open the LG hierarchy forecast by double-clicking it.

  2. Click the Data tab. Click the New data source menu button and select Attributes.

  3. Select LG_Attributes from the list of available data sources. Click OK. The LG_Attributes table is now the attributes table for the project.

  4. Click the Pipelines tab, and open Pipeline 2. Rerun the pipeline.

  5. Open Pipeline 1 and perform the modifications that you made to Pipeline 2 earlier.
    • Click on Auto-forecasting and expand the Node options.
    • Expand Model Selection.
    • Change the model selection criterion to RMSE(Root Mean Square Error).
    • Change the number of data points used in the holdout sample to 12.
    • Change the percentage of data points used in the holdout sample to 25.
    • Rerun the pipeline.

  6. Select Pipeline Comparison. Pipeline 2 is the champion pipeline. Forecasts shown on the Overrides tab are generated by the champion node from the champion pipeline in a project.

  7. Click the Overrides tab. The plot shows an aggregation of the 918 series in the base level of the hierarchy.

Applying Overrides to the Generated Forecasts

The Overrides functionality basically works in two steps: creation and implementation.
In the following steps, two overrides are created:

  • Forecasts in the South customer region will be reduced by 20% for the first three months of 2017 to accommodate a pending strike among delivery drivers.
  • High-margin products in the Greater Texas region will be increased by 15% in response to pending promotional activity that will occur in July of 2017.

These overrides will be implemented, and an impact analysis of their effects on the model's forecasts will be reviewed.

  1. To implement the first listed override, expand the Cust_Region attribute and select the South region. The plot changes on the fly to show an aggregation of the 197 productname series in the South region.

  2. Right-click the Override cell under 01/01/2017 and select Override Calculator.

  3. Add 02/01/2017 and 03/01/2017 to apply the override to the first three months of 2017 using the plus button +. Click OK.

  4. Click Filter and name the item Override.

  5. Because the goal is to reduce forecasts in the South region by 20% during the time range specified above, select Adjust based on an existing forecast value and then select Final Forecast.

    Note: In this case, final forecasts are statistical forecasts that have been adjusted for reconciliation.

  6. Set Aggregate final forecast lock to on.

    Note: Here, the forecast lock is a restriction on the aggregated final forecast of all productname series in the South region. Forecasts for individual series in the override group are free to vary, but they must sum to the override values.

  7. Set Adjustment to -20%. Click OK.

  8. Click OK.

  9. The overrides are currently pending. Right-click on any of the three override cells and select Submit All. The second override is a 15% increase for high-margin forecasts in the Greater Texas region in JUL2017.

  10. Select Reset all from the attributes menu on the left, and then select the Greater Texas region and the high (HIG) margin category. The plot changes to show forecasts and actual values of the 16 high-margin series that flow through the Greater Texas region.

  11. Select the Override cell under 07/01/2017, and right-click it to access the Override Calculator.

  12. Change the Adjustment value to +15%. Click Filter and name this override OverrideTXHIG.

  13. Click OK.

  14. A message box might appear, warning about pending overrides. If it does appear, select Submit All. If not, you have created a pending override. Right-click the cell with the pending override value, and select Submit All. The final forecast and the forecast plot now reflect the JUL2017 promotion override.

  15. Click the Override Management tab. The newly created override is added to the list. Overrides can also be modified from here. The Override Calculator and the Delete overrides button are on the top right of this page.

Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Resolve Override Conflicts

Override conflicts occur when two or more overrides create a forecasting outcome that is infeasible for one or more time intervals. Conflicts arise with locked forecast overrides. In this practice, you will add another override to forecasts for series in the South region for the first month of 2017 to illustrate a conflict.

Assume that you have information that LOW margin series in the South region are somehow exempt from the pending strike for the first month of 2017, and that these products are also going on promotion in this month. The net effect of these two phenomena is hypothesized to be an increase of 60%.

  1. Back on the Overrides tab, click Reset all, and then select the South customer region and the LOW margin category. The plot changes to show the 141 time series in this cross section of the data.

  2. Right-click the 01/01/2017 Override cell, and access the Override Calculator.

  3. Set the adjustment to the final forecast to +60%, and lock the aggregate final forecast for this subset of series. Click Filter and name the item OverrideSouthLOW.

  4. Click OK.

  5. Right-click any Override cell, and select Submit all.

  6. The two locked overrides submitted for JAN2017 on the cross section of South and South and LOW margin have created an infeasible final forecast outcome. The two options for resolution are listed below. If the Conflicts Detected box does not appear, go back and make sure that you locked both of the previous overrides.

  7. Select Resolve Automatically.

    Note: Selecting Resolve Manually takes you back to the Override Calculator to implement a conflict solution. Selecting Resolve Automatically calls an optimization algorithm to find a feasible solution for the conflict that is as close to the desired override restrictions as possible.

  8. Right-click the 01/01/2017 Override cell, and select Impact Analysis.

    The impact analysis for Group 3 (the 141 LOW margin series in the South region for JAN2017) shows that the final forecast is a compromise between the Previous Final Forecast (first override) and the second override, applied above. The Delta shows the net effect of the two overrides.

  9. Select Filter3 (or whichever filter is associated with Group 3) to see the plot of the final forecasts for these 141 series.

Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Export Forecasts

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

Following the override process, the final forecasts from the champion pipeline are consistent with business knowledge and are ready to be disseminated. Making the forecasts available to other team members and project stakeholders is straightforward. This practice proceeds from the end of the previous one.

  1. From the Overrides tab in the LG hierarchy forecast project, click More (the "snowman") and select Export all data.

  2. Select the Public directory or another directory to which you have Write access. Keep the default name for the exported table, LG Hierarchy Forecast_OUTFOR. Click Export.

    Note: The Promote table option is selected. This means that the table is accessible by other team members and in other tools, such as SAS Visual Analytics.

  3. Navigate to Explore and Visualize Data.

  4. This functional area provides access to SAS Visual Analytics. Select Data. The exported data are loaded in memory and are available.

  5. Navigate to Data Sources and the public folder. Notice that there is also an alternative version of the table in SASHDAT format.

  6. Select LG HIERARCHY FORECAST_OUTFOR from the Available tab and click OK.

  7. Click and drag the Prediction Errors variable into the workspace. The default chart option for this variable displays a histogram of the forecast errors.

Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice: Using the Filter Option to Create a Histogram

Perform the following task based on the results from the previous practice.

  1. Use the Filter option in SAS Visual Analytics to create a histogram of prediction errors for productline, Line01 in the LG HIERARCHY FORECAST_OUTFOR in-memory table.

    Note: It is to your advantage to take notice of extreme departures from normality.

    Solution:

    • From the histogram shown in the previous demonstration, select Filters from the menu on the right side of the workspace.
    • Select Name of product line as the variable and then select Line01. Do this by first deselecting all and then reselecting Line01 only.


    The histogram changes on the fly to show the selected subset of prediction errors.