Lesson 01


Forecasting Using Model Studio in SAS® Viya®
Lesson 01, Section 1 Practice the Demo: Create a Forecasting Project and Load the Data

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you create a new project in Model Studio. The baseline sales forecasts project is used throughout the course.

  1. Sign in to SAS Viya. From SAS Drive, click the Show list of applications menu button in the upper left corner and select Build Models. This takes you to Model Studio.

    Model Studio is an integrated visual environment that provides a suite of analytic tools to facilitate end-to-end data mining, text, and forecast analysis. The tools in Model Studio are designed to take advantage of SAS Viya programming and cloud processing environments to deliver and distribute the results of the analysis, such as champion models, score code, and results.

  2. In Model Studio, click New Project. The New Project dialog box is displayed.

    Note: If this is your first session, there will be no existing projects unless projects were set up for you. (If projects already exist, the New Project button is available in the upper right corner.)

  3. Name your project baseline sales forecasts.

    Note: Naming your project something relevant and adding a reasonably detailed description of the project is considered a forecasting best practice.

  4. For Type, select Forecasting.

    There are three types: Data Mining and Machine Learning, Forecasting, and Text Analytics. This course will only deal with Forecasting.

  5. For Data Source, click Browse to select the modeling data source.

    The Browse Data dialog box is displayed. A list of data sets is displayed in the left-side Available tab. These are data sets that are available in CAS and ready for use in a Model Studio project.

    Important: You cannot import data in SAS Viya for Learners. The table is already loaded and you can skip to step 10 and continue from there.

  6. Click the Import tab, and then select Local file.

  7. Navigate to D:\Workshop\Winsas\FVVF, and select the lookingglass_forecast.sas7bdat table.

  8. Select Open.

  9. Select Import Item.

    Note: If there is a note that the table already exists, you can select the radio button for Replace file to overwrite it.

  10. Click the Available tab to view in-memory tables that are available for model building. Click on the LOOKINGGLASS_FORECAST in-memory table. The details list the column names and characteristics.

  11. Click the Profile tab and select Run Profile to produce summary statistics and other details about the columns in the data.

  12. Click OK and then Save to create the new project. The project now appears.

  13. Ensure that the project's Data tab is selected in order to assign variable roles.

    A Note on Variable Assignment:
    • Individual variables can be selected for role assignment by either clicking the variable name or by selecting the corresponding check box.
    • Individual variables are deselected after their role is assigned by either clearing their check box or selecting another variable's name.
    • More than one variable can be selected at the same time using the check boxes.
    • Because selecting a variable using the check box does not deselect others, it is easy for new users to inadvertently re-assign variable roles. Taking a few minutes to get comfortable with the variable selection functionality is considered a best practice for using the software.

  14. The Txn_Month variable is assigned to the role of Time for the project. Select Txn_Month in the middle variables list panel. In the right properties panel, you will see a table of attributes for that variable. Its natural interval has been detected as monthly.

    Note: Other time intervals are available by selecting the down arrow next to Month. The time interval combined with the Multiplier and Shift options indicates that the desired interval of the time series data is one month and that the 12-month annual cycle starts in January. These options can be changed to modify the time index if it is appropriate for your data.

  15. Sale is the target for the analysis. Click sale in the middle variables list panel. In the right property panel, select Dependent.

    The options indicate that monthly intervaled Sale time series will be created by summing sales each month. Accumulation is the process of creating time series from transactional data.

    Note: Missing interpretation options enable the user to interpret, or impute, values for embedded missing values in the series. By default, embedded missing values have no value assigned to them.

  16. Deselect sale. Assign productline and productname, in that order, to the BY Variable role. Change hierarchical reconciliation to the middle, or productline, level of the hierarchy defined by the assigned BY variables.

    The assigned BY variables define a three-level modeling hierarchy with total monthly sales at the top, productline in the middle, and productline-productname pairs at the bottom.

    Note: The order in which the BY variables are assigned defines the modeling hierarchy, and the order can be changed using the arrows to the right of the selected BY variables.

  17. Additional variables will be assigned to roles. Price, discount, and cost can be useful as explanatory variables in subsequent analyses. Select these three variables, where the order does not matter, and change their roles to Independent and change Usage in system-generated models to Maybe.

    Note: For each of these variables, accumulation is accomplished by averaging observed values in each month.

    Note: By setting Usage in system-generated models to Maybe, you define these three variables as candidate explanatory variables for each series. If the model for a given series accommodates explanatory variables, the non-collinear combination of these three variables that results in the best overall fit is selected.


Forecasting Using Model Studio in SAS® Viya®
Lesson 01, Section 2 Demo: Loading an Attributes Table to Subset the Time Series

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

In the last demonstration, I created a project and added data. The only attributes I defined were the BY variables. Now I'd like to add other attributes to help me subset my time series analyses. These attributes are in a separate attributes data file. I'll show you how to load that data file and add the new attributes to the project. If you're not already in the SAS Drive, get back to the SAS Drive first, and then select the tab for Build Models, and open the baseline sales forecast by just double clicking.

Now, I need to change the data source type from time series to attributes. So under Data sources, get a New data source menu. And I'm going to select Attributes. The attributes data set is not yet here in memory. So once again, I need to import it. So I'll go to the Import tab as I had before.

Once again, go to my local files and select lg_attributes and Open. I'll need to import this item. If you'd like, you can click on the button for Replace the File. And once that's successful, they can click on the OK button. Now I can see four attributes. The first two attributes are the by variable that I selected earlier, productline, and productname. Now I also have Customer_Region and margin_cat or margin category.

The margin flag categorizes the profitability of product names as low, medium, or high. Now, I'm going to move from the Data tab to the Pipelines tab by selecting it. Now I can see my first pipeline. That includes both a DataNode, Auto-forecasting, as well as Model Comparison and Output.

For right now, all I need to do is run the DataNode. In order to do that, I can click on the three dots to the right of the DataNode and then select Run. Once the DataNode is finished, you'll see a green circle with a check mark inside. I'm going to right click on that green check mark and select Time Series Viewer.

What you see here is called an Envelope Plot. It shows the aggregated data at the top level, the hierarchy. And notice that there are 918 series. And we are displaying 918 of the 918. The colored bands illustrate one and two standard deviations around the aggregated series.

The available attribute variables are listed on the left side of the window, product line, Product Name, CustomerRegion, and that margin category variable. We're going to explore the time series in the middle level, the hierarchy, which is the product line category by expanding the product line attribute. By default, the product line attributes should already be expanded, and I'm going to select the product Line 07.

So the plot changes on the fly to show you the aggregation of the four product names contained in Line07. Those are Product 21, Product 22, Product 23, and Product 24. Notice the Envelope Plot itself changes, because now it's only going to be relevant for those four product lines.

When I expand the Customer_Region attribute, I'll note that there are two customer regions in Line 07. Those are Pacific and Greater Texas, three in Pacific and one in Greater Texas. I'm going to select the Greater Texas region, so that is represented by exactly one series, and that is product line 24. And now you can see the time series for product line 24, which is Line 07 and Greater Texas. If I want to go back to the beginning, I can click on this Reset option. And now I can see all 918 series once again.


Forecasting Using Model Studio in SAS® Viya®
Lesson 01, Section 2 Practice the Demo: Load an Attributes Table to Subset the Time Series

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

A unique and useful feature in SAS Visual Forecasting is the ability to visualize the modeling data and operate on generated forecasts outside the hierarchy defined by the project's BY variables. The hierarchical arrangement of the modeling data for this project is defined by product characteristics. However, it is routinely useful to be able to explore and operate on forecasts across facets of the data such as customer demographics or geographic regions.

In this practice, you incorporate the LG_ATTRIBUTES table into the baseline sales forecasts project and then use the variables in the table to expand the ways that the modeling data can be visualized.

  1. In SAS Drive, click the tab for Build Models and double-click the baseline sales forecasts project. This is the project that you created in the last practice.

  2. Change the data source type from Time Series to Attributes by navigating to the data sources panel, selecting the New data sources menu and then selecting Attributes. Note: A default attributes table is created when the BY variables are assigned in the project. The BY variables that define the modeling hierarchy are primary attributes for the project.

    Important: The next 2 steps are for importing the table. You cannot import data in SAS Viya for Learners, so the table is already loaded for you. On the Available tab, select LG_ATTRIBUTES and click OK. Then, go to step 5.

  3. Click on the Import tab to import a new data source.

  4. Select Local File and navigate to D:\Workshop\Winsas\FVVF. Select lg_attributes.sas7bdat to load the table into memory. Click on the radio button to Replace file and then click the button to Import Item. Then click OK.

  5. The in-memory table, LG_ATTRIBUTES, is now the attributes table for the project. This table contains two new attributes: a geographic indicator, Cust_Region, and a margin flag, margin_cat. The margin flag categorizes the profitability of product names as LOW, MED, or HIG (high).

  6. Switch to the Pipelines tab by selecting it.

    This first pipeline incldues a Data node, Auto-forecasting, Model Comparison, and Output.

  7. Right-click and run the Data node.

    Note: Pipelines are structured analytic flows and are described in detail later in the course.

  8. After the Data node runs (you will see a green circle with a check mark inside), right-click the green checkmark and select Time series viewer.

    The envelope plot shows the aggregated data at the top level of the hierarchy (918 of 918 series). The colored bands illustrate one and two standard deviations around the aggregated series. The available attribute variables are listed in the left filters panel.

  9. You can explore time series in the middle level of the hierarchy by expanding the product line attribute. By default, the product line attribute should already be expanded. Visualize demand for the product line series, Line07. Under the productline attribute, select Line07.

    The plot changes on the fly to show an aggregation of the four product names contained in Line07: Product 21, Product 22, Product 23, and Product 24. Notice that the Envelope Plot changes because it is now relevant for only the four product lines in Line07.

  10. Expanding the Cust_Region attribute and selecting Greater Texas plots the one product name that flows through both Line07 and the Greater Texas region.

  11. You can select Reset to remove the filters that you created based on attributes and return to 918 series displayed.

Lesson 02


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 3 Demo: Performing Basic Forecasting with a Pipeline

I'm going to start back up by opening the Baseline Series Forecasts project that we created before. Now I'll select the Pipelines tab. If you remember the Data node was already run, so we see that green circle with a checkmark inside. I'm going to run the rest of the pipeline by just clicking on that button.

Then we see the circles, the empty circles start up. And by the time we see all circles filled with green ink and a white checkmark, then the pipeline will be completed. Now that it's done, let's take a look at the Auto Forecasting node. So I'm at a right-click on Auto Forecasting node, and in the dropdown menu, I'll click on Results.

We see a lot of graphics here, primarily. The first graphic we see is the MAPE distribution. There are 918 series. For each of the series, there's a separate model that's generated. The best model is selected for each one of the 918 series.

The MAPE value for each one of the champion models for each of the series is recorded. And we can see that the majority of the MAPE values are somewhere around 5. We can see, if I just mouse over that biggest bar, we see that about 50% of all of the MAPE values are in that range that center around 5.24.

If we go to the next graphic to the right, we see the model families. We only allowed for ARIMA models and exponential smoothing models. And you'll note that of all the 918 series, about 75% of the models that were selected were ARIMA models, and the rest, 25% were ESM models.

Another way of summarizing all of the modeling types is by looking at the model inputs, whether the inputs were present, whether there was seasonal, or whether there are trend components in the models. So these can be overlapping. So I can see that about 74% had input presence. Now inputs were only permitted for the ARIMA models. So this is nearly all of the ARIMA models that had inputs presence.

Now any of the models could have had Seasonal components, and we can see about 33% had Seasonal components. And also about 29% had Trend components in the models. So as I said before, they can be overlapping series, overlapping numbers, and, therefore, we can see that there were over 100%, the percentage is summed to over 100.

And then on the right side, we see the Execution Summary. We can see there were 918 series. There weren't any series that failed for forecasting. And then there were only a few, a handful, well, 6 series with forecasts equal to zero, meaning in the forecast range, the forecasts were all zero.

Then there is a lot of summary information about the number of series that had flat forecasts. Flat forecasts means just that in the forecast range, the forecast values was a constant. Right. So just a lot of information here that you might or might not be interested in reporting or taking note of.

The next thing I can do is look at the Output Data tab. There are a lot of output data sets that are generated whenever you run a node, whenever you're running a pipeline. And some of the information is in a Forecast data set. Then there are a couple of model information data sets, forecasting statistics, and then a couple of other different output tables.

If I want to look at the Forecast Output table, I need to click on the button for View Output Data. That doesn't show up automatically. The reason for that is because this table is generally pretty big. So unless you really need to take a look at it, you probably don't want to click on that button. But we can take a look at that here, just to get an idea of what is in this Forecast data set.

You'll notice that there are, for every one of these lines, we have a unique product line, product name combination. So that's a unique series. It's identified by its product line and product name. All right.

And notice that we have multiple lines of data in this data set, each for a different month. So if I scroll down, I can see that the Time ID, the months go from 2012, 2014, 2015. And once we get to 2017, we see the actual values is missing. What that means is this is the forecast horizon.

The forecast horizon, of course, doesn't have any actual values, but it can have predicted values and so on. So that's information you might want to obtain from that forecast table. The other information you get, interesting information might come from model information.

So if I click on OUTMODELINFO, I can see the table that is produced here. And once again, we see information for every product line, product name combination. In other words, every series. And in this particular data set, we see the model, the name of the model, or the label for the model, is the type of model that was chosen as the champion model for that particular series.

So for Line01, Product01, that particular series, it was an ARIMA model with regression parameters. All right. So it is under the ARIMA family. There were no dependent variables here. And we can get information about whether there are seasonal components, whether there are trend components, whether there are inputs presence, and so on.

So that's information for the Champion model. If I want to see what the competitors were, we can click on the OUTSELECT Data Source. And now you'll see there are three lines for each one of the series. So Line01, Product01 is three different lines. And you can see which of the models were under consideration by looking at the Model column.

And then the next column for Selected Status, you can see that the selected row, the selected model for this particular series, as we'd seen before, was the ARIMA model with regression parameters. OK. And you can see why that happened by scrolling farther to the right, each one of these has fit statistics and accuracy statistics calculated.

So if we looked at for the Mean Absolute percent error-- so this is the Mean Absolute Percent Error column. For those first three rows, you can see that the middle value, 3.73, was the smallest of all three. And for MAPE, being absolute percent error, smaller is better, and that is why that particular model won the competition for that particular series. Now we can close out the Results, and we'll move along to the next demonstration.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 3 Practice the Demo: Perform Basic Forecasting with a Pipeline

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you perform basic forecasting with a pipeline.

  1. Starting from SAS Drive, select Build Models and open the baseline sales forecasts project created previously.

  2. Navigate to the Pipelines tab.

    The Auto-Forecasting template is the default pipeline template for Visual Forecasting. It consists of the essential steps in a forecasting analysis:
    • accumulates the data into time series
    • automatically identifies, estimates, and selects forecast models for the time series
    • assesses forecasting results
    • publishes results for use outside the pipeline

    Note: If the modeling data are hierarchically arranged, the identification, estimation, and selection steps in the default forecasting pipeline are done on series in the base level of the hierarchy.

  3. Select Run Pipeline in the upper right corner of the workspace.

    Note: If you run into problems with this step, make sure that the modeling and attribute tables are loaded in memory. If the server containing in-memory versions of the modeling and attributes table has been shut down since you last opened the project, tables need to be reloaded.

Auto-forecasting Node Results

  1. Right-click on the Auto-forecasting node and select Results.

    Because the Auto-forecasting node is designed to be run with minimal input from the analyst, relatively few options are surfaced for this node. The Auto-forecasting node automatically identifies, estimates, and generates forecasts for the 918 series in the base or product name level of the modeling hierarchy. Most of the forecast models selected for these series are in the ARIMAX family.

    For each series, two families of time series models are considered by default: ARIMAX (ARIMA with exogenous variables) and ESM (exponential smoothing models). The champion model for each series is chosen based on root mean square error. Other selection statistics are available in the Model selection criterion option.

    The MAPE Distribution histogram is located in the upper left-hand corner. The distribution of Mean Absolute Percent Error (MAPE) for forecasts in the product name level of the hierarchy can be used to compare the accuracy of different forecast models. Each of the bars represents the proportion of the series that have a specific range of MAPE values. In general, smaller values of MAPE imply greater accuracy. MAPE is an alternative selection criterion supported in the software.

    The Model Type chart, located in the lower left, summarizes systematic variation found in the identification process. Approximately 72% of the forecast models selected at least one of the candidate input variables, about 34% of the series have a seasonal pattern, and about 30% selected a Trend Model.

    The Model Family histogram is located in the upper right. Among each of these 918 series, approximately 72% were modeled best using the ARIMA model. The rest, 27.23% percent, were modeled using an Exponential Smoothing Model.

    The Execution Summary, located in the lower right, provides information about results that are potentially problematic, anomalous, or both.

  2. Click the Output Data tab above the MAPE Distribution plot.

    Several output tables are created. You can view them by clicking on them.

    Note: In order to view the OUTFOR data source, you also need to click the View Output Data button. This file is large, containing forecasted values for every indexed time interval in the forecast range.

  3. Click the OUTMODELINFO data source to open it.

    For each series, the selected model is named and attributes of the model are displayed.

  4. Close the Results window.

  5. Right-click and open the results of the Model Comparison node.

    The Champion Model is the Auto-forecasting model, which is the only one included in the pipeline. WMAE and WMAPE are weighted sums of the MAE and MAPE values across all series. WMAPE and WMAE represent average performance of all the models in a modeling node.

    Note: For the WMAPE and WMAE, the final computation is based on weighted measurements from each time series, where more weight is given to time series with a higher average of the dependent variable.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 4 Demo: Honest Assessment

In this demonstration, we select models using Honest Assessment. So with the project still open, and we're open up to pipelines, we see that we've got one pipeline, and that's Pipeline 1. So at this point, I'm going to duplicate this pipeline. It's the easiest thing to do in order to create the new pipeline with just a few modifications.

What I can do is then click on the ellipsis within Pipeline 1, and then select Duplicate. Now I'd also like to rename this pipeline. So I'm going to go to the ellipsis here for Options, and Rename. And I will rename this pipeline Honest Assessment Auto for Auto Forecasting Honest Assessment. And then click OK.

Now I'm going to select Auto Forecasting, make it active. And on the right side, I can see some of those options that are available to me. I'm not going to be editing any code here. So I'm not going to click that button. But I will expand Model Generation and Model Selection.

Under Model Generation, you can see by default, the Auto Forecasting Pipeline includes just exponential smoothing models and ARIMAX models. Other models that are available to us are IDM, or intermittent demand models, and UCM, or unobserved components models.

We won't need to use the IDM models. Those are really only useful when we have a series where many of the time intervals have no data, so maybe relatively rare event where maybe their sales of something that don't occur every month. That's not what we have here in these data. So I'm going to leave that box unchecked.

But I will check the box for Include UCM models. So that means that SAS will also, in addition to checking the ESM models and the ARIMAX models, it will also check to see if a UCM model might be the best model for each one of those 918 series. Now the Honest Assessment part comes under Model Selection. So under Model Selection, you can see the number of data points used in the holdout sample.

We have monthly data. So I'm going to use at least one seasonal cycles worth of data, which includes 12 time points. Now once I hit my Enter button, you'll see that there's another box that pops up. And that box says percentage of data points used in the holdout sample.

Well, just in case, you know, the series is relatively small, we don't want to take too many data points out for generating the model. So I'm going to limit the percentage of the data points to 25. So in other words, if there is a series where 12 data points is more than 25% of the series, I'm going to ask that the holdout sample only be limited 20% to 25%. All right.

So MAPE is still going to be the Model Selection criterion. Now let me just run this pipeline. And as usual, we're going to have to wait for each one of these nodes to run. So first thing, we're going to be rerunning the data node.

The Auto Forecasting node you'll notice is going to take a bit longer than the Auto Forecasting node took before. And that's not necessarily because of Model Selection being based on the holdout sample, it is because we've included the UCM models. UCM models cannot be as efficiently run as either ARIMA models or exponential smoothing models. So things will take a little bit longer, but not too much.

As we did before, we're going to take a look at the Auto Forecasting node first. So I'm going to right-click on that Auto Forecasting node, and click on Results. And you'll see immediately there is a big difference between the MAPE distribution using this pipeline, and the MAPE distribution using Auto Forecasting using the default options before. So we have just one big bar representing 95% of the data that has a MAPE with a mean value of 5.6.

You can see that now that we've allowed for UCM models, the proportion of the 918 series that actually found that UCM models were the best was about 25%, exponential smoothing models are now 46%, and then ARIMA models are the remainder, the 29%. Now these models were selected based on performance, MAPE performance on the holdout sample.

And then for completion, we can take a look at the proportion of the series that had inputs present, the ones that had at least a seasonal component, and the ones that had trend component in it. Let me close this summary, and now let me open up Model Comparison.

So I look at results, after right-clicking on the Model Comparison node, we can see the weighted MAPE is 6.5108, which is slightly higher than the weighted MAPE we saw in the Auto Forecasting node, that was somewhere around 5 and 1/2, little over 5 and 1/2. One thing that you should be aware of, the weighted MAPE is not based on a weighted average of the MAPE for the 918 series using the holdout sample. This is weighted MAPE calculated on the entire sample.

So we selected each one of the 918 series. We used as a Selection criterion, the MAPE and the holdout sample. But when we assessed the entire pipeline, the MAPE was recalculated on the entire series, sometimes this could be a little bit misleading or confusing. The consequence of this fact is that you're really not able to compare pipelines or compare nodes in their performance on the holdout sample.

So we can select individual models for each individual series using performance on the holdout sample. But when we summarize how well a particular pipeline is doing, we cannot get a result that's based on performance in the holdout sample. Let me just close these results. And we'll be ready for the next demonstration.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 4 Practice the Demo: Honest Assessment

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you select models using honest assessment.

  1. If it is not still open, reopen the project from the previous demonstration.

  2. Click the Options ellipsis on the Pipeline 1 tab and select Duplicate.

  3. Rename the new pipeline by clicking on its Options and selecting Rename.

  4. Rename this pipeline Honest Assessment Auto.

  5. Click the Auto-Forecasting node to make it active.

  6. On the right, in the node options area, expand both the Model Generation and Model Selection menus.

  7. Under Model Generation, check the box to include UCM models.

    Note: UCM models can lead to excessive run time. They are usually reserved for special case or high-value series.

  8. Under Model Selection, change Number of data points used in the holdout sample from 0 (the default) to 12 (a full seasonal cycle's worth of the monthly data) and then either click the return key or click outside of the box.

    When you do this, another box will appear asking you for a percentage of data points to use in the holdout sample. If you also put a value here, the holdout sample will be the smaller of the two.

  9. Enter 25 for Percentage of data points used in the holdout sample.

    Note: The actual size of the holdout sample is the smaller of the number of data points selected and the percentage of data points. This value can vary from series to series in a project.

  10. Leave the model selection criterion as MAPE and run this pipeline.

  11. Right-click and open the results of the Auto-forecasting node.

    The ESM model is selected for nearly half of the series. The UCM model accounts for another quarter. Remember that these models were selected on the basis of MAPE on the holdout sample of 12 time points, rather than the fit sample, which was the basis for assessment in the previous pipeline.

  12. Close the Results window.

  13. Right-click and open the results of the Model Comparison node.

    WMAE and WMAPE are slightly higher for the honest assessment pipeline than for Pipeline 1. That is to be expected because the data used to assess the models were not the same as the data used to generate the models.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Demo: Exploring More Pipeline Templates

In this demonstration, we explore another pipeline template. So we're going to proceed from the previous demonstration. We already have two pipelines. The Auto-forecasting pipeline is Pipeline 1. And another auto-forecasting pipeline is Honest Assessment Auto. But, remember, Honest Assessment Auto used a holdout sample for assessing the best model within each of the series. The next pipeline I'm going to build is going to be using a different template. So let me build that next pipeline by clicking on the Plus button. And now the first thing I want to do is name this pipeline. And I'll name this one Naive Model Forecasts.

Now the template that I choose can be selected from the dropdown menu, but the only templates that will be available to you are Auto-forecasting, which is the default, and any other template that you've already used. So for me, I've already used Naive Forecasting in my previous work. For you, there probably will not being Naive Forecasting.

So if you are not able to select it from the dropdown menu, then you can always click on the Browse button. Now once you're looking at all the templates, there are a lot of them here, I'm going to scroll down to the Naive Forecasting template. Now notice there's a difference between Naive Forecasting, and above, Naive Model Average Forecasting.

I'm going to use Naive Forecasting by highlighting it, and clicking on OK. Right, now everything's ready. So I'm going to save my new pipeline. Now let me check the options in the Naive Model node. So let me just click on the Naive Model, and make it active. And you can see, there's always going to be, or nearly always going to be, an option for editing the code. We're not going to be doing that in this course.

There are other options. There aren't that many options, but I can select my Naive Model type from among three. So the default is the Seasonal random walk model, but I can also choose a Moving Average or a simple Random walk model. Now the options are, you can check the box for a Drift option with that Random Walk. Right. So we'll just keep those defaults. And then I'm going to run this pipeline.

Now if you'll notice, there were no options for a holdout sample, so we wouldn't be able to compare this pipeline to a pipeline that was using a holdout sample. And that's something that is definitely a limitation that we have here. But we are able to compare the results of this pipeline with another pipeline that use the entire sample in order to select the best model from within each series.

So now that the pipeline is run successfully, let's just take a look at the Model Comparison node. So I'll right click on that, and look at the results. And now we can see the weighted MAPE value of 9.0117. Remember, for MAPE and weighted MAPE, smaller values are better. The MAPE values for the previous pipelines were 5.5 and 6.5, somewhere around there. So 9.0117 is clearly inferior to the others. We'll be talking a little bit later in the next demo about how do we directly compare these pipelines. But for now, I'm just going to close the results, and get ready for that next demonstration.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Practice the Demo: Exploring More Pipeline Templates

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you explore pipeline templates other than the default Auto-forecasting template. This practice proceeds from the end of the previous one.

  1. From the Pipelines tab in the baseline sales forecasts project, click on the plus sign (+) to add a new pipeline.

  2. Name the new pipeline Naïve model forecasts. The model chosen for all of our time series will be a seasonal random walk model.

  3. Select Browse from the Template drop-down menu. The templates described earlier can be accessed here.

  4. Select the Naïve Forecasting template and click OK.

  5. Click Save in the New pipeline window.

    A new pipeline is created based on the Naïve modeling node. The subsequent added nodes are described previously.

  6. Select the Naïve model node and look at the options on the right.

    The options on the Naïve modeling node (in the Node options menu on the right) indicate that a Seasonal random walk model will be fitted to each series.

    Note: These models can be useful for providing benchmark measures of forecasting accuracy.

  7. Run the Naïve model forecasts pipeline by clicking on Run Pipeline.

  8. Right-click the Model Comparison node and select Results.

    Holdout samples are not options within the Naïve Model forecasting node, so WMAE and WMAPE are based on the entire samples of each series.

  9. Close the results.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Demo: Pipeline Comparison

In this demonstration, we learn how to compare pipelines. So with a project open, we have now three pipelines. Pipeline 1, Honest Assessment Auto, and Naive Model Forecasts. If I click on the tab for Pipeline Comparison, we automatically get a comparison of all three of those pipelines. In addition, we also see which is the Champion pipeline. And that Champion pipeline is selected by default using weighted MAPE. So if we look at the Champion column, there is a star where the Champion is. And if you look at the actual values of weighted MAPE, you can see why that Pipeline 1 was the champion. It had a weighted MAPE of 5.58, compared to Honest Assessment 6.51, and Naive Model Forecasts 9.01.

Now the Champion pipeline is going to be used for anything that you do from here on in. So if you want to score any more data, if you want to implement this model in the future, the Champion pipeline is the only one that will be used. If you have decided that there is another pipeline that you'd like to use, you can override the default, which is to select based on that weighted MAPE. And there are a couple of ways of doing that.

So for instance, let's say that we prefer the Honest Assessment model. As I mentioned before, the Honest Assessment model can't really be directly compared to the Pipeline 1 model, because the Honest Assessment model used Honest Assessment holdout sample for selecting the individual models within each of the 918 series. But the weighted MAPE, unfortunately, did not use those MAPE values for the 918 series. It went back, and was weighted MAPE was calculated on the entire series. So it's not really showing its true power. So let's say that I wanted to use that Honest Assessment Model anyway, that Honest Assessment Pipeline. So what I could do is I could right click on Honest Assessment, and I could set that as champion. So that's one way of doing it.

Another way is to uncheck the checkbox next to the Champion model. And then check the checkbox next to Honest Assessment. And then if you go all the way to the right in the upper right-hand corner under the ellipses, you can click on that ellipses, and you can set that as champion. Right. So there is going to be a little bit of a note here, Martin, the new champion model, will recreate the data used in overrides projects. So if there were any overrides, which we haven't done yet, this is something that you would need to consider. So I'm going to cancel this right now. So I'm not going to change the pipeline that is set as champion.

One other thing that you can do while you're here in the Pipeline Comparison tab is you can directly compare, let's say a subset of these models. So let's say that I really wanted to just directly compare Pipeline 1 and the Naive Forecasts. So if I just click on more than one of these pipelines, you'll see that the Compare button becomes active in the upper right-hand corner. So that Compare model will just allow you to compare just the pipelines that you are selecting. So when I click on Compare, in addition to it showing you the weighted MAPE, which we could have seen just by using that table, we also see the MAPE distributions, once again, for each one of those pipelines. So now you can close out, and you're ready for the next demonstration.


Forecasting Using Model Studio in SAS® Viya®
Lesson 02, Section 5 Practice the Demo: Pipeline Comparison

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you compare pipelines and use accuracy and fit statistics to determine a champion model. This practice proceeds from the end of the previous one.

  1. Click the Pipeline Comparison tab.

    The results indicate that Pipeline 1 is selected as the champion. This is because the forecasts generated by the models in Pipeline 1 have the lowest aggregated root mean square error among the pipelines in the comparison. Relative values of other statistics of fit are also shown.

    Note: The declaration of a champion pipeline is important for subsequent steps in the forecasting workflow. The forecast table that can be exported from the project is based on the models in the champion pipeline. Also, any overrides that are set will be implemented on the champion model forecasts. Overrides are described in detail later.

    Recall that Pipeline 1 did not use a holdout sample and, therefore, the models were not selected using honest assessment. Because the purpose of these models is forecasting, Pipeline 1 should be excluded from consideration as a champion model. Based on WMAPE, the Honest Assessment Auto pipeline would beat the Naïve Model Forecasts pipeline. Manually select that pipeline as the champion pipeline.

  2. Right-click the Honest Assessment Auto pipeline and select Set as champion from the drop-down menu. The pipeline has changed.

  3. To compare summary results and diagnostics across pipelines, select check boxes next to the Honest Assessment Auto and Naïve Model Forecasts pipelines and then click Compare.

    You can now compare the MAPE distributions and Execution Summary results across all selected pipelines in one window.

  4. Click Close to exit the compare window and keep Pipeline 1 as the champion model.

    Note: The pipeline selection criterion can be changed, and the automated choice of the champion pipeline can be overridden. For example, to manually change the champion pipeline, clear the box for Pipeline 1, click the Project pipeline menu icon (the three vertical dots in the upper right), and select Set as champion. Selecting a new Champion would recreate the data used and override the project.

Lesson 03


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 1 Demo: Generating Hierarchical Forecasts with the Default Settings

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

Now it's time to use the BY variables that we selected earlier in the course to perform Hierarchical Forecasting using the Hierarchical Forecasting node. For this demonstration, I'm going to start a new project. So as you recall from the SAS Drive, I go to New, Model Studio project. This time I will name the new project LG Hierarchy Forecast. The name isn't really that important, just something that you can remember.

The type will remain Forecasting. And this time I can start out by using the Auto-forecasting template, or I can choose the Hierarchical Forecasting template. So I will just keep the Auto-forecasting template for right now, and then add a new template as we move along. The data, once again, are the same data that we were using in the previous demonstrations in the course.

So let me browse back to the LOOKINGGLASS_FORECAST data, select LOOKINGGLASS_FORECAST. Click OK, and then save the new project. Once again, we have to assign our variables to use and the roles. The Txn_month variable is our time variable once again. And I'm going to choose to use Sale as my dependent variable, so change the role once again to Dependent for Sale.

Deselect Sale, and then choose my two by variables, which are a productline first, and productname second. Change the role to by BY variable. And the reconciliation level will be changed to productline. Reconciliation level is going to be important as we move through the Hierarchical demonstrations. Now deselect productline and productname, and my three independent variables were cost, discount, and price, so I'm going to change their roles to Independent.

And then for usage and system generated models, I'll change that from Yes to Maybe. Now that I have my data set up, I can move to my pipelines. If I click on my Pipelines tab I'll see my first pipeline. If you want, you can run that pipeline. It's identical to the ones we used in earlier. But the next thing that I want to do after this is finally completed is I will want to add a new pipeline, and that will be a Hierarchical pipeline.

The only reason that I'm running this pipeline right now is because I want to compare this pipeline to the Hierarchical pipeline that we'll be using. So in order for us to compare pipelines, we need at least two pipelines. All right, so now let me add my second pipeline using the plus button for add a new pipeline. I'm going to name this new pipeline simply Pipeline 2.

The template that I'd like to use is not the Auto-forecasting template, but Hierarchical Forecasting, and then I'll save this. In contrast to Auto-forecasting node, Hierarchical Forecasting node allows extensive customization, and you'll see that as we run through the demonstrations in this section of the course. So if I just click on Hierarchical Forecasting you'll see the options are many if you look to the right side of the screen.

So I'll just use the default options first and I'll click on Run Pipeline to run the pipeline. Now that the pipeline is completed running, I'm going to take a look at results by right-clicking on Hierarchical Forecasting, and then selecting Results from the drop-down menu. So now, since this is a Hierarchical model, we see the results both on the productline hierarchy level and also on the productname hierarchy level.

You can see the average Weighted MAPE over all of the series is 3.40 when we weight it on the productline hierarchy, and the Weighted MAPE is 5.76 on the hierarchy level productname. Within each of those, we can see the MAPE Distribution. So MAPE Distribution of all the possible series that we've used. Primarily when we're looking at productline, we see that the MAPE values are bunched up pretty close to 3, somewhere between 3 and 4.

If we look at the Model Family information, this is still a selection among simple models. The best models selected for these mostly were ESM models, about 54% were ESM models or exponential smoothing. And the ARIMA models were the best models for about 45 and 1/2 percent in this series. When we look at the series, as far as Model Types go, remember those input variables, the independent variables, about 45 and 1/2 percent of the models used those inputs.

And among those models, 68%, nearly 68%, had seasonal components to them. And about 38% had a trend. And we can look at the same information on the Hierarchy level productname. Now close the results, and we'll move on to the next demonstration later.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 1 Practice the Demo: Generate Hierarchical Forecasts with the Default Settings

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you use the BY variables, productline and productname, to perform hierarchical forecasting using the Hierarchical Forecasting node.

  1. Starting from SAS Drive, create the LG hierarchy forecast project. This project will use the same in-memory tables and variable metadata as the baseline sales forecasts project created earlier. For convenience, a summary of the project creation steps is below.
    • From the Show list of applications menu, select Build Models.
    • On the right side of Model Studio, select New Project.
    • Name the project LG hierarchy forecast, set the type to Forecasting, and provide a reasonably detailed description.
    • Navigate to the LOOKINGGLASS_FORECAST table on the Available tab and click OK.
    • Save the new project.
    • Assign variable roles. Txn_Month is the time variable, the dependent variable is sale, and the BY variables are productline and productname. Set Reconciliation level to productline.
    • Select and edit price, discount, and cost. Set the role of these variables to Independent and change Usage in system-generated models to Maybe.

  2. Navigate to the Pipelines tab, and select Run Pipeline. This pipeline is identical to the ones used previously, but running it now allows you to compare this pipeline to the Hierarchical pipeline that is used later.

  3. Click the plus (+) to add a new pipeline.

  4. Name the new pipeline Pipeline 2.

  5. Select Hierarchical Forecasting for the template and click Save.

  6. In contrast to the Auto-forecasting node, the Hierarchical Forecasting node allows extensive customization. Select the Hierarchical Forecasting node and scrutinize the options on the right side of the screen.

  7. Click on Run Pipeline.

  8. When it finishes running, right-click the Hierarchical Forecasting node and select Results.

  9. Results are given on both the productline (middle) and productname (base) levels of the hierarchy. Model Type and Model Family results are added to the previously introduced diagnostics.

    Note: Recall that the modeling hierarchy was set when the productline and productname variables were assigned as BY variables in the project.
  10. The average Weighted MAPE over all of the series is 3.40 on the productline hierarchy level, and the Weighted MAPE is 5.76 on the hierarchy level productname. Within each of those, you can see the MAPE Distribution of all the possible series used. Looking at productline, notice that the MAPE values are bunched up between 3 and 4.

    The Model Family information indicates that this is still a selection among simple models. The best models selected were mostly ESM models, about 54%. The ARIMA models were the best models for about 45.5 percent in this series.

    Looking at the Model Types, notice that about 45.5 percent of the models used the independent, or input, variables. Among those models, nearly 68% had seasonal components to them, and about 38% had a trend. And we can look at the same information on the hierarchy level productname.

  11. Close the results.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Demo: Adding Combined Models to the Hierarchical Forecasting Pipeline

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

The previous demonstration generated forecasts for all series in the three level hierarchy under the default settings for the Hierarchical Forecasting node. Now we try to improve the fit of the forecasts by adding combined models to the pipeline. So starting off where we left at the last demonstration, Remember that I have the LG Hierarchy Forecast project, two pipelines, Pipeline 1, Pipeline 2.

Pipeline 2 included the Hierarchical Forecasting node. So I'm going to add another Hierarchical Forecasting node. So I will click on the Expand button, the Nodes menu, and I will add another Hierarchical Forecasting node by just dragging Hierarchical Forecasting up to the Data node.

So now I have two Hierarchical Forecasting nodes. You can tell that one has been run, because there is a green check mark next to it. And the other one has not been run. And you'll notice that Model Comparisons and Output Nodes are now no longer showing green check marks, because we have not compared all the models here.

So I'm going to rename the second Hierarchical Forecasting node to Hierarchical Forecasting with combined models. So I'll right-click, and rename. And change the one to with combined models. Now if you don't already have the options expanded, remember to click on your options Node button.

And then, once that's done, you can go down to Model Generation and expand that section. Now you can see that the default options for Model Generation include ARIMA and ARIMAX models, and ESM models. The slider for UCM and external models are not slid to the right, so those models are not included.

So the only models that have been possible so far have been ARIMA models, ARIMAX models, as well as ESM models. If you'd like to, you can include combined models by just sliding the slider to the right for include combined models. So I'll do that now.

And with combined models, we can average the results from all of the ARIMA and ESM models. And then very often, those combined models can perform better in forecast than the ARIMA models and the ESM models individually.

We'll keep the default methods for combination to an average, just a straight average of all the models. And we'll keep all the other statistics and options as they are by default. And I can rerun this pipeline.

Now that the pipeline has run successfully, I'm going to take a look at my new Hierarchical Forecast node. Let me now retract, hide that panel for the options for the Forecasting node to give myself a little bit more space. Right-click on Hierarchical Forecasting, the second Hierarchical Forcasting node. And then, take a look at the results.

Then you'll notice that we have distribution, both through productline and with the hierarchy level productname. You'll see that Weighted MAPE is 3.21 for the productline, and 5.08 for productname. But hopefully, you've noticed here that the model families have now included the combined models.

The combined model was chosen as the best model for an awful lot of our series. So that's about 63%. And now, only ARIMA only accounts for about 21% of this series, and the exponential smoothing about 16%. And similarly, we can look at those results across productname.

Now let me close the results. And now, finally, let's take a look at the Model Comparison. I'd like to compare the Hierarchical Forecasting basic model with a Hierarchical Forecasting model that included the combined models.

So if I right-click on Model Comparisons and look at the Results, I see that the champion was selected as the Hierarchical Forecasting with combined models. With the Weighted MAPE of 5.08, remember, smaller is better, and is somewhat smaller than the Hierarchical Forecasting basic model 5.76. And we can take a look at the distributions of each one of those models, the champion model first, and then the model that did not become champion later.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Add Combined Models to the Hierarchical Forecasting Pipeline

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

The previous practice generated forecasts for all series in the three-level hierarchy under the default settings for the Hierarchical Forecasting node. In this practice, you try to improve the fit of the forecasts by adding combined models to the pipeline.

For each series, the combined model combines the generated forecasts from default families of models considered for that series to produce a new forecast. The default combination method is a simple average of forecasts.

This practice proceeds from Pipeline 2, created in the previous practice.

  1. Expand the Nodes menu on the left side of the workspace.

  2. Click and then drag and drop a Hierarchical Forecasting node on top of the Data node.

  3. Right-click and rename the new Hierarchical Forecasting node to Hierarchical Forecasting with combined models. Click OK.

  4. Select the Hierarchical Forecasting with combined models node, and expand the Model Generation options on the Node Options panel on the right.

    Notice that the default options for Model Generation include ARIMA and ARIMAX models, and ESM models. The sliders for UCM and external models are not slid to the right, so those models are not included.

  5. Scroll down to the Include combined models option and slide the toggle to on.

    With combined models, you can average the results from all of the ARIMA and ESM models. Often, the combined models can perform better in forecast than the ARIMA models and the ESM models individually.

    Keep the default methods for combination, a straight average of all the models. Keep all the other statistics and options as they are by default.

  6. Select Run Pipeline to run the updated components.

  7. Right-click the Hierarchical Forecast with combined models node and open Results. The Model Family results show that the majority of forecast models selected for the series in the base and middle levels of the hierarchy are generated by Combined (comb) forecasts.

    The aggregated, or weighted, MAPE measures have improved, relative to the forecasts generated under the default settings, for both levels of the hierarchy. The Weighted MAPE is 3.21 for the productline, and 5.08 for productname.

    The combined model was chosen as the best model for about 63% of the series. ARIMA accounts for only about 21% of this series, and the exponential smoothing about 16%.

  8. Close the results.

  9. Right-click on the Model Comparison node and select Results. The Hierarchical Forecasting with combined models node with combined models is the champion for the pipeline. The Weighted MAPE for the Hierarchical Forecasting with combined models is 5.08, compared with the Weighted Mape of the Hierarchical Forecasting node, 5.76, and smaller values are better.

  10. You can compare results at the base level of the hierarchy across the two pipelines in the diagnostics. Close the results.



Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Demo: Selecting Models Based on Forecast Accuracy

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

In this demonstration, we split each time series and the data into two parts-- training and validation. The champion modeling node will be selected based on aggregated, out-of-sample performance, or accuracy. The demonstration proceeds from Pipeline 2, which was created in the previous demonstration.

Let's make sure we're in Pipeline 2, and then go to Hierarchical Forecasting-- the first node that we were working with earlier. So make sure you click on forecasting, expand the area for the Node options. And under Model Selection-- make sure that's expanded-- find the place where you see Model selection criterion, MAPE.

You can change the model selection criteria to anything that you'd like. We are going to change our model selection criterion to Root Mean Squared Error, or RMSE. And for the purposes of having a hold-out sample, we're going to choose to use the criteria of either 12 data points-- and the reason I choose 12 is because we have monthly data. And a holdout sample should have at least one seasonal cycle's worth of data-- so 12 would be a seasonal cycle with monthly data, 12 months in a year.

And typically a maximum of about 25% data points. So, I'm going to change this value from 0 to 25. When we choose both criteria for a number of data points and percentage of data points, whichever turns out to be the smaller value is the one that will be used. And I will make those same adjustments for the other node-- the other hierarchical node.

So once again, when I expand Model Selection, change MAPE to Root Mean Squared Error. So even though MAPE will still be the statistic that will be displayed in the results, the best model is going to be selected using the criterion of Root Mean Squared Error. And remember, I'm going to change the number of data points from 0 to 12, and then change the percentage of data points to 25%.

Now I'm going to rerun the pipeline. First let me collapse the pane, or the options, then click on Run Pipeline. Now that we've rerun the model, let me, once again, right-click on Hierarchical Forecasting and look at the results.

Now, it isn't so obvious that there has been a big change since we've made some of those adjustments. The Weighted MAPE on the hierarchy level productline is now 3.49, and Weighted MAPE for productname is 5.77. The distributions of the models that are selected are somewhat different. So the combined models within productline are not necessarily the majority of the models that were selected.

The difference is primarily due to the fact that we have now selected our models based on performance on the holdout sample-- not on the entire sample, but on the holdout sample only. And in general, when you work on a holdout sample, the MAPE values tend to be a little bit larger. So it's not a surprise that these results are somewhat different from the results that we'd used before we had a holdout sample.

Now, if I close out these results and I want to check once again to see, when I compare the models, which one of these Hierarchical Nodes was champion, let me right-click on Model Comparison and look at the results. Once again, Hierarchical Forecasting with combined models is the champion.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Select Models Based on Forecast Accuracy

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

One potential issue with the selection of the Hierarchical Forecasting with combined models node as the champion in the previous practice is that the selection criterion reflects how well the models fit the series in the training data.

In this practice, you split each time series in the data into two parts: training and validation. The Champion modeling node is selected based on aggregated, out-of-sample performance, or accuracy. This practice proceeds from Pipeline 2, created in the previous practice.

  1. Select the default Hierarchical Forecasting node, expand the right Node Options panel, and expand the Model Selection options.

  2. Change the Model selection criterion to RMSE (Root Mean Squared Error).

  3. Change the number of data points used in the holdout sample to 12. The number of data points is set to 12 because this is monthly data, and the holdout sample should include at least one seasonal cycle of data.

  4. Change the percentage of data points used in the holdout sample to 25. The holdout sample typically includes a maximum of about 25% data points.

    Note: When you choose both criteria for a number of data points and percentage of data points, the smaller number of observations generated by either of these restrictions is used as the holdout sample size for each series.

  5. Select the Hierarchical Forecasting with combined models node, expand the right Node Options panel, and expand the Model Selection options.

  6. Change the Model selection criterion to RMSE (Root Mean Squared Error).

  7. Change the number of data points used in the holdout sample to 12 and the percentage of data points used in the holdout sample to 25.

  8. Rerun the pipeline by clicking on Run Pipeline.

  9. Right-click on the Hierarchical Forecasting with combined models node and click on Results. The Model Family and Model Type results are similar, but the MAPE distributions and aggregated MAPE values have changed over the base and middle levels of the hierarchy. The Weighted MAPE on the hierarchy level productline is now 3.49, and Weighted MAPE for productname is 5.77.

    The distributions of the models that are selected are slightly different. The combined models within productline are not necessarily the majority of the models that were selected. These diagnostics are now based on residuals generated over the holdout sample region for each series. That is, they are accuracy statistics. In general, the MAPE values tend to be a bit larger when working on a holdout sample.

  10. Close the results.

  11. Right-click on the Model Comparison node and select Results. Although the choice of the champion pipeline has not changed, this result is more relevant. The pipeline with the models that extrapolate best onto data that they have not seen before is chosen as the champion.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Demo: Sharing a Custom Pipeline via the Exchange

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

You might want to use a previously developed pipeline as a custom template for other forecasting projects. The Exchange provides a repository for collecting and sharing project objects with others. And that's what we'll do in this demonstration.

Starting from Pipeline 2, I'll go to my Options. And among my options, I have an option to save to the Exchange, so I'll select that. It's good to have a descriptive name as well as a description itself. Let me at least put in a descriptive name. I will call it LG Hierarchical Forecasting with Combined Models and save it. This is great.

To find the template, I need to go over here to the button for the Exchange. And then you'll notice that we have the Pipeline selected here, and it will be Forecasting. So among the Forecasting templates, in addition to the default Hierarchical Forecasting template, we now have an LG Hierarchical Forecasting template with my description, "This is great." So this is going to be available to other people. And they will know that this is a great forecasting template.


Forecasting Using Model Studio in SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Share a Custom Pipeline via the Exchange

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

In this practice, you use the pipeline developed in the previous practice as a custom template for other forecasting projects. The Exchange provides a repository for collecting and sharing project objects with others. This practice proceeds from Pipeline 2, created in the previous practice.

  1. From Pipeline 2 in the LG hierarchy forecast project, click on Options. Select Save to The Exchange.

  2. Name the pipeline LG Hierarchical Forecasting with Combined Models. Add a description and click Save.

    Note: Providing a representative name and a detailed description is always useful.

  3. On the left side of the window, click the icon for The Exchange.

  4. Under Templates on the left panel, expand Pipelines and select Forecasting. The custom pipeline that was saved from the LG hierarchy forecast project is now available to others.


Lesson 04


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Demo: Adding the Attributes Table to a Project

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

You added attributes using an attributes table earlier in the course. Attributes can aid in applying overrides to forecasts after modeling is completed. So let's start from the SAS drive, if you're not already in this forecasting project, and move to LG Hierarchy Forecasts.

Once we're in LG Hierarchy Forecasts, click on the Data tab. And then, open up a new source using the New data source menu. And then choose Attributes. Now remember that we've already uploaded the LG_ATTRIBUTES table, so just select that table among the available tables, and then click OK.

Now notice that there is a list of attributes. The attributes are productline and productname, which you should recognize as the BY variables from before. BY variables can be thought of as special cases of attributes. You can think of an attribute just as a way of being able to drill down or describe or to summarize these times series that are being modeled and forecasted.

So productline, there are multiple product lines, multiple product names. And from the LG_ATTRIBUTES table, we added the Cust_Region and margin_cat, which was LOW, MED, and HIG. And so, those four variables together can be used to allow us to drill-down into the time series of interests.

In particular for this demonstration, we can apply overrides just to the time series that we need overrides to be applied to. Now, we haven't done any forecasting yet. We have two pipelines. So let's recall what the pipelines are. There is Pipeline 1 and Pipeline 2.

And at this point, you'll notice that Pipeline 2 and Pipeline 1 are not run. So we need to rerun the pipelines. So first thing I want to do is run Pipeline 2. And of course, I also want to run Pipeline 1. But be careful. Before I run Pipeline 1, remember that we made some changes to Pipeline 2.

Specifically, we made some changes by using a holdout sample, and we calculated the accuracy statistics based on the holdout sample. I mentioned earlier that when you do accuracy statistics on a holdout sample, the accuracy statistics don't look quite so good.

So in order for me to be able to compare these two pipelines, I really need to do the same type of modifications to this pipeline as I did to Pipeline 2. So once again, I'm going to go into Auto-forecasting, and expand my Node options.

And under Model Selection, I changed MAPE to Root Mean Squared Error, and my holdout sample, my number of data points used in the holdout sample was 12. And remember, I chose that because there are 12 months in a seasonal cycle. And if we have any seasonality in the data, we should have a holdout sample with at least one seasonal cycle's worth of data.

And generally speaking, we shouldn't have more than about 25% of our time series saved as a holdout sample. So 12 and 25. And now, I'll run the pipeline. So now I have two pipelines that have been run. Pipeline 1, we actually could name the Auto-forecasting pipeline. And Pipeline 2 is the Hierarchical Forecasting pipeline.

So in order for me to get forecasts, the forecasts will be based on the champion model from among all pipelines. So what we need to do is if we have more than one pipeline as we do here, we need to do a pipeline comparison. And based on that pipeline comparison, we can make forecasts.

So let me go to the Pipeline Comparison tab. And you'll notice here, that Pipeline 2 is the Champion. So Pipeline 2 has the smaller values of the main statistics. Now we're going to be applying overrides to our forecasts.

So in order to look at what our forecasts are, let me click on the Overrides tab. On the Overrides tab, the first thing that you're going to see is you're going to see a series. And notice here, it's only one series. But that series is a Time Series Aggregation. There are 918 time series.

Now, you'll notice that the Attributes are to the left. We can filter based on those attributes. The attribute of productline is open already. There are five different lines, Line 02, 03, 04, 07, and 08. Productnames, we can expand those and see the five different products. Customer Region, there are five regions: South, Great Lakes, Pacific, Mid Atlantic, and Greater Texas. And of course, we have our three categories for margin_cat, LOW, MED, and HIG. You can notice here under the Forecast Overrides table, we have statistical forecasts based on our models, based on our champion model, for the months January 2017 through December of 2017.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Add the Attributes Table to a Project

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

Attributes are useful for visualizing the data outside the dimensions defined by the modeling hierarchy. This practice proceeds from the LG hierarchy forecast project created in the previous lesson.

  1. Start from SAS Drive, and open the LG hierarchy forecast by double-clicking it.

  2. Click the Data tab. Click the New data source menu button and select Attributes.

  3. Select LG_Attributes from the list of available data sources. Click OK. The LG_Attributes table is now the attributes table for the project.

  4. Click the Pipelines tab, and open Pipeline 2. Rerun the pipeline.

  5. Open Pipeline 1 and perform the modifications that you made to Pipeline 2 earlier.
    • Click on Auto-forecasting and expand the Node options.
    • Expand Model Selection.
    • Change the model selection criterion to RMSE(Root Mean Square Error).
    • Change the number of data points used in the holdout sample to 12.
    • Change the percentage of data points used in the holdout sample to 25.
    • Rerun the pipeline.

  6. Select Pipeline Comparison. Pipeline 2 is the champion pipeline. Forecasts shown on the Overrides tab are generated by the champion node from the champion pipeline in a project.

  7. Click the Overrides tab. The plot shows an aggregation of the 918 series in the base level of the hierarchy.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Demo: Applying Overrides to Generated Forecasts

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

Now, I want to move back and find my January 1st, 2017, and perform my first override. So what I want to do here in my overrides is I want to-- the forecast in the South customer region, I want to reduce by about 20% for the first three months of 2017, because I happen to know that there is a pending strike among delivery drivers.

So my forecast should be reduced based on what my estimate is of their reduction due to that strike. And that's just specifically in the South. Another modification or override that I'd like to perform is that in Greater Texas, I want to increase the forecast values by 15%, because I know there is a pending promotional activity that will occur in July of 2017.

So let me first work on that first override, of the 20% reduction based on the strike in the South. So the attributes are going to be useful for me. So I am going to scroll down to my attributes, and then click on the South region only. And you'll notice the number is 197.

So only 197 of the 918 series are of interest to me here. So when I scroll down to my overrides, the first thing I want to do is I want to start calculating my overrides. So if I right-click within the overrides box of January 2017, I see a pop-up that says Override calculator. And I click on the Override calculator.

Now, there's a note that says a name is required for the filter. So I could at this point just name that filter. And I'll just name that for right now, Override. I can call it South Override, and that would be a little bit more descriptive.

Now what about the properties of this override? Well, for right now, I see that the selected time periods were just the box I had started the Override calculator from. That's January of 2017. And so, if I want to add more time periods, I click on the plus box right next to the selected time period, and I can add February by clicking on February. And then control clicking on March.

And then, there is a button in the middle for Add. And I click on that Plus button. And now you'll see the selected items are January, February, and March of that year, and click OK. Now, the adjustment is going to be based on an existing forecast value.

So remember, I want to subtract 20% based on what I think is going to happen due to the strike. So I click on Adjust based on existing forecast, and I will stay on my Final Forecast. The adjustment needs to change from plus to minus.

And I'd like 20%. We've already got 20% here. We could have used units if we wanted. But I'm going to keep that at percent. The next thing that I'll do is I'm going to use my slider here for Aggregate final forecast lock. So I'll slide that to the right, and click OK.

So now you'll see the override values. They have not been applied yet. They're still waiting. That's why we have the clocks here. They're still waiting my instruction to actually apply them. But you can see the values of the overrides for each of those months.

If I right-click on any of them, then I can submit them, and they will be applied to all of those statistical forecasts. And now, we see the override values are now blanked out, because we've already applied the overrides. The final forecast is now modified, it's no longer an override.

The final forecasts are now the values that you saw previously. The next thing we need to do is we need to make those modifications based on the sale that is going to be occurring in the Greater Texas region in July of 2017. So let me uncheck South, and then click on Greater Texas. And in particular, this is going to be an event just for the margin_cat HIG, or high. So I click HIG as well.

So essentially, I'm going to drill down into those 16 series. So this override will just be for those 16 series. So remember, it's going to be for July of 2017, and the adjustment value is going to be plus 15%. So I need to move over to July of 2017.

Once again, right-click in the box. Click on Override calculator. And we know that we want to aggregate our final forecast lock, so we can slide that over. We don't need to add any time periods. Our adjustment is based on the existing forecast value. The adjustment will be plus 15%.

And then finally, we need to name this. So we will name this something a little bit more descriptive. Override TX, for Texas, HIG, for high. So OverrideTXHIG, and then click OK. Remember, we have to submit this pending override. It's only pending until we submit it.

So we'll right-click on that value, and click on Submit all. And now you can see the override in the forecast. You can see it graphically as this big lump. That triangle, the forecast value is going to jump up for that one month of July, and then go back down to whatever the statistical forecast values were in August.

Now let me move over to the Override Management tab. And now, you can see all of the filters that have been applied. So we have a filter Override TX have already been applied. So on anything that we have already been submitting should be available to us in this table.

You could also delete overrides from here. So that the Override calculator and Delete overrides buttons are on the top right part of this screen. So what we would need to do is we would need to select one, and then, we can either go back into the Override calculator or just delete it.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Apply Overrides to Generated Forecasts

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

Attributes can also be helpful in post-modeling tasks such as applying overrides. This practice proceeds from the previous practice.

  1. Start from SAS Drive, and open the LG hierarchy forecast by double-clicking it.

  2. Click the Data tab. Click the New data source menu button and select Attributes.

  3. Select LG_Attributes from the list of available data sources. Click OK. The LG_Attributes table is now the attributes table for the project.

  4. Click the Pipelines tab, and open Pipeline 2. Rerun the pipeline.

  5. Open Pipeline 1 and perform the modifications that you made to Pipeline 2 earlier.
    • Click on Auto-forecasting and expand the Node options.
    • Expand Model Selection.
    • Change the model selection criterion to RMSE(Root Mean Square Error).
    • Change the number of data points used in the holdout sample to 12.
    • Change the percentage of data points used in the holdout sample to 25.
    • Rerun the pipeline.

  6. Select Pipeline Comparison. Pipeline 2 is the champion pipeline. Forecasts shown on the Overrides tab are generated by the champion node from the champion pipeline in a project.

  7. Click the Overrides tab. The plot shows an aggregation of the 918 series in the base level of the hierarchy.

Applying Overrides to the Generated Forecasts

The Overrides functionality basically works in two steps: creation and implementation.
In the following steps, two overrides are created:

  • Forecasts in the South customer region will be reduced by 20% for the first three months of 2017 to accommodate a pending strike among delivery drivers.
  • High-margin products in the Greater Texas region will be increased by 15% in response to pending promotional activity that will occur in July of 2017.

These overrides will be implemented, and an impact analysis of their effects on the model's forecasts will be reviewed.

  1. To implement the first listed override, expand the Cust_Region attribute and select the South region. The plot changes on the fly to show an aggregation of the 197 productname series in the South region.

  2. Right-click the Override cell under 01/01/2017 and select Override Calculator.

  3. Add 02/01/2017 and 03/01/2017 to apply the override to the first three months of 2017 using the plus button +. Click OK.

  4. Click Filter and name the item Override.

  5. Because the goal is to reduce forecasts in the South region by 20% during the time range specified above, select Adjust based on an existing forecast value and then select Final Forecast.

    Note: In this case, final forecasts are statistical forecasts that have been adjusted for reconciliation.

  6. Set Aggregate final forecast lock to on.

    Note: Here, the forecast lock is a restriction on the aggregated final forecast of all productname series in the South region. Forecasts for individual series in the override group are free to vary, but they must sum to the override values.

  7. Set Adjustment to -20%. Click OK.

  8. Click OK.

  9. The overrides are currently pending. Right-click on any of the three override cells and select Submit All. The second override is a 15% increase for high-margin forecasts in the Greater Texas region in JUL2017.

  10. Select Reset all from the attributes menu on the left, and then select the Greater Texas region and the high (HIG) margin category. The plot changes to show forecasts and actual values of the 16 high-margin series that flow through the Greater Texas region.

  11. Select the Override cell under 07/01/2017, and right-click it to access the Override Calculator.

  12. Change the Adjustment value to +15%. Click Filter and name this override OverrideTXHIG.

  13. Click OK.

  14. A message box might appear, warning about pending overrides. If it does appear, select Submit All. If not, you have created a pending override. Right-click the cell with the pending override value, and select Submit All. The final forecast and the forecast plot now reflect the JUL2017 promotion override.

  15. Click the Override Management tab. The newly created override is added to the list. Overrides can also be modified from here. The Override Calculator and the Delete overrides button are on the top right of this page.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Demo: Resolving Override Conflicts

So what happens when there are two or more overrides that create a scenario that's infeasible for one or more time intervals? So conflicts arise all the time with locked forecast overrides. To illustrate a conflict, we'll add another override to forecast for the series in the South region for the first month of 2017.

Starting from where we left off in the last part of the demonstration, back on the Overrides tab, let's reset all of our attributes, clicking on the Reset button. Now I'm going to select South customer and low margin category. So there will be an overlap with a previous override.

So South we'd already looked at before, but this time I'm looking at South specifically only in the Low Margin category. And those are 141 series, and there's a lot of overlap with the first override. Now, if you recall, we already had some overrides for the final forecast for these series when we were looking at the overrides for the South, and you can see them in the final forecast. Notice how they're highlighted.

So let's say that I want to apply another override here, just specifically for January of 2017. So I want to adjust the final forecast to plus 60%. So we have a minus 20%, and now we're going to be adding a plus 60%. So that is problematic. Which comes first?

So I'm going to right-click on the Override Calculator-- as we had before. We'll call this one OverrideSouthLOW. And under Properties, under Adjust based on existing forecast value, I'm going to add 60% to the final forecast, and click on my final forecast lock to slide that over to the right. Now I'll click OK.

And you'll see that the override is sitting there waiting for us to submit it. And when I right-click on this override and select Submit All, I get a message that there are conflicts detected. So there are some conflicts that it already determined.

If you like, you can let the system resolve the conflicts for us automatically. We could do that manually if we'd like, but I'm going allow the system to do it. So click on the box for Resolve Automatically.

Now let me slide down, once again, to the Override cell-- January 2017. I'll right-click on that box for override, and click on Impact Analysis. So you'll see that the impact analysis for the 141 series for the OverrideSouthLOW shows that the final forecast is a compromise between the first final forecast override and the second one applied. So notice that this transaction is referred to as Group 3. So I'll just select Group 3 and go back to my Overrides, and now you can see the final effect of the forecast graphically.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Resolve Override Conflicts

TopicTitle

Override conflicts occur when two or more overrides create a forecasting outcome that is infeasible for one or more time intervals. Conflicts arise with locked forecast overrides. In this practice, you will add another override to forecasts for series in the South region for the first month of 2017 to illustrate a conflict.

Assume that you have information that LOW margin series in the South region are somehow exempt from the pending strike for the first month of 2017, and that these products are also going on promotion in this month. The net effect of these two phenomena is hypothesized to be an increase of 60%.

  1. Back on the Overrides tab, click Reset all, and then select the South customer region and the LOW margin category. The plot changes to show the 141 time series in this cross section of the data.

  2. Right-click the 01/01/2017 Override cell, and access the Override Calculator.

  3. Set the adjustment to the final forecast to +60%, and lock the aggregate final forecast for this subset of series. Click Filter and name the item OverrideSouthLOW.

  4. Click OK.

  5. Right-click any Override cell, and select Submit all.

  6. The two locked overrides submitted for JAN2017 on the cross section of South and South and LOW margin have created an infeasible final forecast outcome. The two options for resolution are listed below. If the Conflicts Detected box does not appear, go back and make sure that you locked both of the previous overrides.

  7. Select Resolve Automatically.

    Note: Selecting Resolve Manually takes you back to the Override Calculator to implement a conflict solution. Selecting Resolve Automatically calls an optimization algorithm to find a feasible solution for the conflict that is as close to the desired override restrictions as possible.

  8. Right-click the 01/01/2017 Override cell, and select Impact Analysis.

    The impact analysis for Group 3 (the 141 LOW margin series in the South region for JAN2017) shows that the final forecast is a compromise between the Previous Final Forecast (first override) and the second override, applied above. The Delta shows the net effect of the two overrides.

  9. Select Filter3 (or whichever filter is associated with Group 3) to see the plot of the final forecasts for these 141 series.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Demo: Exporting Forecasts

Note: The text below is an exact transcription of what is spoken in the demo video. The related practice that follows this demo provides the steps for performing the demo tasks.

Following the Override process, the final forecast from the champion pipeline are ready to be disseminated. You can easily make forecasts available to others by exporting them. Starting from the Overrides tab in the LG Hierarchy Forecast project, click on More, which is what we refer to as the snowman, up in the upper right.

And there is only one option here. That's to Export All Data. Now, I need to select a path to send the data to. So I'm going to select the Public Directory. And I'll keep the name LG Heirarchy Forcasts_OUTFOR. And then, I click on the button for export.

Now, where did the data go? So let me return to the SAS Drive. And then, go down to Explore and Visualize Data. So this area provides access to SAS Visual Analytics. So what I'd like to do is select Data, because that is what I just saved. And let me click on the tab for Data Sources, under cas-shared-default. So I can find my file structure and go back down into my Public drive, my Public PATH.

And in my Public folder, I can see the data sets that I've been using so far, but I also see the LG FORECAST_OUTFOR. Notice that there's also another version of that. It's sashdat form of the data. So I have two versions of the data. So I'm going to select the first one and then click OK.

You notice here the area at the right, the workspace, it says to drag data items or objects here. So let me find the prediction errors. And I might be interested in the prediction errors. So I'm going to drag my prediction errors out into my workspace. And you'll see the prediction errors are symmetrically distributed around the value 0.


Forecasting Using Model Studio in SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Export Forecasts

TopicTitle

Note: This practice provides the steps required to perform the tasks presented in the preceding demo video.

Following the override process, the final forecasts from the champion pipeline are consistent with business knowledge and are ready to be disseminated. Making the forecasts available to other team members and project stakeholders is straightforward. This practice proceeds from the end of the previous one.

  1. From the Overrides tab in the LG hierarchy forecast project, click More (the "snowman") and select Export all data.

  2. Select the Public directory or another directory to which you have Write access. Keep the default name for the exported table, LG Hierarchy Forecast_OUTFOR. Click Export.

    Note: The Promote table option is selected. This means that the table is accessible by other team members and in other tools, such as SAS Visual Analytics.

  3. Navigate to Explore and Visualize Data.

  4. This functional area provides access to SAS Visual Analytics. Select Data. The exported data are loaded in memory and are available.

  5. Navigate to Data Sources and the public folder. Notice that there is also an alternative version of the table in SASHDAT format.

  6. Select LG HIERARCHY FORECAST_OUTFOR from the Available tab and click OK.

  7. Click and drag the Prediction Errors variable into the workspace. The default chart option for this variable displays a histogram of the forecast errors.