Machine Learning Using SAS® Viya®
Lesson 01, Section 1 Practice the Demo: Create a Project and Select Data
In this practice, you create a new project in Model Studio. This is the project that is used throughout the course. During project creation, you select the commsdata data set (which has already been loaded into memory in SAS Viya for Learners) and define a target variable.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- Access SAS Viya for Learners from the block on the right side of the main course page.
- In the upper left corner of SAS Drive, click the Applications menu
and select Build Models. Note: From SAS Drive, you can access SAS Viya products, such as Model Studio.
The Projects page in Model Studio appears. From the Projects page, you can view existing projects, create new projects, and access the Exchange. (The Exchange is a place where you can save your pipelines and find pipeline templates created by other users, as well as best-practice pipelines that are provided by SAS. You learn more about the Exchange later in this course.) - Click New Project to open the New Project window.
- In the Name field, enter Demo as the name of the project. In the Type field, leave the default value, Data Mining and Machine Learning.
Note: Model Studio projects can be one of three types, depending on the SAS licensing for your site: Data Mining and Machine Learning projects, Forecasting projects, and Text Analytics projects.
- Under Data, click Browse.
The Choose Data window appears with the Available tab selected by default. The data that you need for your project, the commsdata table, has already been loaded into memory in SAS Viya for Learners and is listed on the Available tab.
Note: In SAS Viya, you can import a local file into memory (as shown in a later video in this lesson). However, this functionality is not available in SAS Viya for Learners.
- Select the commsdata table on the Available tab. If multiple versions of the table are listed, select the one with the most recent date. Note: The date is displayed under the table name.
- Click OK.
- In the New Project window, notice that the name of the selected data set now appears in the Data field.
- The Description field is optional. Leave it blank.
- To look at some of the advanced project settings, click Advanced.
The New Project Settings window appears. On the left, four groups of project settings are listed: Advisor Options (selected by default), Partition Data, Event-Based Sampling, and Node Configuration. You cannot see the Advisor Options for a given project after you create it (that is, after you save it), so let's look at those options now. You learn about the other advanced project settings later.
- On the right, view the following options in the Advisor Options group:
- Maximum class level specifies the threshold for rejecting categorical variables. If a categorical input has more levels than the specified maximum number, it is rejected.
- Interval cutoff determines whether a numeric input is designated as interval or nominal. If a numeric input has more levels than the interval cutoff value, it is declared interval. Otherwise, it is declared nominal.
- Maximum percent missing specifies the threshold for rejecting inputs with missing values. If an input has a higher percentage of missing values than the specified maximum percent, it is rejected. By default, this option is on. (That is, Apply the "maximum percent missing" limit is selected).
- Without changing the Advisor Options settings, click Cancel to return to the New Project window.
Note: After you save a project, the Advisor Log, and other logs, are available from the Settings menu. You see this menu in a later practice.
- Click Save to save the Demo project.
Note: After you create a new project, Model Studio opens the project. The Data tab is selected by default. The other three tabs are Pipelines, Pipeline Comparison, and Insights. - At the top of the window, a warning message might appear, indicating that you must a assign a target variable. (When a project is created, you must assign a target variable in order to run a pipeline.) In the variables table, do the following:
- Select the check box next to the variable name churn.
- In the right pane, click the Role menu and make sure that Target is selected. Note: After a target is defined, the warning message at the top of the page disappears.
- If the target is binary or nominal, you can also view or change the event of interest. In the right panel, click Specify the Target Event Level. In the Specify the Target Event Level window, click the Target event level menu. Notice that the menu provides the frequency count for each level. For the Demo project, the churn rate is about 12%. By default, Model Studio sorts the levels in ascending alphanumeric order and selects the last level as the event. For your target, the selected level is 1, so you don't need to change it.
- Click Cancel to return to the Demo project.
Note: You cannot modify the names or labels of your variables in Model Studio.
Machine Learning Using SAS® Viya®
Lesson 01, Section 2 Practice the Demo: Modify the Data Partition
In this practice, you change the metadata for multiple variables, modify the default data partition settings, run the partitioning, and then look at the partitioning log.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- If you closed the Demo project, reopen it. Make sure that the Data tab is selected.
- Make sure that the check box for the churn variable is cleared.
Note: It is important to understand how to select and deselect variables on the Data tab. Otherwise, you might inadvertently reassign variable roles or change metadata. For details, see the variable selection tips before this practice. - Reject 11 variables so that they will not be used in modeling, as follows:
- On the Data tab, select the check boxes for the following variables:
- city
- city_lat
- city_long
- data_usage_amt
- mou_onnet_6m_normal
- mou_roam_6m_normal
- region_lat
- region_long
- state_lat
- state_long
- tweedie_adjusted
- In the right pane, for Role, make sure that Rejected is selected.
Note: Variable metadata includes the role and measurement level of the variable. Common variable roles are Input, Target, Rejected, Text, and ID. Common variable measurement levels are Interval, Binary, Nominal, and Ordinal. - Click Settings
in the upper right corner of the window, and select Project settings from the menu.
Note: If you want to see or modify the partition settings before creating the project, you can do this from the user settings. In the user settings, the Partition tab enables you to specify the method for partitioning as well as specify associated percentages. Any settings at this level are global and are applied to any new project created.
The Project Settings window appears with Partition Data selected on the left by default.
Note: You can edit the data partitioning settings only if no pipelines in the project have been run. After the first pipeline has been run, the partition tables are created for the project, and the partition settings cannot be changed. Remember that, as shown in the last demonstration, you can also access the Partition Data options while the project is being created, under the Advanced settings.
- Notice that the Create partition variable check box is selected, which indicates that partitioning is done by default. The default partitioning method is Stratify.
- By default, Model Studio does a 60-30-10 allocation to training, validation, and test. For the Demo project, make the following changes:
- Change the Training percentage to 70.
- Leave the Validation percentage set to 30.
- Change the Test percentage to 0. Note: You will not use a test data set for this project.
- On the left, select Event-Based Sampling to look at those settings. By default, event-based sampling is turned off. (That is, the Enable event-based sampling check box is not selected.) When event-based sampling is turned on, the desired proportion of event and non-event cases can be set after the sampling is done. In this case, the default proportion for both events and non-events after sampling is 50% each. The sum of the proportions must be 100%.
For the Demo project, keep the Event-based Sampling options at their default settings. Note: After a pipeline has been run in the project, the Event-Based Sampling settings cannot be changed. Remember that, as shown in the last demonstration, you can also access the Event-Based Sampling options while the project is being created, under the Advanced settings. - On the left, select Node Configuration. The Prepend Python configuration code setting is useful when you use the Open Source Code node with the language set to Python.
- To explore this setting, do the following:
- Select the Prepend Python configuration code check box. A code editor appears. In the editor, you could add Python code to prepend to code that you specified in the Open Source code node. You learn more about the Open Source Code node later in the course.
- Clear the Prepend Python configuration code check box because you will not use Python code at this point.
- On the left, select Rules to look at those settings. The Rules options can be used to change the selection statistic and partitioned data set that determine the champion model during model comparison. Statistics can be selected for class and interval targets.
For the Demo project, keep the Rules options at their default settings. - Click Save to save the new partition settings and return to the Demo project page.
- Click the Pipelines tab. In the Demo project, there is currently a single pipeline named Pipeline 1.
On the Pipelines tab, you can create, modify, and run pipelines. Each pipeline has a unique name and an optional description. In the Demo project, Pipeline1 currently contains only a Data source node. - To create the partition indicator, you can run the Data node. Right-click the Data node and select Run.
After the node runs, a green check mark in the node indicates that it ran without errors and the data have been partitioned.
Note: After you run the Data node, you cannot change the partitioning, event-based sampling, project metadata, project properties, or the target variable. However, you can change variable metadata with the Manage Variables node or through the Data tab. - To look at the log file that was generated during partitioning, click Settings in the upper right corner, and select Project logs from the menu.
- From the Available Logs window, select Log for Project Partitioning, and then click Open. The log that was created during partitioning appears. You can scroll through the log if you want. Note: It is also possible to download a log file by clicking the Download log link at the bottom of the log.
- To return to the pipeline, close the Partition Log window, and then close the Available Logs window.
Machine Learning Using SAS® Viya®
Lesson 01, Section 2 Practice the Demo: Build a Pipeline from a Basic Template
In this practice, you create a pipeline from a basic template in the Demo project. You use this pipeline to do imputation and build a baseline regression model that you compare with machine learning models in a later demonstration. You run the pipeline and look at the results.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- Make sure that the Demo project is open and the Pipelines tab is selected. Note: Remember that Pipeline 1, which has a single Data node, was created automatically with the project. You'll reserve this pipeline for exploring the data, which you do in a later demonstration.
- Click the plus sign next to the Pipeline 1 tab.
The New Pipeline window appears. - Under Select a pipeline template, click Browse to access the Browse Templates window. This window displays a list of pre-built pipeline templates, which are available for both class (categorical) and interval targets. These templates are available at basic, intermediate, and advanced levels. The Browse Templates window also displays any pipeline templates that users have created and saved to the Exchange.
- Select Basic template for class target, and click OK.
- In the New Pipeline window, in the Name field, enter Starter Template as the pipeline name.
Note: Specifying a pipeline description is optional. - Notice the Automatically generate a pipeline option. This option is an alternative to using one of the pre-populated pipelines already configured to create a model. When you select the Automatically generate a pipeline option, Model Studio uses automated machine learning to dynamically build a pipeline that is based on your data. This option is disabled if the target variable has not been set or if the project data advisor has not finished running. We do not use this option in this course.
- Click OK.
A Starter Template pipeline tab appears on the Pipelines tab for the Demo project. The basic template for class target is a simple linear flow that includes the following nodes: the Data node, one node for data preparation (Imputation), one model node (Logistic Regression), and the Model Comparison node. Even when a pipeline has only one model, a Model Comparison node is included by default.
- To run the entire pipeline, click Run pipeline in the upper right corner of the canvas. After the pipeline runs, green check marks in the nodes indicate that the pipeline has run successfully.
Note: While the pipeline is running, notice that the Run Pipeline button changes to Stop Pipeline. To interrupt a running pipeline, you can click this button. - Right-click the Logistic Regression node and select Results. The Results window appears and contains two tabs: Node and Assessment. The Node tab, which is selected by default, displays the results from the Logistic Regression node.
Note: Alternatively, you can open the node results by clicking More (the three vertical dots) on the right side of the node and selecting Results.
- Explore the results. A subset of the items on this tab are listed below:
- t-Values by Parameter plot
- Parameter Estimates table
- Selection Summary table
- Output
- Click the Assessment tab to see the assessment results from the Logistic Regression node. Explore the results. A subset of the items on this tab are listed below:
- Lift reports plots
- ROC reports plots
- Fit Statistics table
- To close the Results window and return to the pipeline, click Close in the upper right corner.
- To open the results of the Model Comparison node, right-click the Model Comparison node and select Results.
At the top of the results window is the Model Comparison table. This pipeline contains only one model, so the Model Comparison table currently displays information about only that one model.
- In the upper right corner of the Model Comparison table, click Maximize View (
) to maximize the table. The fit statistic that is used to select a champion model is displayed first. The default fit statistic for selecting a champion model with a class target is KS (Kolmogorov-Smirnov).
- Close the Model Comparison table.
- Close the Model Comparison Results window and return to the pipeline.
Note: For future reference, you can change the selection statistic at the pipeline level or at the project level, after you return to the pipeline. To change the selection statistic for all pipelines within a project, you change the class selection statistic on the project's Settings menu (which was shown in an earlier demonstration). However, for the Demo project, continue to use the default selection statistic, KS.
Machine Learning Using SAS® Viya®
Lesson 02, Section 1 Practice the Demo: Explore the Data
In this practice, you explore the source data (commsdata) using the Data Exploration node in Model Studio. Here you select a subset of variables to provide a representative snapshot of the data. Variables can be selected to show the most important inputs or to indicate suspicious variables (that is, variables with anomalous statistics).
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Pipelines tab. Make sure that Pipeline 1 is selected.
- Right-click the Data node and select Add child node > Miscellaneous > Data Exploration from the pop-up menu. Model Studio automatically adds a Data Exploration node to the pipeline and connects it to the Data node.
Note: Alternatively, you can select a node from one of the sections in the Nodes pane on the left and drag it onto an existing node in the pipeline. The new node is added to the canvas below the existing node and automatically connected to that node. - Keep the default settings for the Data Exploration node. Notice that Variable selection criterion is set to Importance. In this demo, we want to see the most important inputs, so we keep this setting,
Note: The variable selection criterion specifies whether to display the most important inputs or suspicious variables. By default, a maximum of 50 of the most important variables are selected. To see the most suspicious variables, you would change the setting to Screening. Then you can control the selection of suspicious variables by specifying screening criteria, such as cutoff for flagging variables with a high percentage of missing values, high-cardinality class variables, class variables with dominant levels, class variables with rare modes, skewed interval variables, peaky interval variables, and interval variables with thick tails.
- Right-click the Data Exploration node and select Run from the pop-up menu.
- When the pipeline finishes running, right-click the Data Exploration node and select Results from the pop-up menu.
- Maximize the Important Inputs bar chart and examine the relative importance of the ranked variables. Note: This bar chart is available only if Variable selection criterion is set to Importance.
Note: The Relative Variable importance metric is based on a decision tree and is a number between zero and 1. (You learn more about decision trees later in this course.) - Minimize the Important Inputs bar chart.
- Maximize the Interval Variable Moments table.
This table displays the interval variables with their associated statistics, which include Minimum, Maximum, Mean, Standard Deviation, Skewness, Kurtosis, Relative Variability, and the Mean plus or minus 2 Standard Deviations. Note that some of the input variables have negative minimum values. You handle these negative values in an upcoming practice. - Close the Interval Variable Moments table.
- Maximize the Interval Variable Summaries scatter plot. This is a scatter plot of skewness against kurtosis for all the interval input variables. Notice that a few input variables in the upper right corner are suspect based on high kurtosis and high skewness values. You can place your cursor on these dots to see the associated variable names.
- Click the View chart menu in the upper left corner of the window and select Relative Variability. Examine the bar chart of the relative variability for each interval variable.
Note: Relative variability is useful for comparing variables with similar scales, such as several income variables. Relative variability is the coefficient of variation, which is a measure of variance relative to the mean, . - Close the Interval Variable Summaries scatter plot.
- Scroll down in the Data Exploration Results window and maximize the Missing Values bar chart, which shows the variables that have missing values. Notice that some of the variables have a higher percentage of missingness than others.
- Close the Missing Values bar chart.
- Click Close to close the results.
- Double-click the Pipeline 1 tab and change its name by entering Data Exploration.
Note: Another way to rename a pipeline is to click the options menu for the tab (the three dots) and select Rename.
Machine Learning Using SAS® Viya®
Lesson 02, Section 1 Practice the Demo: Replace Incorrect Values Starting on the Data Tab
In this practice, you replace incorrect values starting on the Data tab. This method replaces values in all pipelines in the project. Note: Later, you learn about using the Manage Variables node with the Replacement node to replace values in a single pipeline.
In an earlier practice, you explored some of the interval input variables and saw that some have negative minimum values. Based on business knowledge, you will replace these negative values with zeros for a subset of the interval input variables. To start, you sort the variables to find the subset that you want to work with.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Data tab.
- Right-click the Role column and select Sort > Sort (ascending). All the input variables are now grouped together after the ID variable and before the Rejected variables.
- To group the input variables that have negative values, you will add a second sort to the current sort on Role. Scroll to the right, right-click the Minimum column, and select Sort > Add to sort (ascending). Variables with negative minimum values are now grouped together.
Note: Add to sort means that the initial sorting done on Role still holds. So the sort on minimum values takes place within each sorted Role group.
- Rearrange columns so that the Minimum column is next to the Variable Name column, as follows:
- Click the Options icon in the upper right corner of the data table, and then select Manage columns. The Manage Columns window appears.
- In the Displayed Columns list, select Minimum. By clicking the up arrow multiple times, move the Minimum column immediately below the Variable Name column.
- Click OK. The Manage Columns window closes.
- On the Data tab, scroll all the way to the left so that you can see the Variable Name column and the Minimum column.
- Select the following 22 interval input variables:
Note: In the practice environment, the variables might be listed in a different order than shown here. To make sure that any previously selected variables are no longer selected, select the first variable's name rather than its check box.
- tot_mb_data_roam_curr
- seconds_of_data_norm
- lifetime_value
- bill_data_usg_m03
- bill_data_usg_m06
- voice_tot_bill_mou_curr
- tot_mb_data_curr
- mb_data_usg_roamm01 through mb_data_usg_roamm03
- mb_data_usg_m01 through mb_data_usg_m03
- calls_total
- call_in_pk
- calls_out_pk
- call_in_offpk
- calls_out_offpk
- mb_data_ndist_mo6m
- data_device_age
- mou_onnet_pct_MOM
- mou_total_pct_MOM
- In the right pane, enter 0.0 in the Lower Limit field. This specifies the lower limit to be used in the Filtering and Replacement nodes with the Metadata limits method. The Filtering and Replacement nodes use this lower limit to respectively filter out or replace negative values of the selected variables.
Note: This is customer billing data, and negative values often imply that there is a credit applied to the customer's account. So it is realistic that there are negative numbers in these columns. However, in telecom data, it is a general practice to convert negative values to zeros. Note that you did not edit any variable values. Instead, you only set a metadata property that can be invoked using the Replacement node. - Click the Pipelines tab.
- Select the Starter Template pipeline. Notice that, because of the change in metadata, the green check marks in the nodes in the pipeline have changed to gray circles. This indicates that the nodes need to be rerun to reflect the change.
- Add a Replacement node to the pipeline.
Note: The Replacement node can be used to replace outliers and unknown class levels with specified values. It is in this node that you invoke the metadata property of the lower limit that you set earlier.
Note: The following steps show the drag-and-drop method of adding the node. If you prefer, you can use the alternate method of adding a node that was shown in earlier practices.- Expand the Nodes pane on the left side of the canvas.
- Expand Data Mining Preprocessing.
- Click the Replacement node and drag it between the Data node and the Imputation node.
- Hide the Nodes pane.
- In the properties panel for the Replacement node, specify the following settings in the Interval Variables section:
- Set Default limits method to Metadata limits.
- Change Alternate limits method to (none).
- Leave Replacement value at the default value, Computed limits.
- Run the Replacement node and view the results.
- In the results of the Replacement node, maximize the Interval Variables table. This table shows which variables now have a lower limit of 0.
The original variables will now be rejected. The new versions of the variables, which have REP_ prepended to the name, are now the valid input variables.
- Close the Interval Variables table.
- Close the results window of the Replacement node.
- To update the remainder of the pipeline, click the Run Pipeline button.
- When the run is complete, right-click the Model Comparison node and select Results.
- Maximize the Model Comparison table and view the performance results for the Logistic Regression model.
- Exit the maximized view of the Model Comparison table.
- Select Close to return to the pipeline.
Note: There is one more variable with a negative minimum value. Leave this variable unselected for the Demo project.
Note: Alternatively, you can assign metadata properties by using the Manage Variables node. You can use the Manage Variables node with the Replacement node to replace values in a single pipeline. In the Nodes pane, the Manage Variables node is in the Data Mining Preprocessing section. However, you do not use the Manage Variables node in the Demo project.
Machine Learning Using SAS® Viya®
Lesson 02, Section 2 Practice the Demo: Add Text Mining Features
In this practice, you create new features using the Text Mining node. You use the text variable verbatims, which is one of five text variables in the commsdata data source. Rejecting the other four text variables (Call_center, issue_level1, issue_level2, and resolution) requires a metadata change on the Data tab. You must make sure their role is set to Rejected.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Data tab.
- Make sure that any previously selected variables are deselected.
- To sort by variable name, right-click the Variable Name column and select Sort > Sort (ascending).
- Select the variables Call_center, issue_level1, issue_level2, and resolution.
- In the right pane, make sure that the role is set to Rejected. Rejecting these other text variables ensures that only the verbatims variable is used as an input for the Text Mining node.
- To return to the Starter Template pipeline, click the Pipelines tab and select Starter Template.
- Add a Text Mining node (which is in the Data Mining Processing group) between the Imputation node and the Logistic Regression node. Note: Keep the default settings of the Text Mining node.
- Run the Text Mining node.
- When the run is finished, open the results of the Text Mining node. Many windows are available, including the Kept Terms table (which shows the terms used in the text analysis) and the Dropped Terms table (which shows the terms ignored in the text analysis).
Note: In the tables, the plus sign next to a word indicates stemming. For example, +service represents service, services, serviced, and so on. - Maximize the Topics table. This table shows topics that the Text Mining node created based on groups of terms that occur together in several documents. Each term-document pair is assigned a score for every topic. Thresholds are then used to determine whether the association is strong enough to consider whether that document or term belongs in the topic. Terms and documents can belong to multiple topics. Fifteen topics were discovered, so fifteen new columns of inputs are created. The output columns contain SVD (singular value decomposition) scores that can be used as inputs for the downstream nodes.
- Close the Topics table.
- Click the Output Data tab, and then click View Output Data.
- In the Sample Data window, click View Output Data. Note: In the Sample Data window, you can choose to create a sample of the data to view. However, you do not do this for the Demo project.
- Scroll to the right to see the column headings that begin with Score for. These columns are for new variables based on the topics created by the Text Mining node. For each topic, the SVD coefficients (or scores) are shown for each observation in the data set. Notice that the coefficients have an interval measurement level. The Text Mining node converts textual data into numeric variables, specifically interval variables. These columns will be passed along to subsequent nodes.
Note: If you want to rearrange or hide columns, you can use the Manage Columns button. - Close the Results window.
- Another way to see the 15 new interval input columns that were added to the data is to use the Manage Variables node. To add a Manage Variables node to the pipeline after the Text Mining node, right-click the Text Mining node and select Add child node > Data Mining Preprocessing > Manage Variables.
- When the Run Node message window appears, click Close. Notice that Model Studio splits the pipeline path after the Text Mining node.
- Run the Manage Variables node. When the Run Node message window appears again, click Close.
- When the node finishes running, open the results.
- Maximize the Output window to see the new columns (COL1 through COL15), which represent the dimensions of the SVD calculations based on the 15 topics discovered by the Text Mining node. These new columns serve as new interval inputs for subsequent models.
- Close the Output window and close the results.
- To run the entire pipeline, click the Run pipeline button.
- To assess the performance of the model, open the results of the Model Comparison node. Expand the Model Comparison table. Note: Adding text features does not necessarily improve the model.
- Close the Model Comparison table. Close the results of the Model Comparison node.
- To see whether any of the new variables entered the final model, open the results of the Logistic Regression node.
- Maximize the Output window.
- Scroll down to the Selection Summary table. Notice that one of the columns created by the Text Mining node entered the model during the stepwise selection process.
- Close the Output window and the results.
Machine Learning Using SAS® Viya®
Lesson 02, Section 2 Activity: Check Your Variable Roles
In this activity, you follow best practices to make sure that the roles are set correctly for all variables in the commsdata table. Note: If your variables are not set to the specified roles, you might get unexpected results in later practices.
- In the Demo project, make sure the Data tab is selected.
- To sort the listed variables by role in descending order, click the Role column.
- Make sure the variable roles are set as specified below. Note: As in previous practices, you can select and set the role for multiple variables at once.
ID Role Variable Name Label Customer_ID Primary Key Target Role Variable Name Label Churn Churn Text Role Variable Name Label verbatims Survey Verbatim Rejected Role (Categorical Inputs) Variable Name Label call_category_2 Call Center Category 2 call_center Last Call Center Used city Account City issue_level1 Call Center Issue Level 1 issue_level2 Call Center Issue Level 2 resolution Final Resolution state Account State zipcode_primary Account Code Rejected Role (Interval Inputs) Variable Name Label city_lat Account City Latitude city_long Account City Longitude data_usage_amt Data Usage Amount mou_onnet_6m_normal 6M Avg Minutes on Network Normally Distributed mou_roam_6m_normal 6M Avg Minutes Roaming Normally Distributed mou_roam_pct_MOM Minutes Roaming Pct Change Month over Month region_lat Account Region Latitude region_long Account Region Longitude state_lat Account State Latitude state_long Account State Longitude tweedie_adjusted Data Usage Amt Tweedie Distributed zip_lat Account ZIP Code Latitude zip_long Account ZIP Code Longitude
Input Role All remaining variables
- Return to the Pipeline tab.
Machine Learning Using SAS® Viya®
Lesson 02, Section 3 Practice the Demo: Transform Inputs
In this practice, you use the Transformations node to apply a numerical transformation to input variables. In an earlier practice, you explored interval inputs and saw that a few had a high measure of skewness. Here, you revisit the results of that data exploration.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Data Exploration pipeline tab.
The pipeline requires a rerun because metadata properties have been defined. - Right-click Data Exploration and select Run to run the node.
- Right-click the Data Exploration node and select Results.
- Expand the Interval Variable Moments table. Notice that five variables have a high degree of skewness and their names begin with MB_Data_Usg.
- Close the Interval Variable Moments table.
- Expand the Important Inputs plot. Notice that the same MB_Data_Usg variables have also been selected as important variables. Behind the scenes, Importance is defined by a decision tree using the TREESPLIT procedure.
- Close the Important Inputs plot.
- Close the Results window.
Now you are ready to define transformation rules in the metadata and apply the changes to the data. First, you change metadata on the Data tab to specify what you want to do with the variables. Note: The Manage Variables node is an alternative means of defining metadata transformations, but it is not used in this practice. - Click the Data tab.
- It might be helpful to sort by variable name. Make sure that all variables are deselected. Right-click the Variable Name column. Select Sort and then Sort (ascending).
- Scroll down until you see six variables whose names begin with (uppercase) MB_Data_Usg. Although only five of these were identified as important in the results that you just saw, there's a good chance that the other one is also skewed. It's a good idea to transform all six of them.
- To make sure that no other variables are selected, click the name of the first of the six MB_Data_Usg variables. Then select the check box for the other five of these variables. Note: Select only those variables whose name begins with uppercase MB.
- In the Multiple Variables window on the right, under Transform, select Log.
- To verify that the transformation rule has been applied to these variables, scroll right to display the Transform column. Notice that Log is displayed for each of the selected variables.
Note: Remember that setting transformation rules doesn't perform the transformation. It only defines the metadata property. You must use the Transformations node to apply the transformation. - To return to the Starter Template pipeline, click Pipelines, and then click the Starter Template tab.
- Add a Transformations node between the Replacement node and the Imputation node. Leave the Transformations node options at their default settings.
Note: Although the Default interval inputs method property indicates (none), the metadata rules that you assigned to the variables on the Data tab override this default setting. - Right-click the Transformations node and select Run.
- When the run is finished, right-click the node and select Results.
- Expand the Transformed Variables Summary table. This table displays information about the transformed variables, including how they were transformed, the corresponding input variable, the formula applied, the variable level, the type, and the variable label.
Notice that new variables have been created with the prefix LOG_ at the beginning of the original variable names. The original versions of these variables are now rejected.
Note: In the Formula column, notice that the formula for the Log transformations includes an offset of 1 to avoid the case of Log(0). - Close the Transformed Variables Summary window.
- Close the results.
- Run the entire pipeline to assess the performance of the logistic regression model.
- Open the results of the Model Comparison node and maximize the Model Comparison table. Here, you can assess the performance of the logistic regression model.
- Close the Model Comparison table and close the results.
Machine Learning Using SAS® Viya®
Lesson 02, Section 4 Practice the Demo: Select Features
In this practice, you use the Variable Selection node to reduce the number of inputs for modeling.
Note: This is the task shown in the previous demonstration. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Starter Template pipeline.
- Add a Variable Selection node (in the Data Mining Preprocessing node group) between the Text Mining node and the Logistic Regression node.
- With the Variable Selection node selected, review the settings in the node properties panel on the right. In the properties, varying combinations of criteria can be used to select inputs. Notice the following default settings, which you will use:
- Combination Criterion is set to Selected by at least 1. This means that any input selected by at least one of the selection criteria chosen is passed on to subsequent nodes as an input.
- The Fast Supervised Selection method is selected by default.
- The Create Validation from Training property is also selected by default, but its button is initially disabled.
- Combination Criterion is set to Selected by at least 1. This means that any input selected by at least one of the selection criteria chosen is passed on to subsequent nodes as an input.
- In the properties panel, turn on the Unsupervised Selection and Linear Regression Selection methods by clicking the button slider next to each property name. When a property is turned on, additional options appear. You can hide the new options by selecting the down arrow next to the property name.
Keep the default settings for all the new options that appear for the Unsupervised Selection and Linear Regression Selection methods.
- Notice that the Create Validation from Training property was initially selected by default, but the slider button did not become active until you selected a supervised method above. This property specifies whether a validation sample should be created from the incoming training data. It is recommended to create this validation set even if the data have already been partitioned so that only the training partition is used for variable selection and the original validation partition can be used for modeling.
- Run the Variable Selection node.
- Right-click the Variable Selection node and select Results.
- Expand the Variable Selection table. This table contains the output role for the input variables after they have gone through the node. These variables have a blank cell in the Reason column, indicating that they have been selected and are passed on from the node.
- Scroll down in the Variable Selection table. For the variables that have been rejected by the node, the Reason column displays the reason for rejection.
Remember that sequential selection (the default) is performed, and any variable rejected by this unsupervised method is not used by the subsequent supervised methods. The variables that are rejected by supervised methods are represented by combination criteria (at least one in this case) in the Reason column. If you want to see whether they were selected or rejected by each method, look at the Variable Selection Combination Summary. - Close the Variable Selection table.
- Expand the Variable Selection Combination Summary table.
For each variable, this table includes the result (Input or Rejected) for each method that was used, the total count of each result, and the final output role (Input or Rejected). For example, for the variable AVG_DAYS_SUSP, the Input column has a count of 2, and the Rejected column has a count of 0. This means that this variable was selected by two of the input criteria: Fast Selection and Linear Regression. The variable BILLING_CYCLE has 0 in the Input column, and 2 in the Rejected column. It was rejected by two criteria: Fast and Linear Regression. The variable with the label Days of Open Work Orders has a count of 1 in the Input column, and 1 in the Rejected column. This means that this input was rejected by the Fast criterion, but it was selected by the Linear Regression criterion. The property Combination criterion is set to Selected by at least 1, so this variable is selected as an input because it was selected by at least one of the properties.
- Close the Variable Selection Combination Summary table.
- Close the results.
- Click the Run Pipeline button to rerun the pipeline.
- Right-click the Model Comparison node and select Results.
Note: As an alternative, you could view the results for the Logistic Regression model through the Logistic Regression node.
- Expand the Model Comparison table and view the statistics for the performance of the Logistic Regression model.
- Close the Model Comparison table and close the results.
Machine Learning Using SAS® Viya®
Lesson 02, Section 4 Practice the Demo: Save a Pipeline to the Exchange
In this practice, you save the Starter Template pipeline to the Exchange. You use this pipeline later in the course. (Remember that the Exchange is a place where users can save pipelines, and find pipeline templates created by other users, as well as best practice pipelines that are provided by SAS. However, in this course, you do not use pipelines created by other users.)
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, next to the Starter Template tab, click the Options menu and select Save to The Exchange.
- Change the name of the pipeline to CPML demo pipeline. For the description, enter Logistic regression pipeline. Click Save.
- To go to the Exchange, click its button in the left panel.
- In the left pane of the Exchange, expand Pipelines and select Data Mining and Machine Learning. The CPML demo pipeline that you just saved appears in the list of pipeline templates.
Note: In SAS Viya for Learners, in the Exchange, you will likely see other users' pipelines. If there are multiple CPML demo pipelines, make sure you select the one that you created. Check the Owner column for your email address and the Last Modified column for the date and time that you created your pipeline. - To exit the Exchange and return to the Demo project in Model Studio, click the Projects button in the upper left corner.
Machine Learning Using SAS® Viya®
Lesson 02, Section 5 Best Practices for Common Data Preparation Challenges
Data preprocessing (or data preparation) covers a range of processes that are different for raw, structured, and unstructured data (from one or multiple sources). Data preprocessing focuses on improving the quality and completeness of the data, standardizing how it is defined and structured, collecting and consolidating it, and transforming the data to make it useful, particularly for machine learning analysis. The selection and type of preparation processes, as well as the order in which you perform these processes, can differ depending on your purpose, your data expertise, how you plan to interact with the data, and what type of questions you want to answer.

The table below summarizes some challenges that you might encounter in preparing your data. It also includes suggestions for how to handle each challenge by using the Data Mining Preprocessing pipeline nodes in Model Studio.
Data Problem | Common Challenges | Suggested Best Practice |
---|---|---|
Data collection |
|
|
"Untidy" data |
|
|
Outliers |
|
|
Sparse target variables |
|
|
Variables of disparate magnitudes |
|
|
High-cardinality variables |
|
|
Missing data |
|
|
Strong multicollinearity |
|
|
Note: Some of these challenges can also be handled later, in the modeling stage, such as using tree-based methods for handling missing data automatically.
Machine Learning Using SAS® Viya®
Lesson 03, Section 1 Practice the Demo: Build a Decision Tree Model Using the Default Settings
In this practice, you build a decision tree model, using the default settings, in the Demo project. You build the model in a new pipeline based on a template from the Exchange.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Pipelines tab.
- To add a new pipeline, click the plus sign (+) next to the Starter Template tab.
- In the New Pipeline window, enter Tree Based in the Name field.
Note: Entering a description for the pipeline in the New Pipeline window is optional.
- Under Select a pipeline template, click the down arrow to browse templates.
- In the Browse Templates window, select CPML demo pipeline.
Click OK.
- In the New Pipeline window, click OK.
- In the Tree Based pipeline, notice that the new pipeline is a copy of the pipeline from Starter Template, but no nodes have been run.
- Add a Decision Tree node (from the Supervised Learning group) after the Variable Selection node.
Keep all properties for the Decision Tree node at their default settings.
- Right-click the Decision Tree node and select Run.
- Right-click the Decision Tree node and select Results.
In the results of the Decision Tree node (the Node tab) are several charts and plots to help you evaluate the model's performance.
Explore the windows and plots that are described below:
Note: Remember that your results might vary from the results in the demonstration video, which are described below.
The first plot is the Tree Diagram, which presents the final tree structure for this particular model, such as the depth of the tree and all end leaves. If you place your cursor on a leaf, a tooltip appears, giving you information about that particular leaf, such as the number of observations, the percentage of these that are event cases, and the percentage of nonevent cases. To see a splitting rule, you can place your cursor on a branch. This information is helpful in interpreting the tree.
This Pruning Error plot is based on the misclassification rate because the target is binary. The plot shows the change in misclassification rate on training and validation data as the tree grows or as more leaves are added to the tree. The blue line represents the training data and the orange line represents the validation data. In this plot, for the training data, does the misclassification rate consistently decrease? If so, that means it improves as the size of the tree grows. However, for the validation, you probably see that, for the most part, the misclassification rate decreases as the size of the tree grows. But there are a few scenarios where the misclassification rate increases. The selected subtree contains 51 leaves after we have optimized complexity. Starting at this tree and for the next few trees, notice that the misclassification rate actually increases, which means that it's getting worse.
The Variable Importance table shows the final variables selected by the decision tree and their relative importance. The most important input variable has the relative importance 1. All others are measured based on the most important input. In this case, notice that the decision tree selected ever_days_over_plan as the most important variable. The Importance Standard Deviation column shows the dispersion of the importance taken over several partially independent trees. So, for a single tree, this column has all zero values. For forest and gradient boosting, the numbers would be nonzero.
Farther down in the results are several code windows, one for each type of code that Model Studio generates. Supervised Learning nodes can generate as many as three types of score code (node score code, path EP score code, and DS2 package score code) as well as training code. You learn more about score code later in the course.
The Output window shows that the TREESPLIT procedure is the underlying procedure for the Decision Tree node. It also shows the final decision tree model parameters, the Variable Importance table, and the pruning iterations. - Click the Assessment tab. Explore the windows and plots that are described below:
In the Lift Reports window, the Cumulative Lift plot is shown by default. We can interpret the plot as a comparison of the performance of the model at certain depths of the data ranked by the posterior probability of the event compared to a random model. Ideally, you want to see a lift greater than 1, which means that your model is outperforming a random model. Lift and cumulative lift are discussed in more detail later. Notice the information on the right that helps you interpret the plot.
Because our data set has a binary target, the ROC Reports plot is also available. A ROC chart appears by default. The ROC chart plots sensitivity against 1 minus specificity for varying cutoff values. Sensitivity is defined as the true positive rate. And 1 minus specificity is defined as the false positive rate. Again, the information on the right helps you interpret the plot. You learn more about the ROC chart, along with sensitivity and specificity, later in the course.
- The Fit Statistics table shows the performance of the final model on the training and validation data sets. A useful fit statistic to consider is average squared error. Take note of the average squared error on validation data.
Note: As you move forward with modifying your models in this course, you might want to write down the values of the fit statistics that you use to assess performance. This enables you to see whether your model is improving.
- Notice that the fourth window on the Assessment tab shows the Event Classification chart.
- Close the results.
Machine Learning Using SAS® Viya®
Lesson 03, Section 2 Practice the Demo: Modify the Structure Parameters
In this practice, you modify the tree structure parameters of the Decision Tree node that you added earlier in the Tree Based pipeline.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Tree Based pipeline, select the Decision Tree node.
- In the properties panel for the Decision Tree node, expand Splitting Options. Make the following changes:
Note: The property Maximum number of branches specifies the maximum number of branches that a splitting rule produces. Use the default number of splits, which is 2. - Increase Maximum depth from 10 to 14. This allows for a larger tree to be grown, which could lead to overfitting.
- Increase Minimum leaf size from 5 to 15. This change could help prevent overfitting.
- Increase Number of interval bins to 100.
- Right-click the Decision Tree node and select Run.
- Right-click the Decision Tree node and select Results.
- To look at performance of the decision tree, click the Assessment tab.
- In the Fit Statistics table, note the average squared error for the decision tree model on the VALIDATE partition. Is this fit statistic value slightly smaller than for the previous model? If so, this indicates that this model is performing better than the first model using the default settings. Keep in mind that modifying a model does not always result in better performance.
Note: To assess performance, you could also look at the Lift chart or the ROC chart. - Close the results.
Machine Learning Using SAS® Viya®
Lesson 03, Section 3 Practice the Demo: Modify the Recursive Partitioning Parameters
In this practice, you change more settings of the Decision Tree node in the Tree Based pipeline. You modify the recursive partitioning parameters and compare this model performance to the models built earlier.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Tree Based pipeline, make sure that the Decision Tree node is selected.
- In the properties panel for the Decision Tree node, under Grow Criterion, change Class target criterion from Information gain ratio to Gini.
- Right-click the Decision Tree node and select Run.
- Right-click the Decision Tree node and select Results.
- Click the Assessment tab.
In the Fit Statistics table, take note of the average squared error for the Decision Tree model on the VALIDATE partition. If there is a decrease in average squared error, this indicates an improved fit in the model based on changing the recursive partitioning parameters. - Close the results.
Machine Learning Using SAS® Viya®
Lesson 03, Section 4 Practice the Demo: Modify the Pruning Parameters
In this practice, you continue to modify the settings of the Decision Tree node in the Tree Based pipeline. You modify the pruning parameters and compare this model performance to the models built earlier.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Tree Based pipeline, make sure that the Decision Tree node is selected.
- In the properties panel, scroll down and expand Pruning Options.
- Change Subtree method from Cost complexity to Reduced error.
- Right-click the Decision Tree node and select Run.
- Right-click the Decision Tree node and select Results.
- Click the Assessment tab and expand the Fit Statistics table. Is the average squared error for this decision tree model the same as before on the VALIDATE partition? Remember that changing properties does not guarantee improvement in model performance.
- Close the Fit Statistics table.
- Close the Results window.
- Click the Run pipeline button.
- Right-click the Model Comparison node and select Results.
The Model Comparison table shows which model is currently the champion from the Tree Based pipeline. This is based on the default fit statistic KS. Even if you compare the average squared error values for the two models, you likely get a smaller average squared error for the decision tree than the logistic regression model, which indicates that the decision tree performs better on that statistic. - Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 03, Section 5 Practice the Demo: Build a Gradient Boosting Model
In this practice, you add a Gradient Boosting node to the Tree Based pipeline. You first run the gradient boosting model with default settings. You then change some settings and compare the model to the other models in the pipeline.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, make sure that the Tree Based pipeline is selected.
- Add a Gradient Boosting node (from the Supervised Learning group) after the Variable Selection node.
- Keep all properties for the Gradient Boosting node at their defaults.
- Run the Gradient Boosting node.
- Open the results for the Gradient Boosting node.
- Maximize the Error plot. This plot shows the performance of the model as the number of trees increases based on average squared error.
In this case, the trends for average squared error are decreasing (improving) on both validation and training data sets. The results also include a table of variable importance. You see several windows associated with scoring, and the Output window.
- Close the Error plot.
- Notice the Variable Importance plot and score code windows.
- The Output window indicates that the underlying procedure for the Gradient Boosting node is PROC GRADBOOST.
- Click the Assessment tab.
- Maximize the Fit Statistics table and note the average squared error on the VALIDATE partition.
- Close the Fit Statistics table.
- Close the Results window.
- With the Gradient Boosting node selected, make the following changes to the node properties:
- Reduce Number of trees from 100 to 50 in the properties panel.
- Under Tree-splitting Options, increase Maximum depth from 4 to 8. To change the value of Maximum depth, you can either move the slider or manually enter a value in the box.
- Increase Minimum leaf size from 5 to 15.
- Increase Number of interval bins from 50 to 100.
- Reduce Number of trees from 100 to 50 in the properties panel.
- Run the Gradient Boosting node.
- Open the results for the Gradient Boosting node.
- Click the Assessment tab and scroll down to the Fit Statistics table. Note the average squared error for this gradient boosting model on the VALIDATE partition. Is the value of this fit statistic slightly better than for the first gradient boosting model, which was based on the default settings?
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 03, Section 5 Practice the Demo: Build a Forest Model
In this practice, you add a Forest node to the Tree Based pipeline. You first build a forest model using the default settings. You then change some of the settings and compare the model to the other models in the pipeline.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Tree Based pipeline, add a Forest node (from the Supervised Learning group) after the Variable Selection node.
- Keep all properties for the Forest node at their default setting.
- Right-click the Forest node and select Run.
- Right-click the Forest node and select Results. Note: Many of the items in the Results window for the forest model are similar to items that you saw in the results for the gradient boosting model in a previous practice.
The Error plot shows the performance of the model as the number of trees increases. This plot contains three lines that show performance on the training data, the validation data, and the out-of-bag sample, respectively. You see a table of variable importance and the same code output windows as you saw for gradient boosting.
The Output window shows that the underlying procedure is the FOREST procedure. - Click the Assessment tab.
The Fit Statistics table shows the average squared error on the VALIDATE partition. - Close the Results window.
- Make sure that the Forest node is selected. In the node properties panel on the right, make the following changes:
- Reduce Number of trees from 100 to 50.
- Under Tree-splitting Options, change Class Target Criterion from Information gain ratio to Entropy.
- Decrease Maximum depth from 20 to 12.
- Increase Minimum leaf count from 5 to 15.
- Increase Number of interval bins to 100.
- The default number of inputs to consider per split is the square root of the total number of inputs. Clear the check box for this option and set Number of inputs to consider per split to 7, about half the number of inputs that come from the Variable Selection node.
- Reduce Number of trees from 100 to 50.
- Run the Forest node.
- Open the results for the Forest node.
- Click the Assessment tab and scroll down to the Fit Statistics table. Take note of the average squared error for this forest model on the VALIDATE partition. Did this fit statistic decrease a small amount? If so, this model is a little bit better than the first model, which used the default settings.
- Close the Results window.
- To see how the forest model compares to the other models in the pipeline, click the Run pipeline button.
- Right-click the Model Comparison node and select Results. How does the performance of the forest model compare to the other models in the pipeline?
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 04, Section 1 Practice the Demo: Build a Neural Network Using the Default Settings
In this practice, you create a new pipeline in the Demo project, using the CPML demo pipeline. You build a neural network model using the default settings.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the plus sign next to the Tree Based pipeline tab to add a new pipeline.
- In the New Pipeline window, enter information about the new pipeline:
- Enter Neural Network in the Name field.
- Under Select a pipeline template, click the down arrow to browse templates. Select the CPML demo pipeline. The CPML demo pipeline appears in this menu because you used it in a previous practice.
- Enter Neural Network in the Name field.
- Click Save.
- In the Neural Network pipeline, add a Neural Network node (from the Supervised Learning group) after the Variable Selection node.
- Select the Neural Network node to activate its properties panel. Keep all properties for the Neural Network node at their defaults.
- Right-click the Neural Network node and select Run.
- Right-click the Neural Network node and select Results.
Explore the following charts and plots, which help you evaluate the model's performance:
The Network Diagram presents the final neural network structure for this model, including the hidden layer and the hidden units.
The Iteration plot shows the model's performance based on the validation error throughout the training process when new iterations are added.
As usual, you see the score code windows.
The Output window shows the results from the NNET procedure: the final neural network model parameters, the iteration history, and the optimization process. - Click the Assessment tab, and explore the results. Note the following:
In the Lift Reports window, the Cumulative Lift plot shows the model's performance ordered by the percentage of the population. This plot is very useful for selecting the model based on a particular target of the customer base.
For a binary target, you also have the ROC curve in the ROC Reports window. The ROC curve shows the model's performance considering the true positive rate and the false positive rate.
The Fit Statistics table shows the model's performance based on various assessment measures, such as average squared error. Note the average squared error on validation data. - Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 04, Section 2 Practice the Demo: Modify the Neural Network Architecture
In this practice, you modify the network architecture parameters of the neural network model with the intent to improve performance.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, select the Neural Network node.
- In the properties panel for the node, make the following changes:
- Change Input standardization from Midrange to Z score.
- Expand the Hidden Layer options. Clear the check box for Use the same number of neurons in hidden layers.
- Under Custom Hidden Layer Options, enter 26 for Hidden layer 1: number of neurons. This is about twice as many as the number of inputs coming from the Variable Selection node.
Note: Under Target Layer Options, notice the Direct connections property. In the future, if you want to create a skip layer perceptron, select this check box.
- Change Input standardization from Midrange to Z score.
- Right-click the Neural Network node and select Run.
- Right-click the Neural Network node and select Results.
- Click the Assessment tab.
In the Fit Statistics table, take note of the average squared error for this neural network model on the VALIDATE partition. Is this fit statistic value better than for the first model (which used the default settings)? - Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 04, Section 3 Practice the Demo: Modify the Learning and Optimization Parameters
In this practice, you modify the learning and optimization parameters of the neural network model in the Neural Network pipeline, and compare the model performance to the performance of the logistic regression model already in the pipeline.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Neural Network pipeline, make sure that the Neural Network node is selected.
- In the properties panel for the node, make the following changes:
- Under Common Optimization Options, increase L1 weight decay from 0 to 0.01.
Note: The options Maximum iterations and Maximum time control early stopping. For this model, do not change these options.
- Decrease L2 weight decay from 0.1 to 0.0001.
- Under Common Optimization Options, increase L1 weight decay from 0 to 0.01.
- Right-click the Neural Network node and select Run.
- Right-click the Neural Network node and select Results.
- Click the Assessment tab and scroll down to the Fit Statistics table. Take note of the average squared error for this neural network model on the VALIDATE partition.
- Close the Results window.
- To identify the champion model in this pipeline, do the following:
- Click the Run pipeline button to run the entire pipeline.
- Right-click the Model Comparison node and select Results.
The neural network model is the champion model of the pipeline, based on the default statistic, KS.
- Click the Run pipeline button to run the entire pipeline.
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 05, Section 1 Practice the Demo: Build a Support Vector Machine Using the Default Settings
In this practice, you create a new pipeline based on the CPML demo pipeline, and add a Support Vector Machine (SVM) node to it. You build the support vector machine model using the default settings.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the plus sign next to the Neural Network tab to add a new pipeline.
- In the New Pipeline window, enter the following information:
- In the Name field, enter Support Vector Machine.
- Under Select a pipeline template, select CPML demo pipeline.
- In the Name field, enter Support Vector Machine.
- Click Save.
- Add a Support Vector Machine (SVM) node (from the Supervised Learning group) under the Variable Selection node.
- Select the SVM node.
- In the properties panel, keep all properties for the SVM node at their defaults.
- Run the SVM node.
- Open the results for the SVM node. Explore the following charts and plots, which help you evaluate the model's performance:
- The Fit Statistics table presents several assessment measures that indicate the performance of the support vector machine model.
- The Training Results table shows the parameters for the final support vector machine model, such as the number of support vectors and the bias, which is the offset that defines the support vector machine.
- As with previous models, you see score code windows.
- The Output window shows the final support vector machine model parameters, the training results, the iteration history, the misclassification matrix, the fit statistics, the predicted probability variables, and the underlying procedure (the SVMACHINE procedure).
- The Fit Statistics table presents several assessment measures that indicate the performance of the support vector machine model.
- Click the Assessment tab. As usual, you see the lift reports, the ROC reports, the Event Classification plot, and the Fit Statistics table. In the Fit Statistics table, take note of the average squared error on the VALIDATE partition.
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 05, Section 2 Practice the Demo: Modify the Methods of Solution Parameters
In this practice, you modify one of the key methods of solution parameters for the support vector machine model in an attempt to improve its performance.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Support Vector Machine pipeline, make sure that the SVM node is selected.
- In the properties panel, change Penalty from 1 to 0.1. The Penalty value balances model complexity and training error. A larger Penalty value creates a more robust model at the risk of overfitting the training data.
- Run the SVM node.
- Right-click the SVM node and select Results.
- Click the Assessment tab.
- The Fit Statistics table shows the average squared error on validation data. Has the value increased or decreased?
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 05, Section 3 Practice the Demo: Increase the Flexibility of the Support Vector Machine
In this practice, you attempt to improve the performance of the support vector machine by modifying three options: the kernel function, the tolerance, and maximum iterations. You then compare the model performance to the logistic regression model in the pipeline.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Support Vector Machine pipeline, make sure that the SVM node is selected.
- In the properties pane for the node, make the following changes:
- Change Kernel from Linear to Polynomial. Leave Polynomial degree as 2.
Note: In Model Studio, only degrees of 2 and 3 are available.
- Increase Tolerance from 0.000001 to 0.6.
Note: The Tolerance value balances the number of support vectors and model accuracy. A Tolerance value that is too large creates too few support vectors. A value that is too small overfits the training data.
- Decrease Maximum iterations from 25 to 10.
- Change Kernel from Linear to Polynomial. Leave Polynomial degree as 2.
- Run the SVM node.
- Open the results for the support vector machine model.
- Click the Assessment tab. Scroll down to the Fit Statistics table and take note of the average squared error on validation data. Is this fit statistic better than for the previous model?
- Close the Results window.
- To determine the champion model from this pipeline, run the Model Comparison node by clicking the Run Pipeline button.
- Look at the results of the Model Comparison node. Based on the KS statistic (the default), which model is the champion from this pipeline?
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 05, Section 3 Practice the Demo: Add Model Interpretability
In this practice, you use the Model Interpretability feature to provide some explanation about the support vector machine model.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, in the Support Vector Machine pipeline, select the SVM node.
- In the properties pane for the SVM node, make the following changes:
- Under Post-training Properties, expand Model Interpretability.
- Expand Global Interpretability and select both Variable importance and PD plots.
- Expand Local Interpretability and select the check boxes for ICE plots, LIME, and Kernel SHAP.
- Under Maximum number of Kernel SHAP variables, move the slider to change the number to 10. This means that 10 inputs are displayed in the chart, ordered by importance according to the absolute Kernel SHAP values.
- Notice that Specify instances to explain is set to Random. This setting provides explanations for five randomly selected observations. Although you will not change this setting now, note that it is possible to select five observations from the data instead.
- Under Post-training Properties, expand Model Interpretability.
- Run the SVM node.
- Open the results for the SVM node.
- Notice that there is a new tab in addition to the Node and Assessment tabs: Model Interpretability. Click the Model Interpretability tab.
- Expand the Surrogate Model Variable Importance table. The most important inputs are listed in descending order of their importance. What appears to be the most important predictor? Relative importance is based on simple decision trees that are only one level deep. You will see that inputs to the PD and ICE plots are the top predictors from this table.
- Expand the PD plot. This plot shows the marginal effect of a feature (in this case, ever_days_over_plan) on the predicted outcome of the model that you just fit.
The prediction function is fixed at a few values of the chosen feature and averaged over the other features.
A PD plot can show whether the relationship between the target and the feature is linear, monotonic, or more complex.
On the right side of the plot, notice that a concise description of this PD plot appears. Many plots on the Model Interpretability tab provide this type of description.
- To look at the relationship between the model's predictions and a different variable, use the View chart menu. The menu shows the five (by default) most important inputs in the model, based on a one-level decision tree for all inputs used to predict the predicted values from the model.
Select the categorical variable handset_age_grp. This PD plot indicates that the highest probability of churn is associated with the middle age group (that is, the middle level of the variable): handsets between 24 and 48 months old. The newest handsets (those less than 24 months old) have the next highest probability of churn. And the oldest handsets have the lowest probability of churn. This makes sense from a business standpoint. A new device has a lower probability of churn because the customer hasn't had time to test it out yet. At the other end, if a customer has had a handset for more than four years, they probably like it. - Close the PD plot.
- Expand the PD and ICE Overlay plot. This is a combined plot of the partial dependency results and the individual conditional expectation results overlaid. There are six lines because five are for ICE and one is the PD.
ICE plots can help reveal interesting subgroups and interactions between model variables. For a chosen feature, an ICE plot shows how changes in the feature relate to changes in the prediction for individual observations. This ICE plot shows one line for each of the five randomly chosen observations, as specified in the node properties that you saw earlier.
This ICE plot shows churn probability by ever_days_over_plan. Each line represents the conditional expectation for one customer instance. The plot indicates that for all five instances, there is a consistent increase in the probability of churn as ever_days_over_plan increases, given that other features are constant. ICE plots help reveal interesting customer subgroups and interactions between model variables. These relationships are not apparent in PD plots because they are averaged out.
When evaluating an ICE plot of an interval input, the most useful feature to observe is intersecting slopes. Intersecting slopes indicate that there is an interaction between the plot variable and one or more complementary variables. ever_days_over_plan does not show any interactions. - Look at the ICE plot for a different variable. From the View chart menu, select handset_age_grp.
When evaluating an ICE plot of a categorical input, it is useful to look among individual observations for different relationships between the groups (or levels) of the categorical variable and the target. Significant differences in these relationships indicate group effects. Five individuals are represented in this plot, with the average predicted probability of churn calculated separately for each individual, across all levels of handset_age_grp. For this variable, the trend of observing the lowest probability in the oldest handset age group holds true for all five individuals.
- Close the PD and ICE Overlay plot.
- Notice the LIME and SHAPLEY plots. These plots are created by explaining individual predictions. In a given feature space, Shapley values help you determine where you are, how you got there, and how influential each variable is at that location. This is in contrast to LIME values, which help you determine how changes in a variable's value affects the model's prediction.
- Expand the LIME Explanations plot.
This LIME plot displays the regression coefficient for the inputs selected in a local surrogate linear regression model. This surrogate model fits the predicted probability of the event (1) for the target churn for each of the five randomly chosen observations. In the chart, the inputs are ordered by significance, with the most significant input for the local regression model appearing at the bottom of the chart.
The LASSO technique is used to select the most significant effects from the set of inputs that was used to train the model. A positive estimate indicates that the observed value of the input increases the predicted probability of the event. For example, in the demo video, the value of 0 for delinq_indicator decreases the predicted probability of the event (1) for the target churn by 0.1516 compared to the individual having a different value for delinq_indicator. Note: When you perform this demo, your results might differ.
- Close the LIME Explanations plot.
- Expand the Kernel SHAP Values plot.
Unlike LIME coefficients, SHAPLEY values do not come from a local regression model. For each individual observation, an input's Shapley value is the contribution of the observed value of the input to the predicted probability of the event (1) for the target churn. The Shapley values of all inputs sum to the predicted value of that local instance. The inputs are displayed in the chart, ordered by importance according to the absolute Kernel SHAP values, with the most significant input appearing at the bottom of the chart.
The Kernel SHAP values are the regression coefficients that are obtained by fitting a weighted least squares regression. Note that each nominal input is binary encoded based on whether it matches the individual observation. Interval inputs are binary encoded based on their proximity to the individual observation with a value of 1 if the observation is close to the local instance. To eliminate the bias of collinearity in regression, Shapley values average across all permutations of the features joining the model. Therefore, Shapley values control for variable interaction.
- Close the Kernel SHAP Values plot.
- Close the results.
- The Model Comparison node needs to be run because you turned on Model Interpretability in the SVM node above it. Run the entire pipeline and view the results of model comparison. In the demo video, the SVM model is the champion of this pipeline based on KS. Note: When you perform these steps, a different model might be the champion.
- Close the results.
Machine Learning Using SAS® Viya®
Lesson 06, Section 1 Practice the Demo: Compare Models within a Pipeline
In this practice, you run and interpret the Model Comparison node in the Tree Based pipeline. You compare the models' performances based on different fit statistics.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, select the Tree Based pipeline. Note: You are using this pipeline because it has more models built than the other pipelines.
- Use the default assessment measure for binary targets found in the Project Settings window, in the Rules properties. To view these settings, do the following:
- Click the Settings button in the upper right corner of the project window, and select Project settings.
- In the left pane of the Project Settings window, select Rules. The default statistic for class selection is the Kolmogorov-Smirnov (KS) statistic.
- Click Cancel.
- Click the Settings button in the upper right corner of the project window, and select Project settings.
- In the Tree Based pipeline, select the Model Comparison node.
Note: You can also change the fit statistics in the properties for this node. - To make sure that you're looking at the most recent results from the Model Comparison node, right-click the Model Comparison node and select Run.
- Right-click the Model Comparison node and select Results.
The Model Comparison table shows the champion model based on the default statistic (in this case, KS).
- Scroll down to see the Properties table. The Properties table shows the criteria used to evaluate the models and select the champion.
- Click the Assessment tab and expand the Lift Reports plot.
The lift report shows results based on the response percentage. Using the menu in the upper left corner, you can also choose to see the model's performance based on the captured response percentage, cumulative captured response percentage, cumulative response percentage, cumulative lift, gain, and lift.
- Close the Lift Reports plot.
- Expand the ROC Reports plot.
The ROC Reports plot is based on Accuracy, by default. Using the menu in the upper right corner, you can also see the models' performances based on the F1 Score and ROC. - Close the ROC Reports plot.
- Expand the Fit Statistics table.
The Fit Statistics table shows how each model in the pipeline performs on the data partitions defined in the project settings (train, validate, and test) for a series of fit statistics, such as Area Under ROC, Average Square Error, Gini Coefficient, and KS, among others. - Close the Fit Statistics table.
- Close the Results window.
Machine Learning Using SAS® Viya®
Lesson 06, Section 1 Practice the Demo: Compare Models across Pipelines
In this practice, you run the pipeline comparison. Pipeline comparison enables you to compare the best models from each pipeline created. It also enables you to register the overall champion model and use it in other tools.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Pipeline Comparison tab.
At the top, you see the champion model from each pipeline as well as the model deemed the overall champion in the pipeline comparison, the champion of champions. The overall champion is selected by default and is indicated by a star in the Champion column.
In addition, several charts and tables summarize the performance of the overall champion model (the selected model), show the Variable Importance list of the model, provide training and score codes, and show other outcomes from the selected best model. The default assessment measure for pipeline comparison is Kolmogorov-Smirnov (KS).
All the results shown are for the overall champion model only. You might want to perform a model comparison of each of the models shown. - Select the check boxes next to all the models shown at the top of the Results page. You can also select the check box next to the word Champion at the top of the table.
- When multiple models are selected, the Compare button in the upper right corner is activated. Click Compare.
The Compare results enable you to compare assessment statistics and graphics across the models currently selected on the Pipeline Comparison tab. - Close the Compare results window.
- To add a challenger model (a model that was not automatically selected) to the pipeline comparison, perform the following steps:
- Return to the pipeline that contains the desired model (here, the Tree Based pipeline).
- Right-click the node for a model other than the pipeline champion and select Add challenger model from the pop-up menu.
- Click the Pipeline Comparison tab. The selected model now appears in the Pipeline Comparison table at the top, in the Challenger column.
- Return to the pipeline that contains the desired model (here, the Tree Based pipeline).
- To prepare to register the overall champion model in a later practice, clear the check boxes for all other models in the table at the top of the Pipeline Comparison tab.
Machine Learning Using SAS® Viya®
Lesson 06, Section 1 Practice the Demo: Review a Project Summary Report on the Insights Tab
In this demonstration, we look at a project summary report for the Demo project from the Pipeline Comparison tab.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Insights tab.
The Insights tab contains summary information in the form of a report for the project, the champion model, and any challenger models. For the purposes of the Insights tab, a champion model is the overall project champion model, and a challenger model is one that is a pipeline champion, but not the overall project champion.
At the top of the report is a summary of the project and a list of any project notes. Summary information about the project includes the target variable, the champion model, the event rate, and the number of pipelines in the project. - Maximize the plot for Most Common Variables Selected Across All Models. This plot summarizes common variables used in the project by displaying the number of pipeline champion models that the variables appear in. Only variables that appear in models used in the pipeline comparison are displayed.
The plot shows that many variables were used by all models in the pipeline comparison. These variables are listed at the top of the plot. Variables not used in all models are listed at the bottom of the plot. - Close the Most Common Variables Selected Across All Models plot.
- Maximize the Assessment for All Models plot. This plot summarizes model performance for the champion model across each pipeline and the overall project champion. The orange star next to the model indicates that it is the project champion.
In the demo video, the champion is the forest. Take note of the KS value for the model that is selected as the champion when you practice these steps.
- Close the Assessment for All Models plot.
- Maximize the Most Important Variables for Champion Model plot. This plot shows the most important variables, as determined by the relative importance calculated using the actual overall champion model.
- Close the Most Important Variables for Champion Model plot.
- At the bottom of the results, notice the Cumulative Lift for Champion Model plot. This plot displays the cumulative lift for the overall project champion model for both the training and validation partitions.
- To prepare for model deployment, return to the pipeline comparison results by clicking the Pipeline Comparison tab.
Machine Learning Using SAS® Viya®
Lesson 06, Section 1 Practice the Demo: Register the Champion Model
In this practice, you register the champion model in the Demo project. Registering the model makes it available to other SAS applications.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, on the Pipeline Comparison tab, make sure that only the champion model is selected.
- On the right side of the window, click the Project pipeline menu (the three vertical dots). Note that the Manage Models option is not available.
- Select Register models to open the Register Models window. Wait until you see the following indications that the registration process is finished:
- The spinning circle next to Registering in the Status column indicates that the selected model is actively being registered.
- The Register Models window is updated to indicate that the registration process has successfully completed.
- Close the Register Models window.
- In the table at the top of the Pipeline Comparison tab, notice the new Registered column. This column indicates that the champion model was registered.
Note: After the model is registered, you can view and use it in SAS Model Manager. In SAS Model Manager, you can export the score code in different formats, deploy the model, and manage its performance over time. You see this in a later practice.
Machine Learning Using SAS® Viya®
Lesson 06, Section 1 Practice the Demo: Explore the Settings for Model Selection
In this practice, you explore some of the settings for model selection that you can change if you don't want to use the default values for model comparison. Note: It is helpful to know about these settings for future projects, but you use the default settings for the course project.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Tree Based pipeline.
- To explore the settings for comparing models within a single pipeline, perform the following steps:
- Select the Model Comparison node.
- In the properties panel for the node, notice that the three properties (Class selection statistic, Interval selection statistic, and Selection partition) are all set to the default value, Use rule from project settings. Remember that earlier, when you set up the course project, you modified the data partition in the project settings. If you change the model selection settings here instead of in the project settings, those settings apply only to the current pipeline.
- For a class or interval target, you can select a different measure. Click the Class selection statistic drop-down and select Average squared error. The green check mark on the Model Comparison node disappears, indicating that you need to run the node again to take advantage of the new setting.
- Notice that two properties at the bottom of the panel are currently inactive: Selection depth and ROC-based cutoff.
When you select a response-based measure, such as Cumulative lift, you can also specify a selection depth other than the default. When you select an ROC-based measure, such as ROC separation, you can specify a cutoff other than the default.
- Change the Class selection statistic property so that it is back to the default setting, Use rule from project settings.
- Click the Selection partition drop-down. The available options are Test, Train, and Validate. Leave the default value, Use rule from project settings.
- Select the Model Comparison node.
- To explore the settings for comparing models across pipelines, perform the following steps:
- Click the Settings icon in the upper right corner, and then select Project settings.
- Select Rules in the left pane of the Edit Project Settings window. On the right, the first three Model Comparison properties are the same properties that we saw in the properties pane for the Model Comparison node: Class selection statistic, Interval selection statistic, and Selection partition.
- Close the Project Settings window.
- Click the Settings icon in the upper right corner, and then select Project settings.
Machine Learning Using SAS® Viya®
Lesson 06, Section 2 Practice the Demo: View the Score Code and Run a Scoring Test
In this practice, you access SAS Model Manager from Model Studio and run a scoring test using the champion model that you registered in an earlier practice. Before you deploy a model, it is often important to run a scoring test in a nonproduction environment to make sure that the score code runs without errors.
Note: This is the task shown in the previous demonstration video. However, keep in mind that SAS Visual Data Mining and Machine Learning uses distributed processing, so the values in results will vary slightly across runs.
- In the Demo project, click the Pipeline Comparison tab.
- On the right, click the Project pipeline menu (three vertical dots). Notice that the Manage Models option is now available because at least one model has been registered. Select Manage Models from the menu.
By default, when Model Manager opens, you see a list of files that contain various types of code for training and scoring the registered model.
- In the left pane of Model Manager, click the Projects icon (the second icon from the top) to display the Projects list. The Demo project appears in this list. The SAS Model Manager project named Demo is based on the Model Studio project of the same name.
Note: In SAS Viya for Learners, you can see projects created by other users, so you might see multiple Demo projects listed. Make sure to select the Demo project that has your email address specified in the Modified by field.
- Click the name of the Demo project to open it. The Models tab, which is selected by default, lists the model that you registered earlier in Model Studio.
Note: The demo video corresponding to this practice was updated more recently than the earlier demo video in which the model was registered. Notice that the two demo videos show a different registered model. When you perform these practices, it doesn't matter which model is selected as the champion and registered. The steps are the same. - Notice the tabs at the top of the page. These tabs are used during the entire model management process, which goes beyond model deployment. In this practice, you focus on the tabs that are used for the scoring test.
- To open the registered model, click its name. (Do not click the selection check box next to the model name.) Notice the tabs near the top of the page. The Files tab is selected by default. On the left is the same list of files related to scoring this model that you saw earlier.
- To see the score code that Model Studio generated for this model and the data preparation nodes in the pipeline, select dmcas_epscorecode.sas in the left panel. The score code appears on the right. If you want, scroll down through the code. The score code varies by model. You do not need to be able to understand the code in order to run a scoring test.
Note: After you test your score code, a likely next step is to export the code from the Files tab. Then you can put the model into production by deploying it in a variety of environments. - In the upper right corner of the score code window, click Close.
The Models page appears, which lists all models registered across all projects. Here, you have only one project (the Demo project) and one registered model. - On the left, click the Projects icon to return to the Demo project. It's time to create and run a scoring test on the selected model.
- Click on the name of the model (not the checkbox).
- Click the Scoring tab.
- On the Tests tab, click New Test.
- In the New Test window, enter the following information:
- In the Name box, enter CPML_Champion.
- In the Description box, enter Demo project champion. (Entering a description is optional.)
- Below Model, click Choose Model. In the Choose a Model window, select the champion model. The Choose a Model window closes and you return to the New Test window.
- In the Name box, enter CPML_Champion.
- To select the data source for the test, perform the following steps:
Note: The demo video shows how to import the data set as a local file, which you cannot do in SAS Viya for Learners. The following steps show you how to select the data set, which has already been loaded into memory in SAS Viya for Learners.
- Select the score_commsdata table on the Available tab and click OK.
- Notice that the name of the data set now appears in the New Test window.
- Click Save.
- Select the score_commsdata table on the Available tab and click OK.
- Back on the Scoring tab for the Demo project, select the check box next to the name of the scoring test that you just created.
- In the upper right corner of the table, click Run.
When the run is finished, the Status column has a green check mark and a table icon appears in the Results column. This indicates that the test ran successfully.
- To open the test results, click the table icon in the Results column.
- In the left pane, under Test Results, click Output. By default, the score data table shows new variables created during data preparation; the new variables created during the scoring process, which contain the predictions; and all the original variables.
- Scroll to the right until you see the Predicted: churn = 1 column. The predicted values of churn are used to make business decisions.
Note: If you want, you can reduce the number of columns or rearrange columns in the output table. To do this, click the Options icon in the upper right corner and select Manage columns. - Close the Output Table window. From here, you can use the Applications menu to return to either SAS Drive or Model Studio.