Lesson 01

Deep Learning Using SAS® Software
Lesson 01, Section 2 Practice: Modify the Architecture and Training of a Deep Learning Neural Network

In this practice you modify the architecture and training of a deep learning neural network.

Note: To see the full solution code for this practice, open the SAS program DLUS01S01.sas in the course data folder in SAS Studio.

  1. If you did not run the data setup program DLUS01D01.sas (used in an earlier demonstration) in the current SAS Studio session, run it before you proceed with this practice.

  2. Open the program named DLUS01E01.sas.

    This program provides a model template that is similar to the second deep learning model trained in the last demonstration. The following steps request that you expand the number of hidden layers by two, adding one layer immediately after the input layer and one layer immediately before the output layer. This is shown in the diagram below:
    Conceptual diagram of a deep learning model with an input layer, nine hidden layers (with labels indicating to add the first and ninth), and a single output

  3. Replace the hyperbolic tangent (TANH) activation functions in each hidden layer with ELU.

  4. Add another hidden layer immediately after the input layer. Connect this hidden layer to both the input layer and the next hidden layer. Construct this hidden layer such that the layer contains the following characteristics:
    • has 40 hidden neurons
    • uses the exponential linear (ELU) activation function
    • includes a dropout rate of .05 percent
    • uses Xavier initialization for the hidden weights
    • normalizes the output of the layer using batch normalization. That is, apply batch normalization before the output of this layer has a nonlinear transformation applied to the information.

    Solution:

    /* 			FIRST HIDDEN LAYER 			*/
    AddLayer / model='BatchDLNN' name='HLayer1'
      layer={type='FULLCONNECT' n=40 act='ELU' init='xavier'
      dropout=.05} srcLayers={'data'};
    
    /* 			SECOND HIDDEN LAYER 		*/
    AddLayer / model='BatchDLNN' name='HLayer2'
      layer={type='FULLCONNECT' n=30 act='identity' init='xavier'
      includeBias=False} srcLayers={'HLayer1'};
    AddLayer / model='BatchDLNN' name='BatchLayer2'
         layer={type='BATCHNORM' act='ELU'} srcLayers={'HLayer2'};

  5. Add another hidden layer just before the output layer. Connect this hidden layer to both the previous hidden layer and the output layer. Construct this hidden layer such that the layer contains the following characteristics:
    • has 40 hidden neurons
    • uses the exponential linear (ELU) activation function
    • includes a dropout rate of .05 percent
    • uses Xavier initialization for the hidden weights
    • normalizes the output of the layer using batch normalization. That is, apply batch normalization before the output of this layer has a nonlinear transformation applied to the information.

    Solution:

    /* 			NINTH HIDDEN LAYER 		*/
    AddLayer / model='BatchDLNN' name='HLayer9'
      layer={type='FULLCONNECT' n=40 act='identity' init='xavier'
      includeBias=False dropout=.05} srcLayers={'BatchLayer8'};     
    AddLayer / model='BatchDLNN' name='BatchLayer9'
         layer={type='BATCHNORM' act='ELU'} srcLayers={'HLayer9'};
    

  6. The training optimization algorithm is using a STEP learning rate policy. The step size is set to 10 by default. Change the step size to 15. This increases the aggressiveness of the error space crawl.

  7. Run the program.

  8. What is the total number of model parameters?

    Solution:

    There are 7,032 parameters in the model.

  9. What is the approximate validation misclassification rate at epoch 47?

    Solution:

    The misclassification rate at epoch 47 is 25.7%.

Deep Learning Using SAS® Software
Lesson 01, Section 3 Activity: Building Anomaly Detection Classifiers Using Autoencoders

In this activity, we use the SAS Deep Learning tools to build two autoencoders, one to generate encodings for the normal (non-anomalous data) and one to generate encodings for the anomalous data. Once we calculate encodings for the normal and the anomalous data, we use these encodings as inputs to a classifier trained to identify anomalies.

To follow along with this activity, please open DLUS01D03.sas and run the code, exploring the results for each block of code.

We start by loading the radar data, ionosphere.csv, into memory for analysis using SAS Viya. The ionosphere data contains electromagnetic signals from radar passing through the ionosphere. The variables var0000 through var0031 represent the real and imaginary components of electromagnetic signals from 16 high-frequency antennae. The variable class indicates whether the radar return signals that show evidence of structure in the ionosphere. The class variable is equal to 0 for "good" radar signals that show evidence of structure, and it is equal to 1 for "bad" radar signals that pass through the ionosphere. These "bad" radar signals are the anomalies that we want to detect. In addition to profiling the data to understand how many anomalies we have, we also split our data set into a training sample (80% of the data) and a validation sample (20% of the data).

Open code or syntax in a separate window.

/*connect to CAS*/
cas;
caslib _all_ assign;

/*load radar data into memory*/
proc casutil;
    load file="/workshop/winsas/LWDLUS/Data/ionosphere.csv"
    importoptions=(filetype="csv" getnames="true")
    casout="ionosphere"
    replace;
quit;

/*explore and prepare the data briefly*/
proc cas;
    table.columnInfo / table="ionosphere";
    /*We have 32 radar signal variables (var00-var0032) as inputs*/

    /*split the data into a larger training sample and a smaller validation sample*/
    loadactionset "sampling";
    sampling.stratified / 
        table={name="ionosphere", groupby={"class"}}
        samppct=20
        partind="true"
        seed=919
        output={casout={name="ionosphere", replace="true"},
                copyVars="ALL"};
    /*_partInd_=0 is training data, _partInd_=1 is validation data*/

    loadactionset "freqTab";
    freqTab.freqTab / 
        table="ionosphere"
        tabulate={"class", "_partInd_",
                  {vars={"class", "_partInd_"}}};
    /*class=0 is normal radar data, class=1 is anomalous radar data*/
    /*we have 225 normal signals and 126 anomalous signals*/
quit;

Next, we build an autoencoder model. We will use the same model structure for training the normal encoder and the anomaly encoder, so the table ionosphere_encoder will be used twice in the training process.

Open code or syntax in a separate window.

/*build a simple autoencoder model to train both the event and the nonevent encoders*/
proc cas;
    loadactionset "deepLearn";
    deepLearn.buildModel / 
        model={name="ionosphere_encoder", replace="true"}
        type="DNN";

    deepLearn.addLayer result=r/ 
        model="ionosphere_encoder"
        layer={type="INPUT", std="std", dropout=.10}
        replace="true"
        name="data";

    deepLearn.addLayer result=r/ 
        model="ionosphere_encoder"
        layer={type="FC", n=20, dropout=.10}
        replace="true"
        srcLayers="data"
        name="encoding";

    /*the bottleneck layer is where we find the encodings we use for anomaly detection*/
    deepLearn.addLayer result=r/ 
        model="ionosphere_encoder"
        layer={type="FC", n=10}
        replace="true"
        srcLayers="encoding"
        name="bottleneck";

    deepLearn.addLayer result=r/ 
        model="ionosphere_encoder"
        layer={type="FC", n=20, dropout=.10}
        replace="true"
        srcLayers="bottleneck"
        name="decoding";

    deepLearn.addLayer result=r/ 
        model="ionosphere_encoder"
        layer={type="OUTPUT"}
        replace="true"
        srcLayers="decoding"
        name="output";
quit; 

To train the models, we provide a list of input variables (var0000-var0031) to the dlTrain CAS action. Note that because we are training an autoencoder with dlTrain we don't include a target statement.

When we score the training data using the autoencoders, we don't want the output from the model (which is just a prediction for the values of the inputs). Instead, we want the encodings from the "bottleneck" layer that represent compressed information about the inputs. To do this, we use the layerOut statement in the dlScore CAS action to write out the hidden layers from the model to an in-memory CAS table. This table by default includes all the layers in the neural network, but we use the layers statement to output only the "bottleneck" layer.

Open code or syntax in a separate window.

/*now we train the two autoencoder models, one for the normal radar data and one for the anomalies*/
proc cas;
    /*grab a list of input variables*/
    table.columnInfo result=r / table="ionosphere";
    inputs = r['columnInfo'][,"Column"];
	inputs = inputs[2:33];

    /*start by training the encoder for the normal radar data*/
    deepLearn.dlTrain / 
        table={name="ionosphere", where="_partInd_=0 and class=0"}
        inputs=inputs,
        modelTable="ionosphere_encoder",
        modelWeights={name="normal_encoder_weights", replace="true"}
        optimizer={miniBatchSize=50, maxEpochs=300,
                   algorithm={method="adam", learningRate=0.1, gamma=0.9,
                              learningRatePolicy="step", stepSize=30},
                   regL1=0.0001, regL2=0,
                   seed=919};

    /*next train the encoder for the anomalous radar data*/
    deepLearn.dlTrain / 
        table={name="ionosphere", where="_partInd_=0 and class=1"}
        inputs=inputs,
        modelTable="ionosphere_encoder"
        modelWeights={name="anomaly_encoder_weights", replace="true"}
        optimizer={miniBatchSize=50, maxEpochs=300,
                   algorithm={method="adam", learningRate=0.1, gamma=0.9,
                              learningRatePolicy="step", stepSize=30}, 
                   regL1=0.0001, regL2=0,
                   seed=919};

    /*score the training data using the normal encoder*/ 
    deepLearn.dlScore / 
        table={name="ionosphere", where="_partInd_=0"}
        model="ionosphere_encoder"
        initWeights="normal_encoder_weights"
        copyVars={"id","class"}
        layerOut={name="train_normal_encoding", replace="true"}
        layers={"bottleneck"};

    /*score the training data using the anomaly encoder*/
    deepLearn.dlScore / 
        table={name="ionosphere", where="_partInd_=0"}
        model="ionosphere_encoder"
        initWeights="anomaly_encoder_weights"
        copyVars={"id","class"}
        layerOut={name="train_anomaly_encoding", replace="true"}
        layers={"bottleneck"};
quit;

Now that we have the normal and anomaly encodings for the training data, we can train a classifier to identify anomalies. First, we must merge the two encoding data sets, and to do this correctly, we must rename some of the encoding variables.

Open code or syntax in a separate window.

/*now we train a classifier to separate the anomalies from the normal data, using the derived encodings as inputs*/
/*start by merging the two different encoding tables*/
/*first we have to rename the duplicated _LayerAct_2_0_0_N_ variables*/
%macro rename_vars(type=,data=);
data casuser.&data._&type._encoding;
    set casuser.&data._&type._encoding;
    %do i=0 %to 9;
        rename _LayerAct_2_0_0_&i._ = &type._encoding_&i.;
    %end;
run;
%mend rename_vars;

%rename_vars(type=normal,data=train);
%rename_vars(type=anomaly,data=train);

/*now we merge the two encoding tables to prepare our input data for the classifier*/
data casuser.train_combined_encodings;
    merge 
      casuser.train_normal_encoding
      casuser.train_anomaly_encoding;
    by id;
run;

We can use these encodings as inputs for our preferred classifier algorithm. In this example, we train a simple neural network using the NNET procedure, but we can use any machine learning approach for the classifier now that we have the encodings.

Open code or syntax in a separate window.

/*train the classifier using the prepared encodings, let's use a simple neural network*/
proc nnet data=casuser.train_combined_encodings standardize=std;
    target class / level=nominal;
    input normal_encoding_0 anomaly_encoding_0
          normal_encoding_1 anomaly_encoding_1
          normal_encoding_2 anomaly_encoding_2
          normal_encoding_3 anomaly_encoding_3
          normal_encoding_4 anomaly_encoding_4
          normal_encoding_5 anomaly_encoding_5
          normal_encoding_6 anomaly_encoding_6
          normal_encoding_7 anomaly_encoding_7
          normal_encoding_8 anomaly_encoding_8
          normal_encoding_9 anomaly_encoding_9 / level=interval;
    hidden 31;
    train outmodel=casuser.anomaly_classifier;
    score out=casuser.train_scored copyvars=(id class);
    optimization algorithm=SGD regL1=0.0009999 regL2=0
                 learningrate=0.1
                 annealingrate=0.00001778
                 seed=919 maxiter=200;
run;

To evaluate model performance on validation data, we want to score the validation sample using the trained classifier, but to do this, we must generate the normal and anomaly encodings for the validation sample.

Open code or syntax in a separate window.

/*in order to evaluate model performance on validation data, we must prepare the encodings again*/
proc cas;
    /*score the validation data using the normal encoder*/
    deepLearn.dlScore / 
        table={name="ionosphere", where="_partInd_=1"}
        model="ionosphere_encoder"
        initWeights="normal_encoder_weights"
        copyVars={"id","class"}
        layerOut={name="valid_normal_encoding", replace="true"}
        layers={"bottleneck"};

    /*score the validation data using the anomaly encoder*/
    deepLearn.dlScore / 
        table={name="ionosphere", where="_partInd_=1"}
        model="ionosphere_encoder"
        initWeights="anomaly_encoder_weights"
        copyVars={"id","class"}
        layerOut={name="valid_anomaly_encoding", replace="true"}
        layers={"bottleneck"};   
quit;

/*merge validation encodings*/
%rename_vars(type=normal,data=valid);
%rename_vars(type=anomaly,data=valid); 
data casuser.valid_combined_encodings;
    merge 
      casuser.valid_normal_encoding
      casuser.valid_anomaly_encoding;
    by id;
run;

/*score validation data using the classifier*/
proc nnet data=casuser.valid_combined_encodings inmodel=casuser.anomaly_classifier;
    score out=casuser.valid_scored copyvars=(id class);
run;


This example uses a very small data set, so the results are a bit unreliable with such a small validation sample. The approach illustrated in this activity would work well with larger data sets, although the training time for the normal autoencoder (assuming we have more normal data than anomalous data) would take considerably longer. In general, this approach works well even if you don't have a lot of data representing anomalies, because we use both the normal encodings and the anomaly encodings for training the classifier.

Question: After training the three different models in the activity, you collect some new data where you expect to find anomalies. Which model or models should you use to score the new input data set?


Lesson 02

Deep Learning Using SAS® Software
Lesson 02, Section 3 Practice: Build and Train a Convolutional Neural Network to Score New Data

In this practice, you build and train a convolutional neural network to score new data. Note that your results might vary.

Note: To see the full solution code for this practice, open the SAS program DLUS02S01 in the course data folder in SAS Studio.

  1. If you did not run the data setup program DLUS02D01.sas (used in an earlier demonstration) in the current SAS Studio session, run it before you proceed with this practice.

  2. Open the program named DLUS02E01.sas.

    This program provides a program template for you to use. The following steps request that you add layers to build out a convolutional neural network. The name of the model is MYCNN.

  3. Add a convolutional layer just after the input layer. The convolutional layer should have the following attributes:
    • 32 filters
    • Width of 3
    • Height of 3
    • Stride of 2

    Solution:

    addLayer / model='MYCNN' name='ConVLayer1' 
          layer={type='CONVO' nFilters=32  width=3 height=3 stride=2} srcLayers={'data'};
    		

  4. Add two pooling layers and connect the convolutional layer to each of the two new pooling layers. Ensure that each pooling layer has the following attributes:
    • Width of 2
    • Height of 2
    • Stride of 2


  5. Set one of the pooling layers to perform a maximum summary, and one to perform an average summary.

    Solution:

    addLayer / model='MYCNN' name='PoolLayer1max' 
          layer={type='POOL'  width=2 height=2 stride=2 pool='max'} srcLayers={'ConVLayer1'};
       addLayer / model='MYCNN' name='PoolLayer1Avg' 
          layer={type='POOL'  width=2 height=2 stride=2 pool='Average'} srcLayers={'ConVLayer1'};
    

  6. Add a convolutional layer after the pooling layers. Note: A concatenation layer should be used between the pooling layers and convolutional layer.
    Structure the convolutional layer to have the following attributes:
    • 128 filters
    • Width of 3
    • Height of 3
    • Stride of 1
    • Xavier weight initialization

    Solution:

    /* Add a concatenation layer */
       addLayer/model='MYCNN' name='concatlayer1' 
          layer={type='concat'} srcLayers={'PoolLayer1max','PoolLayer1Avg'};
    
       /* Add a convolutional layer */
       addLayer / model='MYCNN' name='ConVLayer2' 
          layer={type='CONVO' nFilters=128  width=3 height=3 stride=1 init='xavier'} srcLayers={'concatlayer1'};
    

  7. Add a fully connected layer and connect the convolutional layer to the fully connected layer. Connect the fully connected layer to the output layer provided in the program template. The fully connected layer should have the following attributes:
    • 20 neurons
    • Use a Xavier weight initialization
    • Batch normalization
    • Exponential linear activation transformation

    Note: You need to use two addLayer actions to complete this task.

    Solution:

    addLayer / model='MYCNN' name='FCLayer1' 
          layer={type='FULLCONNECT' n=20 act='identity' init='xavier' includeBias=False /*dropout=.5*/} 
    				srcLayers={'ConVLayer2'} /*srcLayers={'BatchLayer_1'}*/; 
       addLayer / model='MYCNN' name='BatchLayer' layer={type='BATCHNORM' act='ELU'} srcLayers={'FCLayer1'};

  8. Run the program and view the results.

    Note: On rare occasions, an ODS warning like the following might occur when the entire program is run: Output 'OptIterHistory' was not created. Make sure that the output object name, label, or path is spelled correctly. If you experience this error, run each PROC CAS statement independently (sequentially).

  9. Answer the following questions:
    • How many model parameters does your model have?
    • Which epoch does your model perform best on the validation data with respect to validation misclassification rate (validation error)?
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    It appears that the model is overfitting. Perhaps adding or increasing regularizations can improve validation performance.

    Solution:

    • How many model parameters does your model have? 238,842
    • Which epoch does your model perform best on the validation data with respect to validation misclassification rate (validation error)? 9
    • What is your model's best performance on the validation misclassification rate (validation error)? 47.7% (or could be 47.35%)
    • What is your model's best performance on the training misclassification rate (fit error)? 0%

  10. Modify the model by adding a dropout rate of 50% to the fully connected layer.

    Solution:

    addLayer / model='MYCNN' name='FCLayer1' layer={type='FULLCONNECT' 
    n=20 act='identity' init='xavier' includeBias=False dropout=.5} 
    				srcLayers={'ConVLayer2'}; 
    addLayer / model='MYCNN' name='BatchLayer' layer={type='BATCHNORM' 
    act='ELU'} srcLayers={'FCLayer1'};	
    

  11. Rerun the model.

  12. Answer the following questions:
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    There still seems to be a large divergence between training and validation data. Let's continue to add regularizations to improve validation performance.

    Solution:

    • What is your model's best performance on the validation misclassification rate (validation error)? 46.55%
    • What is your model's best performance on the training misclassification rate (fit error)? 5.6%

  13. Apply batch normalization to the convolution layer containing 128 filters. (Remember to set the activation function to identity and remove the bias from the convolution layer.)

    Solution:

    addLayer / model='MYCNN' name='ConVLayer2' layer={type='CONVO' 
               nFilters=128  width=3 height=3 stride=1 act='identity'
               init='xavier' includeBias=False}
               srcLayers={'concatlayer1'};
     
    addLayer / model='MYCNN' name='BatchLayer_1'
               layer={type='BATCHNORM' act='ELU'}
               srcLayers={'ConVLayer2'};		
    
    addLayer / model='MYCNN' name='FCLayer1'
               layer={type='FULLCONNECT' n=20 act='identity'
               init='xavier' includeBias=False dropout=.5} 
               srcLayers={'BatchLayer_1'};	
    

  14. Rerun the model.

  15. Answer the following questions:
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    The model's performance on the validation data is improving! Let's continue to add regularizations to improve validation performance.

    Solution:

    • What is your model's best performance on the validation misclassification rate (validation error)? 43.8%
    • What is your model's best performance on the training misclassification rate (fit error)? 15.9%

  16. Add a dropout rate of 10% to the convolution layer containing 128 filters.

    Solution:

    addLayer / model='MYCNN' name='ConVLayer2' layer={type='CONVO'
               nFilters=128  width=3 height=3 stride=1 act='identity'
               init='xavier' includeBias=False dropout=.1}
               srcLayers={'concatlayer1'};

  17. Rerun the model.

  18. Answer the following questions:
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    Solution:

    • What is your model's best performance on the validation misclassification rate (validation error)? 43.3%
    • What is your model's best performance on the training misclassification rate (fit error)? 21.85%



Lesson 03

Deep Learning Using SAS® Software
Lesson 03, Section 3 Practice: Predicting Movie Profitability

The data set MOVIE_CLEAN contains a description of popular movies. Each overview is paired with a profit indicator indicating if the movie made more money than its budget. The goal of this analysis is to build a model to predict if the movie is profitable based solely on its description.

The data set MOVIE_CLEAN was cleaned from its original version. For the text in each review, stop words and non-letters have been removed, words have been stemmed, and all tokens were changed to lowercase. The variables in this data set are listed below:

MOVIE_CLEAN: Variables
Name Model Role Measurement Level Description
PROFIT Target Binary 1 = movie made more money than the budget
0 = otherwise
TITLE Nominal Text Movie title
OVERVIEW Input Text Description of the movie

The data set MOVIE_EMBED contains the Global Vectors for Word Representation (GloVe) for each term in the MOVIE_CLEAN data set. The GloVe was created from word-word co-occurrence statistics from the MOVIE_CLEAN corpus using an unsupervised learning algorithm. The vectors of dimension 100 show the linear substructure of the word vector space. The variables in MOVIE_EMBED are listed below:

MOVIE_EMBED: Variables
Name Model Role Measurement Level Description
VOCAB_TERM Input Nominal Individual terms from the cleaned corpus
X1 - X100 Input Interval Word representations in 100 dimensions

Note: To see the full solution code for this practice, open the SAS program DLUS03S01.sas in the course data folder in SAS Studio.

  1. Print the first few observations of MOVIE_CLEAN and MOVIE_EMBED to view the data sets.

    Solution:

    proc print data=mycas.movie_clean (obs=5);
    run;
    
    proc print data=mycas.movie_embed (obs=5);
    run;

  2. Use the FREQ procedure to view the number of movies that earned a profit.

    Solution:

    proc freq data=mycas.movie_clean;
    	tables profit;
    run;
    823 movies earned a profit.

  3. Find and print the titles of movies whose overview mentions Denzel Washington.

    Solution:

    data denzel (drop=newvar);
    	set mycas.movie_clean;
    	newvar = find(overview,'denzel','i');
    	if newvar > 0;
    run;
    
    proc print data=denzel;
    	var title;
    run;
    Three movies have overviews that mention Denzel Washington.

  4. Partition the data into 70% training, 15% validation, and 15% for testing by adding a partition indicator to the CAS table.

    Solution:

    proc partition data=mycas.movie_clean
    	samppct=70 samppct2=15 seed=802 partind;
    	output out=mycas.movie_clean;
    run;

  5. Use the shuffle action from the table action set to randomize the observations and avoid a potential ordering bias in the deep learning model.

    Solution:

    proc cas;
    	table.shuffle / 
    	table = 'movie_clean'
    	casout = {name='movie_clean', replace=True};
    quit;

  6. Use the deepLearn action set to build a gated recurrent unit neural network with one input layer, two GRU hidden layers, and an output layer.
    1. Use the buildModel action to initialize the RNN and then add an input layer.
    2. Connect the input layer to a GRU hidden layer with 15 neurons, set the activation function to auto, set initialization to Xavier, and set the output type to same length.
    3. Connect this hidden layer to a another GRU hidden layer with the same arguments except set output type to encoding.
    4. Connect the second hidden layer to the output layer and set the error function to auto.
    5. To make sure the model structure is correct, specify the modelInfo action and view the model information.

    Solution:

    proc cas;
    	loadactionset "deeplearn";
    quit;
    
    proc cas;
    	deepLearn.buildModel /
        model = {name='gru', replace=True}
        type = 'RNN';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='input'}
        replace=True
        name = 'data';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='samelength'}
        srcLayers = 'data'
        replace=True
        name = 'rnn1';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='encoding'}
        srcLayers = 'rnn1'
        replace=True
        name = 'rnn2';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='output', act='auto', init='xavier', 
                 error='auto'}
        srcLayers = 'rnn2'
        replace=True
        name = 'output';
    
    	deepLearn.modelInfo /
        model='gru';
    quit;

  7. Use the dlTrain action to train the GRU model using the profit variable as the target and the overview variable as the input. Train the model using the Adam optimization algorithm and a learning rate of 0.05. Use mini batch sizes of 50 and train for 20 epochs. Be sure to save the weights to score the test data after the model is built.

    Solution:

    proc cas;
    	deepLearn.dlTrain /
        table    = {name = 'movie_clean', where = '_PartInd_ = 1'}
        validTable = {name = 'movie_clean', where = '_PartInd_ = 2'}
        target = 'profit'
        inputs = 'overview'
        texts = 'overview'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        nominals = 'profit'
        seed = '649'
        modelTable = 'gru'
        modelWeights = {name='gru_trained_weights', replace=True}
        optimizer = {miniBatchSize=50, maxEpochs=30, 
               algorithm={method='adam', beta1=0.9, beta2=0.999,
               learningRate=0.05, clipGradMax=100, clipGradMin=-100}};
    quit;

  8. Score the test data and view the misclassification error.

    Solution:

    proc cas;
        deepLearn.dlScore / 
        table    = {name = 'movie_clean', where = '_PartInd_ = 0'}
        model = 'gru'
        initWeights = 'gru_trained_weights'
        copyVars = 'profit'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        casout = {name='gru_scored', replace=True};
    quit;
    The misclassification error is approximately 46.4.

  9. In the optimization history of the dlTrain action, notice that the model overfit on the training data, resulting in a comparatively large validation error. Regularize the previous GRU model by building the model again but include a dropout of 0.40 in each GRU hidden layer. Train the new model with the same arguments for the dlTrain action and view the changes in the optimization history.

    Solution:

    proc cas;
    	deepLearn.buildModel /
        model = {name='gru', replace=True}
        type = 'RNN';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='input'}
        replace = True
        name = 'data';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='samelength', dropout=.40}
        srcLayers = 'data'
        replace=True
        name = 'rnn1';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='encoding', dropout=.40}
        srcLayers = 'rnn1'
        replace = True
        name = 'rnn2';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='output', act='auto', init='xavier', 
                 error='auto'}
        srcLayers = 'rnn2'
        replace = True
        name = 'output';
    
    	deepLearn.modelInfo /
        model='gru';
    quit;
    
    proc cas;
    	deepLearn.dlTrain /
        table    = {name = 'movie_clean', where = '_PartInd_ = 1'}
        validTable = {name = 'movie_clean', where = '_PartInd_ = 2'}
        target = 'profit'
        inputs = 'overview'
        texts = 'overview'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        nominals = 'profit'
        seed = '649'
        modelTable = 'gru'
        modelWeights = {name='gru_trained_weights', replace=True}
        optimizer = {miniBatchSize=50, maxEpochs=30, 
              algorithm={method='adam', beta1=0.9, beta2=0.999,
              learningRate=0.05, clipGradMax=100, clipGradMin=-100}};
    quit;

  10. Score the test data using the GRU model with regularization.

    Solution:

    proc cas;
    	deepLearn.dlScore / 
        table    = {name = 'movie_clean', where = '_PartInd_ = 0'}
        model = 'gru'
        initWeights = 'gru_trained_weights'
        copyVars = 'profit'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        casout = {name='gru_scored', replace=True};
    quit;
    The misclassification error is approximately 43.0.

Lesson 04

Deep Learning Using SAS® Software
Lesson 04, Section 1 Practice: Tuning the Movie Profitability Model

In this practice, you tune the movie profitability model that you created in the previous practice. You use the data sets MOVIE_CLEAN and MOVIE_EMBED again.

Note: To see the full solution code for this practice, open the SAS program DLUS04S01.sas in the course data folder in SAS Studio.

  1. If you did not complete the previous practice in the current SAS Studio session, run the SAS program DLUS03S01.sas to load the data sets before you proceed with this practice.

  2. Use the dlTune action to tune the movie profitability model from the previous practice. Tune the gamma, learning rate, and dropout hyperparameters. Use bounds of (0.3,0.7), (0.0001, 0.01), and (0.1, 0.9), respectively. In the optimizer argument, set maxEpochs to 10, numTrials to 25, tuneiter to 10, and tuneRetention to 0.5. Or, if you prefer, tune the model as you see fit. Try to find a set of hyperparameters that results in a more predictive model on new data.

    Solution:

    proc cas;
       deepLearn.dlTune /
       modelTable = 'gru'
       modelWeights = {name='gru_trained_weights', replace=True}
       table    = {name = 'movie_clean', where = '_PartInd_ = 1'}
       validTable = {name = 'movie_clean', where = '_PartInd_ = 2'}
       target = 'profit'
       inputs = 'overview'
       texts = 'overview'
       textParms = {initInputEmbeddings={name='movie_embed'}}
       nominals = 'profit'
       seed = '649'
       optimizer = {miniBatchSize=50, maxEpochs=10, numTrials=25, 
                    tuneIter=10, tuneRetention=0.5, 
          algorithm={method='adam', beta1=0.9, beta2=0.999, 
                     clipGradMax=100, clipGradMin=-100
                     gamma={lowerBound=0.3 upperBound=0.7},
                     learningRate={lowerBound=0.0001 upperBound=0.01}}
                     dropout={lowerBound=0.1 upperBound=0.9}};
    quit;
    In the results, the first row of the Best Parameters table shows the hyperparameters that produce the best validation error.

  3. Use the tuned model to score the test data with the dlScore action. Does the tuned model outperform the previous model?

    Solution:

    proc cas;
       deepLearn.dlScore / 
       table    = {name = 'movie_clean', where = '_PartInd_ = 0'}
       model = 'gru'
       initWeights = 'gru_trained_weights'
       textParms = {initInputEmbeddings={name='movie_embed'}}
       copyVars = 'profit'
       casout = {name='gru_scored', replace=True};
    quit;
    The tuned model outperforms the previous model.

Lesson 05

Deep Learning Using SAS® Software
Lesson 05, Section 1 Practice: Constructing a Sparse Denoising Convolutional Autoencoder

In this practice, you construct and train a sparse denoising convolutional autoencoder. To do this, you use two new layers that have not been explicitly introduced in this course: the TRANSPOSE CONVOLUTION layer and the SEGMENTATION layer.

The TRANSPOSE CONVOLUTION layer is used to upsample information to effectively reverse engineer downsampling that arises when stride is greater than 1. The formula for upsampling is as follows:

o=(i1)s2p+f+op

where

  • o is the output feature map size
  • i is the input feature map size
  • s equals the stride value
  • p is the padding value
  • f is the size of the filter, and
  • op equals the output padding value.

For example, transforming an input feature map size of 8x8 cross-correlated with a filter size of 5x5 into a feature map of 16x16 requires the following:

stride(s)=2 padding(p)=2 outputpadding(op)=1

A program accomplishing the above would resemble the following:

AddLayer / model='My_model' name='TConvo' layer={type='TRANSCONVO'
           nFilters=4  width=5 height=5 stride=2 padding=2
           outputpadding=1} srcLayers={'previouslayer'};

The SEGMENTATION layer computes the associated loss error for either classification or regression using each tensor element from the input layer. The width, height, and depth of the segmentation layer are identical to those of the model's input layer. For example, if the input layer contains 32x32 color images, then the source layer feeding into the segmentation layer must contain a 32x32x3 tensor.

Note: To see the full solution code for this practice, open the SAS program DLUS05S01a in the course data folder in SAS Studio.

  1. Open the program named DLUS05E01a.sas.

  2. Construct a deep learning model shell called My_Sparse_DA and specify the type as CNN.

    Solution:

    buildModel / modelTable={name='My_Sparse_DA', replace=1} type = 'CNN';

  3. Add an input layer with the following attributes to the model My_Sparse_DA:
    1. Name the layer data.
    2. The number of channels should equal 1.
    3. The width should equal 32.
    4. The height should equal 32.
    5. Apply a dropout rate of 40%.
    6. Add an offset of 92.7.

    Solution:

    addLayer / model='My_Sparse_DA' name='data' layer={type='input' 
    nchannels=1 width=32 height=32 dropout=.4 offsets={92.739742}};

  4. Add a convolution layer with the following attributes to the model My_Sparse_DA:
    1. Name the layer ConVLayer1.
    2. This layer should contain sixteen (16) filters.
    3. The filters used in this layer should be a 5x5.
    4. Apply a stride value of 2.
    5. The activation function should be exponential linear unit (ELU).
    6. The weight initialization method should be MSRA2.
    7. The source layer should be the input layer, titled data.


  5. Add a second convolution layer with the following attributes to the model My_Sparse_DA:
    1. Name the layer ConVLayer2.
    2. This layer should contain eight (8) filters.
    3. The filters used in this layer should be a 5x5.
    4. Apply a stride value of 2.
    5. The activation function should be Identity.
    6. The hidden bias should be removed from this layer.
    7. The weight initialization method should be MSRA2.
    8. The source layer to this convolution layer should be the first convolution layer, titled ConVLayer1.
    9. Apply batch normalization to the output of ConVLayer2 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo2.

    Solution:

    addLayer / model='My_Sparse_DA' name='ConVLayer2' 
    layer={type='CONVO' nFilters=8  width=5 height=5 stride=2 
    act='Identity' init='MSRA2' includeBias=FALSE} 
    srcLayers={'ConVLayer1'};
    addLayer / model='My_Sparse_DA' name='BNConvo2' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'ConVLayer2'};

  6. Add a third convolution layer with the following attributes to the model My_Sparse_DA:
    1. Name the layer ConVLayermiddle.
    2. This layer should contain one (1) filter.
    3. The filter used in this layer should be a 5x5.
    4. Apply a stride value of 1.
    5. The activation function should be Identity.
    6. The hidden bias should be removed from this layer.
    7. The weight initialization method should be MSRA2.
    8. The source layer to this convolution layer should be the output of the second convolution layer's batch normalization, titled BNConvo2.
    9. Apply batch normalization to the output of ConVLayermiddle and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvoMiddle.

    Solution:

    addLayer / model='My_Sparse_DA' name='ConVLayermiddle' 
    layer={type='CONVO' nFilters=1  width=5 height=5 stride=1 
    act='Identity' init='MSRA2' includeBias=FALSE} 
    srcLayers={'BNConvo2'};
    addLayer / model='My_Sparse_DA' name='BNConvoMiddle' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'ConVLayermiddle'};

  7. Next, add a transpose convolution layer with the following attributes to the model My_Sparse_DA:
    1. Name the layer TConvo4.
    2. Set the type value to TRANSCONVO.
    3. This layer should contain eight (8) filters.
    4. The filter used in this layer should be a 5x5.
    5. Apply a stride value of 2.
    6. Apply a padding value of 2.
    7. Apply an output padding value of 1.
    8. The activation function should be Identity.
    9. The hidden bias should be removed from this layer.
    10. The weight initialization method should be MSRA2.
    11. The source layer to this transpose convolution layer should be the output of the third convolution layer's batch normalization, titled BNConvoMiddle.
    12. Apply batch normalization to the output of TConvo4 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo4.

    What is the size of the feature maps created by this transpose convolution layer?

    Solution:

    addLayer / model='My_Sparse_DA' name='TConvo4' 
    layer={type='TRANSCONVO' nFilters=8  width=5 height=5 stride=2 
    padding=2 outputpadding=1 act='Identity' includeBias=FALSE 
    init='MSRA2'} srcLayers={'BNConvoMiddle'};
    addLayer / model='My_Sparse_DA' name='BNConvo4' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'TConvo4'};

    What is the size of the feature maps created by this transpose convolution layer? 16

  8. Add another transpose convolution layer with the following attributes to the model My_Sparse_DA:
    1. Name the layer TConvo5.
    2. Set the type value to TRANSCONVO.
    3. This layer should contain sixteen (16) filters.
    4. The filter used in this layer should be a 5x5.
    5. Apply a stride value of 2.
    6. Apply a padding value of 2.
    7. Apply an output padding value of 1.
    8. The activation function should be Identity.
    9. The hidden bias should be removed from this layer.
    10. The weight initialization method should be MSRA2.
    11. The source layer to this transpose convolution layer should be the output of the previous transpose convolution layer's batch normalization, titled BNConvo4.
    12. Apply batch normalization to the output of TConvo5 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo5.

    What is the size of the feature maps created by this transpose convolution layer?

    Solution:

    addLayer / model='My_Sparse_DA' name='TConvo5' 
    layer={type='TRANSCONVO' nFilters=16  width=5 height=5 stride=2 
    padding=2 outputpadding=1 act='Identity' includeBias=FALSE 
    init='MSRA2'} srcLayers={'BNConvo4'};
    addLayer / model='My_Sparse_DA' name='BNConvo5' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'TConvo5'};

    What is the size of the feature maps created by this transpose convolution layer? 32

  9. Add a final convolution layer. This final convolution layer is used to modify the depth of the information to match that of the input layer. Ensure that this convolution layer has the following attributes and is added to the model My_Sparse_DA:
    1. Name the layer Convo6.
    2. This layer should contain one (1) filter.
    3. The filter used in this layer should be a 3x3.
    4. Apply a stride value of 1.
    5. The activation function should be Identity.
    6. The hidden bias should be removed from this layer.
    7. The weight initialization method should be MSRA2.
    8. The source layer to this convolution layer should be the output of the previous transpose convolution layer's batch normalization, titled BNConvo5.
    9. Apply batch normalization to the output of Convo6 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo6.

    What is the width, height, and depth of this final convolutional layer's output?

    Solution:

    addLayer / model='My_Sparse_DA' name='Convo6' layer={type='CONVO' 
    nFilters=1  width=3 height=3 stride=1 act='Identity' 
    includeBias=FALSE init='MSRA2'} srcLayers={'BNConvo5'};
    addLayer / model='My_Sparse_DA' name='BNConvo6' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'Convo6'};

    What is the width, height, and depth of this final convolutional layer's output? 32 x 32 x 1

  10. Add a segmentation layer with the following attributes to the model My_Sparse_DA:
    1. Set the name of the layer to seglayer.
    2. Set the type of the layer to SEGMENTATION.
    3. Set the activation function to Identity.
    4. Set the number of channels (nChannels=) to 1.
    5. Set the source layer as the last batch normalization layer, titled BNConvo6.

    Solution:

    addLayer / model='My_Sparse_DA' name='seglayer' 
    layer={type='segmentation' act='Identity' nChannels=1} 
    srcLayers={'BNConvo6'};	

  11. Now you can train the sparse denoising convolutional autoencoder.
    Add dataSpecs to dlTrain with the following attributes:
    1. Define the input layer information with the following:
      1. Data = {'_image_'}
      2. Layer=data
      3. Type=Image
    2. Define the output layer information with the following:
      1. Datalayer=data
      2. Layer=seglayer
      3. Type=Image

    Solution:

    dataSpecs={
    	{data={'_image_'},
     					layer='data',
    					type='Image'}
    	{dataLayer='data',
     					layer='seglayer',
    					type='Image'}}

  12. Add an L1 regularization value of .00001.

    Solution:

    optimizer={minibatchsize=80, 
      algorithm={method='ADAM', beta1=.9, beta2=.999, learningrate=.02}
       			        regL1=0.00001,  
            			maxepochs=60}

  13. Add an L2 regularization value of .00002.

    Solution:

    optimizer={minibatchsize=80, 
      algorithm={method='ADAM', beta1=.9, beta2=.999, learningRate=.02}
       			        regL1=0.00001,  
            			   regL2=0.00002, 
            			maxEpochs=60}

  14. Run the program and view the results. What are the values of the training error rate and the validation error rate?

    Solution:

    The Optimization History table shows that the training error rate is approximately 54.5 and the validation error rate is approximately 58.1.

  15. Open the program named DLUS05E01b.sas and examine it.

    This program uses the trained sparse denoising autoencoder to score (inference) the gray-scale version of the Cifar-10 data. The program saves the feature maps of the middle layer of encoders and then prints out 60 of the images for viewing.

  16. Run the program and view the images in the results.