Lesson 01

Deep Learning Using SAS® Software
Lesson 01, Section 2 Practice: Modify the Architecture and Training of a Deep Learning Neural Network

In this practice you modify the architecture and training of a deep learning neural network.

Note: To see the full solution code for this practice, open the SAS program DLUS01S01.sas in the course data folder in SAS Studio.

  1. If you did not run the data setup program DLUS01D01.sas (used in an earlier demonstration) in the current SAS Studio session, run it before you proceed with this practice.

  2. Open the program named DLUS01E01.sas.

    This program provides a model template that is similar to the second deep learning model trained in the last demonstration. The following steps request that you expand the number of hidden layers by two, adding one layer immediately after the input layer and one layer immediately before the output layer. This is shown in the diagram below:
    Conceptual diagram of a deep learning model with an input layer, nine hidden layers (with labels indicating to add the first and ninth), and a single output

  3. Replace the hyperbolic tangent (TANH) activation functions in each hidden layer with ELU.

  4. Add another hidden layer immediately after the input layer. Connect this hidden layer to both the input layer and the next hidden layer. Construct this hidden layer such that the layer contains the following characteristics:
    • has 40 hidden neurons
    • uses the exponential linear (ELU) activation function
    • includes a dropout rate of .05 percent
    • uses Xavier initialization for the hidden weights
    • normalizes the output of the layer using batch normalization. That is, apply batch normalization before the output of this layer has a nonlinear transformation applied to the information.

    Solution:

    /* 			FIRST HIDDEN LAYER 			*/
    AddLayer / model='BatchDLNN' name='HLayer1'
      layer={type='FULLCONNECT' n=40 act='ELU' init='xavier'
      dropout=.05} srcLayers={'data'};
    
    /* 			SECOND HIDDEN LAYER 		*/
    AddLayer / model='BatchDLNN' name='HLayer2'
      layer={type='FULLCONNECT' n=30 act='identity' init='xavier'
      includeBias=False} srcLayers={'HLayer1'};
    AddLayer / model='BatchDLNN' name='BatchLayer2'
         layer={type='BATCHNORM' act='ELU'} srcLayers={'HLayer2'};

  5. Add another hidden layer just before the output layer. Connect this hidden layer to both the previous hidden layer and the output layer. Construct this hidden layer such that the layer contains the following characteristics:
    • has 40 hidden neurons
    • uses the exponential linear (ELU) activation function
    • includes a dropout rate of .05 percent
    • uses Xavier initialization for the hidden weights
    • normalizes the output of the layer using batch normalization. That is, apply batch normalization before the output of this layer has a nonlinear transformation applied to the information.

    Solution:

    /* 			NINTH HIDDEN LAYER 		*/
    AddLayer / model='BatchDLNN' name='HLayer9'
      layer={type='FULLCONNECT' n=40 act='identity' init='xavier'
      includeBias=False dropout=.05} srcLayers={'BatchLayer8'};     
    AddLayer / model='BatchDLNN' name='BatchLayer9'
         layer={type='BATCHNORM' act='ELU'} srcLayers={'HLayer9'};
    

  6. The training optimization algorithm is using a STEP learning rate policy. The step size is set to 10 by default. Change the step size to 15. This increases the aggressiveness of the error space crawl.

  7. Run the program.

  8. What is the total number of model parameters?

    Solution:

    There are 7,032 parameters in the model.

  9. What is the approximate validation misclassification rate at epoch 47?

    Solution:

    The misclassification rate at epoch 47 is 25.7%.

Lesson 02

Deep Learning Using SAS® Software
Lesson 02, Section 3 Practice: Build and Train a Convolutional Neural Network to Score New Data

In this practice you build and train a convolutional neural network to score new data. Note that your results might vary.

Note: To see the full solution code for this practice, open the SAS program DLUS02S01 in the course data folder in SAS Studio.

  1. If you did not run the data setup program DLUS02D01.sas (used in an earlier demonstration) in the current SAS Studio session, run it before you proceed with this practice.

  2. Open the program named DLUS02E01.sas.

    This program provides a program template for you to use. The following steps request that you add layers to build out a convolutional neural network. The name of the model is MYCNN.

  3. Add a convolutional layer just after the input layer. The convolutional layer should have the following attributes:
    • 32 filters
    • Width of 3
    • Height of 3
    • Stride of 2


  4. Add two pooling layers and connect the convolutional layer to each of the two new pooling layers. Ensure that each pooling layer has the following attributes:
    • Width of 2
    • Height of 2
    • Stride of 1


  5. Set one of the pooling layers to perform a maximum summary, and one to perform an average summary.

  6. Add a convolutional layer after the pooling layers. Note: A concatenation layer should be used between the pooling layers and convolutional layer.
    Structure the convolutional layer to have the following attributes:
    • 128 filters
    • Width of 3
    • Height of 3
    • Stride of 1
    • Use a Xavier weight initialization


  7. Add a fully connected layer and connect the convolutional layer to the fully connected layer. Connect the fully connected layer to the output layer provided in the program template. The fully connected layer should have the following attributes:
    • 20 neurons
    • Use a Xavier weight initialization
    • Apply batch normalization
    • Apply an exponential linear activation transformation

    Note: You need to use two addLayer actions to complete this task.

  8. Run the program and view the results.

    Note: On rare occasions, an ODS warning like the following might occur when the entire program is run: Output 'OptIterHistory' was not created. Make sure that the output object name, label, or path is spelled correctly. If you experience this error, run each PROC CAS statement independently (sequentially).

  9. Answer the following questions:
    • How many model parameters does your model have?
    • Which epoch does your model perform best on the validation data with respect to validation misclassification rate (validation error)?
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    It appears the model is overfitting. Perhaps adding or increasing regularizations can improve validation performance.

    Solution:

    • How many model parameters does your model have? 238,842
    • Which epoch does your model perform best on the validation data with respect to validation misclassification rate (validation error)? 9
    • What is your model's best performance on the validation misclassification rate (validation error)? 47.7% (or could be 47.35%)
    • What is your model's best performance on the training misclassification rate (fit error)? 0%

  10. Modify the model by adding a dropout rate of 50% to the fully connected layer.

    Solution:

    AddLayer / model='MYCNN' name='FCLayer1' layer={type='FULLCONNECT' 
    n=20 act='identity' init='xavier' includeBias=False dropout=.5} 
    				srcLayers={'ConVLayer2'}; 
    AddLayer / model='MYCNN' name='BatchLayer' layer={type='BATCHNORM' 
    act='ELU'} srcLayers={'FCLayer1'};	
    

  11. Rerun the model.

  12. Answer the following questions:
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    There still seems to be a large divergence between training and validation data. Let's continue to add regularizations to improve validation performance.

    Solution:

    • What is your model's best performance on the validation misclassification rate (validation error)? 46.55%
    • What is your model's best performance on the training misclassification rate (fit error)? 5.6%

  13. Apply batch normalization to the convolution layer containing 128 filters. (Remember to set the activation function to identity and remove the bias from the convolution layer.)

    Solution:

    AddLayer / model='MYCNN' name='ConVLayer2' layer={type='CONVO' 
               nFilters=128  width=3 height=3 stride=1 act='identity'
               init='xavier' includeBias=False}
               srcLayers={'concatlayer1'};
     
    AddLayer / model='MYCNN' name='BatchLayer_1'
               layer={type='BATCHNORM' act='ELU'}
               srcLayers={'ConVLayer2'};		
    
    AddLayer / model='MYCNN' name='FCLayer1'
               layer={type='FULLCONNECT' n=20 act='identity'
               init='xavier' includeBias=False dropout=.5} 
               srcLayers={'BatchLayer_1'};	
    

  14. Rerun the model.

  15. Answer the following questions:
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    The model's performance on the validation data is improving! Let's continue to add regularizations to improve validation performance.

    Solution:

    • What is your model's best performance on the validation misclassification rate (validation error)? 43.8%
    • What is your model's best performance on the training misclassification rate (fit error)? 15.9%

  16. Add a dropout rate of 10% to the convolution layer containing 128 filters.

    Solution:

    AddLayer / model='MYCNN' name='ConVLayer2' layer={type='CONVO'
               nFilters=128  width=3 height=3 stride=1 act='identity'
               init='xavier' includeBias=False dropout=.1}
               srcLayers={'concatlayer1'};

  17. Rerun the model.

  18. Answer the following questions:
    • What is your model's best performance on the validation misclassification rate (validation error)?
    • What is your model's best performance on the training misclassification rate (fit error)?

    Solution:

    • What is your model's best performance on the validation misclassification rate (validation error)? 43.3%
    • What is your model's best performance on the training misclassification rate (fit error)? 21.85%

Lesson 03

Deep Learning Using SAS® Software
Lesson 03, Section 3 Practice: Predicting Movie Profitability

The data set MOVIE_CLEAN contains a description of popular movies. Each overview is paired with a profit indicator indicating if the movie made more money than its budget. The goal of this analysis is to build a model to predict if the movie is profitable based solely on its description.

The data set MOVIE_CLEAN was cleaned from its original version. For the text in each review, stop words and non-letters have been removed, words have been stemmed, and all tokens were changed to lowercase. The variables in this data set are listed below:

MOVIE_CLEAN: Variables
Name Model Role Measurement Level Description
PROFIT Target Binary 1 = movie made more money than the budget
0 = otherwise
TITLE Nominal Text Movie title
OVERVIEW Input Text Description of the movie

The data set MOVIE_EMBED contains the Global Vectors for Word Representation (GloVe) for each term in the MOVIE_CLEAN data set. The GloVe was created from word-word co-occurrence statistics from the MOVIE_CLEAN corpus using an unsupervised learning algorithm. The vectors of dimension 100 show the linear substructure of the word vector space. The variables in MOVIE_EMBED are listed below:

MOVIE_EMBED: Variables
Name Model Role Measurement Level Description
VOCAB_TERM Input Nominal Individual terms from the cleaned corpus
X1 - X100 Input Interval Word representations in 100 dimensions

Note: To see the full solution code for this practice, open the SAS program DLUS03S01.sas in the course data folder in SAS Studio.

  1. Print the first few observations of MOVIE_CLEAN and MOVIE_EMBED to view the data sets.

    Solution:

    proc print data=mycas.movie_clean (obs=5);
    run;
    
    proc print data=mycas.movie_embed (obs=5);
    run;

  2. Use the FREQ procedure to view the number of movies that earned a profit.

    Solution:

    proc freq data=mycas.movie_clean;
    	tables profit;
    run;
    823 movies earned a profit.

  3. Find and print the titles of movies whose overview mentions Denzel Washington.

    Solution:

    data denzel (drop=newvar);
    	set mycas.movie_clean;
    	newvar = find(overview,'denzel','i');
    	if newvar > 0;
    run;
    
    proc print data=denzel;
    	var title;
    run;
    Three movies have overviews that mention Denzel Washington.

  4. Partition the data into 70% training, 15% validation, and 15% for testing by adding a partition indicator to the CAS table.

    Solution:

    proc partition data=mycas.movie_clean
    	samppct=70 samppct2=15 seed=802 partind;
    	output out=mycas.movie_clean;
    run;

  5. Use the shuffle action from the table action set to randomize the observations and avoid a potential ordering bias in the deep learning model.

    Solution:

    proc cas;
    	table.shuffle / 
    	table = 'movie_clean'
    	casout = {name='movie_clean', replace=True};
    quit;

  6. Use the deepLearn action set to build a gated recurrent unit neural network with one input layer, two GRU hidden layers, and an output layer.
    1. Use the buildModel action to initialize the RNN and then add an input layer.
    2. Connect the input layer to a GRU hidden layer with 15 neurons, set the activation function to auto, set initialization to Xavier, and set the output type to same length.
    3. Connect this hidden layer to a another GRU hidden layer with the same arguments except set output type to encoding.
    4. Connect the second hidden layer to the output layer and set the error function to auto.
    5. To make sure the model structure is correct, specify the modelInfo action and view the model information.

    Solution:

    proc cas;
    	loadactionset "deeplearn";
    quit;
    
    proc cas;
    	deepLearn.buildModel /
        model = {name='gru', replace=True}
        type = 'RNN';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='input'}
        replace=True
        name = 'data';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='samelength'}
        srcLayers = 'data'
        replace=True
        name = 'rnn1';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='encoding'}
        srcLayers = 'rnn1'
        replace=True
        name = 'rnn2';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='output', act='auto', init='xavier', 
                 error='auto'}
        srcLayers = 'rnn2'
        replace=True
        name = 'output';
    
    	deepLearn.modelInfo /
        model='gru';
    quit;

  7. Use the dlTrain action to train the GRU model using the profit variable as the target and the overview variable as the input. Train the model using the Adam optimization algorithm and a learning rate of 0.05. Use mini batch sizes of 50 and train for 20 epochs. Be sure to save the weights to score the test data after the model is built.

    Solution:

    proc cas;
    	deepLearn.dlTrain /
        table    = {name = 'movie_clean', where = '_PartInd_ = 1'}
        validTable = {name = 'movie_clean', where = '_PartInd_ = 2'}
        target = 'profit'
        inputs = 'overview'
        texts = 'overview'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        nominals = 'profit'
        seed = '649'
        modelTable = 'gru'
        modelWeights = {name='gru_trained_weights', replace=True}
        optimizer = {miniBatchSize=50, maxEpochs=30, 
               algorithm={method='adam', beta1=0.9, beta2=0.999,
               learningRate=0.05, clipGradMax=100, clipGradMin=-100}};
    quit;

  8. Score the test data and view the misclassification error.

    Solution:

    proc cas;
        deepLearn.dlScore / 
        table    = {name = 'movie_clean', where = '_PartInd_ = 0'}
        model = 'gru'
        initWeights = 'gru_trained_weights'
        copyVars = 'profit'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        casout = {name='gru_scored', replace=True};
    quit;
    The misclassification error is approximately 46.4.

  9. In the optimization history of the dlTrain action, notice that the model overfit on the training data, resulting in a comparatively large validation error. Regularize the previous GRU model by building the model again but include a dropout of 0.40 in each GRU hidden layer. Train the new model with the same arguments for the dlTrain action and view the changes in the optimization history.

    Solution:

    proc cas;
    	deepLearn.buildModel /
        model = {name='gru', replace=True}
        type = 'RNN';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='input'}
        replace = True
        name = 'data';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='samelength', dropout=.40}
        srcLayers = 'data'
        replace=True
        name = 'rnn1';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='recurrent', n=15, act='auto', init='xavier', 
                 rnnType='gru', outputType='encoding', dropout=.40}
        srcLayers = 'rnn1'
        replace = True
        name = 'rnn2';
    
    	deepLearn.addLayer /
        model = 'gru'
        layer = {type='output', act='auto', init='xavier', 
                 error='auto'}
        srcLayers = 'rnn2'
        replace = True
        name = 'output';
    
    	deepLearn.modelInfo /
        model='gru';
    quit;
    
    proc cas;
    	deepLearn.dlTrain /
        table    = {name = 'movie_clean', where = '_PartInd_ = 1'}
        validTable = {name = 'movie_clean', where = '_PartInd_ = 2'}
        target = 'profit'
        inputs = 'overview'
        texts = 'overview'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        nominals = 'profit'
        seed = '649'
        modelTable = 'gru'
        modelWeights = {name='gru_trained_weights', replace=True}
        optimizer = {miniBatchSize=50, maxEpochs=30, 
              algorithm={method='adam', beta1=0.9, beta2=0.999,
              learningRate=0.05, clipGradMax=100, clipGradMin=-100}};
    quit;

  10. Score the test data using the GRU model with regularization.

    Solution:

    proc cas;
    	deepLearn.dlScore / 
        table    = {name = 'movie_clean', where = '_PartInd_ = 0'}
        model = 'gru'
        initWeights = 'gru_trained_weights'
        copyVars = 'profit'
        textParms = {initInputEmbeddings={name='movie_embed'}}
        casout = {name='gru_scored', replace=True};
    quit;
    The misclassification error is approximately 43.0.

Lesson 04

Deep Learning Using SAS® Software
Lesson 04, Section 1 Practice: Tuning the Movie Profitability Model

In this practice, you tune the movie profitability model that you created in the previous practice. You use the data sets MOVIE_CLEAN and MOVIE_EMBED again.

Note: To see the full solution code for this practice, open the SAS program DLUS04S01.sas in the course data folder in SAS Studio.

  1. If you did not complete the previous practice in the current SAS Studio session, run the SAS program DLUS03S01.sas to load the data sets before you proceed with this practice.

  2. Use the dlTune action to tune the movie profitability model from the previous practice. Tune the gamma, learning rate, and dropout hyperparameters. Use bounds of (0.3,0.7), (0.0001, 0.01), and (0.1, 0.9), respectively. In the optimizer argument, set maxEpochs to 10, numTrials to 25, tuneiter to 10, and tuneRetention to 0.5. Or, if you prefer, tune the model as you see fit. Try to find a set of hyperparameters that results in a more predictive model on new data.

    Solution:

    proc cas;
       deepLearn.dlTune /
       modelTable = 'gru'
       modelWeights = {name='gru_trained_weights', replace=True}
       table    = {name = 'movie_clean', where = '_PartInd_ = 1'}
       validTable = {name = 'movie_clean', where = '_PartInd_ = 2'}
       target = 'profit'
       inputs = 'overview'
       texts = 'overview'
       textParms = {initInputEmbeddings={name='movie_embed'}}
       nominals = 'profit'
       seed = '649'
       optimizer = {miniBatchSize=50, maxEpochs=10, numTrials=25, 
                    tuneIter=10, tuneRetention=0.5, 
          algorithm={method='adam', beta1=0.9, beta2=0.999, 
                     clipGradMax=100, clipGradMin=-100
                     gamma={lowerBound=0.3 upperBound=0.7},
                     learningRate={lowerBound=0.0001 upperBound=0.01}}
                     dropout={lowerBound=0.1 upperBound=0.9}};
    quit;
    In the results, the first row of the Best Parameters table shows the hyperparameters that produce the best validation error.

  3. Use the tuned model to score the test data with the dlScore action. Does the tuned model outperform the previous model?

    Solution:

    proc cas;
       deepLearn.dlScore / 
       table    = {name = 'movie_clean', where = '_PartInd_ = 0'}
       model = 'gru'
       initWeights = 'gru_trained_weights'
       textParms = {initInputEmbeddings={name='movie_embed'}}
       copyVars = 'profit'
       casout = {name='gru_scored', replace=True};
    quit;
    The tuned model outperforms the previous model.

Lesson 05

Deep Learning Using SAS® Software
Lesson 05, Section 1 Practice: Constructing and Training a Sparse Denoising Convolutional Autoencoder

In this practice, you construct and train a sparse denoising convolutional autoencoder. To do this, you use two new layers that have not been explicitly introduced in this course: the TRANSPOSE CONVOLUTION layer and the SEGMENTATION layer. Note: These two layers are discussed in detail in the Advanced Topics in Computer Vision Using SAS Software course.

The TRANSPOSE CONVOLUTION layer is used to upsample information to effectively reverse engineer down sampling that arises when stride is greater than one. The formula for upsampling is as follows:

o=(i1)s2p+f+op

where

  • o is the output feature map size
  • i is the input feature map size
  • s equals the stride value
  • p is the padding value
  • f is the size of the filter, and
  • op equals the output padding value.

For example, transforming an input feature map size of 8x8 cross-correlated with a filter size of 5x5 into a feature map of 16x16 requires the following:

stride(s)=2 padding(p)=2 outputpadding(op)=1

A program accomplishing the above would resemble the following:

AddLayer / model='My_model' name='TConvo' layer={type='TRANSCONVO'
           nFilters=4  width=5 height=5 stride=2 padding=2
           outputpadding=1} srcLayers={'previouslayer'};

The SEGMENTATION layer computes the associated loss error for either classification or regression using each tensor element from the input layer. The width, height, and depth of the segmentation layer are identical to those of the model's input layer. For example, if the input layer contains 32x32 color images, then the source layer feeding into the segmentation layer must contain a 32x32x3 tensor.

Note: To see the full solution code for this practice, open the SAS program DLUS05S01a in the course data folder in SAS Studio.

  1. Open the program named DLUS05E01a.sas.

  2. Construct a deep learning model shell called My_Sparse_DA and specify the type as CNN.

    Solution:

    BuildModel / modeltable={name='My_Sparse_DA', replace=1} type = 'CNN';

  3. Add an input layer with the following attributes to the model, My_Sparse_DA:
    1. Name the layer data.
    2. Number of channels should equal 1.
    3. The width should equal 32.
    4. The height should equal 32.
    5. Apply a dropout rate of 40%.
    6. Add an offset of 92.7.

    Solution:

    AddLayer / model='My_Sparse_DA' name='data' layer={type='input' 
    nchannels=1 width=32 height=32 dropout=.4 offsets={92.739742}};

  4. Add a convolution layer with the following attributes to the model, My_Sparse_DA:
    1. Name the layer ConVLayer1.
    2. This layer should contain sixteen (16) filters.
    3. The filters used in this layer should be a 5x5.
    4. Apply a stride value of 2.
    5. The activation function should be exponential linear unit (ELU).
    6. The weight initialization method should be MSRA2.
    7. The source layer should be the input layer, titled data.

    Solution:

    AddLayer / model='My_Sparse_DA' name='ConVLayer1' 
    layer={type='CONVO' nFilters=16  width=5 height=5 stride=2 
    act='ELU' init='MSRA2' } srcLayers={'data'};	

  5. Add a second convolution layer with the following attributes to the model, My_Sparse_DA:
    1. Name the layer ConVLayer2.
    2. This layer should contain eight (8) filters.
    3. The filters used in this layer should be a 5x5.
    4. Apply a stride value of 2.
    5. The activation function should be identity.
    6. The hidden bias should be removed from this layer.
    7. The weight initialization method should be MSRA2.
    8. The source layer to this convolution layer should be the first convolution layer, titled ConVLayer1.
    9. Apply batch normalization to the output of ConVLayer2 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo2.

    Solution:

    AddLayer / model='My_Sparse_DA' name='ConVLayer2' 
    layer={type='CONVO' nFilters=8  width=5 height=5 stride=2 
    act='Identity' init='MSRA2' includeBias=FALSE} 
    srcLayers={'ConVLayer1'};
    AddLayer / model='My_Sparse_DA' name='BNConvo2' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'ConVLayer2'};

  6. Add a third convolution layer with the following attributes to the model, My_Sparse_DA:
    1. Name the layer ConVLayermiddle.
    2. This layer should contain one (1) filter.
    3. The filter used in this layer should be a 5x5.
    4. Apply a stride value of 1.
    5. The activation function should be identity.
    6. The hidden bias should be removed from this layer.
    7. The weight initialization method should be MSRA2.
    8. The source layer to this convolution layer should be the output of the second convolution layer's batch normalization, titled BNConvo2.
    9. Apply batch normalization to the output of ConVLayermiddle and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvoMiddle.

    Solution:

    AddLayer / model='My_Sparse_DA' name='ConVLayermiddle' 
    layer={type='CONVO' nFilters=1  width=5 height=5 stride=1 
    act='Identity' init='MSRA2' includeBias=FALSE} 
    srcLayers={'BNConvo2'};
    AddLayer / model='My_Sparse_DA' name='BNConvoMiddle' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'ConVLayermiddle'};

  7. Next, add a transpose convolution layer with the following attributes to the model, My_Sparse_DA:
    1. Name the layer TConvo4.
    2. Set the type value to TRANSCONVO.
    3. This layer should contain eight (8) filters.
    4. The filter used in this layer should be a 5x5.
    5. Apply a stride value of 2.
    6. Apply a padding value of 2.
    7. Apply an output padding value of 1.
    8. The activation function should be identity.
    9. The hidden bias should be removed from this layer.
    10. The weight initialization method should be MSRA2.
    11. The source layer to this transpose convolution layer should be the output of the third convolution layer's batch normalization, titled BNConvoMiddle.
    12. Apply batch normalization to the output of TConvo4 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo4.

    What is the size of the feature maps created by this transpose convolution layer?

    Solution:

    AddLayer / model='My_Sparse_DA' name='TConvo4' 
    layer={type='TRANSCONVO' nFilters=8  width=5 height=5 stride=2 
    padding=2 outputpadding=1 act='Identity' includeBias=FALSE 
    init='MSRA2'} srcLayers={'BNConvoMiddle'};
    AddLayer / model='My_Sparse_DA' name='BNConvo4' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'TConvo4'};

    What is the size of the feature maps created by this transpose convolution layer? 16

  8. Add another transpose convolution layer with the following attributes to the model, My_Sparse_DA:
    1. Name the layer TConvo5.
    2. Set the type value to TRANSCONVO.
    3. This layer should contain sixteen (16) filters.
    4. The filter used in this layer should be a 5x5.
    5. Apply a stride value of 2.
    6. Apply a padding value of 2.
    7. Apply an output padding value of 1.
    8. The activation function should be identity.
    9. The hidden bias should be removed from this layer.
    10. The weight initialization method should be MSRA2.
    11. The source layer to this transpose convolution layer should be the output of the previous transpose convolution layer's batch normalization, titled BNConvo4.
    12. Apply batch normalization to the output of TConvo5 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo5.

    What is the size of the feature maps created by this transpose convolution layer?

    Solution:

    AddLayer / model='My_Sparse_DA' name='TConvo5' 
    layer={type='TRANSCONVO' nFilters=16  width=5 height=5 stride=2 
    padding=2 outputpadding=1 act='Identity' includeBias=FALSE 
    init='MSRA2'} srcLayers={'BNConvo4'};
    AddLayer / model='My_Sparse_DA' name='BNConvo5' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'TConvo5'};

    What is the size of the feature maps created by this transpose convolution layer? 32

  9. Add a final convolution layer. This final convolution layer is used to modify the depth of the information to match that of the input layer. Ensure that this convolution layer has the following attributes, and is added to the model, My_Sparse_DA:
    1. Name the layer Convo6.
    2. This layer should contain one (1) filters.
    3. The filter used in this layer should be a 3x3.
    4. Apply a stride value of 1.
    5. The activation function should be identity.
    6. The hidden bias should be removed from this layer.
    7. The weight initialization method should be MSRA2.
    8. The source layer to this convolution layer should be the output of the previous transpose convolution layer's batch normalization, titled BNConvo5.
    9. Apply batch normalization to the output of Convo6 and use the exponential linear unit (ELU) as the activation function in the batch normalization layer. Name the batch normalization layer BNConvo6.

    What is the width, height, and depth of this final convolutional layer's output?

    Solution:

    AddLayer / model='My_Sparse_DA' name='Convo6' layer={type='CONVO' 
    nFilters=1  width=3 height=3 stride=1 act='Identity' 
    includeBias=FALSE init='MSRA2'} srcLayers={'BNConvo5'};
    AddLayer / model='My_Sparse_DA' name='BNConvo6' 
    layer={type='BATCHNORM' act='ELU'} srcLayers={'Convo6'};

    What is the width, height, and depth of this final convolutional layer's output? 32 x 32 x 1

  10. Add a segmentation layer with the following attributes to the model, My_Sparse_DA:
    1. Set the name of the layer to seglayer.
    2. Set the type of the layer to SEGMENTATION.
    3. Set the activation function to identity.
    4. Set the number of channels (NCHANNELS=) to 1.
    5. Set the source layer as the last batch normalization layer, titled BNConvo6.

    Solution:

    AddLayer / model='My_Sparse_DA' name='seglayer' 
    layer={type='segmentation' act='Identity' nChannels=1} 
    srcLayers={'BNConvo6'};	

  11. Now you can train the sparse denoising convolutional autoencoder.
    Add DataSpecs to DLTRAIN with the following attributes:
    1. Define the input layer information with the following:
      1. Data = {'_image_'}
      2. Layer=data
      3. Type=Image
    2. Define the output layer information with the following:
      1. Datalayer=data
      2. Layer=seglayer
      3. Type=Image

    Solution:

    dataSpecs={
    	{data={'_image_'},
     					layer='data',
    					type='Image'}
    	{datalayer='data',
     					layer='seglayer',
    					type='Image'}}

  12. Add an L1 regularization value of .00001.

    Solution:

    optimizer={minibatchsize=80, 
      algorithm={method='ADAM', beta1=.9, beta2=.999, learningrate=.02}
       			        regL1=0.00001,  
            			maxepochs=60}

  13. Add an L2 regularization value of .00002.

    Solution:

    optimizer={minibatchsize=80, 
      algorithm={method='ADAM', beta1=.9, beta2=.999, learningrate=.02}
       			        regL1=0.00001,  
            			   regL2=0.00002, 
            			maxepochs=60}

  14. Run the program and view the results. What are the values of the training error rate and the validation error rate?

    Solution:

    The Optimization History table shows that the training error rate is approximately 54.5 and the validation error rate is approximately 58.1.

  15. Open the program named DLUS05E01b.sas and examine it.

    This program uses the trained sparse denoising autoencoder to score (inference) the gray scale version of the Cifar-10 data. The program saves the feature maps of the middle layer of encoders, and then prints out sixty of the images for viewing.

  16. Run the program and view the images in the results.