SAS logo

Activities and Practices (with Solutions) for High-Performance Data Processing with CASL in SAS® Viya®


Course Code: ECASL35, prepared date: June 20, 2023
Copyright © SAS Institute Inc., Cary, NC, USA. All rights reserved.

 

Activity: Set Up Your Course Data (REQUIRED)

After you sign in to SAS Viya for Learners (the practice environment), perform the following steps to find the course files and set up the course data:

Open SAS Studio and locate the course folder.

  1. When you sign in to SAS Viya for Learners, you are in SAS Drive. In the upper left corner of the window, select the Show list of applications button, and then select Develop SAS Code. This opens SAS Studio, the application you use for this course.
  2. On the left side of SAS Studio, click the Explorer icon to access the Explorer.
  3. Expand pdcesx# > Home > Courses > CASL01. The CASL01 folder contains all the files that you need to work in the practice environment. Note: Your sever name begins with pdcesx, but the numbers are unique to your environment. In future instructions, the server name is represented as <server>.

Set up an autoexec file that runs automatically every time you start SAS Studio.

  1. In the CASL folder, double-click startup.sas to open it. This code sets up the data and structure for this course.
  2. Use Ctrl-A to select all the code, and then Ctrl-C to copy the code.
  3. In the menu at the top, select Options > Autoexec file, and then paste the code into the Autoexec file window.
  4. Select Run to run the startup code, then click Save. Note: The first time the autoexec code executes it might take about a minute. Subsequent runs will be faster.
  5. Close the startup.sas tab.

Expand each subfolder in your course folder so you are familiar with the location of your course files

  • The activities subfolder contains the SAS program files that you use for the activities.
  • The data subfolder contains the data files.
  • The demos subfolder contains the SAS program files that are shown in the demo videos. You can use these programs to perform the demos.
  • The examples subfolder contains the program code shown in the conceptual videos.
  • The practices subfolder contains the SAS program files that you use for the practices. Note: The practices folder contains starter code for some practices and solution code for all practices. Starter code filenames include the letter p and the solution code filenames include the letter s. Example: cs06p01.sas is the starter code for practice 1 in lesson 6, and s06s01.sas is the corresponding solution code.

Lesson 01

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 01, Section 1 Activity: View Available Tables

  1. Sign in to SAS Viya for Learners. SAS Drive opens. Click the Show list of applications menu in the top left of the browser and select Develop SAS Code. SAS Studio opens.
  1. Select Explorer on the SAS Studio navigation pane.
  1. Expand pdcesx#### > Home > Courses > CASL01.
  1. Open program cs01a01.sas from the activities folder.

  1. Run the program. Which caslib contains the CARS and HEART tables?

Lesson 02

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 1 Activity: Use the CAS API

Open cs02a01.sas from the activities folder and perform the following tasks:

  1. The program includes a PROC CASUTIL step, PROC CONTENTS and a PROC MEANS step. The CASUTIL step loads a table into memory, the CONTENTS step views columns in the in-memory table, and the MEANS step analyzes in-memory data in CAS.
  1. At the bottom of the program, enter the following syntax to list the previous five CAS actions that were executed in the CAS server using the SAS API to CAS.

    cas conn listhistory 5;
  1. Run the program and examine the notes in the log. Observe that each note begins with the word action. What was the last action listed in the log?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 1 Activity: Find the EXISTS Built-In Function

  1. In a browser, go to https://support.sas.com/documentation and click SAS Programming: SAS Viya.
  1. In the left navigation pane, select Cloud Analytic Services, and then select CASL Programming.
  1. In the CASL Reference section select the CASL Built-In Functions section.
  1. Find and select the EXISTS built-in function. What does the EXISTS function do?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 2 Activity: Find the loadTable CAS Action

  1. In a browser, go to https://support.sas.com/documentation. Click SAS Programming: SAS Viya.
  1. In the Quick Links box, select Actions, Procedures, and Language Elements.
  1. In the CAS Actions box, select Action Sets by Name.
  1. Search Action Sets by Name for the Table action set and select Table.

  2. Find the loadTable action and select it. What does the loadTable action do?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 3 Activity: Create and Print a CASL Variable

Open cs02a04.sas from the activities folder and perform the following tasks to create a numeric variable:

  1. At the end of the CAS procedure, create a new variable named z that uses the sum function to add the x and y variables.

  1. Run the program. Notice in the log that you do not see the total value of the z variable.
  1. Add a PRINT statement to print the value of the z variable.

  1. Run the program. What is the value of z?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 3 Activity: Use Variables in CAS Actions

Open cs02a05.sas from the activities folder and perform the following tasks:

  1. Review the program. Notice the cars and heart tables are each used in three CAS actions.
  1. Run the program. The first five rows, table details, and column information are returned for the cars and heart tables.
  1. Create three new variables: tbl1 with the string value cars, tbl2 with the string value heart, and n with the numeric value 5.

  1. Replace each instance of "cars" with the variable tbl1, "heart" with tbl2, and 5 with n.

  1. Run the program. Are the results the same as the Step 2 results?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 4 Activity: Create and Access an Array

Open cs02a06.sas from the activities folder and perform the following tasks:

  1. View the array being created in the CAS procedure. Notice that the array is enclosed in brackets. Run the procedure and review the error.

  1. Replace the brackets with braces. Run the procedure. Verify that the array was created.

  1. Add a PRINT statement to print the columns variable. Run the procedure. View the log to see the values of the columns variable.

  1. Modify the PRINT statement to print only the second and third values of the array. Run the CAS procedure. Which values appear in the log?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 4 Activity: Iterate Over an Array

Open cs02a07.sas from the activities folder and perform the following tasks:

  1. Review the code. Notice that the array variable named libs is created and assigned three string elements referencing caslib names: casuser, formats, and samples.
  1. Complete the loop to explore each caslib.

  1. Run the program and view the Results tab. Which caslib contains the CARS and HEART tables?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 4 Activity:Find the UNIQUE Built-In Function

  1. In a browser, go to https://support.sas.com/documentation and click SAS Programming: SAS Viya.
  1. In the left navigation pane, select Cloud Analytic Services, and then select CASL Programming.
  1. In the CASL Reference section, select the CASL Built-In Functions section.
  1. Find the Function Categories section and select it. This section lists all available functions for each data type.
  1. In the Array category, find the UNIQUE function and select it. What does the UNIQUE function do?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 5 Activity: Create and Describe a Dictionary

Open cs02a09.sas from the activities folder and perform the following tasks:

  1. In the CAS procedure, create a dictionary named me with the following key value pairs: Name="Your First Name", JobTitle="Your Job Title", Siblings=Number of siblings. Run the step and verify that the procedure ran successfully.

  1. Add a PRINT statement to print the keys and values to the log. Run the step and verify that the dictionary is correct.

  1. Add a DESCRIBE statement below the PRINT statement to describe the dictionary me.

  1. Run the step and view the log. What data type value does the Siblings key hold?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 5 Activity: Access a Dictionary Value

Open cs02a10.sas from the activities folder and perform the following tasks:

  1. Review the code. A dictionary with three keys, name, caslib and where is stored in the tbl variable. Run the step and view the log to verify that the procedure ran successfully.

  1. Below the PRINT statement, add a DESCRIBE statement to view the data structure of the dictionary. Run the step and view the log to verify that the dictionary contains three entries.

  1. Add a PRINT statement to print only the value associated with the where key. Use brackets or the dot operator. View the log. What value is returned from last PRINT statement?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 5 Activity: Use a Dictionary in a CAS Action

Open cs02a11.sas from the activities folder and perform the following tasks to add a key-value pair to a dictionary:

  1. The program creates a dictionary that is stored in the tbl variable. Run the program and view the log to verify that the tbl dictionary has two keys and the table has 428 rows.
  1. Below the tbl assignment statement add the following new key-value pair to the dictionary using either method:

    tbl["where"]="MSRP < 20000";
    or
    tbl.where="MSRP < 20000";

  1. Run the program and view the log. How many rows are in the table?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 5 Activity: Store the Results of a CAS Action

Open cs02a12.sas from the activities folder and perform the following tasks to work with a CAS action result dictionary:

  1. Review the program. The result of the serverStatus action is stored in variable r and the DESCRIBE statement views the data type of r.
  1. Run the program and view the log. Notice the serverStatus action returns a dictionary with three entries, an About dictionary, a server table, and a nodestatus table.

  1. Below the PRINT statement create a new variable named tbl that holds the value of the server key and describe the new tbl variable.

  1. Run the program and view the log. What data type does the server key hold?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 02, Section 5 Activity: Iterate Over a Dictionary

Open cs02a13.sas from the activities folder and perform the following tasks to loop over a dictionary:

  1. Review the program. A dictionary named phones is created.
  1. Below the assignment statement, complete the DO OVER statement to iterate over the dictionary phones using k and v as the variables.

  1. In the first PRINT statement, add the k variable in place of the key comment, and the v variable in place of the value comment.

  1. Run the program and review the results. What is the result of the last PRINT statement?

Lesson 03

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 1 Activity: Identify the Active Caslib

Open cs03a01.sas from the activities folder and perform the following tasks to identify the active caslib:

  1. Use the getSessOpt action from the sessionProp action set to identify the active caslib. Use the name parameter.

  1. Run the CAS procedure. What is the value of the caslib key?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 2 Activity: Explore Available Caslibs

Open cs03a02.sas from the activities folder and perform the following tasks to explore caslibs:

  1. Run the table.caslibInfo action to view all available caslibs. Confirm that a results table was shown with a list of caslib names.
  1. Add the table.fileInfo action to explore all server-side files in the cs caslib.

  1. Add the table.tableInfo action to explore all available in-memory tables in the cs caslib.

  1. Run the program and view the log. How many tables are loaded in-memory?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3 Activity: Load a Server-Side File into Memory

Open cs03a03.sas from the activities folder and perform the following tasks to load a SASHDAT server-side file into memory:

  1. Run the fileInfo action to view all data source files in the cs caslib. Confirm that the orders_clean.sashdat file is available.
  1. Review the loadTable action. Notice that the casOut parameter doesn't include the name subparameter to name the new in-memory table.

  1. In the loadTable action, specify the orders_clean.sashdat file in the path subparameter, and the lib variable in the caslib subparameter. Run the action.

  1. View the results. What is the name of the new in-memory table?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3 Activity: Load a Client-Side File into Memory

Open cs03a04.sas from the activities folder and perform the following tasks to load a SAS table into memory:

  1. In the LOAD statement, specify the sashelp.class table in the DATA= option.
  1. Add the CASOUT= option and name the new in-memory table class.
  1. Add the OUTCASLIB= option and place the new in-memory table in the casuser caslib.
  1. Add the REPLACE option to replace the table if it already exists.
  1. Run the CASUTIL procedure. Confirm that the table was loaded successfully in the casuser caslib.
  1. Run the tableInfo action. How many rows are in the class table?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3 Activity: Promote an In-Memory Table

Open cs03a05.sas from the activities folder and perform the following tasks to promote an in-memory table:

  1. In PROC CASUTIL, the LOAD statement loads sashelp.baseball to an in-memory table named BASEBALL.
  1. Add the PROMOTE option at the end of the LOAD statement.

    ...outcaslib="casuser" replace promote;
  1. Run the procedure. Notice that the program did not run successfully.

  1. Remove the REPLACE option. Rerun the program. Confirm that it ran successfully.

  1. Run the tableInfo action. How many rows are in BASEBALL?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3 Activity: Save an In-Memory Table

Open cs03a06.sas from the activities folder and perform the following tasks to save an in-memory table as a SASHDAT file:

  1. Run the tableInfo action and confirm that the BASEBALL table is loaded in-memory. If it's not, uncomment and run the CASUTIL procedure to load the table into memory.
  1. In the CAS procedure, complete the save action to save the BASEBALL in-memory table to the cs caslib, with the name baseball.sashdat. Run the CAS procedure and confirm that it ran successfully.
  1. View the results. What is the size of the baseball.sashdat file?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3 Activity: Drop an In-Memory Table

Open cs03a07.sas from the activities folder and perform the following tasks to drop an in-memory table:

  1. Run the tableInfo action and confirm that the BASEBALL table is loaded in-memory. If it's not, uncomment and run the CASUTIL procedure to load the table into memory.
  1. In the CAS procedure, complete the dropTable action to drop the BASEBALL in-memory table from the casuser caslib.

  1. Run the procedure and view the results. Is the BASEBALL table available in the casuser caslib?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3

Practice: Load a Server-Side File into Memory

Use the table.loadTable action to load the orders_clean.sashdat file from the cs caslib into memory. Load only the specified rows and columns.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Begin by creating a new SAS program. Use the loadTable action to load the server-side file into memory. Follow these requirements:
    • Load the orders_clean.sashdat file in the cs caslib.
    • Load only the rows where Customer_Group is equal to Orion Club Gold members.
    • Load only the columns Customer_Group, Customer_Type, Postal_Code, RetailPrice and Country.
    • Name the new CAS table gold_members and place it in the cs caslib.

    Hint: Use the where and vars parameters.

    Solution:

    proc cas;
        lib="cs";
    
    * Load the table into memory *;
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            where="Customer_Group='Orion Club Gold members'",
            vars={"Customer_Group","Customer_Type","Postal_Code",
                  "RetailPrice","Country"},
            casOut={name="gold_members",
                    caslib=lib, 
                    replace=TRUE
            };
    quit;

  2. Run the program. View the log to verify that the file was loaded successfully.


  3. Add the table.tableInfo action to view information about the cs caslib.

    Solution:

    proc cas;
        lib="cs";
    
    * Load the table into memory *;
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            where="Customer_Group='Orion Club Gold members'",
            vars={"Customer_Group","Customer_Type","Postal_Code",
                  "RetailPrice","Country"},
            casOut={name="gold_members",
                    caslib=lib, 
                    replace=TRUE
            };
    
    * View the in-memory table *;
        table.tableInfo / caslib=lib; 
    quit;

  4. Run the program and review the results from the tableInfo action. How many rows are in the gold_members table?

    Solution:

    483,438 rows

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 03, Section 3

Practice: Load a Client-Side File into Memory

Use the table.upload action to load the air.txt file from the practices folder into memory. Use the importOptions parameter to modify import options.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Begin by creating a new SAS program. Use the upload action to load the client-side file into memory. Follow these requirements:
    • Load the air.txt file using the path "&path/practices".
    • Name the new CAS table air_upload and place it in the cs caslib.
    • Use importOptions to:
      1. Load the data using the correct vertical bar delimiter.
      2. Name the columns Date and Air, respectively.
      3. Don't modify the data type of the Date column during import. Let SAS import the column as a varchar by default.

    Solution:

    proc cas;
        lib="cs";
        colNames={"Date","Air"};
    
    * Load a client-side file into memory *;
        table.upload / 
            path="&path/practices/air.txt"
            casOut={name="air_upload",caslib=lib,replace=TRUE},
            importOptions={fileType="DELIMITED",
                           delimiter="|",
                           getNames=FALSE,
                           vars=colNames
            };
    quit;

  2. Run the program. View the log to verify that the file was loaded successfully.


  3. Add the table.fetch action to preview the first 20 rows of the new table.

    Solution:

    proc cas;
        lib="cs";
        colNames={"Date","Air"};
    
    * Load a client-side file into memory *;
        table.upload / 
            path="&path/practices/air.txt"
            casOut={name="air_upload",caslib=lib,replace=TRUE},
            importOptions={fileType="DELIMITED",
                           delimiter="|",
                           getNames=FALSE,
                           vars=colNames
            };
    
    * Preview the new in-memory table *;
        table.fetch / table={name="air_upload",caslib=lib};
    quit;

  4. Run the program and review the results from the fetch action. What is the Date value of the first row?

    Solution:

    JAN49

Lesson 04

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1 Activity: View Column Attributes

Open cs04a01.sas from the activities folder and perform the following tasks:

  1. In the PROC CAS step, use the table.columnInfo CAS action to display column information for the heart table in the casuser caslib.

  1. Submit the program and review the results. What is the label for the column named MRW?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1 Activity: Display the First Five Rows of a Table

Open cs04a02.sas from the activities folder and perform the following tasks:

  1. In the PROC CAS step, use the table.fetch CAS action to display only the first five rows of data from the heart table in the casuser caslib where Status='Alive'.

  1. Submit the program and review the results. What is the weight of the patient in the last row of the result?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1 Activity: View the Results of a CAS Action

Open program cs04a03.sas from the activities folder and perform the following tasks:

  1. In the PROC CAS step, the simple.summary CAS action produces a result dictionary named rs.
  1. Add a DESCRIBE statement to describe the contents of rs.

  1. Submit the program. How many items are stored in the dictionary rs? What is the data type of the item named Summary?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1 Activity: Filter a Result Table

Open program cs04a04.sas from the activities folder and perform the following tasks:

  1. In the PRINT statement, add a WHERE operator to fTab to select only rows where Frequency > 2500.

  1. Which analysis variables have a frequency greater than 2500?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1 Activity: Create a Computed Column in a Result Table

Open program cs04a05.sas from the activities folder and perform the following tasks:

  1. In line 16, modify the code so that the SUM function adds up all values in the Frequency column.

  1. Submit the program. What is the Population Percentage for the BP_Status value Optimal?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1 Activity: Save a Result Table as a SAS Data Set

Open program cs04a06.sas from the activities folder and perform the following tasks:

  1. Just before the PROC CAS QUIT statement, add a SAVERESULT statement to save fTab as a SAS data set named work.freq.

  1. Submit the program. What is the last value in the Obs column of the report generated by PROC PRINT?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1

Level 1 Practice: Explore a CAS Table Using Actions

Use a variety of actions to explore a CAS table.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Create a new SAS program.
    • Use the loadTable action to load the sales.sas7bdat server-side file from the cs caslib into memory.
    • Name the new CAS table sales.
    • Load the new table into the cs caslib.

    Solution:

    * Load the sales.sas7bdat server-side file into memory *;
    proc cas;
        tbl={name="sales.sas7bdat",caslib="cs"};
        outTbl={name="sales",caslib="cs"};
    
        table.loadTable /
            path=tbl.name, caslib=tbl.caslib,
            casOut=outTbl || {replace=TRUE};
    quit;

  2. Run the program and verify that the table loaded successfully.


  3. Write code to explore the new sales CAS table:
    • Use table.fetch to retrieve five rows of data.
    • Use table.columnInfo to explore column names and data types.
    • Use simple.numRows to determine the number of rows in the table.
    • Use simple.freq to analyze the Job Title column.

    Solution:

    * Load the sales.sas7bdat server-side file into memory *;
    proc cas;
        tbl={name="sales.sas7bdat",caslib="cs"};
        outTbl={name="sales",caslib="cs"};
    
        table.loadTable /
            path=tbl.name, caslib=tbl.caslib,
            casOut=outTbl || {replace=TRUE};
    
    * Explore the new table *;
        tbl={name="sales", caslib="cs"};
        table.fetch / table=tbl to=5;
        table.columnInfo / table=tbl;
        simple.numRows / table=tbl;
        simple.freq / 
            table=tbl, 
            input="Job Title";
    quit;

  4. Run the program. Review the log and results, and answer the following questions:
    1. How many columns are in the sales table?
    2. How many rows are in the sales table?
    3. How many distinct Job Titles are in the sales table?

    Solution:

    1. 8 columns
    2. 63 rows
    3. 5 distinct Job Titles

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 1

Level 2 Practice: Save CASL Results as a CSV File

Use the table.columnInfo action and the SAVERESULTS statement to create a CSV file data dictionary for the cars in-memory table in the casuser caslib.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Create a new SAS program.
    • Use the columnInfo action to obtain column information about the cars in-memory table and store the results in a variable named ci.
    • Using the results from the columnInfo action, complete the following on the table:
      • Keep only the Column, Type, and Format columns.
      • Add two computed columns to the table with the given values:
              TableName = "cars"
              Caslib="casuser"
    • Save the table variable as a CSV file named cars_data_dictionary.csv in the output folder using the path "&path/output".

    Solution:

    proc cas;
        cars={name="cars", caslib="casuser"};
        table.columnInfo result=ci / table=cars;
        carsDataDict=ci.columnInfo [,{"Column","Type","Format"}]
                         .compute("TableName","Cars")
                         .compute("Caslib","casuser"); 
        saveresult carsDataDict
            csv="&path/output/cars_data_dictionary.csv";
    quit;

  2. Run the program and review the results.


  3. How many rows are in the cars_data_dictionary.csv file?

    Solution:

    16 rows

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 2 Activity: Print Rows of a Result Table

Open program cs04a07.sas from the activities folder and perform the following tasks:

  1. Submit the PROC CAS step and review the Results tab. Note that no column names ending in _Status are listed.
  1. Modify the PRINT statement so that only column names that end in _Status and only Column and NDistinct are included in the report.

  1. Submit the program and review the Results tab.
    How many column names end in _Status?
    How many distinct values are there for Smoking_Status?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 2

Level 1 Practice: Identify Categorical Columns for Analysis

Use the simple.distinct action to identify low-cardinality columns for use as categorical variables in subsequent analyses.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open program cs04p03.sas from the practices folder.
    • Complete the simple.distinct code to produce statistics for heart table columns in the casuser caslib.
    • Store the result from simple.distinct in a variable named rd.

    Solution:

    proc cas;
        heartTbl = {name="heart", caslib="casuser"};
        simple.distinct result=rd/
            table=heartTbl
        ;
    	 print rd.Distinct;
    quit;

  2. Run the entire program and review the results. How many columns have more than 100 missing values?

    Solution:

    5 columns

  3. Modify the PRINT statement to print only results with nDistinct values less than 20 and nMiss values less than 100.

    Solution:

    /* Identify categorical variable columns (distinct count<20 and fewer than 100 missing values); */
    proc cas;
        heartTbl = {name="heart", caslib="casuser"};
        simple.distinct result=rd/
            table=heartTbl
        ;
        print rd.Distinct.where(nDistinct<20 and nMiss<100);
    quit;

  4. Run the program again and review the results.


  5. How many analysis variables meet our selection criteria?

    Solution:

    6 variables

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 2

Level 2 Practice: Identify Continuous Numeric Columns for Analysis

Use the simple.distinct and simple.numrows actions to identify columns containing continuous numerical values for subsequent analyses.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open program cs04p04.sas from the practices folder and review the code.

    Solution:

    /* Identify Continuous Numerical Variables (numeric, distinct count >= 20% of total rows); */
    %include "&path/data/load_orders_raw.sas";
    proc cas;
       ordersTbl = {name="orders_raw", caslib="cs"};
       table.columnInfo result=ri/
          table=ordersTbl
       ;
    /*    describe ri; */
    
    /* c. Add a selection operator to store Column values as an array in varList*/
       varList=ri.ColumnInfo.where(/* Exclude character variables*/
                                      !(Type like '%char%') 
                                   /* Exclude variable names not useful for analysis */
                                  and !(Column like '%ID')
                                  and !(lowcase(Column) like '%date')
                                  and !(lowcase(Column) like 'xy%'))
                                  
       ;
        print varList;
    
    /*  d. Use varList to select inputs for simple.distinct  */
       simple.distinct result=rd/ 
    
    
       ;
       print rd.distinct;
    
    /*  e. Use simple.numRows to find the number of rows in the table */
       simple.numRows result=nr / 
    
       ;
       print ;
    quit;

  2. In section c, on line 23, add a selection operator so that all values of Column are stored in varList as an array.

    Solution:

    varList=ri.ColumnInfo.where(/* Exclude character variables*/
                                   !(Type like '%char%') 
            /* Exclude variable names not useful for analysis */
                               and !(Column like '%ID')
                               and !(lowcase(Column) like '%date')
                               and !(lowcase(Column) like 'xy%'))
                               [,"Column"];
    

  3. Highlight and run lines 1 through 26 of the program. How many column names are in the varList array?

    Solution:

    5 columns: {Quantity,RetailPrice,Cost,Discount,OrderType}

  4. In section d, complete the simple.distinct CAS action code to count distinct values in the orders table only for the columns listed in varList.

    Solution:

    simple.distinct result=rd/ 
           table=ordersTbl,
          inputs=varlist
    ;

  5. Highlight and run lines 1 through 33 of the program. How many distinct values are there for OrderType?

    Solution:

    4 distinct values

  6. In section e, complete the simple.numRows CAS action code to count the number of rows in the orders table, then use the result to print only those rows from the simple.distinct result (rd) where the number of distinct values is greater than or equal to 20% of the total row count. The column names selected can be used as analysis variables in subsequent programs.

    Solution:

    simple.numRows result=nr / 
       table=ordersTbl;
    print rd.distinct.where(NDistinct>=nr.numrows*.2);

  7. Run the entire program and review the results. What are the column names for the analysis variables?

    Solution:

    Cost and RetailPrice

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 3 Activity: Remove Duplicate Rows of a Table

Open program cs04a08.sas from the activities folder and perform the following tasks:

  1. In the PROC CAS step, complete the assignment statement for the variable sourceTbl so that the deduplicate action will read from the heart_raw table in the cs caslib grouping values by the column named ID.

  1. Highlight and submit the code from line one through the QUIT statement.
  1. Review the results. How many duplicate rows were in the heart_raw table?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 3 Activity: Specify the Length of a Computed Column

Open program cs04a09.sas from the activities folder and perform the following tasks:

  1. Highlight from line 1 through the QUIT statement and submit the selected code.
  1. Review the results. Note that one of the cohort values is truncated.
  1. Add the appropriate statement to the SOURCE code block to set the length of cohort to varchar(9).

  1. Submit the modified program. What are the three values of cohort?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 3

Level 1 Practice: Document Invalid Data

Use the table.partition action to read the heart_raw table from the cs caslib and create a new table named heart_badFlags in the casuser caslib. The flag variable badDPB should contain the value 1 to flag a row with out-of-range Diastolic and Systolic column values.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open program cs04p05.sas from the practices folder. Between the SOURCE and ENDSOURCE statements, add statements to create and populate a new variable named badDPB. If the Diastolic value is between 20 and 200, set the value of badDBP to 0, otherwise set it to 1.

    Solution:

    source flagCode;
        if 20 <= Diastolic <= 200  then badDBP=0;
        else badDBP=1;
    endsource;

  2. Complete the table.partition action code so that the SOURCE block is executed and only the AgeAtDeath, Smoking_Status, Diastolic,and badDBP columns are included in the output table.

    Solution:

    table.partition /
        table=inTbl||{vars={"AgeAtDeath", "Smoking_Status",
                            "Diastolic", "badDBP"},
              computedVarsProgram=flagCode}
       ,casout=outTbl||{replace=TRUE}
    ;

  3. Complete the simple.freq action code so that only rows with a badDBP value of 1 and the ageAtDeath value is not missing. Produce frequency counts for only the Diastolic column.

    Solution:

    simple.freq /
        table=outTbl||{where='badDBP=1 and ageAtDeath is not missing'},
        inputs={"Diastolic"}
    ;

  4. Run your program and review the results.

    Solution:

    %include "&path/data/load_heart_raw.sas";
    proc cas;
    inTbl= {name="heart_raw", caslib="cs"};
    outTbl={name="heart_badFlags", caslib="casuser"};
    
    source flagCode;
        if 20 <= Diastolic <= 200  then badDBP=0;
        else badDBP=1;
    endsource;
    
    table.partition /
        table=inTbl||{vars={"AgeAtDeath", "Smoking_Status",
                            "Diastolic", "badDBP"},
              computedVarsProgram=flagCode}
       ,casout=outTbl||{replace=TRUE}
    ;
    simple.freq /
        table=outTbl||{where='badDBP=1 and ageAtDeath is not missing'},
        inputs={"Diastolic"}
    ;
    quit;

  5. How many distinct bad values were identified?

    Solution:

    There was 1 bad result.

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 04, Section 3

Level 2 Practice: Identify and Isolate Duplicates

Use the deduplication.deduplicate action to identify rows in the orders_raw table in the cs caslib. The combination of Customer_ID, Order_ID and Product_ID should provide a unique identifier for every row. Write the unique rows to the orders_clean table and the rejected rows to the orders_dup table in the cs caslib, overwriting the tables if they already exist. Print a list of any duplicate identifiers.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open program cs04p06.sas from the practices folder. In the Variable Definition section, provide values for the id, clean and dups variables.
    • For ID, use the columns Customer_ID, Order_ID, Product_ID
    • Name the clean table orders_clean in the cs caslib. Replace the table if it exists.
    • Name the duplicate table orders_dups in the cs caslib. Replace the table if it exists.
    • Delete the replace subparameter from the tbl variable.
    • Add the groupBy key to the tbl dictionary using the id variable as the value.

    Solution:

    	
    	/* Variable definition */
        id={"Customer_ID","Order_ID","Product_ID"};
        clean={caslib="cs",name="orders_clean",replace=TRUE};
        dups={caslib="cs",name="orders_dups",replace=TRUE};
        delete tbl.replace;
        tbl.groupBy=id;

  2. Add code in the Display Duplicate IDs section to use table.fetch to display the Customer_ID, Order_ID and Product_ID values for any duplicate rows identified.

    Solution:

    /* Display Duplicate IDs (use table.fetch)*/
        delete dups.replace;
        table.fetch /
           table=dups,
           fetchVars=id
        ;

  3. Prior to the fetch action, delete the replace key from the dups variable. Run your program and review the results.

    Solution:

    /* Identify and isolate duplicates in cs.orders_raw */
    proc cas;
        tbl= {name="orders_raw", caslib="cs",replace=TRUE};
        table.loadTable result=rl/
            caslib="cs",
            path="orders_raw.sashdat"
            casout=tbl;
    
    /* Variable definition */
        id={"Customer_ID","Order_ID","Product_ID"};
        clean={caslib="cs",name="orders_clean",replace=TRUE};
        dups={caslib="cs",name="orders_dups",replace=TRUE};
        delete tbl.replace;
        tbl.groupBy=id;
    
    /* Deduplication */
        deduplication.deduplicate / 
            table=tbl,
            casOut=clean,
            duplicateOut=dups,
            noDuplicateKeys=true;
    
    /* Display Duplicate IDs (use table.fetch)*/
        delete dups.replace;
        table.fetch /
           table=dups,
           fetchVars=id
        ;
    quit;

  4. How many duplicate IDs were found?

    Solution:

    There were 2 duplicate IDs.

Lesson 05

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 1 Activity: Find the SCAN Common Function

  1. In a browser, go to https://support.sas.com/documentation and click SAS Programming: SAS Viya.
  1. On the left navigation pane select Cloud Analytic Services, and then select CASL Programming.
  1. In the CASL Reference section, select Common Functions.
  1. Scroll through the common functions and notice the number of functions available.
  1. Find and select the SCAN Function. What does it do?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 2 Activity: Load a Client-Side File into Memory

Open cs05a02.sas from the activities folder and perform the following tasks to load a client-side file into memory:

  1. In the Navigation pane, navigate to CASL01 > data > my_data. Find the cars_raw.sas7bdat file.
  1. In the CAS procedure, complete the path parameter with the path and file name of the cars_raw.sas7bdat file.

    path="&path/data/my_data/cars_raw.sas7bdat"
  1. Review the casOut parameter. Notice the new in-memory table will be named cars_raw and placed in the casuser caslib.
  1. Run the program and view the results. How many rows are in the cars_raw in-memory table?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 2 Activity: Update a Column in Place

Open cs05a03.sas from the activities folder and perform the following tasks to update a column in place:

  1. Run the first CAS procedure to load the heart_raw.sashdat server-side file into memory. View the results and confirm the table was loaded with 5,212 rows.
  1. View the Cause of Death column. Note that many rows contain missing values.
  1. In the second CAS procedure, review the assignment statement for the heart_raw variable. Notice the where key expression.

  1. Complete the set parameter to update the missing rows in the DeathCause column to the value Unknown Cause.

  1. Run the program and check the log. How many rows were updated?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 2 Activity: Format a Computed Column

Open cs05a04.sas from the activities folder and perform the following tasks to load a client-side file into memory:

  1. Submit the first CAS procedure to load the cars_raw.sas7bdat file into memory. Confirm that the table has been loaded with 442 rows.
  1. View the second CAS procedure. Notice that a new calculated column is being created called InvoiceEuro. The new column converts the Invoice dollar amount to Euros. Submit the procedure. View the results and notice a new column has been created without a format.
  1. In the second CAS procedure, add the format=eurox14.2 key value pair in the computedVars subparameter to format the new column.
  1. Submit the program and view the results. What symbol is used in front of the numeric values?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 2 Activity: Convert a Column's Data Type

Open cs05a05.sas from the activities folder and perform the following tasks to convert a column data type:

  1. Submit the DATA step to create a CAS table with two rows and one column.
  1. Submit the first CAS procedure to preview the table and view the column information. Notice the date_char is of char data type.

  1. View the second CAS procedure. It creates a new table named converted with a new column named date_num. Complete the computedVarsProgram sub parameter by entering the following:

    "date_num=inputn(date_char,'anydtdte.');"
  1. Submit the program. In the results, notice that date_char was converted to a SAS numeric date. What is the value of date_num in the first row?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 2

Level 1 Practice: Update a Table in Place

Use the table.update action to consolidate the Region values Eastern Europe and Western Europe into the value Europe in the shoes CAS table.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs05p01.sas from the practices folder and execute the CASUTIL procedure to load the sashelp.shoes SAS data set into memory. Verify that the file loaded successfully.

    Solution:

    * Load the sashelp.shoes table into memory *;
    proc casutil;
    	load data=sashelp.shoes 
             outcaslib="casuser" 
             casout="shoes" 
             replace;
    run;

  2. Execute the CAS procedure to preview the shoes in-memory table. Notice the distinct Region values Eastern Europe and Western Europe.

    Solution:

    * Preview the shoes table *;
    proc cas;
    	shoes={name="shoes", caslib="casuser"};
    
    	table.fetch / 
    		table=shoes,
    		index=FALSE,
    		to=10;
    	simple.freq /
    		table=shoes,
    		inputs={"Region"};
    quit;

  3. Use the update action to replace the values Eastern Europe and Western Europe with the value Europe in the shoes table. Note: There are a variety of solutions available using the update action.

    Solution:

    *******************************;
    * SOLUTION 1                  *;
    *******************************;
    proc cas;
    	shoes={name="shoes", caslib="casuser"};
    
    	source combine;
    		ifc(find(Region,"Europe","i")>0,"Europe",Region);
    	endsource;
    
    	table.update /
    		table=shoes,
    		set={
    		   {var="Region", value=combine}
    		};
    quit;
    
    
    *******************************;
    * SOLUTION 2                  *;
    *******************************;
    proc cas;
    	shoes={name="shoes", 
              caslib="casuser",
    	       where="find(Region,'Europe','i')>0"
         };
    
    	table.update /
    		table=shoes,
    		set={
    		   {var="Region", value="'Europe'"}
    		};
    quit;	

  4. After the update action, add the freq action to view the frequency values of the updated Region column.

    Solution:

    *******************************;
    * SOLUTION 1                  *;
    *******************************;
    proc cas;
    	shoes={name="shoes", caslib="casuser"};
    
    	source combine;
    		ifc(find(Region,"Europe","i")>0,"Europe",Region);
    	endsource;
    
    	table.update /
    		table=shoes,
    		set={
    		   {var="Region", value=combine}
    		};
    
    	simple.freq /
    		table=shoes,
    		inputs={"Region"};
    quit;
    
    
    *******************************;
    * SOLUTION 2                  *;
    *******************************;
    proc cas;
    	shoes={name="shoes", 
                caslib="casuser",
    	       where="find(Region,'Europe','i')>0"
         };
    
    	table.update /
    		table=shoes,
    		set={
    		   {var="Region", value="'Europe'"}
    		};
    
    	simple.freq /
    		table=shoes,
    		inputs={"Region"};
    quit;	

  5. Run the program and review the results of the freq action.


  6. What is the total frequency of Europe?

    Solution:

    93

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 2

Level 1 Practice: Create a New CAS Table

Use the table.partition action to create a new CAS table that creates two new columns: one to determine the days to deliver a product, and another that converts the numeric OrderType value to its associated code. Then write only the specified rows and columns.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs05p02.sas from the practices folder. Execute the CAS procedure to create a new table named orders in the cs caslib from the orders_clean.sashdat file in the cs caslib, and then preview the new table.

    Solution:

    * Load the orders_clean.sashdat file into memory *;
    proc cas;
        orders={name="orders", caslib="cs"};
    
        table.loadtable / 
            path="orders_clean.sashdat", caslib="cs",
            casOut=orders || {replace=TRUE};
    
        table.columnInfo /
            table=orders;
    
        table.fetch /
            table=orders,
            index=FALSE,
            to=10;
    quit;

  2. Create a new SAS program that uses the partition action to create a new table using the orders table in the cs caslib. Use the following requirements.

    1. Add a source block named createColumns to create two new columns, DaysToDeliver and Type using the following statements:

      DaysToDeliver = Delivery_Date - Order_Date;

      length Type $10;
      if OrderType=1 then Type='Retail';
          else if OrderType=2 then Type='Catalog';
          else if OrderType=3 then Type='Internet';
          else Type='Unknown';
      

    2. Write only the following columns to the new table: Continent, DaysToDeliver and Type.

    3. Write only the rows where DaysToDeliver is greater than 3 and Type equals Internet.

    4. Name the new table orders_delivery and place it in the cs caslib.

    5. After the partition action, add the tableInfo action on the cs caslib.

    Solution:

    proc cas;
        source createColumns;
            DaysToDeliver=Delivery_Date - Order_Date;
            length Type $10;
            if OrderType=1 then Type='Retail';
                else if OrderType=2 then Type='Catalog';
                else if OrderType=3 then Type='Internet';
                else Type='Unknown';
        endsource;
    
        table.partition /
            table={name="orders",
                   caslib="cs",
                   computedVarsProgram=createColumns,
                   vars={"Continent","DaysToDeliver", "Type"},
                   where="DaysToDeliver > 3 and Type='Internet'"
            },
            casOut={name="orders_delivery", 
                    caslib="cs", 
                    replace=TRUE
            };
    
        table.tableInfo / caslib="cs";
    quit;

  3. Execute the program and review the results of the tableInfo action.


  4. How many rows and columns are in the orders_delivery table?

    Solution:

    27,818 rows and 3 columns

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 3 Activity: Execute DATA Step Code with an Action

Open cs05a06.sas from the activities folder and perform the following tasks to run DATA step code using the runCode action:

  1. Review the DATA step. Notice that the assignment statement creates a new column named First_Letter that uses the FIRST function to create a column with the first letter of each Make.
  1. Submit the program and view the error in the log.

  1. Replace the FIRST function with the following SUBSTR function to obtain the first letter of each Make:

    First_Letter=substr(Make,1,1);
  1. Submit the program and view the results. What is the value of First_Letter in the first row of the new table?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 3 Activity: Add a Label to a Column

Open cs05a07.sas from the activities folder and perform the following tasks to add a label to a column in an in-memory table:

  1. Submit the first CAS procedure to load and view the column information of the heart_raw table. Notice that there are no labels associated with the Diastolic column.
  1. In the second CAS procedure, complete the empty dictionary in the columns parameter by adding the following:

    {name="Diastolic", label="Diastolic Pressure"}
  1. Submit the program and view the log. What is the label of the Diastolic column?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 3

Level 1 Practice: Impute Missing Values

Use the dataPreprocess.impute action to replace missing values in columns with the MEDIAN values.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs05p03.sas from the practices folder. Run the program to load the cars_raw.sas7bdat table into memory and then preview the new table. Notice that the Cylinders and EngineSize columns contain missing values.

    Solution:

    * Load the cars_raw.sas7bdat file into memory *;
    proc cas;
    	cars={name="cars_raw", caslib="casuser"};
    
    	table.upload / 
    		path="&path/data/my_data/cars_raw.sas7bdat"
    		casOut=cars || {replace=TRUE};
    
    	table.columnInfo /
    		table=cars;
    
    	simple.distinct / table=cars;
    quit;

  2. Use the impute action to complete the following:
    • Replace the missing values of the Cylinders and EngineSize using the MEDIAN method.
    • Create a new table named cars_imputed in the casuser caslib.
    • Copy all of the columns from the input table to the output table.

    Solution:

    proc cas;
    	inTbl={name="cars_raw", caslib="casuser"};
    	outTbl={name="cars_imputed", caslib="casuser"};
    
    	datapreprocess.impute /
    		table=inTbl,
    		casOut=outTbl || {replace=TRUE},
    		inputs={"EngineSize","Cylinders"},
    		methodInterval="MEDIAN",
    		copyAllVars=TRUE;
    quit;

  3. Run the program and review the results of the impute action.


  4. What are the median values of EngineSize and Cylinders?

    How many columns are in the cars_imputed output table?

    Solution:

    EngineSize median value is 3, Cylinders median value is 6.
    There are 14 columns in the output table.

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 3

Level 2 Practice: Transpose a Table

Use the transpose.transpose action to restructure a table so that you can use the new table to create a visualization.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs05p04.sas from the practices folder. Execute the DATA step to create a new table named hr_emps in the casuser caslib. View the new output table. Notice the table structure is in wide format with six rows and seven columns. Note: Do not execute the SGPLOT procedure at the bottom of the program.

    Solution:

    * Create the new CAS table *;
    data casuser.hr_emps;
        call streaminit(999);
        do Department="Human Resources","IT","R & D","Marketing","Analysts","Sales";
            Year0=round(rand('uniform',10,15),1);
            Year1=round(rand('uniform',15,20),1);
            Year2=round(rand('uniform',20,25),1);
            Year3=round(rand('uniform',25,30),1);
            Year4=round(rand('uniform',30,35),1);
            Year5=round(rand('uniform',35,40),1);
            output;
        end;
    run;

  2. Write a new CAS procedure above the SGPLOT procedure that completes the following:
    • Use the transpose action to create a new table in a narrow structure. Follow these requirements:
      • Use the table parameter to group the table by Department and create a new column to use as the ID. Name the new column ID and give it the value TotalEmps.
      • Transpose the Year0Year5 columns.
      • Name the transposed columns Year.
      • Name the new table hr_narrow in the casuser caslib.
      • In the same CAS procedure, add the table.alterTable action to modify the label of the Year column to Hire Year.
      • In the same CAS procedure, execute the fetch and columnInfo actions on the newly created table.


    Solution:

    proc cas;
    	tbl={name="hr_emps", caslib="casuser"};
    	outTbl={name="hr_narrow", caslib="casuser"};
    	
    	transpose.transpose /
    		table=tbl || {groupBy="Department", 
                          computedVarsProgram="ID='TotalEmps'"},
    		id="ID",
    		transpose={"Year0","Year1","Year2","Year3","Year4","Year5"},
    		name="Year",
    		casOut=outTbl || {replace=TRUE};
    
    	table.alterTable /
    		name=outTbl.name,
    		caslib=outTbl.caslib,
    		columns={
    			{name="Year", label="Hire Year"}
    		};
        table.fetch / table=outTbl;
        table.columnInfo / table=outTbl;
    quit;

  3. Run the CAS procedure to create the hr_narrow table. View the log to verify that the table was created successfully.


  4. Execute the SGPLOT procedure to visualize the newly created table.

    Solution:

    * Data Visualization *;
    proc sgplot data=casuser.hr_narrow
                noborder;
    	vbar Department / 
                group=Year 
                groupdisplay=cluster
                response=TotalEmps 
                stat=sum;
    	yaxis label="Total Employees";
    	keylegend / position=top;
    run;

  5. Using the results from the SGPLOT procedure, which Department has the most employees in Year5?

    Solution:

    Analysts

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 05, Section 4 Activity: Execute an SQL Query with an Action

Open cs05a08.sas from the activities folder and perform the following tasks to execute an SQL query:

  1. Run the CAS procedure to create a new table named heart_alive using an SQL query. Notice that an error was returned.

  1. Modify the quotation marks around Alive from single quotation marks to double quotation marks. Rerun the CAS procedure. How many rows are in the new heart_alive table?

Lesson 06

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2 Activity: Find the summary Action

  1. In a browser, go to https://support.sas.com/documentation and click SAS Programming: SAS Viya.
  1. In the Quick Links section, select Actions, Procedures, and Language Elements. Then select Action Sets by Name.
  1. Find and select the Simple Analytics action set. Notice all the available actions for performing basic analytics.
  1. Select Details to view detail information about the action set.
  1. Select Summary. What does the Summary action do?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2 Activity: Execute the summary Action

Open cs06a02.sas from the activities folder and perform the following tasks to generate summary statistics on selected columns:

  1. Run the first CAS procedure and view the first five rows and column information of the heart table. View the Age at Start, Height, and Weight columns in the results.
  1. In the second CAS procedure, specify the following column names as an array of columns for analysis: AgeAtStart, Height, and Weight. Run the program. Notice that all summary statistics are returned for each column.

  1. Add the subSet parameter to return only the MEAN and STD summary statistics. Run the program.
  1. What is the mean value of Height?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2 Activity: Execute the aggregate Action

Open cs06a03.sas from the activities folder and perform the following tasks to perform aggregation on selected columns:

  1. Run the first CAS procedure and view the first twenty rows of the heart table. Find and examine the Cholesterol column.
  1. Run the second CAS procedure to find the Median value of the Cholesterol column using the aggregate action. View the results. Notice the median Cholesterol is 223.
  1. In the varSpecs parameter, add the following code to find the MIN, MEAN and MAX of the Cholesterol column:

    {name="Cholesterol", subSet={"MIN","MEAN","MAX"}}
    
  1. What is the maximum value of Cholesterol?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2 Activity: Execute the freqTab Action

Open cs06a04.sas from the activities folder and perform the following tasks to generate summary statistics on selected columns:

  1. Run the first CAS procedure and view the first five rows and column information of the heart table. View the categorical columns Cholesterol Status, Blood Pressure Stats, and Smoking Status in both result tables.
  1. In the second CAS procedure, notice that the tabulate parameter in the freqTab action specifies three columns in the vars subparameter. Run the procedure. Notice that three two-way frequency reports were created. BP_Status by Smoking_Status for each distinct Chol_Status: Borderline, Desirable, and High.
  1. In the freqTab action, add the tabDisplay parameter with the value LIST. Run the procedure and view the results. How many frequency tables were created?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2

Level 1 Practice: Generate Descriptive Statistics Using Summary

Use the simple.summary action to analyze Cost, RetailPrice and a newly computed column TotalProfit for the orders in-memory table in the cs caslib by Customer_Group.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs06p01.sas from the practices folder. Run the program to load and preview the orders_clean.sashdat table.

    Solution:

    * Load the orders table *;
    proc cas;
        lib="cs";
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            casOut={name="orders",caslib=lib, replace=TRUE};
    quit;
    
    
    * Preview the orders table *;
    proc cas;
        orders={name="orders",caslib="cs"};
        table.columnInfo / table=orders;
        table.fetch / table=orders;
    quit;

  2. Use the summary action to analyze the orders table by completing the following:
    • Group the results by Customer_Group.
    • Create a new computed column named TotalProfit that uses the following formula:
            TotalProfit=RetailPrice-(Cost*Quantity);
    • Use the following columns in the summary action: TotalProfit, RetailPrice, and Quantity.
    • Use the following summary statistics: MIN, MAX, MEAN and SUM.
    • Create a new CAS table named Group_Analysis in the cs caslib from the summary results using the casOut parameter.
    • Use the fetch action to preview the new table.

    Solution:

    * Analyze the orders table *;
    proc cas;
    * Set input and output table information *;
        inputTbl={name="orders",
                  caslib="cs", 
                  groupBy="Customer_Group ",
                  computedVarsProgram="TotalProfit=RetailPrice-(Cost*Quantity);"
        };
    
        outTbl={name="Group_Analysis", caslib="cs"};
    
    * Add a computed column and summarize the table *;
        simple.summary / 
           table=inputTbl,
           inputs={"TotalProfit","RetailPrice","Quantity"},
           subSet={"MIN","MAX","MEAN","SUM"},
           casOut=outTbl || {replace=TRUE};
    
    * View the newly created table *;
        table.fetch / table=outTbl;
    quit;

  3. Run the program and review the results of the fetch action.


  4. What is the total _Sum_ of the TotalProfit column where customer_group equals Orion Club members.

    Solution:

    3,384,750.12

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2

Level 1 Practice: Generate a Crosstabulation Report

Use the simple.crossTab action to create a crosstabulation report to see the total sum of RetailPrice of OrderType by Customer_Group for the orders table in the cs caslib.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs06p02.sas from the practices folder. Run the program to load and preview the orders_clean.sashdat file and view the log to verify that the file loaded successfully.

    Solution:

    * Load the orders table *;
    proc cas;
        lib="cs";
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            casOut={name="orders",caslib=lib, replace=TRUE};
    quit;
    
    
    * Preview the orders table *;
    proc cas;
        orders={name="orders",caslib="cs"};
        table.columnInfo / table=orders;
        table.fetch / table=orders;
    quit;

  2. Create a new SAS program that uses the crosstab action to analyze the orders table by completing the following:
    • Specify OrderType column as the row.
    • Specify Customer_Group as the column.
    • Use the RetailPrice column as the weight.
    • Use SUM as the aggregator.

    Solution:

    * Create the crosstab *;
    proc cas;
        orders={name="orders", caslib="cs"};
        simple.crossTab /
            table=orders,
            row="OrderType",
            col="Customer_Group",
            weight="RetailPrice",
            aggregator="SUM";
    quit;

  3. Run the program and review the results of the crosstab action.


  4. What is the sum of RetailPrice for OrderType 3 for the Orion Club Gold members?

    Solution:

    3167251.66

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2

Level 2 Practice: Generate Multiple Two-Way Frequency Reports

Use the freqTab.freqTab action to create two separate two-way frequency reports. Create one for OrderType by Country, and the other for OrderType by Customer_Group. Use the orders table in the cs caslib.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs06p03.sas from the practices folder. Execute the program to load and preview the orders_clean.sashdat table.

    Solution:

    * Load the orders table *;
    proc cas;
        lib="cs";
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            casOut={name="orders",caslib=lib, replace=TRUE};
    quit;
    
    
    * Preview the orders table *;
    proc cas;
        orders={name="orders",caslib="cs"};
        table.columnInfo / table=orders;
        table.fetch / table=orders;
    quit;

  2. Use the freqTab action to create multiple two-way frequency reports by completing the following:
    • Use the tabulate parameter to specify a two-way frequency between OrderType and Country, and OrderType and Customer_Group. HINT: Use the cross sub parameter.
    • Order the results by descending frequency count.
    • Set the tabDisplay parameter to TRUE.

    Solution:

    ******************;
    * SOLUTION 1     *;
    ******************;
    proc cas;
        orders={name="orders",caslib="cs"};
        freqTab.freqTab /
            table=orders,
            tabulate={
                   {vars={"OrderType"},cross={"Country", "Customer_Group"}}
            },
            order="FREQ",
            tabDisplay="LIST";
    quit;
    
    
    ******************;
    * SOLUTION 2     *;
    ******************;
    proc cas;
        orders={name="orders",caslib="cs"};
        freqTab.freqTab /
            table=orders,
            tabulate={
                   {vars={"OrderType","Country"}},
                   {vars={"OrderType","Customer_Group"}}
            },
            order="FREQ",
            tabDisplay="LIST";
    quit;

  3. Run the program and review the results of the freqTab action.


  4. What is the frequency of OrderType 1 and the Country Denmark?

    What is the frequency of OrderType 3 and Customer_Group Internet/Catalog Customers?

    Solution:

    What is the frequency of OrderType 1 and the Country Denmark? 14133

    What is the frequency of OrderType 3 and Customer_Group Internet/Catalog Customers? 17473

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 2

Level 2 Practice: Generating Descriptive Statistics Using Aggregate

Use the aggregation.aggregate action to find the Median, MIN and MAX of the newly computed column TotalProfit for each OrderType. Use the orders table in the cs caslib.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs06p04.sas from the practices folder. Run the program to load and preview the orders_clean.sashdat table.

    Solution:

    * Load the orders table *;
    proc cas;
        lib="cs";
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            casOut={name="orders",caslib=lib, replace=TRUE};
    quit;
    
    
    * Preview the orders table *;
    proc cas;
        orders={name="orders",caslib="cs"};
        table.columnInfo / table=orders;
        table.fetch / table=orders;
    quit;

  2. Use the aggregate action to analyze the orders table by completing the following:
    • Group the results by OrderType.
    • Create a new computed column named TotalProfit that uses the following formula:
          TotalProfit=RetailPrice-(Cost*Quantity);
    • In the varSpecs parameter, find the Median, MIN and MAX of the TotalProfit column.
    • Create a new CAS table named TotalProfit_Summary in the cs caslib from the aggregate results using the casOut parameter.
    • Preview the new table using the fetch action.

    Solution:

    proc cas;
    * Input table *;
        inTbl={name="orders",
               caslib="cs",
               groupBy="OrderType",
               computedVarsProgram="TotalProfit=RetailPrice-
                                    (Cost*Quantity);"
        };
    
    * Output table *;
        outTbl={name="TotalProfit_Summary", caslib="cs"};
    
    * Use aggregate to analyze the orders table *;
        aggregation.aggregate /
            table=inTbl,
            varSpecs={ 
                {name="TotalProfit", agg="MEDIAN"},
                {name="TotalProfit", subSet={"MIN","MAX"}}
            },
            casOut=outTbl || {replace=TRUE};
    
    * Preview the new table *;
        table.fetch / table=outTbl;
    quit;

  3. Run the program and review the results of the fetch action.


  4. What is the median value (the _TotalProfit_Q2_ column) of OrderType 3?

    Solution:

    What is the median value (the _TotalProfit_Q2_ column) of OrderType 3? 4.6

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 3 Activity: Create a Graph from a CAS Table

Open cs06a05.sas from the activities folder and perform the following tasks to create a bar chart from a CAS table:

  1. Run the DATA step to create a table named products with two columns and 10 million rows. Confirm that the table was created successfully.

  1. Run the CAS procedure to preview the new table. Notice that the table has 10 million rows and contains two columns.
  1. Run the ODS GRAPHICS statements and SGPLOT procedure to create a bar chart using the products CAS table. What error did you receive in the log?

  1. Add the option maxobs=10000000 at the end of the first ODS Graphics statement after the forward slash. Run the ODS GRAPHICS statements and SGPLOT procedure again. What error did you receive in the log?

High-Performance Data Processing with CASL in SAS® Viya®
Lesson 06, Section 3

Level 1 Practice: Create a Bar Chart from a Summarized CAS Table

Save the two-way frequency results table from the freqTab.freqTab action as a SAS data set. Then create a vertical bar chart showing the frequency percent of OrderType for each Customer_Group.

Reminder: If you have not setup the Autoexec file in SAS Studio, open and submit startup.sas first.

  1. Open cs06p05.sas from the practices folder. Run the first CAS procedure to load the orders_clean.sashdat table.

    Solution:

    * Load the orders table *;
    proc cas;
        lib="cs";
        table.loadTable / 
            path="orders_clean.sashdat", caslib=lib,
            casOut={name="orders",caslib=lib, replace=TRUE};
    quit;

  2. Execute the second CAS procedure to execute the freqTab action. View the log and results table.

    Solution:

    * Create the two-way frequency report *;
    proc cas;
        orders={name="orders",caslib="cs"};
    
        freqTab.freqTab result=ft /
            table=orders,
            tabulate={
                   {vars={"OrderType","Customer_Group"}}
            },
            order="FREQ",
            tabDisplay="LIST";
    
        describe ft;
        print ft;
    quit;

  3. In the same CAS procedure, complete the following:
    • Store the table from the Table1.List key in a variable. Keep only the columns OrderType, Customer_Group and Percent.
    • Save the result table variable with the specified columns as a SAS data set named work.freqType.

    Solution:

    * Create the two-way frequency report *;
    proc cas;
        orders={name="orders",caslib="cs"};
    
        freqTab.freqTab result=ft /
            table=orders,
            tabulate={
                   {vars={"OrderType","Customer_Group"}}
            },
            order="FREQ",
            tabDisplay="LIST";
    
        describe ft;
        print ft;
    
    * Store the two way frequency result table in a variable *;
        ftTbl=ft["Table1.List"];
        ftTbl=ftTbl[,{"OrderType","Customer_Group", "Percent"}];
        print ftTbl;
    
    * Save the table variable as a SAS data set *;
        saveresult ftTbl dataOut=work.freqType;
    quit;

  4. Run the CAS procedure and view the log to verify it ran successfully.


  5. After the SAS data set work.freqType has been created, run the SGPLOT procedure to create the vertical bar chart.

    Solution:

    * Create a bar chart *;
    proc sgplot data=work.freqType
                noborder;
        vbar OrderType / 
             group=Customer_Group
             groupdisplay=cluster
             response=Percent
             nooutline;
        keylegend / noborder;
    quit;

  6. Using the results of the SGPLOT procedure, which Customer_Group has the largest frequency percent in OrderType 1?

    Solution:

    Orion Club Gold members