SAS logo

Activities and Practices (with Solutions) for SAS® Programming 2: Data Manipulation Techniques


Course Code: EPG2V2, prepared date: September 13, 2023
Copyright © SAS Institute Inc., Cary, NC, USA. All rights reserved.

 

Setting Up Your Practice Files (REQUIRED)

To complete the activities, demos, and practices in this course, you must access SAS software and set up your practice files.

Using the Hands-On Lab

The Hands-On Lab already has the course files loaded for you. All you have to do is launch the lab and start using SAS Studio.

Instructions for using the Hands-On Lab

Using SAS OnDemand for Academics

Setup instructions for SAS OnDemand for Academics

Using your own SAS software

Start your SAS software. Click Open next to the SAS software interface you have, and follow the instructions to set up your data.

Setup Instructions for SAS Studio

Setup Instructions for SAS Enterprise Guide

Setup Instructions for SAS Windowing Environment


Lesson 01

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 1 Activity

Open p201a03.sas from the activities folder and perform the following tasks:

  1. In this code, the statement to drop EndDate from the output table comes before it is used to calculate values for StormLength. Run the program. Examine the log, results, and output table. Is EndDate is in the output table? Are values generated for StormLength?

  1. Move the DROP statement to the end of the DATA step, just before the RUN statement. Run the program and examine the log, PROC CONTENTS report, and output table. Did the results change?

  1. View the PROC CONTENTS report and notice that the Ocean column has a length of 8. Move the LENGTH statement between the DROP and RUN statements. Run the program and examine the log, PROC CONTENTS report, and output table. Did the results change?

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 1 Activity

Open p201a04.sas from the activities folder and perform the following tasks:

  1. Examine the PUTLOG statements that are in the DATA step.
  1. Add two PUTLOG statements before the RUN statement to print "PDV Before RUN Statement" and write all columns in the PDV to the log. Run the program.

  1. View the log. What is the value of StormLength at the end of the second iteration of the DATA step?

  1. Type NOTE: (use uppercase and include the colon) inside the quotation marks at the beginning of the first PUTLOG statement. Run the program. What changes in the log?
  2. putlog  "NOTE: PDV After SET Statement";
    

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 1 Practice


The following practices offer two different ways to examine DATA step execution. You only have to do one. Open the practice for the software that you are using for this course.

SAS Enterprise Guide: Using the DATA Step Debugger to Examine Execution Steps

SAS Studio (or any SAS software): Using PUTLOG Statements to Examine Execution Steps

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 1

SAS Enterprise Guide: Using the DATA Step Debugger to Examine Execution Steps

Examine the National Park data that is used in most practices. Use the DATA step debugger to follow the steps of execution in a DATA step that reads the np_final table.

Note: This practice must be performed in SAS Enterprise Guide to use the interactive DATA step debugger. If you did not do the first activities in Enterprise Guide, first open and run the libname.sas program.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. In Enterprise Guide, use the Servers list to expand Servers > Local > Libraries > PG2. Double-click np_final to open the table. The table includes one row per US national park. Note that the first row in the table is Cape Krusenstern National Monument.


  2. Become familiar with the following columns in the np_final table:
      •  Region (Alaska, Intermountain, Midwest, National Capital, Northeast, Pacific West, and Southeast)
      •  Type (Monument, Park, Preserve, River, Seashore)
      •  ParkName (full name of national park)
      •  DayVisits (number of daily visitors in 2017)
      •  Campers (number of campers in 2017)
      •  OtherLodging (number of people in other lodging, including cabins and hotels, in 2017)
      •  Acres (total park size in acres)


  3. Open p201p01.sas in the practices folder of the course files. Click the Toggle DATA Step Debugger toolbar button to enable debugging in the program. Click the Debugger icon next to the DATA statement. The DATA Step Debugger window appears.


  4. How many variables are in the PDV? What are the initial values?

    Solution:

    Ten variables are included in the PDV, and all have been initialized to missing. Character variables are blank, and numeric variables are periods. _ERROR_ is 0, and _N_ is 1.

  5. Click the Step execution to next line toolbar button to execute the highlighted SET statement. Recall that the first row of the np_final table is Cape Krusenstern National Monument. Why was the first row not read into the PDV in the first iteration of the DATA step?

    Solution:

    Cape Krusenstern National Monument is not read into the PDV because it does not meet the WHERE statement condition. Kenai Fjords National Park is the first row where Type="PARK".

  6. Click Step execution to next line to step through the remaining statements in the DATA step. Which statements are executable? Which statements are compile-time only?

    Solution:

    Executable: SET, Type= assignment statement, and AvgMonthlyVisitors= assignment statement

    Compile-time only: WHERE, FORMAT, and KEEP

  7. Exit the debugger and run the program to view the output table.

    Note: The DATA step debugger is available by default in other programs. To suppress the debugger icon in the editor, click the Toggle DATA Step Debugger toolbar button.

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 1

SAS Studio: Using PUTLOG Statements to Examine Execution Steps

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p201p02.sas in the practices folder of the course files. Examine the program and answer the following questions:

    • Which statements are compile-time only?
    • What length will be assigned to the variable Size?

    Solution:

    The KEEP, WHERE, and FORMAT statements are compile-time only. Size will have a length of 5. The first time Size occurs in the DATA step, it is assigned a value of Small, which is five characters.

  2. Submit the program and examine the output data. What is the value of Size in the third row? Is the value truncated?

    Solution:

    In the third row, the value of Size is Mediu. The value is truncated.

  3. Modify the program.

    • Resolve the truncation of Size.
    • Read the first five rows from the input table.
    • Add PUTLOG statements to provide the following information in the log:
      • Immediately after the SET statement, write START DATA STEP ITERATION to the log as a color-coded note.
      • After the Type= assignment statement, write the value of Type to the log.
      • At the end of the DATA step, write the contents of the PDV to the log.
    • Submit the program and read the log to examine the messages written during execution.

    Solution:

    data np_parks;
        set pg2.np_final (obs=5);
        putlog "NOTE: START DATA STEP ITERATION"; 
        keep Region ParkName AvgMonthlyVisitors Acres Size;
        length Size $ 6;
        where Type="PARK";
        format AvgMonthlyVisitors Acres comma10.;
        Type=propcase(Type);
        putlog Type=;
        AvgMonthlyVisitors=sum(DayVisits,Campers,OtherLodging)/12;
        if Acres<1000 then Size="Small";
        else if Acres<100000 then Size="Medium";
        else Size="Large";
        putlog _all_;
    run;

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 2 Activity

SAS Studio

  1. Open p201a05b.sas from the activities folder.
  2. Run the program. Observe the values of Year and ProjectedSales written in the log.
  3. How many rows are in the input and output tables?

Enterprise Guide

  1. Open p201a05a.sas from the activities folder.
  2. Use the DATA step debugger to step through one iteration of the DATA step. Observe the values of Year and ProjectedSales as they are updated.
  3. Close the debugger and run the program. Examine the log and output data. How many rows are in the input and output tables?


SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 2 Activity

Open p201a06.sas from the activities folder and perform the following tasks:
Note: If this program is not in your activities folder, copy and paste this code into the code window.

  1. Add an explicit OUTPUT statement after each ProjectedSales assignment statement. Run the program. How many rows are in the output table?

  1. Comment the final OUTPUT statement and run the program again. Are rows where Year=3 written to the new table?

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 2

Level 1 Practice: Conditionally Creating Multiple Output Tables

The pg2.np_yearlytraffic table contains annual traffic counts at locations in national parks. Parks are classified as one of five types: National Monument, National Park, National Preserve, National River, and National Seashore.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p201p03.sas program from the practices folder.

    • Modify the DATA step to create three tables: monument, park, and other.
    • Use the value of ParkType as indicated above to determine which table the row is output to.
    • Drop ParkType from the monument and park tables.
    • Drop Region from all three tables.
    • Submit the program and verify the output. The note in the SAS log indicates how many rows are in each table.

    Solution:

    data monument(drop=ParkType) park(drop=ParkType) other; 
        set pg2.np_yearlytraffic;
        if ParkType = 'National Monument' then output monument;
        else if ParkType = 'National Park' then output park;
        else output other;
        drop Region;
    run;

  2. How many rows are in the park table?

    Solution:

    The park table has 246 rows.

  3. How many columns are in the monument table?

    Solution:

    The monument table has three columns.

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 2

Level 2 Practice: Conditionally Creating Columns and Output Tables

The pg2.np_2017 table contains monthly public use figures for national parks.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Create a new program.

    • Write a DATA step that creates temporary SAS tables named camping and lodging and reads the pg2.np_2017 table.
    • Compute a new column, CampTotal, that is the sum of CampingOther, CampingTent, CampingRV, and CampingBackcountry.
    • Format CampTotal so that values are displayed with commas.
    • The camping table has the following specifications:
      • includes rows if CampTotal is greater than zero
      • contains the ParkName, Month, DayVisits, and CampTotal columns
    • The lodging table has the following specifications:
      • includes rows where LodgingOther is greater than zero
      • contains only the ParkName, Month, DayVisits, and LodgingOther columns
    • Submit the program and verify the output. The notes in the SAS log indicate how many rows are in each table.

    Solution:

    data camping(keep=ParkName Month DayVisits CampTotal)
        lodging(keep=ParkName Month DayVisits LodgingOther);
        set pg2.np_2017;
        CampTotal=sum(of Camping:);
        if CampTotal > 0 then output camping;
        if LodgingOther > 0 then output lodging;
        format CampTotal comma15.;
    run; 

  2. How many rows are in the camping table?

    Solution:

    The camping table has 1374 rows.

  3. How many rows are in the lodging table?

    Solution:

    The lodging table has 383 rows.

SAS® Programming 2: Data Manipulation Techniques
Lesson 01, Section 2

Challenge Practice: Processing Statements Conditionally with SELECT-WHEN Groups

SELECT and WHEN statements can be used in a DATA step as an alternative to IF-THEN statements to process code conditionally.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p201p05.sas from the practices folder. The program contains the solution programs for the Level 1 practice.

    • Use SAS Help or online documentation to read about using SELECT and WHEN statements in the DATA step.
    • Modify the program to use SELECT groups and WHEN statements.
    • Submit the program and examine the output data.

    Solution:

    data monument(drop=ParkType) park(drop=ParkType) other;
        set pg2.np_yearlytraffic;
        select (ParkType);
            when ('National Monument') output monument;
            when ('National Park') output park;
            otherwise output other;
        end;
        drop Region;
    run;

  2. How many rows are in the park table?

    Solution:

    The park table has 246 rows, just as it did before.

  3. How many columns are in the monument table?

    Solution:

    The monument table has three columns, just as it did before.

Lesson 02

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 1 Activity

Open p202a01.sas from the activities folder and perform the following tasks:

  1. Modify the program to retain TotalRain and set the initial value to 0.

  1. Run the program and examine the results. Why are all values for TotalRain missing after row 4?

  1. Change the assignment statement to use the SUM function instead of the plus symbol. Run the program again. Why are the results different?

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 1

Level 1 Practice: Producing a Running Total

The pg2.np_yearlytraffic table contains annual traffic counts at locations in national parks.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the pg2.np_yearlytraffic table. Notice that the Count column records the number of cars that have passed through a particular location.


  2. Open p202p01.sas from the practices folder.

    • Modify the DATA step to create a column, totTraffic, that is the running total of Count.
    • Keep the ParkName, Location, Count, and totTraffic columns in the output table.
    • Format totTraffic so that values are displayed with commas.
    • Submit the program and examine the output data.

    Solution:

    data totalTraffic;
        set pg2.np_yearlytraffic;
        retain totTraffic 0;
        totTraffic=totTraffic+Count;
        keep ParkName Location Count totTraffic; 
        format totTraffic comma12.;
    run;
    
    /*OR*/
    
    data totalTraffic;
        set pg2.np_yearlytraffic;
        totTraffic+Count;
        keep ParkName Location Count totTraffic; 
        format totTraffic comma12.;
    run;

  3. What is the value of totTraffic in row 6?

    Solution:

    1,957,186

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 1

Level 2 Practice: Producing Multiple Totals

The pg2.np_yearlytraffic table contains annual traffic counts at locations in national parks. Parks are classified as one of five types: National Monument, National Park, National Preserve, National River, and National Seashore.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Create a table, parkTypeTraffic, from the pg2.np_yearlytraffic table. Use the following specifications:

    • Read only the rows from the input table where ParkType is National Monument or National Park.
    • Create two new columns named MonumentTraffic and ParkTraffic. The value of each column should be increased by the value of Count for that park type.
    • Format the new columns so that values are displayed with commas.

    Solution:

    data work.parktypetraffic;
        set pg2.np_yearlyTraffic;
        where ParkType in ("National Monument", "National Park");
        if ParkType = 'National Monument' then MonumentTraffic+Count;
        else ParkTraffic+Count;
        format MonumentTraffic ParkTraffic comma15.;
    run;

  2. Create a listing report of parkTypeTraffic.

    • Use Accumulating Traffic Totals for Park Types as the report title.
    • Display the columns in this order: ParkType, ParkName, Location, Count, MonumentTraffic, and ParkTraffic.
    • Submit the program and view the results.

    Solution:

    title 'Accumulating Traffic Totals for Park Types';
    proc print data=work.parktypetraffic;
        var ParkType ParkName Location Count MonumentTraffic 
            ParkTraffic;
    run;
    title;

  3. Which row has the first nonzero value for MonumentTraffic?

    Solution:

    The first nonzero row for MonumentTraffic is row 12, with a value of 8,181.

  4. What is the value of ParkTraffic in row 10?

    Solution:

    1,646,974

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 1

Challenge Practice: Determining Maximum Amounts

The RETAIN statement can be used for purposes other than accumulating columns. Use the pg2.np_monthlytraffic table, which contains monthly traffic counts at locations in national parks. Create new columns that sequentially store the maximum value to date for Count, as well as the corresponding values for Month and Location.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Create a table, cuyahoga_maxtraffic, from the pg2.np_monthlytraffic table. Use the following specifications:

    • Include only rows where ParkName is equal to Cuyahoga Valley NP.
    • Create three columns: TrafficMax, MonthMax, and LocationMax. Initialize TrafficMax to 0.
    • If the current traffic count is greater than the value in TrafficMax, then:
      • set the value of TrafficMax equal to Count
      • set the value of MonthMax equal to Month, and
      • set the value of LocationMax equal to Location
    • Format the Count and TrafficMax columns so that values are displayed with commas.
    • Keep only the Location, Month, Count, TrafficMax, MonthMax, and LocationMax columns in the output table.
    • Submit the program and examine the ouput data.

    Solution:

    data cuyahoga_maxtraffic;
        set pg2.np_monthlyTraffic;
        where ParkName = 'Cuyahoga Valley NP';
        retain TrafficMax 0 MonthMax LocationMax;
        if Count>TrafficMax then do;
            TrafficMax=Count;
            MonthMax=Month;
            LocationMax=Location;
        end;
        format Count TrafficMax comma15.;
        keep Location Month Count TrafficMax MonthMax LocationMax;
    run;

  2. What is the value of TrafficMax in row 4, and why?

    Solution:

    TrafficMax retains the value of 1,447 from the previous row, because the 772 value of Count in row 4 is not greater than the current value of TrafficMax.

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 2 Activity

Open p202a03.sas from the activities folder and perform the following tasks:

  1. Modify the PROC SORT step to sort the rows within each value of Basin by MaxWindMPH. Highlight the PROC SORT step and run the selected code. Which row within each value of Basin represents the storm with the highest wind?

  1. Add the following WHERE statement immediately after the BY statement in the DATA step. The intent is to include only the last row within each value of Basin. Does the program run successfully?

  2. where last.Basin=1;
    
    

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 2 Activity

Open p202a04.sas from the activities folder and perform the following tasks:
Note: If this program is not in your activities folder, copy and paste this code into the code window.

  1. Change the WHERE statement to a subsetting IF statement and submit the program. How many rows are included in the output table?

  1. Move the subsetting IF statement just before the RUN statement and submit the program. How many rows are included in the output table?

  1. Consider the sequence of the statements in the execution phase. Where is the optimal placement of the subsetting IF statement?

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 2 Activity

Open p202a05.sas from the activities folder and perform the following tasks:

  1. Run the program. How many rows are in the output table?

  1. Add a subsetting IF statement to output only the final day of each month. Run the program.

  1. How many rows are in the output table?

SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 2

Level 1 Practice: Generating an Accumulating Column within Groups

The pg2.np_yearlytraffic table contains annual traffic counts at locations in national parks. Park names are grouped into park types.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p202p04.sas from the practices folder. Complete the PROC SORT step to sort the pg2.np_yearlytraffic table by ParkType and ParkName.

    Solution:

    proc sort data=pg2.np_yearlyTraffic   
              out=sortedTraffic(keep=ParkType ParkName Location Count);
        by ParkType ParkName;
    run;

  2. Modify the DATA step as follows:

    • Read the sorted table created in PROC SORT.
    • Add a BY statement to group the data by ParkType.
    • Create a column, TypeCount, that is the running total of Count within each value of ParkType.
    • Format TypeCount so that values are displayed with commas.
    • Keep only the ParkType and TypeCount columns.
    • Submit the program and confirm that TypeCount is reset at the beginning of each ParkType group.

    Solution:

    data TypeTraffic;
        set work.sortedTraffic;
        by ParkType;
        if first.ParkType=1 then TypeCount=0;
        TypeCount+Count;
        format typeCount comma12.;
        keep ParkType TypeCount;
    run;

  3. Examine theTypeTraffic table and answer the following questions:

    • How many rows and columns are in the table?
    • What are the values of ParkType and TypeCount in row 85?


    Solution:

    • 478 rows, 2 columns
    • In row 85, ParkType is National Park and TypeCount is 377,759.


  4. Modify the program to write only the last row for each value of ParkType to the output table. Submit the program and examine the output data.

    Solution:

    data TypeTraffic;
        set work.sortedTraffic;
        by ParkType;
        if first.ParkType=1 then TypeCount=0;
        TypeCount+Count;
        if last.ParkType=1;
        format typeCount comma12.;
        keep ParkType TypeCount;
    run;

  5. Examine theTypeTraffic table and answer the following questions:

    • How many rows and columns are in the table?
    • What are the values of ParkType and TypeCount in row 5?


    Solution:

    • Five rows, two columns
    • In row 5, ParkType is National Seashore and TypeCount is 6,622,359.


SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 2

Level 2 Practice: Generating an Accumulating Column within Multiple Groups

The sashelp.shoes table contains sales information for various products in each region and subsidiary. Numbers for sales and returns are recorded for each row. Create a summary table that includes the sum of Profit for each region and product.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Create a new program.

    • Create a sorted copy of sashelp.shoes that is ordered by Region and Product.
    • Use the DATA step to read the sorted table and create a new table named profitsummary.
    • Create a column named Profit that is the difference between Sales and Returns.
    • Create an accumulating column named TotalProfit that is a running total of Profit within each value of Region and Product.
    • Reset TotalProfit for each new combination of Region and Product.
    • Submit the program and verify that TotalProfit is accurate.

    Solution:

    proc sort data=sashelp.shoes out=sort_shoes;
        by Region Product;
    run;
    
    data profitsummary;
        set sort_shoes;
        by Region Product;
        Profit=Sales-Returns;
        if first.Product then TotalProfit=0;
        TotalProfit+Profit;
        format TotalProfit dollar12.;
    run;

  2. How many rows and columns are in the profitsummary table?

    Solution:

    The profitsummary table has 395 rows and nine columns.


  3. Modify the DATA step.

    • Include only the last row for each Region and Product combination.
    • Keep Region, Product, and TotalProfit.
    • Format TotalProfit as a currency value.
    • Submit the program and examine the output data.

    Solution:

    data profitsummary;
        set sort_shoes;
        by Region Product;
        Profit=Sales-Returns;
        if first.Product then TotalProfit=0;
        TotalProfit+Profit;
        if last.Product=1;
        keep Region Product TotalProfit;
        format TotalProfit dollar12.;
    run;

  4. How many rows and columns are in the profitsummary table?

    Solution:

    The profitsummary table has 80 rows and three columns.


SAS® Programming 2: Data Manipulation Techniques
Lesson 02, Section 2

Challenge Practice: Creating Multiple Output Tables Based on Group Values

The pg2.np_acres table contains acreage amounts for national parks. The park state is also provided. However, some parks span multiple states and therefore have multiple rows of data.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Create a new program.

    • Write a DATA step that creates two tables, singlestate and multistate, from the pg2.np_acres table.
      • The singlestate table includes the rows with unique park names.
      • The multistate table includes the rows with park names that appear in multiple states.
    • The parks should be grouped within their associated regions.
    • When sorting the data, keep only the Region, ParkName, State, and GrossAcres columns.
    • Submit the program and examine the output data.

    Solution:

    proc sort data=pg2.np_acres 
              out=sortedAcres(keep=Region ParkName State GrossAcres);
        by Region ParkName;
    run;
    	
    data multiState singleState;
        set sortedAcres;
        by Region ParkName;
        if First.ParkName=1 and Last.ParkName=1 
            then output singleState;
        else output multiState;
        format GrossAcres comma15.;
    run;

  2. How many rows are in the singlestate table?

    Solution:

    The singlestate table has 367 rows.


  3. In the multistate table, in which two states does the Hovenweep NM value appear for ParkName?

    Solution:

    The park appears in both CO and UT.


Lesson 03

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 1 Activity

Open p203a01.sas from the activities folder and perform the following tasks:

  1. Run the program. Why does the DATA step fail?

  1. Correct the error by overwriting the value of the column Name in uppercase.

  1. Examine the expressions for Mean1, Mean2, and Mean3. Each one is a method for specifying a list of columns as arguments in a function. Run the program and verify that the values in these three columns are the same.
  1. In the expression for Mean2, delete the keyword OF and run the program. What do the values in Mean2 represent?

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 1 Activity

Open p203a02.sas from the activities folder and perform the following tasks:

  1. Examine the program and notice all quiz scores for two students are changed to missing values. Highlight the first DATA step and submit the selected code.

  2. Open SAS Help at http://support.sas.com/documentation.

  3. Click the Programming: SAS 9.4 and Viya link. ​In the Syntax – Quick Links section, click CALL Routines. Use the documentation to read about the CALL MISSING routine.

  4. Simplify the second DATA step by using CALL MISSING to assign missing values for the two students' quiz scores. Run the step.


SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 2 Activity

Open p203a03.sas from the activities folder and perform the following tasks:

  1. Notice that the expressions for WindAvg1 and WindAvg2 are the same. Run the program and examine the output table.

  2. Modify the WindAvg1 expression to use the ROUND function to round values to the nearest tenth (.1).

  1. Add a FORMAT statement to format WindAvg2 with the 5.1 format. Run the program. What is the difference between using a function and a format?

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 2 Activity

Open p203a04.sas from the activities folder and perform the following tasks:

  1. Notice that the INTCK function does not include the optional method argument, so the default discrete method is used to calculate the number of weekly boundaries (ending each Saturday) between StartDate and EndDate.

  2. Run the program and examine rows 8 and 9. Both storms were two days, but why are the values assigned to Weeks different?

  1. Add 'c' as the fourth argument in the INTCK function to use the continuous method. Run the program. Are the values for Weeks in rows 8 and 9 different?

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 2

Level 1 Practice: Using the LARGEST and ROUND Functions

The pg2.np_lodging table contains statistics for lodging from 2010 through 2017. Each column name starts with CL followed by the year. (For example, CL2010 contains the number of nights stayed in 2010 for that park.)

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p203p01.sas program from the practices folder. Submit the PROC PRINT step. Examine the column names and the 10 rows printed from the np_lodging table.


  2. Modify the program.

    • Use the LARGEST function to create three new columns (Stay1, Stay2, and Stay3) whose values are the first, second, and third highest number of nights stayed from 2010 through 2017. Note: Use column list abbreviations to avoid typing each column name.
    • Use the MEAN function to create a column named StayAvg that is the average number of nights stayed for the years 2010 through 2017.
    • Use the ROUND function to round values to the nearest integer.
    • Add a subsetting IF statement to output only rows with StayAvg greater than zero.
    • Submit the DATA step and examine the output data.

    Solution:

    data stays;
        set pg2.np_lodging;
        Stay1=largest(1, of CL:);
        Stay2=largest(2, of CL:);
        Stay3=largest(3, of CL:);
        StayAvg=round(mean(of CL:));
        if StayAvg > 0;
        format Stay: comma11.;
        keep Park Stay:;
    run;

  3. How many rows are in the stays table?

    Solution:

    The stays table has 44 rows.

  4. What is the value of StayAvg in row 10?

    Solution:

    35,551

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 2

Level 2 Practice: Working with Date/Time Values

The pg2.np_hourlyrain table contains hourly rain amounts for the Panther Junction, TX, station located in Big Bend National Park. The DateTime column contains date/time values.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p203p02.sas program from the practices folder. Submit the program and notice that each row includes a datetime value and rain amount. The MonthlyRainTotal column represents a cumulative total of Rain for each value of Month.


  2. Modify the program.

    • Uncomment the subsetting IF statement to continue processing a row only if it is the last row within each month.
    • After the subsetting IF statement, create the following new columns:
      • Date – the date portion of the DateTime column
      • MonthEnd – the last day of the month
    • Format Date and MonthEnd as a date value.
    • Keep only the StationName, MonthlyRainTotal, Date, and MonthEnd columns.
    • Submit the program and examine the output data.

    Solution:

    data rainsummary;
        set pg2.np_hourlyrain;
        by Month;
        if first.Month=1 then MonthlyRainTotal=0;
        MonthlyRainTotal+Rain;
        if last.Month=1;
        Date=datepart(DateTime);
        MonthEnd=intnx('month',Date,0,'end');
        format Date MonthEnd date9.;
        keep StationName MonthlyRainTotal Date MonthEnd;
    run;

  3. What are the values of Date and MonthEnd in row 1?

    Solution:

    Date has a value of 24JAN2017 and MonthEnd is 31JAN2017.

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 2

Challenge Practice: Creating Projected Date Values

The pg2.np_weather table contains weather-related statistics for locations in four national parks. Determine the number of weeks between the first and last snowfall in each park for the 2015-2016 winter season.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p203p03.sas program from the practices folder. The program contains a PROC SORT step that creates the winter2015_2016 table. This table contains rows with dates with some snowfall between October 1, 2015, and June 1, 2016, sorted by Code and Date. Only the Name, Code, Date, and Snow columns are kept.


  2. Modify the DATA step to create the snowforecast table based on the following specifications:

    • Process the data in groups by Code.
    • For the first row within each Code group, create a new column named FirstSnow that is the date of the first snowfall for that code.
    • For the last row within each Code group, do the following:
      • Create a new column named LastSnow that is the date of the last snowfall for that code.
      • Create a new column named WinterLengthWeeks that counts the number of full weeks between the FirstSnow and LastSnow dates.
      • Create a new column named ProjectedFirstSnow that is the same day of the first snowfall for the next year.
      • Output the row to the new table.
    • Be sure to retain the values of FirstSnow in the PDV so that they will be included with the rows that are in the output table.
    • Apply the DATE7. format to the FirstSnow, LastSnow, and ProjectedFirstSnow columns, and drop the Date and Snow columns.
    • Submit the program and examine the output data.

    Solution:

    proc sort data=pg2.np_weather(keep=Name Code Date Snow)
              out=winter2015_2016;
        where date between '01Oct15'd and '01Jun16'd and Snow > 0;
        by Code Date;
    run;
    
    data snowforecast;
        set winter2015_2016;
        retain FirstSnow;
        by Code;
        if first.Code then FirstSnow=Date;
        if last.Code then do;
            LastSnow=Date;
            WinterLengthWeeks=intck('week', FirstSnow, LastSnow, 'c');
            ProjectedFirstSnow=intnx('year', FirstSnow, 1, 'same');
            output;
        end;
        format FirstSnow LastSnow ProjectedFirstSnow date7.;
        drop Snow Date;	
    run;

  3. What is the value of WinterLengthWeeks in Moose, WY?

    Solution:

    22

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3 Activity

Open p203a06.sas from the activities folder and perform the following tasks:

  1. Complete the NewLocation assignment statement to use the COMPBL function to read Location and convert each occurrence of two or more consecutive blanks into a single blank.

  1. Complete the NewStation assignment to use the COMPRESS function with Station as the only argument. Run the program. Which characters are removed in the NewStation column?

  1. Add a second argument in the COMPRESS function to specify the characters to remove. All characters should be enclosed in a single set of quotation marks. Run the program.

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3 Activity

Open p203a07.sas from the activities folder and perform the following tasks:

  1. Notice that the code contains a subsetting IF statement that writes rows to output only if Prefecture is Tokyo. Run the program. How many rows are in the output table?

  1. Either use the DATA step debugger in Enterprise Guide or uncomment the PUTLOG statement to view the values of Prefecture as the step executes. Why is the subsetting IF condition always false?

  1. Modify the program to correct the logic error. Run the program and confirm that four rows are returned.

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3 Activity

Open p203a08.sas from the activities folder and perform the following tasks:

  1. Notice that the assignment statement for CategoryLoc uses the FIND function to search for category within each value of the Summary column. Run the program.
  1. Examine the PROC PRINT report. Why is CategoryLoc equal to 0 in row 1? Why is CategoryLoc equal to 0 in row 15?

  1. Modify the FIND function to make the search case insensitive. Uncomment the IF-THEN statement to create a new column named Category. Run the program and examine the results. How are values for Category generated?

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3 Activity

Open p203a09.sas from the activities folder and perform the following tasks:

  1. Examine the assignment statements that use the CAT and CATS functions to create StormID1 and StormID2. Run the program. How do the two columns differ?

  1. Add an assignment statement to create StormID3 that uses the CATX function to concatenate Name, Season, and Day with a hyphen inserted between each value. Run the program.

  1. Modify the StormID2 assignment statement to insert a hyphen only between Name and Season.

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3

Level 1 Practice: Using the SCAN and PROPCASE Functions

The pg2.np_monthlytraffic table contains monthly traffic statistics for national parks. However, the data has some inconsistencies. There is no column containing park type, and the gate location does not use proper case.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p203p04.sas program from the practices folder. Submit the program and examine the data. Notice that ParkName includes a code at the end of each value that represents the park type. Also notice that some of the values for Location are in uppercase.


  2. Modify the program.

    • Add a LENGTH statement to create a new five-character column named Type.
    • Add an assignment statement that uses the SCAN function to extract the last word from the ParkName column and assigns the resulting value to Type.
    • Add an assignment statement to use the UPCASE and COMPRESS functions to change the case of Region and remove any blanks.
    • Add an assignment statement to use the PROPCASE function to change the case of Location.
    • Submit the program and examine the output data.

    Solution:

    data clean_traffic;
        set pg2.np_monthlytraffic;
        drop Year;
        length Type $ 5;
        Type=scan(ParkName, -1);
        Region=upcase(compress(Region));
        Location=propcase(Location);
    run;

  3. How many rows and columns are in the clean_traffic table?

    Solution:

    The clean_traffic table has 5140 rows and seven columns.


  4. What is the value of Type in row 100?

    Solution:

    NM


SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3

Level 2 Practice: Searching for Character Strings

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p203p05.sas program from the practices folder. Notice that the DATA step creates a table named parks and reads only those rows where ParkName ends with NP.


  2. Modify the DATA step to create or modify the following columns:

    • Use the SUBSTR function to create a new column named Park that reads each ParkName value and excludes the NP code at the end of the string.
      • Note: Use the FIND function to identify the position number of the NP string. That value can be used as the third argument of the SUBSTR function to specify how many characters to read.
    • Convert the Location column to proper case. Use the COMPBL function to remove any extra blanks between words.
    • Use the TRANWRD function to create a new column named Gate that reads Location and converts the string Traffic Count At to a blank.
    • Create a new column named GateCode that concatenates ParkCode and Gate together with a single hyphen between the strings.
    • Submit the program and examine the output data.

    Solution:

    data parks;
        set pg2.np_monthlytraffic;
        where ParkName like '%NP';
        Park=substr(ParkName, 1, find(ParkName,'NP')-2);
        Location=compbl(propcase(Location));
        Gate=tranwrd(Location, 'Traffic Count At ', ' ');
        GateCode=catx('-', ParkCode, Gate);
    run;
    
    proc print data=parks;
        var Park GateCode Month Count;
    run;

  3. What is the value of Park and GateCode in the first row of the PROC PRINT results?

    Solution:

    The value of Park is Acadia and the value of GateCode is ACAD-Sand Beach in the first row of the PROC PRINT results.


SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 3

Challenge Practice: Determining the Maximum Length of a Column

The pg2.np_unstructured_codes table contains a single column whose contents include location codes and names. Create a table that efficiently stores the location code and location name.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open the p203p06.sas program from the practices folder. Submit the program.

    • Examine the output report. Notice that the Column1 column contains raw data with values separated by various symbols. The SCAN function is used to extract the ParkCode and ParkName values.
    • Examine the PROC CONTENTS report. Notice that ParkCode and ParkName have a length of 200, which is the same as Column1.

      Note: When the SCAN function creates a new column, the new column will have the same length as the column listed as the first argument.


  2. Modify the program.

    • The ParkCode column should include only the first four characters in the string.
    • Add a LENGTH statement to define the length of ParkCode as 4.
    • The length for the ParkName column can be optimized by determining the longest string and setting an appropriate length. Create a new column named NameLength that uses the LENGTH function to return the position of the last non-blank character for each value of ParkName.
    • Use a RETAIN statement to create a new column named MaxLength that has an initial value of zero.
    • Use an assignment statement and the MAX function to set the value of MaxLength to the current value of either NameLength or MaxLength, whichever is larger.
    • Use the END= option in the SET statement to create a temporary variable in the PDV named LastRow. LastRow will be zero for all rows until the last row of the table, when it will be 1. Add an IF-THEN statement to write the value of MaxLength to the log if the value of LastRow is 1.
      data parklookup;
          set pg2.np_unstructured_codes end=LastRow;
          ...
          if LastRow=1 then putlog MaxLength=;
      run;
      Note: The statements added to determine the maximum length can be deleted or commented.

    Solution:

    data parklookup;
        set pg2.np_unstructured_codes end=lastrow;
        length ParkCode $ 4;
        ParkCode=scan(Column1, 2, '{}:,"()-');
        ParkName=scan(Column1, 4, '{}:,"()');
        retain MaxLength 0;
        NameLength=length(ParkName);
        MaxLength=max(NameLength,MaxLength);
        if lastrow=1 then putlog MaxLength=;
    run;
    
    proc print data=parklookup(obs=10);
    run;
    
    proc contents data=parklookup;
    run;

  3. Submit the DATA step. Examine the output data to confirm that the MaxLength column sequentially stores the maximum value for NameLength. View the log to determine the last value of MaxLength.


  4. What is the final value of MaxLength?

    Solution:

    The final value of MaxLength is 83.


  5. Modify the LENGTH statement to set the length of ParkName to the maximum length. Submit the program and confirm in the PROC CONTENTS report that the lengths of the new columns are optimized.

    Note: The statements added to determine the maximum length can be deleted or commented.

    Solution:

    data parklookup;
        set pg2.np_unstructured_codes end=lastrow;
        length ParkCode $ 4 ParkName $ 83;
        ParkCode=scan(Column1, 2, '{}:,"()-');
        ParkName=scan(Column1, 4, '{}:,"()');
    /*     retain MaxLength 0; */
    /*     NameLength=length(ParkName); */
    /*     MaxLength=max(NameLength,MaxLength); */
    /*     if lastrow=1 then putlog MaxLength=; */
    run;
    
    proc print data=parklookup(obs=10);
    run;
    
    proc contents data=parklookup;
    run;

  6. Examine the PROC CONTENTS ouptut. What is the length of each character column?

    Solution:

    • Column1 has a length of 200.
    • ParkCode has a length of 4.
    • ParkName has a length of 83.


SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 4 Activity

Open p203a10.sas from the activities folder and perform the following tasks:

  1. Highlight the PROC CONTENTS step and run the selected code. What is the column type of High, Low, and Volume?

  1. Highlight the DATA and PROC PRINT steps and run the selected code. Notice that although High is a character column, the Range column is accurately calculated.
  1. Open the log. Read the note printed immediately after the DATA step.

  1. Uncomment the DailyVol assignment statement and run the program. Is DailyVol created successfully?

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 4 Activity

Open p203a11.sas from the activities folder and perform the following tasks:

  1. Run the program. In the output table, verify that Date2 is created as numeric. Notice that the table contains a character column named Volume.
  1. Add an assignment statement to create a column named Volume2. Use the INPUT function to read Volume using the COMMA12. informat. Run the program and verify that Volume2 is created as a numeric column.

  1. In the assignment statement, change Volume2 to Volume so that you update the value of the existing column.

  1. Run the program and notice that Volume is still character. Why is the assignment statement not changing the column type?

SAS® Programming 2: Data Manipulation Techniques
Lesson 03, Section 4 Activity

Open p203a13.sas from the activities folder and perform the following tasks:

  1. Add to the RENAME= option to rename the input column Date as CharDate.

  2. Add an assignment statement to create a numeric column Date from the character column CharDate. The values of CharDate are stored as 01JAN2018.

  3. Modify the DROP statement to eliminate all columns that begin with Char from the output table.

  4. Run the program and verify that Volume and Date are numeric columns.


Lesson 04

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 1 Activity

Open p204a01.sas from the activities folder and perform the following tasks:

  1. Add a FORMAT statement in the DATA step to format the following values:

    • Date: display three-letter month and four-digit year values
    • Volume: add commas
    • CloseOpenDiff and HighLowDiff: add dollar signs and include two decimal places.

  1. Run the program and verify the formatted values in the PROC PRINT output.
  1. Add a FORMAT statement in the PROC MEANS step to format the values of Date to show only a four-digit year. Run the PROC MEANS step again.

  1. What is the advantage of adding a FORMAT statement to the DATA step versus the PROC step?

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 1 Activity

Open p204a02.sas from the activities folder and perform the following tasks:

  1. In the PROC FORMAT step, modify the second VALUE statement to create a format named HRANGE that has the following criteria:
    • A range of 50 – 57 has a formatted value of Below Average.
    • A range of 58 – 60 has a formatted value of Average.
    • A range of 61 – 70 has a formatted value of Above Average.

  1. In the PROC PRINT step, modify the FORMAT statement to format Height with the HRANGE format.

  1. Run the program and verify the formatted values in the PRINT output.
  1. Why is the Height value for the first row not formatted?

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 1 Activity

Open p204a03.sas from the activities folder and perform the following tasks:

  1. Review the PROC FORMAT step that creates the $REGION format that assigns basin codes into groups. Highlight the step and run the selected code.
  1. Notice the DATA step includes IF-THEN/ELSE statements to create a new column named BasinGroup.
  1. Delete the IF-THEN/ELSE statements and replace them with an assignment statement to create the BasinGroup column. Use the PUT function with Basin as the first argument and $REGION. as the second argument.

  1. Highlight the DATA and PROC MEANS steps and run the selected code. How many BasinGroup values are in the summary report?

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 1

Level 1 Practice: Creating Custom Formats Based on Single Values

The pg2.np_summary table contains public use statistics from the National Park Service. The values of the Reg column represent park region as a code. Create a format that, when applied, displays full descriptive values for the regions with high frequency.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p204p01.sas from the practices folder. Submit the PROC FREQ step and review the output. Notice that regional codes are used, not descriptive values.


  2. Modify the program.

    • Add a VALUE statement to the PROC FORMAT step to create a format named $HIGHREG that defines the descriptive values shown below.

      Code Value
      IM Intermountain
      PW Pacific West
      SE Southeast
      other codes All Other Regions

    • Add a FORMAT statement to the PROC FREQ step so that the $HIGHREG format is applied to the Reg column.
    • Submit the program and examine the output. Verify that the descriptive values for the Reg column are displayed.

    Solution:

    proc format;
        value $highreg 'IM'='Intermountain'
                       'PW'='Pacific West'
                       'SE'='Southeast'
                       other='All Other Regions';
    run;
    
    title 'High Frequency Regions';
    proc freq data=pg2.np_summary order=freq;
        tables Reg;
        label Reg='Region';
        format Reg $highreg.;
    run;
    title;

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 1

Level 2 Practice: Creating Custom Formats Based on a Range of Values

The pg2.np_acres table contains acreage amounts for national parks. Create a format that, when applied, groups acreage amounts into identified categories.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p204p02.sas from the practices folder. Modify the program.

    • Before the DATA step, add a PROC FORMAT step to create a format named PSIZE that categorizes parks based on the gross acres. Use the ranges and values as identified below.

      Range Value
      Less than 10,000 acres Small
      10,000 through less than 500,000 acres Average
      500,000 and more acres Large

    • In the DATA step, add an assignment statement to create a new column named ParkSize. Use the PUT function to create the new column based on the formatted values of GrossAcres.
    • Submit the program and examine the output data.

    Solution:

    proc format;
        value psize low-<10000='Small'
                    10000-<500000='Average'
                    500000-high='Large';
    run;
    
    data np_parksize;
        set pg2.np_acres;
        ParkSize=put(GrossAcres,psize.);
        format GrossAcres comma16.;
    run;

  2. What is the value of ParkSize in row 1?

    Solution:

    Small

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 1

Challenge Practice: Creating Custom Formats Based on Nesting Formats

The pg2.np_weather table contains weather-related statistics for four national park locations. Create a format that, when applied, groups dates into identified categories.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Access the Base SAS® 9.4 Procedures Guide. Find the PROC FORMAT section and the VALUE statement page. Scroll to the bottom of the page to look at examples where existing SAS formats are used for labels in a custom format.


  2. Open p204p03.sas from the practices folder. Modify the program.

    • Add a PROC FORMAT step to create a format named DECADE that categorizes dates as identified below.
      • Dates from January 1, 2000 – December 31, 2009 are displayed with the value 2000-2009.
      • Dates from January 1, 2010 – December 31, 2017 are displayed with the value 2010-2017.
      • Dates from January 1, 2018 – March 31, 2018 are displayed with the value 1st Quarter 2018.
      • Dates from April 1, 2018, and beyond display the actual date value using the MMDDYY10. format.
    • Modify the PROC MEANS step so that the DECADE format is applied to the Date column.
    • Submit the program and examine the output. Verify that the descriptive values for the Date column are displayed.

    Solution:

    proc format;	
        value decade '01Jan2000'd-'31Dec2009'd = '2000-2009'
                     '01Jan2010'd-'31Dec2017'd = '2010-2017'
                     '01Jan2018'd-'31Mar2018'd = '1st Quarter 2018'
                     '01Apr2018'd-high = [mmddyy10.];
    run;
    
    title1 'Precipitation and Snowfall';
    title2 'Note: Amounts shown in inches';
    proc means data=pg2.np_weather maxdec=2 sum mean nonobs;
        where Prcp > 0 or Snow > 0;
        var Prcp Snow;
        class Date Name;
        format Date decade.;
    run;
    title;

  3. Examine the PROC MEANS output. What was the sum and mean Snowfall in Moose, WY, from 2000-2009?

    Solution:

    Sum: 1487.80
    Mean: 1.24


SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 2 Activity

Open p204a04.sas from the activities folder and perform the following tasks:

  1. Run the program to create the $SBFMT and CATFMT formats. View the log to confirm both were output.
  1. Uncomment the PROC FORMAT step at the end of the program. Highlight the step and run the selected code. A report for all formats in the WORK library is generated.
  1. Add the following statement in the last PROC FORMAT step to limit the report to selected formats. Run the step.

  2. select $sbfmt catfmt; 
  1. What is the default length for the $SBFMT format?

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 2 Activity

Open p204a05.sas from the activities folder and perform the following tasks:

  1. In the PROC FORMAT statement, add the LIBRARY= option to save the formats to the pg2.formats catalog.

  1. Run the PROC FORMAT step and verify in the log that the two formats were created in a permanent location.
  1. Before the PROC PRINT step, add an OPTIONS statement so that SAS can find the two permanent formats.

  1. Run the OPTIONS statement and the PROC PRINT step. Are the Registration and Height values formatted?

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 2

Level 1 Practice: Creating a Custom Format from a Table

The pg2.np_monthlytraffic table contains monthly traffic counts at locations in national parks. Create a format that categorizes park codes into their type (for example, National Park, National Seashore, and so on). The pg2.np_codelookup table contains park codes and the associated park types.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p204p04.sas from the practices folder. Submit the PROC MEANS step and review the output. Notice that the traffic statistics are listed by a four-letter park code.


  2. Open the pg2.np_codelookup table. Notice that ParkCode contains the four-letter park code and Type contains the type of park.


  3. Modify the program.

    • In the DATA step, do the following:
      • Add a RENAME= data set option to the SET statement to rename the ParkCode column to Start and the Type column to Label.
      • Add a RETAIN statement before the SET statement to create the FmtName column with a value of $TypeFmt (without a period at the end).
    • In the PROC FORMAT statement, add a CNTLIN= option to build a format from the type_lookup table.
    • In the PROC MEANS step, add a FORMAT statement so that the $TypeFmt format is applied to the ParkCode column.
    • Submit the program and examine the results. Verify that the data is grouped by park types.

    Solution:

    data type_lookup;
        retain FmtName '$TypeFmt';
        set pg2.np_codeLookup(rename=(ParkCode=Start Type=Label));
        keep Start Label FmtName;
    run;
    
    proc format cntlin=type_lookup;
    run;
    
    title 'Traffic Statistics';
    proc means data=pg2.np_monthlyTraffic maxdec=0 mean sum nonobs;
        var Count;
        class ParkCode Month;
        label ParkCode='Name';
        format ParkCode $TypeFmt.;
    run;
    title;

SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 2

Level 2 Practice: Creating a Custom Format from a Table

The pg2.np_species table provides a detailed species list for selected national parks. Create a format that categorizes park codes into regions (for example, Northeast or Intermountain). Use the pg2.np_codelookup table to create a custom format.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p204p05.sas from the practices folder. Modify the first DATA step to create the np_lookup table that will be used to build a custom format.

    • Add a RETAIN statement to create the FmtName column with a value of $RegLbl.
    • Add a RENAME= data set option to the SET statement to rename the ParkCode column to Start.
    • Add conditional statements to create the Label column.
      • The Label column is equal to the Region column unless the region is missing. In that case, the Label column is equal to a value of Unknown.
    • Add a KEEP statement to include the Start, Label, and FmtName columns.
    • Submit the first DATA step and examine the output data.

    Solution:

    data np_lookup;
        retain FmtName '$RegLbl';
        set pg2.np_codeLookup(rename=(ParkCode=Start));
        if Region ne ' ' then Label=Region;
        else Label='Unknown';
        keep Start Label FmtName;
    run;

  2. How many columns are in the np_lookup table? What is the value of FmtName and Label in the first row?

    Solution:

    There are three columns in the np_lookup table. FmtName is $regLbl and Label is Southeast in the first row.


  3. Modify the PROC FORMAT step to read in the np_lookup table.

    Solution:

    proc format cntlin=np_lookup;
    run;

  4. Modify the second DATA step.

    • Create a new column named Region. Use the PUT function to create the new column based on using the $RegLbl format on the ParkCode column.
    • Submit this DATA step and the PROC FREQ step and view the output.

    Solution:

    data np_endanger;
        set pg2.np_species;
        where Conservation_Status='Endangered';
        Region=put(ParkCode,$RegLbl.);
    run;

  5. Which region has the highest number of endangered species?

    Solution:

    Pacific West


SAS® Programming 2: Data Manipulation Techniques
Lesson 04, Section 2

Challenge Practice: Updating a Custom Format by Using the CNTLOUT= Option

The pg2.np_summary table contains public use statistics from the National Park Service. The values of the Type column represent the park type as a code. A format is applied to display descriptive values for the park types.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p204p06.sas from the practices folder. Submit the program and review the results. Notice that some of the park types are still displayed as codes because the custom format does not include a label for those values.


  2. Write a PROC FORMAT step.

    • Use the CNTLOUT= option to create a table named typfmtout from the existing $TypCode format.
    • Submit the step and view the output table.

    The typfmtout table contains several extra columns, but the critical columns for this practice are FmtName, Start, and Label. Notice that the values for FmtName do not include the $ as a prefix.

    Solution:

    proc format cntlout=typfmtout;
        select $TypCode;
    run;

  3. Open the pg2.np_newcodes table. Notice that it contains the format name, the Type values, and the labels in the FmtName, Start, and Label columns.


  4. Write a DATA step.

    • Create a table named typfmt_update by concatenating the output table from PROC FORMAT and the pg2.np_newcodes table.
    • Change the values of FmtName to $TypCode.
    • Keep only the FmtName, Start, and Label columns.
    • Submit this step.

    Solution:

    data typfmt_update;
        set typfmtout pg2.np_newcodes;
        keep FmtName Start Label;
        FmtName='$TypCode';
    run;

  5. Write a PROC FORMAT step.

    • Re-create the $TypCode format using the CNTLIN= option to read the new table that contains the updated format values.
    • Submit this step.

    Solution:

    proc format cntlin=typfmt_update;
    run;

  6. Submit the PROC FREQ step again and verify that all Type codes are displayed with labels.


  7. What is the frequency value of National Preserve?

    Solution:

    The frequency value of National Preserve is 7.

Lesson 05

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 1 Activity

Open p205a01.sas from the activities folder and perform the following tasks:

  1. Notice that the SET statement concatenates the sashelp.class and pg2.class_new2 tables. Highlight the DATA step and run the selected code. What differences do you observe between the first 19 rows and the last 3 rows?

  1. Use the RENAME= data set option to change Student to Name in the pg2.class_new2 table. Highlight the DATA step and run the selected code. What warning is issued in the log?

  1. Highlight the two PROC CONTENTS steps and run the selected code. What is the length of Name in sashelp.class and Student in pg2.class_new2?

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 1

Level 1 Practice: Concatenating Like-Structured Tables

Create a table that contains monthly public use statistics for 2015 and 2016 from the National Park Service.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p205p01.sas from the practices folder. Modify the program.

    • Complete the SET statement to concatenate the pg2.np_2015 and pg2.np_2016 tables to create a new table, np_combine.
    • Use a WHERE statement to include only rows where Month is 6, 7, or 8.
    • Create a new column named CampTotal that is the sum of CampingOther, CampingTent, CampingRV, and CampingBackcountry.
      • Use a column list to specify that all columns beginning with Camping be included as arguments in the SUM function.
      • Format the new column with commas.
    • Write a PROC SORT step to order the np_combine table by ParkCode.
    • Submit the program and examine the output data.

    Solution:

    data work.np_combine;
        set pg2.np_2015 pg2.np_2016;
        CampTotal=sum(of Camping:);
        where Month in(6, 7, 8);
        format CampTotal comma15.;
        drop Camping:;
    run;

  2. How many rows are in the np_combine table?

    Solution:

    The np_combine table has 2208 rows.

  3. Which month in 2016 had the highest number of campers (CampTotal) for ParkCode ACAD (Acadia National Park)?

    Solution:

    Month 8 (August) had the highest number of campers at 52,947.

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 1

Level 2 Practice: Concatenating Unlike-Structured Tables

Create a table that contains monthly public use statistics for 2014, 2015, and 2016 from the National Park Service.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Complete the Level 1 practice or submit the following code:
    data work.np_combine;
        set pg2.np_2015 pg2.np_2016;
        CampTotal=sum(of Camping:);
        where Month in(6, 7, 8);
        format CampTotal comma15.;
        drop Camping:;
    run;

  2. Open the pg2.np_2014 table and compare the column names with the np_combine table. Which column or columns in np_2014 must be renamed to match columns in np_combine?

    Solution:

    The Park and Type columns must be renamed.

  3. Modify the program.

    • Concatenate the pg2.np_2014, pg2.np_2015, and pg2.np_2016 tables.
    • Rename the columns as necessary to align the columns with similar values.
    • In addition to filtering rows by Month, also include only rows where ParkType is National Park.
    • Arrange the newly created table in ascending order by ParkType, ParkCode, Year, and Month.
    • Submit the program and examine the output data.

    Solution:

    data work.np_combine;
        set pg2.np_2014(rename=(Park=ParkCode Type=ParkType))
            pg2.np_2015 
            pg2.np_2016;
        CampTotal=sum(of Camping:);
        where Month in(6, 7, 8) and ParkType="National Park";
        format CampTotal comma15.;
        drop Camping:;
    run;
    
    
    proc sort data=np_combine;
        by ParkType ParkCode Year Month;
    run;

  4. How many rows are in the np_combine table, and what is the value of ParkCode in row 1?

    Solution:

    The np_combine table has 531 rows, and the ParkCode value is ACAD in row 1.

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 2 Activity

Open p205a02.sas from the activities folder and perform the following tasks:

  1. Highlight the two PROC SORT steps and run the selected code. How many rows per Name are in the teachers_sort and test2_sort tables?

  1. Complete the DATA step to merge the sorted tables by Name. Run the DATA step and examine the log and results. How many rows are in the output table?

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 3 Activity

Open p205a04.sas from the activities folder and perform the following tasks:

  1. Modify the final DATA step to create an additional table named storm_other that includes all nonmatching rows.
  1. Drop the Cost column from the storm_other table only.
  1. How many rows are in the storm_other table?

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 3

Level 1 Practice: Performing a One-to-Many Merge

The pg2.np_2016traffic table contains monthly traffic statistics from the National Park Service for parks. Create a table that contains the monthly traffic statistics from the pg2.np_2016traffic table and adds a column for the park name. Park name values can be found in the matching pg2.np_codelookup table.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p205p03.sas from the practices folder. Submit the two PROC SORT steps. Determine the name of the common column in the sorted tables.

    Solution:

    The common column is ParkCode.

  2. Modify the program.

    • Modify the second PROC SORT step to use the RENAME= option after the pg2.np_2016traffic table to rename Code to ParkCode. Note: You could also rename the column in the DATA step after the table in the MERGE statement.
    • Modify the BY statement to sort by the new column name.
    • Write a DATA step to merge the sorted tables by the common column to create a new table, work.trafficstats.
    • Drop the Name_Code column from the output table.
    • Submit the program and examine the work.trafficstats table.

    Solution:

    proc sort data=pg2.np_codelookup out=work.codesort;
        by ParkCode;
    run;
    
    proc sort data=pg2.np_2016traffic(rename=(Code=ParkCode)) 
              out=work.traf2016Sort;
        by ParkCode month;
    run;
    
    data work.trafficstats;
        merge work.traf2016Sort 
              work.codesort;
        by ParkCode;
        drop Name_Code;
    run;

  3. How many rows are in the trafficstats table, and what is the value of ParkCode in row 1?

    Solution:

    The trafficstats table has 2980 rows, and the ParkCode value is ABLI in row 1.

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 3

Level 2 Practice: Writing Matches and Nonmatches to Separate Tables

The pg2.np_2016 table contains monthly public use statistics from the National Park Service for parks by ParkCode. The pg2.np_codelookup table contains the full name for each ParkCode value. Create a table, parkStats, that contains all park codes found in the np_2016 table. Create a second table, parkOther, that contains ParkCode values in the np_codelookup table, but not in the np_2016 table.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Determine the name of the common column in the pg2.np_codelookup and pg2.np_2016 tables.

    Solution:

    The common column is ParkCode.

  2. Create a new program.

    • Ensure that the data in both tables is sorted by the matching column.
    • Using a DATA step, merge the pg2.np_codelookup and pg2.np_2016 tables to create two new tables:
      • The work.parkStats table should contain only ParkCode values that are in the np_2016 table.
      • Keep only the ParkCode, ParkName, Year, Month, and DayVisits columns.
      • The work.parkOther table should contain all other rows.
      • Keep only the ParkCode and ParkName columns.
    • Submit the program and examine the output data.

    Solution:

    proc sort data=pg2.np_CodeLookup
              out=work.sortedCodes;
        by ParkCode;
    run;
    
    proc sort data=pg2.np_2016
              out=work.sorted_code_2016;
        by ParkCode;
    run;
    
    data work.parkStats(keep=ParkCode ParkName Year Month DayVisits)
         work.parkOther(keep=ParkCode ParkName);
        merge work.sorted_code_2016(in=inStats) work.sortedCodes;
        by ParkCode;
        if inStats=1 then output work.parkStats;
        else output work.parkOther;
    run;

  3. How many rows are in the parkStats table?

    Solution:

    The parkStats table has 4416 rows.

  4. How many rows are in the parkOther table?

    Solution:

    Tthe parkOther table has 347 rows.

SAS® Programming 2: Data Manipulation Techniques
Lesson 05, Section 3

Challenge Practice: Combining Multiple Tables with Different Matching Columns

Merge the pg2.np_codelookup, pg2.np_final, and pg2.np_species tables to create a table that contains information about the common birds found at locations that have more than 5,000,000 visitors a year.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p205p05.sas from the practices folder. The first three steps sort and merge the pg2.np_codelookup and pg2.np_final tables. Submit the first two PROC SORT steps and the DATA step and examine the highuse table.


  2. How many rows and columns are in the highuse table? What are the column names?

    Solution:

    There are 713 rows and two columns in the highuse table. The column names are ParkName and ParkCode.


  3. Add a subsetting IF statement in the DATA step to output only the rows in which DayVisits is greater than or equal to 5,000,000. Submit the DATA step. Why must you use IF instead of a WHERE statement? How many rows are in the highuse table now?

    Solution:

    proc sort data=pg2.np_CodeLookup out=sortnames(keep=ParkName ParkCode);
        by ParkName;
    run;
    
    proc sort data=pg2.np_final out=sortfinal;
        by ParkName;
    run;
    
    data highuse(keep=ParkCode ParkName);
        merge sortfinal sortnames;
        by ParkName;
        if DayVisits ge 5000000;
    run;
    You must use a subsetting IF statement because the DayVisits column is in only one of the tables in the MERGE statement. Now the highuse table has three rows.


  4. Submit the final PROC SORT step to sort and subset the pg2.np_species table. Compare the columns in the output birds table with the highuse table to determine the matching column. Which column is in both tables?

    Solution:

    The ParkCode column is in both tables.


  5. Add a PROC SORT step to sort the highuse table by the matching column in the birds table. What is the value of ParkCode in the first row of the highuse table?

    Solution:

    proc sort data=highuse;
        by ParkCode;
    run;
    ParkCode is GRCA in the first row.


  6. Add a DATA step to merge the highuse and birds tables and create a table named birds_largepark. Include in the output table only ParkCode values that are in the highuse table.

    Solution:

    data work.birds_largepark;
        merge birds highuse(in=inPark);
        by ParkCode;
        if inPark=1;
    run;

  7. How many rows and columns are in the birds_largepark table?

    Solution:

    There are 46 rows and five columns in the birds_largepark table.


Lesson 06

SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 1 Activity

The DATA step below calculates the amount in savings when 200 dollars is added each month. Notice that this code does not read any data, so there is only one iteration of the DATA step, but the DO loop with the OUTPUT statement creates 12 rows, corresponding to 12 months of savings. At month 12, the savings is 2,400 dollars.

data YearlySavings;
    Amount=200;
    do Month=1 to 12;
       Savings+Amount;
       output;
    end;
    format Savings 12.2;
run;

Let's add interest to these amounts. Open p206a01.sas from the activities folder and perform the following tasks:

  1. In the DATA step, add the following sum statement after the Savings sum statement to add 2% interest compounded monthly:

  2. Savings+(Savings*0.02/12);
  1. Run the program. How much is in savings at month 12?

  1. Delete the OUTPUT statement and run the program again.
  1. How many rows are created?

  1. What is the value of Month?

  1. What is the value of Savings?

SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 1

Level 1 Practice: Using Nested Iterative DO Loops (DATA Step with No SET Statement)

Determine the value of a retirement account after six years based on an annual investment of $10,000 and a constant annual interest rate of 7.5%.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p206p01.sas from the practices folder. Modify the program.

    • Add an iterative DO loop around the sum statement for Invest.
      • Add a DO statement that creates the column Year with values ranging from 1 to 6.
      • Add an OUTPUT statement to show the value of the retirement account for each year.
      • Add an END statement.
    • Submit the program and examine the results.

    Solution:

    data retirement;
        do Year = 1 to 6;
           Invest+10000;
           output;
        end;
    run;
    
    title1 'Retirement Account Balance per Year';
    proc print data=retirement noobs;
        format Invest dollar12.2;
    run;
    title;

  2. What is the Invest value for Year 6?

    Solution:

    $60,000.00


  3. Add an inner iterative DO loop between the sum statement and the OUTPUT statement to include the accrued quarterly compounded interest based on an annual interest rate of 7.5%.

    • Add a DO statement that creates the column Quarter with values ranging from 1 to 4.
    • Add a sum statement to add the accrued interest to the Invest value.
      Invest+(Invest*(.075/4));
    • Add an END statement.
    • Submit the program and examine the results.

    Solution:

    data retirement;
        do Year = 1 to 6;
           Invest+10000;
           do Quarter = 1 to 4;
              Invest+(Invest*(.075/4));
           end;
           output;
        end;
    run;
    
    title1 'Retirement Account Balance per Year';
    proc print data=retirement noobs;
        format Invest dollar12.2;
    run;
    title;

  4. What is the Invest value for Year 6? Why is the Quarter value 5 in every row?

    Solution:

    $78,449.27

    The stop value for the inner DO loop is 4, so the loop will terminate when Quarter is equal to 5. The OUTPUT statement executes after the inner loop stops, so the value of Quarter is 5 in every output row.


  5. Drop the Quarter column. Submit the program and review the results. Did the Invest value for Year 6 change?

    Solution:

    data retirement;
        do Year = 1 to 6;
           Invest+10000;
           do Quarter = 1 to 4;
              Invest+(Invest*(.075/4));
           end;
           output;
        end;
        drop Quarter;
    run;
    
    title1 'Retirement Account Balance per Year';
    proc print data=retirement noobs;
        format Invest dollar12.2;
    run;
    title;
    No, the value of Invest did not change when Quarter was dropped.


SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 1

Level 2 Practice: Using an Iterative DO Loop (DATA Step with a SET Statement)

The pg2.np_summary table contains public use statistics from the National Park Service. The Pacific West region is anticipating the number of recreational day visitors to increase yearly by 5% for national monuments and 8% for national parks. Show the forecasted number of recreational day visitors for each park for the next five years.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p206p02.sas from the practices folder. Submit the program and examine the results. Notice that the initial program is showing the forecasted value for the next year. The next year is based on adding one year to the year value of today's date.


  2. Modify the program.

    • Add an iterative DO loop around the conditional IF-THEN statements.
      • The DO loop needs to iterate five times.
      • In the DO statement, a new column named Year needs to be created that starts at the value of NextYear and stops at the value of NextYear plus 4.
      • A row needs to be created for each year.
    • Modify the KEEP statement to keep the column Year instead of NextYear.
    • Submit the program and examine the results.

    Solution:

    data ForecastDayVisits;  
        set pg2.np_summary;
        where Reg='PW' and Type in ('NM','NP');
        ForecastDV=DayVisits;
        NextYear=year(today())+1;
        do Year = NextYear to NextYear+4;
           if Type='NM' then ForecastDV=ForecastDV*1.05;
           if Type='NP' then ForecastDV=ForecastDV*1.08;
           output;
        end;
        format ForecastDV comma12.;
        label ForecastDV='Forecasted Recreational Day Visitors';
        keep ParkName DayVisits ForecastDV Year;
    run;
    
    proc sort data=ForecastDayVisits;
        by ParkName;
    run;
    
    title 'Forecast of Recreational Day Visitors for Pacific West';
    proc print data=ForecastDayVisits label;
    run;
    title;

  3. How many rows are in the ForecastDayVisits table?

    Solution:

    The ForecastDayVisits table has 105 rows.

  4. (Optional) Modify the OUTPUT statement to be a conditional statement that outputs only on the fifth iteration. Submit the program and review the results.

    Solution:

    data ForecastDayVisits;  
        set pg2.np_summary;
        where Reg='PW' and Type in ('NM','NP');
        ForecastDV=DayVisits;
        NextYear=year(today())+1;
        do Year = NextYear to NextYear+4;
           if Type='NM' then ForecastDV=ForecastDV*1.05;
           if Type='NP' then ForecastDV=ForecastDV*1.08;
           if Year=NextYear+4 then output;
        end;
        format ForecastDV comma12.;
        label ForecastDV='Forecasted Recreational Day Visitors';
        keep ParkName DayVisits ForecastDV Year;
    run;
    
    proc sort data=ForecastDayVisits;
        by ParkName;
    run;
    
    title 'Forecast of Recreational Day Visitors for Pacific West';
    proc print data=ForecastDayVisits label;
    run;
    title; 

  5. How many rows are in the output table?

    Solution:

    There are 21 rows in the output table.

SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 1

Challenge Practice: Using an Iterative DO Loop with a List of Values

The sashelp.cars table contains information about cars including Make, Model, MPG_City, and MPG_Highway. Forecast each car’s projected fuel efficiency over the next five years, assuming a three percent increase per year.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p206p03.sas from the practices folder. Modify the program. Add a DO loop to the DATA step to produce the following results. The MPG value is increasing by three percent per year.

    Projected Fuel Efficiency with 3% Annual Increase
    Obs Make Model Year MPG
    1
    Acura
    MDX
    1
    20.6
    2
    Acura
    MDX
    2
    21.2
    3
    Acura
    MDX
    3
    21.9
    4
    Acura
    MDX
    4
    22.5
    5
    Acura
    MDX
    5
    23.2
    6
    Acura
    RSX Type S 2dr
    1
    28.3
    7
    Acura
    RSX Type S 2dr
    2
    29.2


    Solution:

    data IncMPG;
        set sashelp.cars;
        MPG=mean(MPG_City, MPG_Highway);
        do Year=1 to 5;
            MPG=MPG*1.03;
            output;
        end;
    run;
    
    title 'Projected Fuel Efficiency with 3% Annual Increase';
    proc print data=IncMPG;
        var Make Model Year MPG;
    	 format MPG 4.1;
    run;
    title;
    

  2. Submit the program and verify the results.


  3. Modify the DO statement to produce the following results. The DO statement will now be based on a list of values instead of a value that is incremented.

    Projected Fuel Efficiency with 3% Annual Increase
    Obs Make Model Year MPG
    1
    Acura
    MDX
    Year 1
    20.6
    2
    Acura
    MDX
    Year 2
    21.2
    3
    Acura
    MDX
    Year 3
    21.9
    4
    Acura
    MDX
    Year 4
    22.5
    5
    Acura
    MDX
    Year 5
    23.2
    6
    Acura
    RSX Type S 2dr
    Year 1
    28.3
    7
    Acura
    RSX Type S 2dr
    Year 2
    29.2


    Solution:

    data IncMPG;
        set sashelp.cars;
        MPG=mean(MPG_City, MPG_Highway);
        do Year='Year 1', 'Year 2', 'Year 3', 'Year 4', 'Year 5';
            MPG=MPG*1.03;
            output;
        end;
    run;
    
    title 'Projected Fuel Efficiency with 3% Annual Increase';
    proc print data=IncMPG;
        var Make Model Year MPG;
        format MPG 4.1;
    run;
    title;
    

  4. Submit the program and verify the results.


SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 2 Activity

Open p206a02.sas from the activities folder and perform the following tasks:

  1. Run the program and view the Savings3K table.
  1. How many months until James exceeds 3000 in savings? 

  1. How much savings does James have at that month? 

  1. Change the DO UNTIL statement to a DO WHILE statement and modify the expression to produce the same results.

  1. Run the program and view the Savings3K table.
  1. Are the results for James identical with the DO WHILE as compared to the DO UNTIL? 

SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 2

Level 1 Practice: Using a Conditional DO Loop

The pg2.np_summary table contains public use statistics from the National Park Service. The Northeast region has seen an increase in visitors at its national monuments that previously experienced low visitation. Determine the number of years it will take for the number of visitors to exceed 100,000, assuming an annual 6% increase.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p206p04.sas from the practices folder. Submit the program and examine the results. Notice that the first two monuments are not near 100,000 visitors, but the third monument is near 100,000 after one year with a 6% increase.


  2. Modify the program.

    • Add a conditional DO loop around the assignment statement where IncrDayVisits is being increased by 6%.
      • Add a DO UNTIL statement that executes until the value of IncrDayVisits exceeds 100,000.
      • Add an OUTPUT statement to show the increased values per each iteration.
      • Add an END statement.
    • Submit the program and examine the results.

    Solution:

    data IncreaseDayVisits;  
        set pg2.np_summary;
        where Reg='NE' and DayVisits<100000;
        IncrDayVisits=DayVisits;
        do until (IncrDayVisits>100000);
           IncrDayVisits=IncrDayVisits*1.06;
           output;
        end;
        format IncrDayVisits comma12.;
        keep ParkName DayVisits IncrDayVisits;
    run;
    
    proc sort data=IncreaseDayVisits;
        by ParkName;
    run;
    
    title1 'Years Until Northeast National Monuments Exceed 100,000 Visitors';
    title2 'Based on Annual Increase of 6%';
    proc print data=IncreaseDayVisits label;
        label DayVisits='Current Day Visitors'
              IncrDayVisits='Increased Day Visitors';
    run;
    title;

  3. How many rows are in the IncreaseDayVisits table?

    Solution:

    The IncreaseDayVisits table has 41 rows.

  4. Modify the program.

    • Within the DO loop, add a sum statement to add 1 to the value of Year.
      Year+1;
    • Before the DO loop, add an assignment to set the Year to 0.
    • Add Year to the KEEP statement.
    • Submit the program and examine the results.

    Solution:

    data IncreaseDayVisits;  
        set pg2.np_summary;
        where Reg='NE' and DayVisits<100000;
        IncrDayVisits=DayVisits;
        Year=0;
        do until (IncrDayVisits>100000);
           Year+1;
           IncrDayVisits=IncrDayVisits*1.06;
           output;
        end;
        format IncrDayVisits comma12.;
        keep ParkName DayVisits IncrDayVisits Year;
    run;
    
    proc sort data=IncreaseDayVisits;
        by ParkName;
    run;
    
    title1 'Years Until Northeast National Monuments Exceed 100,000 Visitors';
    title2 'Based on Annual Increase of 6%';
    proc print data=IncreaseDayVisits label;
        label DayVisits='Current Day Visitors'
              IncrDayVisits='Increased Day Visitors';
    run;
    title;

  5. How many years did it take until the number of visitors exceeded 100,000 for each national monument?

    Solution:

    African Burial Ground National Monument: 14 years
    Booker T. Washington National Monument: 25 years
    Fort Stanwix National Monument: 2 years

  6. Remove the OUTPUT statement. Submit the program and view the results. The number for Year should match the numbers that you specified above.


  7. (Optional) Modify the DO UNTIL statement to be a DO WHILE statement that produces the same results. Submit the program and verify the results.

    Solution:

    data IncreaseDayVisits;  
        set pg2.np_summary;
        where Reg='NE' and DayVisits<100000;
        IncrDayVisits=DayVisits;
        Year=0;
        do while (IncrDayVisits<=100000);
           Year+1;
           IncrDayVisits=IncrDayVisits*1.06;
        end;
        format IncrDayVisits comma12.;
        keep ParkName DayVisits IncrDayVisits Year;
    run;
    
    proc sort data=IncreaseDayVisits;
        by ParkName;
    run;
    
    title1 'Years Until Northeast National Monuments Exceed 100,000 Visitors';
    title2 'Based on Annual Increase of 6%';
    proc print data=IncreaseDayVisits label;
        label DayVisits='Current Day Visitors'
              IncrDayVisits='Increased Day Visitors';
    run;
    title; 

SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 2

Level 2 Practice: Using an Iterative and Conditional DO Loop

The pg2.eu_sports table contains European Union trade amounts for sport products. Belgium wants to see their exports exceed their imports for golf and racket products. They expect to annually increase exports by 7% and want to achieve their goal within 10 years.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p206p05.sas from the practices folder. Submit the program and examine the results. Notice that the golf export number is farther from the golf import number as compared to the racket export and import numbers.


  2. Modify the program.

    • Add a conditional DO loop around the assignment statement for Amt_Export.
      • Use a DO WHILE statement that executes while the export value is less than or equal to the import value.
      • Create a Year column that increments by a value of 1.
      • Create a row of output for each year.
    • Submit the program and examine the results.

    Solution:

    data IncrExports;
        set pg2.eu_sports;
        where Year=2015 and Country='Belgium' 
              and Sport_Product in ('GOLF','RACKET');
        do while (Amt_Export<=Amt_Import);
           Year+1;
           Amt_Export=Amt_Export*1.07;
           output;
        end;
        format Amt_Import Amt_Export comma12.;
    run; 
    
    title 'Belgium Golf and Racket Products - 7% Increase in Exports'; 
    proc print data=IncrExports;
        var Sport_Product Year Amt_Import Amt_Export;
    run;
    title;

  3. How many years did it take until the exports exceeded the imports, and what is the final Year value for each sport product?

    Solution:

    GOLF, 14 years, 2029
    RACKET, 4 years, 2019

  4. Modify the program.

    • Modify the DO statement to include an iterative portion before the conditional portion. The iterative portion needs to be based on Year values of 2016 to 2025 (10 years).
    • Within the DO loop, delete any statements related to the incrementing of Year.
    • Submit the program and review the results. The results show 14 data rows.

    Solution:

    data IncrExports;
        set pg2.eu_sports;
        where Year=2015 and Country='Belgium' 
              and Sport_Product in ('GOLF','RACKET');
        do Year=2016 to 2025 while (Amt_Export<=Amt_Import);
           Amt_Export=Amt_Export*1.07;
           output;
        end;
        format Amt_Import Amt_Export comma12.;
    run; 
    
    title 'Belgium Golf and Racket Products - 7% Increase in Exports'; 
    proc print data=IncrExports;
        var Sport_Product Year Amt_Import Amt_Export;
    run;
    title;

  5. Complete this table based on your last modification:

    Sport_Product Number of Years Final Year Do Exports
    exceed Imports?
    GOLF      
    RACKET      

    Solution:

    Sport_Product Number of Years Final Year Do Exports
    exceed Imports?
    GOLF
    10
    2025
    No
    RACKET
    4
    2019
    Yes

  6. Delete the OUTPUT statement. Submit the program and examine the results. Do these Year values equal the final Year values before deleting this statement? Why or why not?

    Solution:

    No, the Year values do not equal the final Year values before deleting the OUTPUT statement. Output happens after the DO loop due to the implicit OUTPUT. The Year column is incremented at the bottom of the DO loop before checking the DO WHILE condition at the top of the loop.

  7. (Optional) Include a conditional OUTPUT statement within the DO loop that will show the two rows of output with the Year values equal to the final Year values before deleting the OUTPUT statement. Submit the program and verify the results.

    Solution:

    data IncrExports;
        set pg2.eu_sports;
        where Year=2015 and Country='Belgium' 
              and Sport_Product in ('GOLF','RACKET');
        do Year=2016 to 2025 while (Amt_Export<=Amt_Import);
           Amt_Export=Amt_Export*1.07;
           if Year=2025 or Amt_Export>Amt_Import then output;
        end;
        format Amt_Import Amt_Export comma12.;
    run; 
    
    title 'Belgium Golf and Racket Products - 7% Increase in Exports'; 
    proc print data=IncrExports;
        var Sport_Product Year Amt_Import Amt_Export;
    run;
    title; 

SAS® Programming 2: Data Manipulation Techniques
Lesson 06, Section 2

Challenge Practice: Controlling Execution of DO Loop Statements with CONTINUE and LEAVE

The pg2.storm_summary table contains information about storms, including storm name, basin, maximum wind speed, and the start and end dates. You want to calculate the duration of each storm in days and count the number of working days lost in 2015.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p206p06.sas from the practices folder. Submit the program and examine the results. Note that the values for Duration and LostWork2015 are incorrect.


  2. Modify the DATA step program to correctly calculate duration and the number of lost work days in 2015 for each storm.

    • When calculating Duration, include both the start and end dates in the number of days.
    • Use a DO loop and accumulating variable LostWork2015 to calculate the number of work days lost. Within the DO loop, do the following:
      • Test to see whether ThisDay is in the year 2015.
        • If not, exit the DO loop because there will be no further work days that occur in 2015 for the given storm. Review the SAS documentation for the LEAVE statement.
      • If the current day of the week is Sunday or Saturday, skip the remaining statements in the DO loop and go to the next iteration. Review the SAS documentation for the CONTINUE statement.
      • Otherwise, increment LostWork2015 by 1.
    • Submit the program and examine the results. The table work.storm_workdays should have 95 rows and seven columns.

    Solution:

    data storm_workdays;
        set pg2.storm_summary;
        where year(StartDate)=2015 and name is not missing;
        Duration=EndDate-StartDate+1;
        LostWork2015=0;
        do ThisDay = StartDate to EndDate;
           /* if the current day is not in 2015, exit the DO loop */
           if year(ThisDay) ne 2015 then leave;
           /* if the current day is not a work day, skip the rest 
              of the statements in the loop, and loop again*/
           if weekday(ThisDay) in (1,7) then continue;
           LostWork2015+1;      
        end;
        keep Name Basin MaxWindMPH StartDate EndDate 
             Duration LostWork2015;
    run;
    
    title1 'Work Days Lost in 2015 due to Storms';
    title2 '(where started in 2015 and ended in 2016)';
    proc print data=storm_workdays;
        where year(StartDate) ne year(EndDate);
    run;
    title;

Lesson 07

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 1 Activity

Open p207a02.sas from the activities folder and perform the following tasks:

  1. Examine the DATA step code and run the program. Uncomment the RETAIN statement and run the program again. Why is the RETAIN statement necessary?

  1. Add a subsetting IF statement to include only the last row per student in the output table. Run the program.

  1. What must be true of the input table for the DATA step to work?

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 1

Level 1 Practice: Restructuring a Table Using the DATA Step: Wide to Narrow

The pg2.np_2017camping table contains public use statistics for camping in 2017 from the National Park Service. To enable statistics to be calculated for all camping locations, restructure the table as a narrow table.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p207p01.sas from the practices folder. Submit the PROC PRINT step and examine the results. Note that the Tent, RV, and Backcountry columns contain visitor counts.


  2. To convert this wide table to a narrow table, modify the DATA step.

    • The DATA step must create a new column named CampType with the values Tent, RV, and Backcountry, and another new column named CampCount with the numeric counts.
    • The DATA step includes statements to output a row for CampType='Tent'.
      • Modify the DATA step to output additional rows for RV and Backcountry.
    • Add a LENGTH statement to ensure that the values of the CampType column are not truncated.
    • Submit the DATA step and examine the output data. Confirm that each ParkName value has three rows corresponding to the Tent, RV, and Backcountry visitor counts.

    Solution:

    data work.camping_narrow(drop=Tent RV Backcountry);
        length CampType $11;
        set pg2.np_2017Camping;
        format CampCount comma12.;
        CampType='Tent';
        CampCount=Tent;
        output;
        CampType='RV';
        CampCount=RV;
        output;
        CampType='Backcountry';
        CampCount=Backcountry;
        output;
    run;

  3. How many rows and columns are in the camping_narrow table?

    Solution:

    The camping_narrow table has 381 rows and three columns.

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 1

Level 2 Practice: Restructuring a Table Using the DATA Step: Narrow to Wide

The pg2.np_2016camping table contains public use statistics for camping in 2016 from the National Park Service. To enable statistics to be calculated for individual camping locations, restructure the table as a wide table.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Examine the pg2.np_2016camping table to determine the three unique values of the CampType column.

    Solution:

    The three values of CampType are Tent, RV, and Backcountry.

  2. Create a program.

    • Write a DATA step to read pg2.np_2016camping and create camping_wide.
    • Use IF-THEN/ELSE statements to assign CampCount to the Tent, RV, and Backcountry columns based on the value of CampType.
    • Use the RETAIN statement to hold the values of ParkName, Tent, RV, and Backcountry in the PDV each time that the PDV reinitializes.
    • Use the BY statement to group the data by ParkName.
    • Add a subsetting IF statement to output the last row for each value of ParkName.
    • Keep the ParkName, Tent, RV, and Backcountry columns.
    • Format Tent, RV, and Backcountry with commas.
    • Submit the program and confirm that a column exists for each unique camping location (Tent, RV, and Backcountry).

    Solution:

    data work.camping_wide;
        set pg2.np_2016Camping;
        by ParkName;
        keep ParkName Tent RV Backcountry;
        format Tent RV Backcountry comma12.;
        retain ParkName Tent RV Backcountry;
        if CampType='Tent' then Tent=CampCount;
        else if CampType='RV' then RV=CampCount;
        else if CampType='Backcountry' then Backcountry=CampCount;
        if last.ParkName;
    run;

  3. How many rows are in the camping_wide table?

    Solution:

    The camping_wide table has 126 rows.

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 1

Challenge Practice: Using Arrays to Restructure a Table

The pg2.np_lodging_array table contains statistics for stays at lodging facilities in 2015, 2016, and 2017. Create a table that contains three rows for each park, corresponding to the lodge counts for years 2015, 2016, and 2017.

Note: An array enables you to perform the same action on a group of similar columns. In this example, Lodge2015, Lodge2016, and Lodge2017 are all numeric columns that represent the same measure for different years. Using an array with a DO loop can simplify repetitive code. Access SAS Help for more information about arrays.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Examine the np_lodging_array table. In addition to ParkName, notice that there are three columns containing visitor lodging counts. Lodge2015, Lodge2016, and Lodge2017 contain counts for visitors staying at lodges.


  2. Open the p207p03.sas program from the practices folder. Submit the program and confirm that the output table stacks the values of the Lodge columns.


  3. Modify the DATA step to use an array to simplify the repetitive processing.

    • Delete all statements between the FORMAT and RUN statements.
    • Add the following ARRAY statement after the FORMAT statement to define an array named Lodge that includes the columns Lodge2015, Lodge2016, and Lodge2017.
      array Lodge[2015:2017] Lodge2015-Lodge2017;
    • Add a DO loop with an index variable, Year, that loops three times for the values 2015 to 2017.
    • Inside the DO loop, perform the following actions:
      • Create a column named Stays that will be equal to the value of each column in the Lodge array. Note: The array name can be used in combination with the DO loop index variable to represent each column in the array. For example, Lodge[Year] is replaced by Lodge[2015] the first time through the DO loop. Lodge[2015] represents the first column in the lodge array, which is Lodge2015.
      • Output the row to the new table.
    • Submit the program and verify that the table includes three rows for each value of ParkName.

    Solution:

    data np_lodge_stack;
        set pg2.np_lodging_array;
        keep ParkName Year Stays;
        format Stays comma12.;
        array Lodge[2015:2017] Lodge2015-Lodge2017;
        do Year=2015 to 2017;
           Stays=Lodge[Year];
           output;
        end;
    run;

  4. How many rows are in the np_lodge_stack table?

    Solution:

    The np_lodge_stack table has 147 rows.

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 2 Activity

Open p207a03.sas from the activities folder and perform the following tasks:

  1. Highlight the PROC PRINT step and run the selection. Note how many rows are in the sashelp.class table.
  1. Highlight the PROC TRANSPOSE step and run the selection. Answer the following questions:

    1. Which columns from the input table are transposed into rows?
    2. What does each column in the output table represent?
    3. What is the name of the output table?

  1. Add the OUT= option on the PROC TRANSPOSE statement to create an output table named class_t.

  1. Add the following ID statement and run the step. What changes in the results?

  2. id Name;

  1. Add the following VAR statement and run the step. What changes in the results?

  2. var Height Weight;

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 2 Activity

Open p207a05.sas from the activities folder and perform the following tasks:

  1. Run the program. Notice that by default PROC TRANSPOSE transposes all the numeric columns, Wind1-Wind4.
  1. Add a VAR statement in PROC TRANSPOSE to transpose only the Wind1 and Wind2 columns. Run the program.

  1. What are the names of the columns that contain the column names and values that have been transposed?

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 2

Level 1 Practice: Restructuring a Table Using PROC TRANSPOSE: Wide to Narrow

The pg2.np_2017camping table contains public use statistics for camping in 2017 from the National Park Service. Convert the data from a wide table to a narrow table.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p207p04.sas from the practices folder. Submit the PROC PRINT step to display the first five rows of pg2.np_2017camping. Notice that the table contains three columns (Tent, RV, and Backcountry) with visitor counts for each value of ParkName. In addition, notice that the table is sorted by ParkName.


  2. Modify the PROC TRANSPOSE step.

    • Add the OUT= option to create a table named work.camping2017_t.
    • Add the BY statement to group the data by ParkName. This creates one row in the output table for each unique value of ParkName.
    • Add the VAR statement to transpose the Tent and RV columns.
    • Submit the PROC TRANSPOSE step and examine the output data.

    Solution:

    proc transpose data=pg2.np_2017camping
                   out=work.camping2017_t;
        by ParkName;
        var Tent RV;
    run;

  3. How many rows and columns are in the camping2017_t table? What are the column names?

    Solution:

    The camping2017_t table has 254 rows and three columns. The column names are ParkName, _NAME_, and COL1.


  4. Modify the program.

    • Use the NAME= option to specify Location as the name for the column that contains the names of the columns from the input table.
    • Use the RENAME= data set option after the output table to rename COL1 as Count.
    • Submit the PROC TRANSPOSE step and verify the results.

    Solution:

    proc transpose data=pg2.np_2017camping
                   out=work.camping2017_t (rename=(COL1=Count)) name=Location;
        by ParkName;
        var Tent RV;
    run;

  5. What are the column names in the camping2017_t table?

    Solution:

    The column names are ParkName, Location, and Count.


SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 2

Level 2 Practice: Restructuring a Table Using PROC TRANSPOSE: Narrow to Wide

The pg2.np_2016camping table contains public use statistics for camping in 2016 from the National Park Service. Convert the data from a narrow to a wide table.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Examine the np_2016camping table. Notice that the table contains one row for each location type (Tent, RV, and Backcountry) by ParkName. In addition, notice that the table is sorted alphabetically by ParkName.


  2. Create a new program.

    • Write a PROC TRANSPOSE step to create a wide table named work.camping2016_t.
    • Include only the ParkName column and individual columns for the values of CampType.
    • Submit the program and examine the output data.

    Solution:

    proc transpose data=pg2.np_2016camping 
                   out=work.camping2016_transposed(drop=_name_);
        by ParkName;
        id CampType;
        var CampCount;
    run;

  3. How many rows and columns are in the camping2016_t table?

    Solution:

    The camping2016_t table has 126 rows and four columns.

SAS® Programming 2: Data Manipulation Techniques
Lesson 07, Section 2

Challenge Practice: Naming Transposed Columns when the ID Column Has Duplicate Values

The pg2.weather_highlow table contains weather data for four locations. The high and low temperatures are recorded for the months of June, July, and August.

Reminder: If you restarted your SAS session, submit your libname.sas program or run your Autoexec process flow to access the practice data.

  1. Open p207p06.sas from the practices folder. Submit the program and examine the output table. Notice that table contains two rows for each value of Location and Month. The first row represents the high temperature and the second row is the low temperature.


  2. Create a new program.

    • Write a PROC TRANSPOSE step to create a table, work.lows, that contains the low temperatures for each reporting location.
    • Use the LET option to transpose only the last row for each BY group. Note: The LET option transposes only the last row for each BY group. Be sure that your data is sorted in the order that you require. For more information about the LET option, view SAS Help.
    • Use the values of Month as the names for the transposed columns.
    • Submit the program and examine the output data.

    Solution:

    proc transpose data=sort_highlow out=lows let;
        by location;
        id Month;
    run;

  3. Confirm that three columns (Jun, Jul, and Aug) exist for each value of Location. The values of the month columns should be the low temperatures.

    Note:   Warning messages will still appear in the log indicating that the Month values are duplicated within each value of Location.