SAS® SQL 1: Essentials
Lesson 01, Section 2 Demo: Exploring the Customer Table
In this demonstration, we explore the customer table.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s101d01.sas from the demos folder and find the Demo section. In the first SQL step, add a DESCRIBE TABLE statement to see column attributes of the sq.customer table. Highlight the step and run the selected code.
describe table sq.customer;
-
Review the log.
The log shows all the columns in the customer table, the column type, and then if there's a label or a format. Notice that FirstName, MiddleName, LastName are characters and they all contain a label. The DOB column is a numeric column with a label, date of birth. The actual column name is DOB. But again, there is a label.
- In the second SQL step, add a SELECT statement and select all the columns from the sq.customer table using an asterisk. Add the OBS=10 data set option to the table to limit the report to 10 rows. Run the code.
proc sql; select * from sq.customer(obs=10); quit;
-
Review the output.
Let's focus on the Date of Birth column. The column's name is actually DOB, not Date of Birth. So by default, PROC SQL uses associated labels with the columns. If there is no label, it uses the column name. We can also see the first 10 rows in every column in this table. But what if I don't want to see every column?
- Modify the SELECT statement to select the columns FirstName, LastName and DOB. Run the code and examine the log and results.
select FirstName, LastName, DOB
-
Review the output.
- Modify the SELECT statement to select the columns CustomerID, LastName, UserID, and DOB. Run the code and examine the log and results.
select CustomerID, LastName, UserID, DOB;
-
Review the output.
Again, we can see the four columns, the CustomerID, UserID, LastName, and Date of Birth. Again PROC SQL does use labels if they are with the column.
SAS® SQL 1: Essentials
Lesson 02, Section 1 Demo: Creating Simple Reports
In this demonstration, we use PROC SQL to create some simple reports.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the s102d01.sas program in the demos folder and find the Demo section. Move to Report 1. Complete the WHERE clause to filter for a missing BankID value and a value of CreditScore greater than 700. Complete the ORDER BY clause to arrange rows by descending Income. Highlight the step and run the selected code.
proc sql; select FirstName, LastName, State, Income, UserID from sq.customer (obs=100) where BankID is null and CreditScore > 700 order by Income desc; quit;
-
Review the output.
- Now we need to find the top 10 customers and clean up the report with formats and labels. Add the column modifiers FORMAT=DOLLAR16. to the Income column and LABEL='Email' to the UserID column. Remove the OBS= data set option and add the OUTOBS=10 option in the PROC SQL statement.
-
Review the output.
Here we can see our five columns. We can see the Income column is in descending order, and we only have the top 10 customers. We can see the format. And we can also see the Email column with the new label.
- Move to Report 2. We want to find all customers born prior to December 31, 1940 who are employed. Complete the WHERE clause to filter DOB prior to 31DEC1940 and where Employed equals Y. Use the AND operator. Complete the ORDER BY clause to arrange rows by descending DOB. Run the query and view the results.
proc sql; select CustomerID, State, Zip, DOB, UserID, HomePhone, CellPhone from sq.customer(obs=100) where DOB < '31DEC1940'd and Employed='Y' order by DOB desc; quit;
-
Review the output.
The results show the Date of Birth column, UserID, HomePhone, and CellPhone. However, what date is -7,016? Let's apply a format to the Date of Birth column, and apply a format to the ZipCode column. If you scroll down in the report, you can see one of the rows where the Customer ID is 1937622797. This customer has a ZipCode of 7097. United States zip codes are five digits, so we need a leading zero here.
- Add the column modifiers FORMAT=DATE9. to the DOB column and FORMAT=z5. to the Zip column. Remove the OBS= data set option and highlight and run the query.
-
Review the output.
SAS® SQL 1: Essentials
Lesson 02, Section 1 Demo: Assigning Values Conditionally
We're going to use a CASE expression to create columns conditionally.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the s102d02.sas program in the demos folder and find the Demo section. The first query uses a simple case expression. We want to find the credit score and assign it a value. If you have a CreditScore greater than 750, you're excellent, 700 to 749, you're good, and so on. Complete the WHEN and ELSE expressions in the simple CASE expression. Highlight and run the query. Examine the log and results.
proc sql; select FirstName, LastName, State, CreditScore, case when CreditScore >=750 then 'Excellent' when CreditScore >=700 then 'Good' when CreditScore >=650 then 'Fair' when CreditScore >=550 then 'Poor' when CreditScore >=0 then 'Bad' end as CreditCategory from sq.customer(obs=1000); quit;
-
Review the output.
Look at the CreditScore column where there's a missing value. What is the CreditCategory column? It is a missing value or a blank. We want a value for CreditCategory even when there's a missing CreditScore value.
- Add the ELSE "Unknown" expression to change remaining CreditScore values to Unknown. Highlight and run the query. Examine the log and results.
-
Review the output.
We can see now if the CreditScore column is missing, then the Category is unknown.
- Add a WHERE clause to filter the table for customers with Excellent credit. Remove the OBS=1000 data set option. Highlight and run the query.
proc sql; select FirstName, LastName, State, CreditScore, case when CreditScore >=750 then 'Excellent' when CreditScore >=700 then 'Good' when CreditScore >=650 then 'Fair' when CreditScore >=550 then 'Poor' when CreditScore >=0 then 'Bad' else 'Unknown' end as CreditCategory from sq.customer where calculated CreditCategory = 'Excellent'; quit;
-
Review the output.
- Next, let's use the case operand form. Complete the WHEN and ELSE expressions in the simple CASE-Operand expression. Highlight and run the query. Examine the log and results.
proc sql; select FirstName, LastName, State, CreditScore, Married, case Married when "M" then "Married" when "S" then "Single" when "D" then "Divorced" when "W" then "Widowed" else "Unknown" end as MarriedCategory from sq.customer(obs=1000); quit;
-
Review the output.
Case expressions are a great way to create a new column based on specific values.
SAS® SQL 1: Essentials
Lesson 02, Section 2 Demo: Using Summary Functions to Analyze a Table
We're going to use summary functions to analyze a table.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the s102d03.sas program in the demos folder and find the Demo section. Highlight and run the query in the explore the statepopulation table section. Examine the log and results.
proc sql inobs=10; describe table sq.statepopulation; select Region, Division, Name, PopEstimate1, PopEstimate2, PopEstimate3 from sq.statepopulation; quit;
-
Review the output.
In this table, we can see we have a couple of columns, a Region and the Division. We have the state name, and then we have the population estimates of that state. Using this information, we want to summarize down a column. And we want to find the count, the mean, the standard deviation, the min and the max of the PopEstimate1 column.
- Move to Method 1 - Down a Column. We are using the COUNT function to count the PopEstimate1, and we're going to call that TotalStates. And we're going to use the MEAN function to get the mean value of PopEstimate1 and call that mean. And we'll format using the COMMA16. format. Run the query and examine the results.
proc sql; select count(PopEstimate1) as TotalStates, mean(PopEstimate1) as Mean format=comma16. from sq.statepopulation; quit;
-
Review the output.
We can see when specifying one argument in a summary function, we get one row as output. We can see TotalStates is 52. This includes DC and Puerto Rico, and then we have the mean of about 6.2 million.
- In the SELECT clause, add three columns to find the standard deviation, minimum, and maximum of PopEstimate1 using the STD, MIN, and MAX functions. Use the COMMA16. format for all new columns. Highlight and run the query. Examine the log and results.
proc sql; select count(PopEstimate1) as TotalStates, mean(PopEstimate1) as Mean format=comma16., std(PopEstimate1) as StdDev format=comma16., min(PopEstimate1) as Min format=comma16., max(PopEstimate1) as Max format=comma16. from sq.statepopulation; quit;
-
Review the output.
Now we can see all these descriptive statistics.
- Move to SAS Method – PROC MEANS below the query. SAS has procedures to do similar summarization. Highlight and run the MEANS procedure. Examine the log and results.
proc means data=sq.statepopulation maxdec=0; var PopEstimate1; run;
-
Review the output.
- Move to Method Two - Across a Column. We're selecting the Name, the PopEstimate1 through PopEstimate3 columns. We're formatting those accordingly. And then we're using the AVG, MIN and MAX functions. Highlight and run the query. Examine the log and results.
proc sql; select Name, PopEstimate1 format=comma16., PopEstimate2 format=comma16., PopEstimate3 format=comma16., avg(PopEstimate1, PopEstimate2, PopEstimate3) as Mean format=comma16., min(PopEstimate1, PopEstimate2, PopEstimate3) as Min format=comma16., max(PopEstimate1, PopEstimate2, PopEstimate3) as Max format=comma16. from sq.statepopulation; quit;
-
Review the log.
There's an error: the function AVG could not be located. Why didn't this work? The AVG function takes only one argument, because it's ANSI standard. So if we want to summarize the average across the rows, we have to use the SAS MEAN function.
- Replace the AVG function with the MEAN function. Highlight and run the query. Examine the log and results.
proc sql; select Name, PopEstimate1 format=comma16., PopEstimate2 format=comma16., PopEstimate3 format=comma16., mean(PopEstimate1, PopEstimate2, PopEstimate3) as Mean format=comma16., min(PopEstimate1, PopEstimate2, PopEstimate3) as Min format=comma16., max(PopEstimate1, PopEstimate2, PopEstimate3) as Max format=comma16. from sq.statepopulation; quit;
-
Review the output.
When we specify multiple arguments in the summary function, SAS summarizes across. We can see we have the MIN, MEAN, MAX of each state's population estimate.
- In the MAX function, change the arguments to PopEstimate1-PopEstimate3. If this were a DATA step, this would definitely work. Run the query and examine the results.
proc sql; select Name, PopEstimate1 format=comma16., PopEstimate2 format=comma16., PopEstimate3 format=comma16., mean(PopEstimate1, PopEstimate2, PopEstimate3) as Mean format=comma16., min(PopEstimate1, PopEstimate2, PopEstimate3) as Min format=comma16. , max(of PopEstimate1-PopEstimate3) as Max format=comma16. from sq.statepopulation; quit;
-
Review the log.
That shortcut does not work in PROC SQL. You have to type each column name.
SAS® SQL 1: Essentials
Lesson 02, Section 2 Demo: Analyzing Groups of Data
We're going to use the GROUP BY clause to group data and produce summary statistics for each group.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the s102d04.sas program in the demos folder and find the Demo section. Notice that the query creates a report of the State column in the customer table and limits the output to 1,000 rows. Highlight and run the query. Examine the log and results.
proc sql; select State from sq.customer(obs=1000) group by State; quit;
-
Review the log.
When you use a GROUP BY clause without an aggregate function, PROC SQL treats the GROUP BY clause as if it were an ORDER BY clause, displaying a corresponding message in the log.
- Modify the query to count the total number of customers in each state by using the COUNT function. Name the column TotalCustomers and format it with a COMMA7. format. Highlight and run the query.
proc sql; select State, count(*) as TotalCustomers format=comma7. from sq.customer(obs=1000) group by state; quit;
-
Review the output.
This looks better. We have a count. Remember, we're limiting to the first 1,000 rows.
- Add an ORDER BY clause to the query to sort the report by descending TotalCustomers. Remove the OBS=1000 data set option and run the final query. Examine the log and results.
proc sql; select State, count(*) as TotalCustomers format=comma7. from sq.customer group by state order by TotalCustomers desc; quit;
-
Review the output.
And we can see we have each state with a total amount of customers.
- Replace State in the SELECT and GROUP BY clauses with BankID. This should give us the total number of customers for each bank. Highlight and run the query. Examine the log and the results.
proc sql; select BankID, count(*) as TotalCustomers format=comma7. from sq.customer group by BankID order by TotalCustomers desc; quit;
-
Review the output.
Looking at our results, we can see the Bank 101010 has the most customers. We can also see a GROUP BY clause does include missing values. So we actually have about 4,900 customers that don't have a bank ID.
- Add the Employed column after BankID in the SELECT and GROUP BY clauses to count the number of customers by bank ID and if they're employed. Highlight and run the query. Examine the log and the results.
proc sql; select BankID, Employed, count(*) as TotalCustomers format=comma7. from sq.customer group by BankID, Employed order by TotalCustomers desc; quit;
-
Review the output.
In our results, we can now see we have distinct values for BankID, Employed, and the TotalCustomers for each value.
- Add a WHERE clause to filter for TotalCustomers greater than 10,000. Highlight and run the query. Examine the log and the results.
proc sql; select BankID, Employed, count(*) as TotalCustomers format=comma7. from sq.customer where calculated TotalCustomers > 10000 group by BankID, Employed order by TotalCustomers desc; quit;
-
Review the log.
There's an error: Summary functions are restricted to the SELECT and HAVING clauses only. So our SUMMARY function must be filtered using the HAVING clause, not the WHERE clause. Note: Because the WHERE clause is evaluated before a row is available for processing and determines which individual rows are available for grouping, you cannot use a WHERE clause to subset grouped rows by referring to the calculated summary column.
- Remove the WHERE clause and insert a HAVING clause below the GROUP BY clause. Although it's not always necessary to have the CALCULATED keyword in the GROUP BY or HAVING clause, it's good practice to use it to guarantee the results of the query. Highlight and run the query. Examine the log and the results.
proc sql; select BankID, Employed, count(*) as TotalCustomers format=comma7. from sq.customer group by BankID, Employed having calculated TotalCustomers > 10000 order by TotalCustomers desc; quit;
-
Review the output.
We now see all the distinct values of BankID and Employed where TotalCustomers is greater than 10,000.
SAS® SQL 1: Essentials
Lesson 02, Section 2 Demo: Summarizing Data Using a Boolean Expression
In this demonstration, we use functions to summarize the number of customers under 25 and over 64 for each state.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the s102d05.sas program in the demos folder and find the Demo section. This query creates a table called CustomerCount from a query. We're selecting the State column, and then we're creating a new column called Age using the YRDIF function.
proc sql inobs=1000; create table CustomerCount as select State, yrdif(DOB,"01JAN2020"d,'age') as Age from sq.customer; quit;
-
Review the table.
- We want to summarize the data by any customers under 25 and over 64. Add the less than operator after the YRDIF function test if the row is under 25 years old. Rename the column Under25. Run the query and examine the results.
proc sql inobs=1000; create table CustomerCount as select State, yrdif(DOB,'01JAN2020'd,'age')<25 as Under25 from sq.customer; quit;
The table now includes the State and Under25 columns. The first row is for the state B and it has an Under25 value of 0. - Copy the expression. Replace the < comparison operator with the > comparison operator. Change the value from 25 to 64 and the name from Under25 to Over64. Run the query and examine the results.
proc sql inobs=1000; create table CustomerCount as select State, yrdif(DOB,'01JAN2020'd,'age')<25 as Under25, yrdif(DOB,'01JAN2020'd,'age')>64 as Over64 from sq.customer; quit;
- Summarize the data by wrapping each new column with the SUM function to add all the values of 1 to count the number of customers. Add a GROUP BY clause with the State column. Remove the INOBS= option. Run the query and examine the results.
proc sql; create table CustomerCount as select State, sum(yrdif(DOB,'01JAN2020'd,'age')<25) as Under25, sum(yrdif(DOB,'01JAN2020'd,'age')>64) as Over64 from sq.customer group by State; quit;
-
Review the table.
The new table, CustomerCount, with the State, in this example, Arkansas, is displayed. 60 customers are under 25, 57 are over 64. And we can see that throughout for each state, and DC and Puerto Rico.
The table now includes the State, Under25, and Over54 columns. The first row is for the state WI and it has an Under25 value of 0 and an Over64 value of 0.
SAS® SQL 1: Essentials
Lesson 02, Section 4 Demo: Using DICTIONARY Tables
We're going to be using dictionary tables to query dictionary information.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the s102d06.sas program in the demos folder and find the Demo section.
In the Explore dictionary.tables section, we're going to use a DESCRIBE TABLE statement on the dictionary.tables and then we're going to select all from dictionary.tables. Highlight and run the procedure. Examine the log and the results.
proc sql inobs=100; describe table dictionary.tables; select * from dictionary.tables; quit;
-
Review the log.
The DESCRIBE TABLE statement shows us the column names, the column types, and if a label is associated. Every column here has a label, so it's good to know what the actual column name is.
-
Review the output.
We can see the Library Name, WORK, SQ, etc. These are all in uppercase. We can see the Member Name, which is the table name, again all uppercase; the Member Type; and then a variety of other information.
- Add a WHERE clause to subset the Libname column for libraries named SQ and remove the INOBS= option. Remove the DESCRIBE TABLE statement. Highlight and run the procedure. Examine the log and the results.
proc sql; select * from dictionary.tables where Libname = 'SQ'; quit;
We can now see every table in the SQ library. We should have about 27 tables and all the information about them. - The SAS equivalent of dictionary.tables is using sashelp.vtable. We're going to use the WHERE statement, where libname = "SQ". Run the procedure and view the results.
proc print data=sashelp.vtable; where Libname = "SQ"; run;
-
Review the output.
We get the same information as above. By default, it doesn't use the labels, so we have the actual column names. It's good to know both dictionary.tables and sashelpviews, depending on where you're using them.
- Move to the Explore dictionaary.columns section. Here we're going to explore the dictionary.columns. I want to use the DESCRIBE TABLE statement on dictionary.columns and then I'm going to select everything from dictionary.columns where libname = "SQ". Highlight and run the procedure. Examine the log and the results.
proc sql; describe table dictionary.columns; select * from dictionary.columns where Libname = "SQ"; quit;
-
Review the log.
Some of the column names are similar, but there are different columns in each of these tables. I'm going to focus on the libname and the memname.
-
Review the output.
We can see the Library Name is only the SQ library, and the Member Name is the table name. Now, if we look over at the column name, we can see every column in each table in the SQ library. We can see the Column Type, the Length, the Position, and a variety of other information about each column in these tables. This is a great way to compare columns in different tables if they're supposed to be the same column type, length, and format.
- Let's look at the SAS equivalent of dictionary.columns, and that is sashelp.vcolumn. We're limiting it to the first 100 rows. Run the procedure and view the results.
proc print data=sashelp.vcolumn(obs=100); where Libname = "SQ"; run;
-
Review the output.
Again, we can see the same information. The column names here do not have labels by default when using the print procedure.
- Move to the Explore dictionary.libnames section. Highlight and run the procedure. Examine the log and the results.
proc sql; describe table dictionary.libnames; select * from dictionary.libnames; quit;
-
Review the output.
We can see we have the Library Name, the Engine used, where that library is located, and again, different information for each library.
- Add a WHERE clause to subset the Libname column for the SQ library.
Highlight and run the procedure. Examine the log and the results.
where Libname="SQ";
-
Review the output.
- To see each distinct library you're connected to, remove the WHERE clause and modify the SELECT clause by replacing the asterisk and adding distinct libname. Highlight and run the procedure. Examine the log and the results.
proc sql; describe table dictionary.libnames; select distinct libname from dictionary.libnames; quit;
-
Review the output.
Your list might differ slightly, but this is a nice way to check every library.
- Highlight and run the SAS equivalent of dictionary.libnames. Examine the log and the results.
proc print data=sashelp.vlibnam; where Libname = "SQ"; run;
-
Review the output.
We can see all the libraries and the same information. Again, here we have our SQ library and the same information we saw earlier.
SAS® SQL 1: Essentials
Lesson 03, Section 2 Demo: Performing an Inner Join with PROC SQL
In this demonstration, we use PROC SQL to perform an inner join between two tables.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s103d01.sas from the demos folder and find the Demo section. Run the queries in the Explore the Tables section to compare the columns of the sq.smallcustomer and sq.smalltransaction tables. Both tables contain an AccountID that we can use to join the tables.
proc sql; select * from sq.smallcustomer; select * from sq.smalltransaction; quit;
-
Review the output.
- Perform the INNER JOIN section and add sq.smallcustomer and sq.smalltransaction to the FROM clause to perform an inner join on AccountID. Qualify AccountID columns as table-name.col-name in the ON expression only. Highlight and run the query.
proc sql; select FirstName, LastName, State, Income, DateTime, MerchantID, Amount from sq.smallcustomer inner join sq.smalltransaction on smallcustomer.AccountID = smalltransaction.AccountID; quit;
-
Review the output.
We can see that Gary has one transaction, the income of Gary, the datetime of the transaction, the merchant from, and the amount. We can see Sergio has two transactions and so on.
- Add the AccountID column to the query after Amount. Highlight and run the query. Examine the log.
proc sql; select FirstName, LastName, State, Income, DateTime, MerchantID, Amount, AccountID from sq.smallcustomer inner join sq.smalltransaction on smallcustomer.AccountID = smalltransaction.AccountID; quit;
-
Review the log.
Why does the program fail? There is an ambiguous reference. The column AccountID is in more than one table. SQL doesn't know which account ID you would like. So let's qualify this name in the SELECT clause.
- Modify the query to qualify the AccountID column in the SELECT clause. When using an inner join, you can select either AccountID because you are only using matches. We're going to select the smallcustomer.AccountID. Highlight the step and run the selected code.
proc sql; select FirstName, LastName, State, Income, DateTime, MerchantID, Amount, smallcustomer.AccountID from sq.smallcustomer inner join sq.smalltransaction on smallcustomer.AccountID = smalltransaction.AccountID; quit;
-
Review the output.
And now we can see we have AccountID as the last column.
- Modify the query to include a WHERE clause to subset for customers who have a State value of NY (New York) and an ORDER BY clause that sorts by descending Amount.
proc sql; select FirstName, LastName, State, Income, DateTime, MerchantID, Amount, smallcustomer.AccountID from sq.smallcustomer inner join sq.smalltransaction on smallcustomer.AccountID = smalltransaction.AccountID where State = "NY" order by Amount desc; quit;
-
Review the output.
In this report, we can see we only have rows where state is NY, and we can see the Amount is in descending order.
SAS® SQL 1: Essentials
Lesson 03, Section 2 Demo: Performing an Inner Join with Four Tables
We're going to use PROC SQL to perform an inner join with four tables.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s103d02.sas from the demos folder and find the Demo section. Run the queries in the Explore the Tables section to compare the columns of the sq.smallcustomer and sq.smalltransaction tables.
proc sql inobs=5; title "Table: SMALLCUSTOMER"; select * from sq.smallcustomer; title "Table: SMALLTRANSACTION"; select * from sq.smalltransaction; title "Table: MERCHANT"; select * from sq.merchant; title "Table: BANK"; select * from sq.bank; title; quit;
-
Review the output.
Both tables contain an AccountID that we can use to join the tables. The SMALLCUSTOMER table has customer information. The SMALLTRANSACTION table has transactions from customers. We've been joining this by AccountID. What we want to do next is use the MERCHANT table to bring in the Merchant Name, and then the BANK table to bring in the Bank Name.
- Find the Joining Data from More Than Two Tables section. Highlight and run the query to join sq.smallcustomer with sq.transaction. Examine the results.
proc sql; select FirstName, LastName, c.State, Income, DateTime, MerchantID, Amount, c.AccountID, c.BankID from sq.smallcustomer as c inner join sq.smalltransaction as t on c.AccountID = t.AccountID; quit;
-
Review the output.
In our results, we can see the MerchantID, but what merchant is that? We don't memorize IDs based on the merchant name, so we want to bring in the merchant name here. And in BankID, we want to bring in the bank name. So let's do that by performing a four-table INNER JOIN.
- Add a second inner join and join the MerchantID column from the sq.merchant table with the MerchantID column of the previous join. Replace MerchantID in the SELECT clause with MerchantName. Highlight and run the query. Examine the results.
proc sql; select FirstName, LastName, c.State, Income, DateTime, MerchantName, Amount, c.AccountID, c.BankID from sq.smallcustomer as c inner join sq.smalltransaction as t on c.AccountID = t.AccountID inner join sq.merchant as m on t.MerchantID = m.MerchantID; quit;
-
Review the output.
We can see here, instead of MerchantID, we brought in the Merchant Name from the merchant table. This gives us a little bit more information. We want to do the same thing with BankID.
- Add a third inner join and join the BankID column from the sq.bank table with the BankID column of the previous join. Replace BankID in the SELECT clause with the bank name. Highlight and run the query. Examine the results.
proc sql; select FirstName, LastName, c.State, Income, DateTime, MerchantName, Amount, c.AccountID, b.Name from sq.smallcustomer as c inner join sq.smalltransaction as t on c.AccountID = t.AccountID inner join sq.merchant as m on t.MerchantID = m.MerchantID inner join sq.bank as b on c.BankID = b.BankID; quit;
-
Review the output.
Now, in the last column, we can see we brought in the Bank Name from the bank table. This report has a little bit more information for us, and shows you that you can perform joins with multiple tables.
SAS® SQL 1: Essentials
Lesson 03, Section 3 Demo: Performing a Full Join with Proc SQL
We're going to use PROC SQL to perform an inner join with four tables.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s103d03.sas from the demos folder and find the Demo section. Let's look at this query. We're selecting the FirstName, LastName, Income, and we're selecting both Account IDs, one from smallcustomer and one from smalltransaction. We're selecting the DateTime, MerchantID, Amount. We're full joining smallcustomer and smalltransaction on AccountID. Highlight and run the query. Examine the results.
proc sql; select FirstName, LastName, Income, c.AccountID, t.AccountID, DateTime, MerchantID, Amount from sq.smallcustomer as c full join sq.smalltransaction as t on c.AccountID = t.AccountID; quit;
-
Review the output.
Notice the AccountID columns. Are they exactly the same? You can see some differences. Some have both missing. Some have both values. Sometimes one on the left is missing. Sometimes one on the right is missing. That's because we're selecting AccountID in both tables. Our goal is to overlay these columns.
- Modify the SELECT clause and remove the t.AccountID column. Highlight and run the query. Examine the results.
proc sql; select FirstName, LastName, Income, c.AccountID, DateTime, MerchantID, Amount from sq.smallcustomer as c full join sq.smalltransaction as t on c.AccountID = t.AccountID; quit;
-
Review the output.
How many missing AccountID values do you see? There are five.
- Modify the SELECT clause and replace c.AccountID with t.AccountID. Highlight and run the query. Examine the results.
proc sql; select FirstName, LastName, Income, t.AccountID, DateTime, MerchantID, Amount from sq.smallcustomer as c full join sq.smalltransaction as t on c.AccountID = t.AccountID; quit;
-
Review the output.
Now there are three missing values for AccountID. So depending on which AccountID you select, you will have different values. What we can do is overlay these using the COALESCE function.
- Modify the SELECT clause, use the COALESCE function, and add c.AccountID and t.AccountID as arguments. Add the alias AccountID and the FORMAT=10. column modifier to the newly created column.
proc sql; select FirstName, LastName, Income, coalesce(c.AccountID,t.AccountID) as AccountID format=10., DateTime, MerchantID, Amount from sq.smallcustomer as c full join sq.smalltransaction as t on c.AccountID = t.AccountID; quit;
-
Review the output.
Now let's look at AccountID. We can see we only have one missing value for AccountID. So we overlaid those columns, and we took the first non-missing value.
With this report, we've completed a full join. I want to investigate it a little bit further. Let's look at that first row. We don't have an AccountID or a FirstName, LastName, or Income value. We don't know which customer bought that, so maybe that person paid in cash. We can see the remaining rows have an AccountID. So ones with customer, we could easily match. But ones without a customer Name and Income, for some reason, we don't have that value. That's something we could investigate further.
The last thing I want to see is the last two rows. We have the customer Ada and Samantha. They don't have a DateTime, MerchantID, or Amount. Those customers have not purchased anything. So they're in our report, but they have not had any purchases.
SAS® SQL 1: Essentials
Lesson 03, Section 4 Demo: Performing a Reflexive Join
We're going to use PROC SQL to perform a reflexive join or self join.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s103d04.sas from the demos folder and find the Demo section. Let's look at this query. We're selecting the EmployeeID, EmployeeName, StartDate, and ManagerID from the employee table. Highlight and run the query. Examine the results.
proc sql; select e.EmployeeID, e.EmployeeName, e.StartDate format=date9., e.ManagerID from sq.employee as e; quit;
-
Review the output.
We can see the EmployeeID on the left, the EmployeeName, the StartDate, and each employee has a manager ID. We want to find out what each employee's manager's name is. We can use this table twice.
- Modify the query to create a reflexive join. In the FROM clause, add an inner join followed by the sq.employee table again. Add the alias m to the second sq.employee table. Add the ON clause and set e.ManagerID equal to m.EmployeeID. So that is, the employee table's ManagerID equals e.EmployeeID. The employee table we're referencing as manager-- we want to match that table with the EmployeeID in the employee table.
proc sql; select e.EmployeeID, e.EmployeeName, e.StartDate format=date9., e.ManagerID from sq.employee as e inner join sq.employee as m on e.ManagerID = m.EmployeeID; quit;
- Add the EmployeeName column in the SELECT clause. Qualify the new EmployeeName column with the table alias m.
proc sql; select e.EmployeeID, e.EmployeeName, e.StartDate format=date9., e.ManagerID, m.EmployeeName from sq.employee as e inner join sq.employee as m on e.ManagerID = m.EmployeeID; quit;
- Add the column alias ManagerName to the m.EmployeeName column and an ORDER BY clause to sort by ManagerName. Highlight and run the query. Examine the results.
proc sql; select e.EmployeeID, e.EmployeeName, e.StartDate format=date9., e.ManagerID, m.EmployeeName as ManagerName from sq.employee as e inner join sq.employee as m on e.ManagerID = m.EmployeeID order by ManagerName; quit;
-
Review the output.
So we can see the EmployeeID, EmployeeName, the StartDate, the ManagerID, and then we can see the ManagerName. We performed a self join on the same table to solve this problem.
SAS® SQL 1: Essentials
Lesson 04, Section 1 Demo: Using a Subquery That Returns a Single Value
We're going to use a subquery that returns a single value.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s104d01.sas from the demos folder and find the Demo section. Run the query in the Explore the statepopulation table section. View the results.
proc sql; select Name, PopEstimate1 from sq.statepopulation; quit;
-
Review the output.
We can see we're only focusing on the Name column, so the state abbreviation, and the PopEstimate1, so next year's estimated population. We want to find all states that have a population estimate 1 greater than the total average.
- Run the query in the Future Subquery section to find the average of PopEstimate1 in the sq.statepopulation table.
proc sql; select avg(PopEstimate1) from sq.statepopulation; quit;
- Complete the outer query to find all states that have a higher PopEstimate1 value than the result of the previous query's value of 6278420. Sort the results by descending PopEstimate1. Highlight and run the query. Examine the results.
proc sql; select Name, PopEstimate1 from sq.statepopulation where PopEstimate1 > 6278420 order by PopEstimate1 desc; quit;
-
Review the output.
The report shows those states. Now, what if the state population estimate 1 changes? Those change all the time. I would have to rerun that first query, retype the value, and then run the second query. Here's where we can use a subquery.
- Remove the value 6278420 in the WHERE clause, replace it with the query that found the average of PopEstimate1, and enclose the inner query in parentheses. Note: Remember to remove the semicolon from the subquery when copying and pasting. When this query runs, the subquery will evaluate first, give us that value of 6.2 or so million, and then evaluate the outer query. Highlight and run the query. Examine the results.
proc sql; select Name, PopEstimate1 from sq.statepopulation where PopEstimate1 > (select avg(PopEstimate1) from sq.statepopulation) order by PopEstimate1 desc; quit;
-
Review the output.
Again, we get the same results. Now, one thing to consider is that a subquery doesn't have to return a value from the same table.
- Replace the subquery with a new query. Find the average of Population_2010 from the sashelp.us_data table. Note: The table in a subquery can be a different table than the outer query. This will return an average of the population estimate of 2010 from the sashelp.us_data, which is not the sq.statepopulation table.Highlight and run the query. Examine the results.
proc sql; select Name, PopEstimate1 from sq.statepopulation where PopEstimate1 > (select avg(Population_2010) from sashelp.us_data) order by PopEstimate1 desc; quit;
-
Review the output.
And now I have my results using that 2010 estimate.
The value returned is 6278420. We're going to use this value in our main query.
SAS® SQL 1: Essentials
Lesson 04, Section 1 Demo: Using a Subquery That Returns Multiple Values
We're going to use a subquery that returns multiple values in a single column.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s104d02.sas from the demos folder and find the Demo section. Examine the Future Subquery section and run the query to find the states that reside in Division 3. View the results.
proc sql; select Division, Name from sq.statepopulation where Division = '3'; quit;
-
Review the output.
We can see we have all Division 3 states: IL, IN, MI, OH, WI.
- Using the results from the Future Subquery section, modify the outer query by entering the State abbreviations ("IL", "IN","MI","OH","WI") in the WHERE clause. Run the query and view the results.
proc sql; create table Division3 as select * from sq.customer where State in ("IL", "IN","MI","OH","WI"); quit;
-
Review the log.
Since we created the table, we now have the Output Data tab, and we have 16,022 total rows and 22 columns. The log also shows this.
- To view the data in a report (only the first 50 rows), submit the following code.
proc print data=division3 (obs=50); run;
-
Review the output.
While this is great, let's use a subquery instead.
- Replace the values in parentheses with the query from step 1 by copying and pasting the query inside the parentheses. The query is now the subquery. Be sure to remove the semicolon from inside the parentheses. Highlight and run the query. Note: This program will return a syntax error.
proc sql; create table division3 as select * from sq.customer where State in (select Division, Name from sq.statepopulation where Division = '3'); quit;
-
Review the log.
Remember, a subquery cannot return more than one column. We're actually returning two, Division and Name. We don't need Division, because that's always 3.
- Remove the Division column from the subquery. Highlight and run the query.
proc sql; create table division3 as select * from sq.customer where State in (select Name from sq.statepopulation where Division = '3'); quit;
-
Review the log.
Again, this table returns 16,022 rows and 22 columns. This worked perfectly. Well, what if we want to change to Division 6?
- Change the Division value from 3 to 6 in the subquery. Replace the 3 at the end of the new table name in the CREATE TABLE statement to a 6. Highlight and run the query.
proc sql; create table division6 as select * from sq.customer where State in (select Name from sq.statepopulation where Division = '6'); quit;
-
Review the log.
- To view the data in a report (only the first 50 rows), submit the following code.
proc print data=division6 (obs=50); run;
-
Review the output.
And now we have every customer in Division 6, which is about 4,900 customers there. Using a subquery is a great way to make this more efficient.
SAS® SQL 1: Essentials
Lesson 04, Section 2 Demo: Using an In-Line View
We're going to use an inline view to create a virtual table.
Reminder: If you restarted your SAS session, you must recreate the SQ library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the ESQ1M6 folder. In Enterprise Guide, run the Autoexec process flow.
- Open the program s104d03.sas from the demos folder and find the Demo section. Run the first query to explore the customer and statepopulation tables.
proc sql inobs=10; select FirstName, MiddleName, LastName, State from sq.customer; select Name, EstimateBase from sq.statepopulation; quit;
-
Review the output.
We can see our customers, and we want to count the total number of customers in each state. And then we're going to join that with the statepopulation table name, and then we're going to divide the number of customers by the estimated base. That will give us a percentage of customers in each state.
- Move to the next section, Temporary Table Solution. The first solution creates a temporary table called totalcustomer that counts the number of customers in each state. Highlight and run the query. Examine the log and results.
proc sql; create table totalcustomer as select State,count(*) as TotalCustomer from sq.customer group by State order by TotalCustomer desc; quit;
-
Review the output.
Since we created the table, we now have the Output Data tab. We have the state and the total number of customers in that state.
- Run the second query to join the totalcustomer and sq.statepopulation tables and calculate the new column PctCustomer that calculates the percentage of customers in each state based on the current year's estimated population. We're selecting the State, TotalCustomer from the totalcustomer table, and then we're going to select the estimated base from the statepopulation table. We have our aliases here, c for totalcustomer, s for statepopulation. And then in our ON clause, we're using c.State = s.Name. We are ordering by PctCustomer. Highlight and run the query. Examine the log and results.
proc sql; select c.State, c.TotalCustomer, s.EstimateBase, c.TotalCustomer/s.EstimateBase as PctCustomer format=percent7.3 from totalcustomer as c inner join sq.statepopulation as s on c.State = s.Name order by PctCustomer; quit;
-
Review the output.
We've performed our join, and these are in ascending order. So it looks like VT, or Vermont, has the least amount of customers by percentage. If we scroll down, we can see the most customers. And we have DC, the most customers by percentage at 0.066. This solution is great, and it works. We can use a virtual table to make this a little bit better.
- Copy only the query that is used to create the totalcustomer table. Move to the Using an In-Line View section of the program. Paste the query (not including the CREATE TABLE statement) in the FROM clause to create an in-line view. Be sure to remove the semicolon. Highlight and run the query. View the syntax error in the log.
proc sql; select c.State, c.TotalCustomer, s.EstimateBase, c.TotalCustomer/s.EstimateBase as PctCustomer format=percent7.3 from (select State,count(*) as TotalCustomer from sq.customer group by State order by TotalCustomer desc) as c inner join sq.statepopulation as s on c.State = s.Name order by PctCustomer; quit;
-
Review the log.
There's one clause in the inline view that we cannot use. Remember, inline views can not contain an ORDER BY clause.
- Remove the ORDER BY clause in the subquery. Run the query and view the results.
proc sql; select c.State, c.TotalCustomer, s.EstimateBase, c.TotalCustomer/s.EstimateBase as PctCustomer format=percent7.3 from (select State,count(*) as TotalCustomer from sq.customer group by State) as c inner join sq.statepopulation as s on c.State = s.Name order by PctCustomer; quit;
-
Review the output.
When we use the inline view, the inline view will run at that exact moment. So as the sq.customer table grows, the inline view values will change. If we use the previous solution, we will always have to create the totalcustomer table and then use that table in an inner join.