SAS Viya Administration Operations
Lesson 01, Section 0 Exercise: Validate Environment
Validate the Workshop Environment
In this exercise you will validate the status of the SAS Viya platform deployment you will be using for the remainder of the workshop. In addition to verifying the status and health of the SAS Viya components, you will also validate the functional aspects of both SAS Viya as well as third-party applications you will be exposed to later.
List existing namespaces
From the collection’s Windows machine, open MobaXterm and initiate a new session to sasnode01, the Linux machine you will use to access the cluster.
List the namespaces in the Kubernetes cluster.
kubectl get ns
Efficiency tip…
Many of the commands you see throughout this workshop are long and somewhat challenging to type accurately. While you are certainly free to type the commands for yourself, we encourage you to copy the commands you see in the instructions and paste them into your MobaXterm session.
To copy: Select the Copy to clipboard icon which will copy all of the code from the code portlet to your clipboard. You do not have to highlight the text yourself for this to work.
To paste: In MobaXterm, right-click to paste the selected text.
The key namespaces to verify are
- gelcorp
- v4mlog
- v4mmon
There will be additional namespaces listed but if you see these three you can move on.
To simplify all subsequent
kubectl
commands, set the default namespace to gelcorp. This will effectively add--namespace gelcorp
to everykubectl
command.gel_setCurrentNamespace gelcorp
You should see
THE DEFAULT KUBERNETES DEPLOYMENT NAMESPACE IS: gelcorp If you want to change the current namespace, please run this command: gel_setCurrentNamespace [yourNamespace] and Context "gelcluster" modified.
Check status of SAS Viya pods
Let’s verify the status of your SAS Viya pods. Maximize your MobaXterm window and run
kubectl get pods -o wide
You should see that all of the pods have a status of Running or Completed. The output will also show you how the Viya pods are distributed across the nodes in your cluster.
A status of Running does not necessarily signal that the pod is ‘open for business’ so now let’s try reaching an endpoint on each pod.
Run the
gel_ReadyViya4
script to make sure all Viya pods are reporting in as Ready.gel_ReadyViya4 -n gelcorp
You should see a message similar to the following:
NOTE: POD labeled sas-readiness in namespace gelcorp is sas-readiness-6475759dc9-jld24 NOTE: Viya namespace gelcorp is running Stable 2024.03 : 20240425.1714076884655 NOTE: All checks passed. Marking as ready. The first recorded failure was 8m4s ago.
Display cadence information of your Viya deployment.
kubectl get configmaps -o yaml | grep CADENCE | head -8
Except for SAS_CADENCE_RELEASE, your values should match those below.
SAS_BASE_CADENCE_NAME: stable SAS_BASE_CADENCE_VERSION: "2024.03" SAS_CADENCE_DISPLAY_NAME: Long-Term Support 2024.03 SAS_CADENCE_DISPLAY_SHORT_NAME: Long-Term Support SAS_CADENCE_DISPLAY_VERSION: "2024.03" SAS_CADENCE_NAME: lts SAS_CADENCE_RELEASE: "20240517.1700181118101" SAS_CADENCE_VERSION: "2024.03"
Validate Applications
During the deployment, a file was written to the cloud-user’s home directory with application URLs. List that file.
cat ~/urls.md
Your listing should look like this with different hostname components embedded in the URLs. The URLs below will not work! Simply highlight one of the URLs in your MobaXterm window which will automatically copy it and then open a browser on the client machine and paste the copied text into the address bar.
# List of URLs for your environment * [Airflow]( http://airflow.*hostname*.race.sas.com ) * [Alert Manager]( http://alertmanager.*hostname*.race.sas.com/ ) * [Grafana (u=admin p=lnxsas)]( http://grafana.*hostname*.race.sas.com/ ) * [OpenSearch Dashboards (u=logadm p=lnxsas, u=admin p=lnxsas)]( http://osd.*hostname*.race.sas.com/ ) * [Prometheus]( http://prometheus.*hostname*.race.sas.com/ ) * [SAS Drive (gelcorp) ]( https://gelcorp.*hostname*.race.sas.com/SASDrive/ ) * [SAS Environment Manager (gelcorp) ]( https://gelcorp.*hostname*.race.sas.com/SASEnvironmentManager/ ) * [SAS Studio (gelcorp) ]( https://gelcorp.*hostname*.race.sas.com/SASStudio/ ) * [SAS Visual Analytics (gelcorp) ]( https://gelcorp.*hostname*.race.sas.com/SASVisualAnalytics/ )
Verify that you can login to SAS Environment Manager.
User: geladm Password: lnxsas
- Examine the users and groups
- Do you see any pages that are new for Viya?
- Are there pages missing that you are used to seeing in SAS Environment Manager?
Verify that you can login to Grafana.
User: admin Password: lnxsas
- See if you can display the SAS CAS Overview dashboard
There will be much more on Grafana later in the workshop.
Verify that you can login to OpenSearch Dashboards.
User: logadm Password: lnxsas
- Locate the Dashboard page
- Open the Log Message Volumes with Levels dashboard
There will be more coverage of OpenSearch Dashboards later in the workshop.
Let the workshop leader know if you have trouble verifying access to any of the applications listed in urls.md.
This completes the exercise.
SAS Viya Administration Operations
Lesson 02, Section 1 Exercise: Working with Labels
Working with Labels
Introduction
In this exercise you will experiment with selecting Kubernetes resources using labels. You will experiment with pod listings because the number of pods presents an abundance of options for learning about how to reference labels. You will learn how to
- Discover labels associated to pods
- Filter pod listings for pods that match a given label selector
- Combine multiple selector queries.
As you work through the following steps you can refer to this table of operators for building selector queries.
Operator | Meaning | Example | Meaning |
---|---|---|---|
= | equal to | ‘env=prod’ | The env key has a value of prod |
!= | not equal to | ‘env!=qa’ | The env key has a value that is not qa |
in | occurs in a list | ‘env in (prod,test)’ | The env key value is either prod or test |
notin | does not occur in the listed values | ‘env notin (prod,dev)’ | The env key value is not prod or dev; or the env key is unassigned |
exists (key only) | the referenced key exists | ‘env’ | The env key exists but the value is not tested |
!exists (key only) | the referenced key is not assigned | ‘!env’ | The env key does not exist for an object |
, | logical AND joining multiple expressions | ‘env=prod, tier=compute’ | The env key value is prod AND the tier key value is compute |
Set the default namespace
Set the default namespace to
gelcorp
so you can omit-n gelcorp
from anykubectl
command intended for thegelcorp
namespace.gel_setCurrentNamespace gelcorp
Discovering labels
Before you can reference labels you probably need a way to find out which labels have been assigned.
The easiest way to show all pod labels is to employ the
--show-labels
option with aget pods
command. As a warning, the following command generates a very wide pod listing so you may want to maximize your MobaXterm window before running it to minimize line wrapping.kubectl get pods --show-labels
You can also use the
describe
command to view the labels for any single object. For the following example, replace<paste-a-pod-name-here-from-your-console>
with the name of any pod of interest from the output of your previous command.kubectl describe pod <paste-a-pod-name-here-from-your-console>
Useful labels for pod selection
Now let’s look at a few labels that are quite useful for identifying key groups of pods in Viya.
NOTE: you can use -l or –selector to use a label query.
If you took time to scan all of the labels from the
--show-labels
output you may have noticed that almost every pod in SAS Viya has thesas.com/deployment=sas-viya
label.kubectl get pods --selector 'sas.com/deployment=sas-viya'
That listing is pretty long so let’s use the
wc
command to count how many pods have thesas.com/deployment=sas-viya
label. The--no-headers
option prevents the column headings from polluting our count.kubectl get pods --selector 'sas.com/deployment=sas-viya' --no-headers | wc -l
How does that compare to our total number of pods?
kubectl get pods --no-headers | wc -l
So now let’s use a label trick to discover which pods do not have the
sas.com/deployment=sas-viya
label. In this example, the selector'!sas.com/deployment
selects pods that do not have a label key ofsas.com/deployment
assigned.kubectl get pods --selector '!sas.com/deployment'
The
workload.sas.com/class
key is another useful label for looking at pods according to the type of workload they impart. This label key is important for the workload placement strategy for Viya when directing certain types of workload to specific nodes in the cluster. Expected values of the key arestateful
,stateless
,compute
,connect
, andcas
.For example, it is oftentimes useful to be able to identify the pods for all of the stateful services (Consul, Postgres, RabbitMQ, Redis, OpenDistro Elastic, and Workload Orchestrator).
kubectl get pods --selector 'workload.sas.com/class=stateful'
Now take a look at the pods for
stateless
services.kubectl get pods --selector 'workload.sas.com/class=stateless'
What are the results when you look for pods with the label
workload.sas.com/class=cas
?kubectl get pods --selector 'workload.sas.com/class=cas'
You should see no pods returned for the class=cas query. While this may seem odd, we do not use a workload placement strategy for CAS on our RACE collections to minimize the resources we require. Because we allow CAS pods to be scheduled on any node, the CAS pods themselves do not get the workload.sas.com label. This is not a good practice in the real world.
Your turn
Using what you have learned so far, try to write a pod selector query that will return the requested list of pods. There are multiple ways of solving these queries but an example solution is provided for each one if you get stuck.
List the pods related to Postgres but not those of the Postgres server itself. You may see no, or a different number of sas-crunchy-platform-postgres-repo1 jobs depending on how long your reservation has been running.
NAME READY STATUS RESTARTS AGE sas-crunchy-platform-postgres-backup-gzj6-xwhvh 0/1 Completed 0 4d21h sas-crunchy-platform-postgres-repo-host-0 2/2 Running 0 4d21h sas-crunchy-platform-postgres-repo1-full-28057320-sjmzp 0/1 Completed 0 3d12h sas-crunchy-platform-postgres-repo1-incr-28058760-5jwcx 0/1 Completed 0 2d12h sas-crunchy-platform-postgres-repo1-incr-28060200-7cmjf 0/1 Completed 0 36h sas-crunchy-platform-postgres-repo1-incr-28061640-nbdqn 0/1 Completed 0 12h
Click here to see one possible solution
kubectl get pods --selector 'postgres-operator.crunchydata.com/cluster=sas-crunchy-platform-postgres, postgres-operator.crunchydata.com/role notin (replica,master)'
List all of the CAS server pods that are managed by the CAS Operator.
NAME READY STATUS RESTARTS AGE sas-cas-server-default-controller 3/3 Running 0 4d21h
Click here to see one possible solution
kubectl get pods --selector app.kubernetes.io/managed-by=sas-cas-operator
List all of the pods that have a job-name key assigned (your output will differ from the example)
NAME READY STATUS RESTARTS AGE sas-backup-purge-job-28055535-2xcsl 0/2 Completed 0 4d17h sas-backup-purge-job-28056975-tg4q6 0/2 Completed 0 3d17h sas-backup-purge-job-28058415-q5dzw 0/2 Completed 0 2d17h sas-backup-purge-job-28059855-v5kf9 0/2 Completed 0 41h sas-backup-purge-job-28061295-qflmk 0/2 Completed 0 17h sas-create-openssl-ingress-certificate-6b4qm 0/1 Completed 0 4d21h sas-crunchy-platform-postgres-backup-gzj6-xwhvh 0/1 Completed 0 4d21h sas-crunchy-platform-postgres-repo1-full-28057320-sjmzp 0/1 Completed 0 3d12h sas-crunchy-platform-postgres-repo1-incr-28058760-5jwcx 0/1 Completed 0 2d12h sas-crunchy-platform-postgres-repo1-incr-28060200-7cmjf 0/1 Completed 0 36h sas-crunchy-platform-postgres-repo1-incr-28061640-nbdqn 0/1 Completed 0 12h sas-import-data-loader-28062120-v4pt4 0/1 Completed 0 4h9m sas-import-data-loader-28062240-gmsl4 0/1 Completed 0 129m sas-import-data-loader-28062360-zt5sk 0/1 Completed 0 9m11s sas-pyconfig-cjinitial-t7zhw 0/1 Completed 0 4d21h sas-scheduled-backup-job-28057020-gndz9 0/2 Completed 0 3d17h sas-update-checker-28055349-x6zsp 0/1 Completed 0 4d21h
Click here to see one possible solution
kubectl get pods --selector job-name
You have completed this exercise!
SAS Viya Administration Operations
Lesson 02, Section 2 Exercise: Kustomize Basics
In this hands-on you will use kustomize to make changes to your Viya Deployment.
- Preliminary Tasks
- Use Kustomize to create Persistent Volume Claim resource
- Use Kustomize to create mount of PVC to CAS Deployment
- Build and Apply the mainfests
- Test the changes were made in the cluster
- Review
In this hands-on we will demonstrate the use of Kustomize to:
- Create a new K8S resource a Persistent Volume Claim
- Update an existing K8S resource to use the Persistent Volume Claim
We will use a Kubernetes Persistent Volume claim to make data available to CAS. Our PVC is on NFS but that detail is abstracted from the user. In the cloud, the PVC would most likely be a different type of storage. Don’t worry if you don’t understand all of the Kubernetes concepts yet. This section is to help you get oriented to kustomize and kubectl.
NOTE: in the hands-on we use yq to update the yaml files on the command-line. This is less error prone than editing the files in an editor. In each case where we use the yq command we also show you the change you could make interactively in the editor. Please use the yq approach in class to ensure your success.
Preliminary Tasks
Set the current namespace.
gel_setCurrentNamespace gelcorp
Review the current contents of the
kustomization.yaml
file. Notice the different sections for:- resources
- configurations
- transformers
- patches
- generators
- etc.
cd ~/project/deploy/ cat ~/project/deploy/${current_namespace}/kustomization.yaml
Keep a copy of the original manifest and
kustomization.yaml
files. We will use these copies to track the changes your kustomization processing makes to these two files.cp -p ~/project/deploy/${current_namespace}/site.yaml /tmp/${current_namespace}/manifest_02-021.yaml cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_02-021.yaml cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization.yaml.orig
Use Kustomize to create Persistent Volume Claim resource
Create the resource definition. Create a yaml file that contains a complete persistent volume definition. We will use Kustomize to add the PVC to the generated manifest. By convention the file is created in the
site-config
sub-directory of the project directory.tee ~/project/deploy/${current_namespace}/site-config/gelcontent_pvc.yaml > /dev/null <<EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: gelcontent-data spec: accessModes: - ReadWriteMany resources: requests: storage: 2Gi EOF
Reference the resource definition in kustomization.yaml. Modify
~/project/deploy/gelcorp/kustomization.yaml
to add a reference to the filesite-config/gelcontent_pvc.yaml
. The file contains a complete Kubernetes resource so the reference is added to the resources section. in class we use yq to automate the update of the kuztomization.yaml.Run this command to update your
kustomization.yaml
file using the yq tool:[[ $(grep -c "site-config/gelcontent_pvc.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.resources += ["site-config/gelcontent_pvc.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could manually edit the resources section to add the line
- site-config/gelcontent_pvc.yaml
.[...] resources: [... previous resource items ...] - site-config/gelcontent_pvc.yaml [...]
At this point we could perform the build and apply, but lets make a another change before we do that.
Use Kustomize to create mount of PVC to CAS Deployment
Now we will use a JSON patch to patch the CAS Deployment Kubernetes resource. The patch will:
- be added to the patches section of kustomization.yaml
- target all resources of type CASDeployment
- add the claimName at /spec/controllerTemplate/spec/volumes/-
- add the mount path at /spec/controllerTemplate/spec/containers/0/volumeMounts/-
Create a JSON patch file that updates the CASDeployment
tee ~/project/deploy/${current_namespace}/site-config/cas-gelcontent-mount-pvc.yaml > /dev/null << EOF - op: add path: /spec/controllerTemplate/spec/volumes/- value: name: sas-viya-gelcontent-pvc-volume persistentVolumeClaim: claimName: gelcontent-data - op: add path: /spec/controllerTemplate/spec/containers/0/volumeMounts/- value: name: sas-viya-gelcontent-pvc-volume mountPath: /mnt/gelcontent EOF
Reference the patch in kustomization.yaml.Modify the
~/project/deploy/gelcorp/kustomization.yaml
to reference the patch.Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/cas-gelcontent-mount-pvc.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.patches += { "path": "site-config/cas-gelcontent-mount-pvc.yaml", "target": {"group": "viya.sas.com", "kind": "CASDeployment", "name": ".*", "version": "v1alpha1"} }' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the patches section to add the lines below that patch the compute server deployment.
[...] patches: [... previous patches items ...] - path: site-config/cas-gelcontent-mount-pvc.yaml target: group: viya.sas.com kind: CASDeployment # The following name specification will target all CAS servers. To target specific # CAS servers, comment out the following line then uncomment and edit one of the lines # targeting specific CAS servers. name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1 [...]
What have we done so far? Two files (cas-gelcontent-mount-pvc.yaml and gelcontent_pvc.yaml ) were created in the
site-config
sub-directory of the project and thekustomization.yaml
was updated to reference those files. Run the following command to view the changes made to kustomization.yaml. The changes are in green in the right column.icdiff /tmp/${current_namespace}/kustomization_02-021.yaml ~/project/deploy/${current_namespace}/kustomization.yaml
Build and Apply the mainfests
The sas-orchestration deploy command uses the sas-orchestration docker container to build and apply the Kuberenetes manifests. The deploy command performs the following steps within the container:
- pulls the deployment assets
- builds the manifest (with kustomize)
- applys the manfests with kubectl apply commands and runs any necessary life-cycle operations
Review the .gelcorp_var file in which we store the parameters needed by the orchestration deploy command.
cat ~/project/deploy/.${current_namespace}_vars
The output will show what cadence and release we are using.
#deployment parameters _viyaMirrorReg=crcache-race-sas-cary.unx.sas.com _order=9CV11D _cadenceName=stable _cadenceVersion=2023.04 _cadenceRelease=latest
Run the sas-orchestration deploy command. Follow the output in the terminal, this command will take a few minutes to complete.
cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy commmand completes succesfully the final message should say The deploy command completed succesfully as shown in the log snippet below.
The deploy command started Generating deployment artifacts Generating deployment artifacts complete Generating kustomizations Generating kustomizations complete Generating manifests Applying manifests > start_leading gelcorp [...more...] > kubectl delete --namespace gelcorp --wait --timeout 7200s --ignore-not-found configmap sas-deploy-lifecycle-operation-variables configmap "sas-deploy-lifecycle-operation-variables" deleted > stop_leading gelcorp Applying manifests complete The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Test the changes were made in the cluster
It may takes some time for the PVC to be bound. Run the following command to wait for that to happen.
while [[ $(kubectl get pvc gelcontent-data -o 'jsonpath={..status.phase}') != "Bound" ]]; do echo "waiting for PVC status" && sleep 1; done
Confirm that the PVC was created in the namespace.
kubectl get pvc gelcontent-data
Expected output:
log NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gelcontent-data Bound pvc-8dccb0ed-cfdc-4a26-afd0-76aad37bb7ff 2Gi RWX nfs-client 13m
Delete the CAS pod so the PVC change can be picked up when CAS restarts.
kubectl delete pod --selector='app.kubernetes.io/managed-by=sas-cas-operator'
Confirm that the PVC is mounted into the pods at the location
/mnt/gelcontent
.kubectl describe pod -l casoperator.sas.com/node-type=controller | grep -A 3 sas-viya-gelcontent-pvc-volume
You should see in the output:
log /mnt/gelcontent from sas-viya-gelcontent-pvc-volume (rw) /opt/sas/viya/config/etc/SASSecurityCertificateFramework/cacerts from security (rw,path="cacerts") /opt/sas/viya/config/etc/SASSecurityCertificateFramework/private from security (rw,path="private") /opt/sas/viya/home/commonfiles from commonfilesvols (ro) -- sas-viya-gelcontent-pvc-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: gelcontent-data ReadOnly: false
To see the change made to the Viya manifest file run the following command.
icdiff /tmp/${current_namespace}/manifest_02-021.yaml /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml
NOTE: when using the orchestrate deploy command the manifest is built inside the docker container in a work directory. In order to access the manifest we have mounted that work directory to a path on the local file system.
Review
In the practice exercise you:
- Created two new overlays(a PVC definition and a transformer that patches the CAS deployment to use the PVC)
- Edited the kustomization.yaml file to reference two new overlays.
- Used the docker run command to run the
sas-orchestration deploy
command - The deploy command built and applied the manifests to make the changes in the Viya environment.
SAS Viya Administration Operations
Lesson 03, Section 0 Exercise: Configure Identities
In this handson you will manage the SAS Viya access to POSIX attributes.
- Review Current Identities Configuration
- Update Identities Configuration to return UID from LDAP
- Review
Review Current Identities Configuration
Setup Environment.
gel_setCurrentNamespace gelcorp export SAS_CLI_PROFILE=${current_namespace} export SSL_CERT_FILE=~/.certs/${current_namespace}_trustedcerts.pem export REQUESTS_CA_BUNDLE=${SSL_CERT_FILE} /opt/pyviyatools/loginviauthinfo.py
In the class environment the identity provider can return POSIX attributes. However the default settings for Viya do not return the userid (UID) POSIX attribute. In this step we view the current settings and see that identifier.generateUids is true, meaning that the identities service will generate UID’s using a hashing algorithim.
sas-viya configuration configurations show --id $(sas-viya configuration configurations list --definition-name sas.identities | jq -r '.items[0]["id"]') | grep identifier
Expected output:
log identifier.disableGids : false identifier.disallowedUids : 1001 identifier.generateGids : false identifier.generateUids : true
As a result of these settings UID and GID are generated and cannot be overriden. You can check the current POSIX attributes returned to Viya from identities using the show-user command of the identities plugin of the sas-viya cli. The –showadvanced option shows the advanced attributes including UID and GID. The UID and GID numbers have been generated by the identities service.
sas-viya --output text identities show-user --id Ahmed --show-advanced
Expected output:
log Id Ahmed Name Ahmed Title Platform Administrator EmailAddresses [map[value:Ahmed@gelenable.sas.com]] PhoneNumbers Addresses State active ProviderId ldap CreationTimeStamp 2021-11-02T15:15:25.000Z ModifiedTimeStamp 2022-11-21T15:05:37.000Z Uid 7.3671692e+08 Gid 7.3671692e+08 SecondaryGids [2003 3000 3006 3007]
NOTE: If the UID is generated for a user, their primary GID is always set to the same value. Because generateGids=False the secondary GID’s are being loaded from LDAP.
You can also use the pyviyatools getposixidentity.py to view the UID and GID (also secondary GID’s).
/opt/pyviyatools/getposixidentity.py -u Ahmed -o simplejson
Expected output:
log { "username": "Ahmed", "version": 1, "gid": 736716920, "secondaryGids": [ 2003, 3006, 3007 ], "id": 736716920, "uid": 736716920 }
getposixidentity can return the UID, GID and secondary GID’s for all users. NOTE: this can take a minute or two to complete.
/opt/pyviyatools/getposixidentity.py -o csv
Expected output:
log id ,uid ,gid ,secgid ,name "sasldap","520876611","520876611","[1003]","SAS LDAP Service Account" "sas","1999796463","1999796463","[2002, 2003, 1001, 3001, 1002, 3003, 3004]","SAS System Account" "cas","1847089512","1847089512","[1001, 3001, 1002, 3003, 3004]","CAS System Account" "sasadm","1681596511","1681596511","[2002, 2003, 3001, 3003, 3004, 3006, 3007]","SAS Administrator" "sastest1","1396998156","1396998156","[2003]","SAS Test User 1" "sastest2","1754724830","1754724830","[2003]","SAS Test User 2" "geladm","1794942382","1794942382","[2002, 2003, 3001, 3003, 3004, 3006, 3007]","geladm" "Douglas","1721890207","1721890207","[2003, 3003, 3007]","Douglas" "Delilah","408403145","408403145","[2003, 3001, 3007]","Delilah" "Alex","1571476065","1571476065","[2003, 3005]","Alex" "Amanda","1917500395","1917500395","[2003, 3006, 3007]","Amanda" "Ahmed","736716920","736716920","[2003, 3006, 3007]","Ahmed" "Fay","217599329","217599329","[2003, 3004]","Fay" "Fernanda","2063022101","2063022101","[2003, 3004]","Fernanda" "Fiona","820179103","820179103","[2003, 3004]","Fiona" "Frank","451444854","451444854","[2003, 3004]","Frank" "Fred","340329804","340329804","[2003, 3004]","Fred" "Hamish","127901971","127901971","[2003, 3001]","Hamish" "Hazel","1123162857","1123162857","[2003, 3001]","Hazel" "Heather","1263590372","1263590372","[2003, 3001]","Heather" "Helena","1414144599","1414144599","[2003, 3001, 3002]","Helena" "Henrik","274336581","274336581","[2003, 3001]","Henrik" "Hugh","176265822","176265822","[2003, 3001]","Hugh" "Santiago","792606553","792606553","[2003, 3003]","Santiago" "Sarah","1664936282","1664936282","[2003, 3003]","Sarah" "Sasha","243405028","243405028","[2003, 3003]","Sasha" "Sean","1900111114","1900111114","[2003, 3003]","Sean" "Sebastian","1640393434","1640393434","[2003, 3003]","Sebastian" "Shannon","1234018889","1234018889","[2003, 3003]","Shannon" "Sheldon","1621874476","1621874476","[2003, 3003]","Sheldon" "Sophia","1098749434","1098749434","[2003, 3002, 3003]","Sophia" "hrservice","1170696439","1170696439","[2003]","hrservice" "salesservice","1688566405","1688566405","[2003]","salesservice" "financeservice","1148835472","1148835472","[2003]","financeservice"
Update Identities Configuration to return UID from LDAP
In our environment we want to return the POSIX attributes from our LDAP identity provider so that our SAS compute engines can access content secured to users and groups on shared storage.
On sasnode01 where the LDAP Server is deployed we can see the POSIX attributes for Ahmed.
id Ahmed
Expected output:
uid=4005(Ahmed) gid=2003(sasusers) groups=2003(sasusers),3006(GELCorpSystemAdmins),3007(powerusers)
To return the POSIX attributes from LDAP we will set the identifier.generateUids property to false and then refresh the identities cache. We will use the configuration cli to achieve this, we could also perform these steps in Environment Manager.
MEDIATYPE=$(sas-viya configuration configurations download -d sas.identities | jq -r '.items[]["metadata"]["mediaType"] ' ) echo ${MEDIATYPE} tee /tmp/update_identities.json > /dev/null << EOF { "name": "identities configurations", "items": [ { "metadata": { "isDefault": false, "mediaType": "${MEDIATYPE}" }, "identifier.generateUids": false } ] } EOF sas-viya configuration configurations update --file /tmp/update_identities.json sas-viya --output text identities refresh-cache sleep 20
Expected output:
log The cache is refreshing. state refreshing
Now that UID is not generated lets check what is returned for Ahmed. Notice that we are now getting the attributes from the LDAP identity provider.
Tip: If Ahmed still has the same large values for Uid and Gid that he had before you updated the configuration, it is likely the cache has not refreshed yet. Wait a short time (no more than 30 seconds), and try again
sas-viya --output text identities show-user --id Ahmed --show-advanced
Expected output:
log Id Ahmed Name Ahmed Title Platform Administrator EmailAddresses [map[value:Ahmed@gelcorp.com]] PhoneNumbers Addresses [map[country: locality:Cary postalCode: region:]] State active ProviderId ldap CreationTimeStamp 2023-05-04T08:29:03.000Z ModifiedTimeStamp 2023-05-04T08:29:03.000Z Uid 4005 Gid 2003
In this step review the UID, GID and secondary GID’s for all users. NOTE: this can take a minute or two to complete.
/opt/pyviyatools/getposixidentity.py -o csv
Expected output:
log id ,uid ,gid ,secgid ,name "sasldap","1003","1003","['']","SAS LDAP Service Account" "sas","1001","1001","[2002, 2003, 3001, 1002, 3003, 3004]","SAS System Account" "cas","1002","1001","[3001, 1002, 3003, 3004]","CAS System Account" "sasadm","2002","2002","[2003, 3001, 3003, 3004, 3006, 3007]","SAS Administrator" "sastest1","2003","2003","['']","SAS Test User 1" "sastest2","2004","2003","['']","SAS Test User 2" "geladm","4000","2002","[2003, 3001, 3003, 3004, 3006, 3007]","geladm" "Douglas","4001","2003","[3003, 3007]","Douglas" "Delilah","4002","2003","[3001, 3007]","Delilah" "Alex","4003","2003","[3005]","Alex" "Amanda","4004","2003","[3006, 3007]","Amanda" "Ahmed","4005","2003","[3006, 3007]","Ahmed" "Fay","4006","2003","[3004]","Fay" "Fernanda","4007","2003","[3004]","Fernanda" "Fiona","4008","2003","[3004]","Fiona" "Frank","4009","2003","[3004]","Frank" "Fred","4010","2003","[3004]","Fred" "Hamish","4011","2003","[3001]","Hamish" "Hazel","4012","2003","[3001]","Hazel" "Heather","4013","2003","[3001]","Heather" "Helena","4014","2003","[3001, 3002]","Helena" "Henrik","4015","2003","[3001]","Henrik" "Hugh","4016","2003","[3001]","Hugh" "Santiago","4017","2003","[3003]","Santiago" "Sarah","4018","2003","[3003]","Sarah" "Sasha","4019","2003","[3003]","Sasha" "Sean","4020","2003","[3003]","Sean" "Sebastian","4021","2003","[3003]","Sebastian" "Shannon","4022","2003","[3003]","Shannon" "Sheldon","4023","2003","[3003]","Sheldon" "Sophia","4024","2003","[3002, 3003]","Sophia" "hrservice","3001","2003","['']","hrservice" "salesservice","3002","2003","['']","salesservice" "financeservice","3003","2003","['']","financeservice"
Review
The POSIX attributes are now returned from the LDAP identity provider. This will facilitate securing and accessing files on the shared NFS server that uses the same LDAP identity provider.
SAS Viya Administration Operations
Lesson 03, Section 1 Exercise: Configure Persistent Storage
The Viya environment has an NFS server running on sasnode1. We can use this NFS server to mount directories and files from the host to pods in the Kubernetes cluster. This is useful for accessing data or code from this permanent location. The files and folders on the NFS server are secured so we will also make sure that the CAS and SAS Programming Run-Time servers work with the secured NFS mount.
In this hands-on you will mount a drive from an NFS file server into the CAS and Programming Run-time pods.
- Set the namespace and authenticate
- Use an NFS volume to make data available to the CAS deployment
- Use an NFS volume to make data available to the Programming Run-Time Servers (programming run-time)
- Build and Apply with sas-orchestration deploy
- Validation
- Review
Set the namespace and authenticate
gel_setCurrentNamespace gelcorp
/opt/pyviyatools/loginviauthinfo.py
Use an NFS volume to make data available to the CAS deployment
Mount NFS Share to CAS Deployment
Create an overlay for CAS to add the volume from the NFS server and the volume mount point inside the CAS container. The overlay targets the single CASDeployment that is available in the namespace.
- volumeMounts : mountPath is the location inside the container
- volumes : path is the location outside the container
cd ~/project/deploy/ _deploymentNodeFQDN=$(hostname -f) tee ~/project/deploy/${current_namespace}/site-config/cas-add-nfs-mount.yaml > /dev/null << EOF # cas-add-nfs-mount.yaml # Add additional mount apiVersion: builtin kind: PatchTransformer metadata: name: cas-add-mount patch: |- - op: add path: /spec/controllerTemplate/spec/volumes/- value: name: sas-viya-gelcorp-volume nfs: path: /shared/gelcontent server: ${_deploymentNodeFQDN} - op: add path: /spec/controllerTemplate/spec/containers/0/volumeMounts/- value: name: sas-viya-gelcorp-volume mountPath: /gelcontent target: group: viya.sas.com kind: CASDeployment # The following name specification will target all CAS servers. To target specific # CAS servers, comment out the following line then uncomment and edit one of the lines # targeting specific CAS servers. name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1 EOF
Modify ~/project/deploy/${current_namespace}/kustomization.yaml to reference the cas server overlay.
In the transformers section add the line - site-config/cas-add-nfs-mount.yaml. Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/cas-add-nfs-mount.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-add-nfs-mount.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the transformers section to add the lines below
[...] transformers: [... previous transformers items ...] - site-config/cas-add-nfs-mount.yaml [...]
Add path to CAS allowlist
By default, the path which CAS can access is restricted to /cas/data/caslibs. In this step we will add the NFS mounted path so that users can create caslibs to that path. This could also be done as a CAS super-user in SAS Environment Manager.
tee ~/project/deploy/${current_namespace}/site-config/cas-add-allowlist-paths.yaml > /dev/null << EOF --- apiVersion: builtin kind: PatchTransformer metadata: name: cas-add-allowlist-paths patch: |- - op: add path: /spec/appendCASAllowlistPaths value: - /cas/data/caslibs - /gelcontent - /mnt/gelcontent/ target: group: viya.sas.com kind: CASDeployment # The following name specification will target all CAS servers. To target specific # CAS servers, comment out the following line then uncomment and edit one of the lines # targeting specific CAS servers. name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1 EOF
Modify ~/project/deploy/${current_namespace}/kustomization.yaml to reference the cas allowlist overlay. In the transformers section add the line - site-config/cas-add-allowlist-paths.yaml
Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/cas-add-allowlist-paths.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-add-allowlist-paths.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the transformers section to add the lines below
[...] transformers: [... previous transformers items ...] - site-config/cas-add-allowlist-paths.yaml [...]
Use an NFS volume to make data available to the Programming Run-Time Servers (programming run-time)
Mount NFS share to all Programming Run-Time
In Viya, programming run-time sessions are started by the launcher. The Launcher Service looks for a Kubernetes PodTemplate that contains information that is used to construct a Kubernetes job request. The PodTemplate information is used to generate the container that is launched as pod. The container in the pod performs the SAS processing.
Create a new .yaml file for the changes that need to be applied to the Kubernetes manifest to add the volume and volume mount. Save the file in the ${current_namespace} project.
_deploymentNodeFQDN=$(hostname -f) tee ~/project/deploy/${current_namespace}/site-config/compute-server-add-nfs-mount.yaml > /dev/null << EOF --- apiVersion: builtin kind: PatchTransformer metadata: name: compute-server-add-nfs-mount patch: |- - op: add path: /template/spec/volumes/- value: name: sas-viya-gelcorp-volume nfs: path: /shared/gelcontent server: ${_deploymentNodeFQDN} - op: add path: /template/spec/containers/0/volumeMounts/- value: name: sas-viya-gelcorp-volume mountPath: /gelcontent target: kind: PodTemplate version: v1 labelSelector: sas.com/template-intent=sas-launcher EOF
Modify the ~/project/deploy/${current_namespace}/kustomization.yaml to reference the compute server overlay. Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/compute-server-add-nfs-mount.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/compute-server-add-nfs-mount.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the transformers section to add the lines below
[...] transformers: [... previous transformers items ...] - site-config/compute-server-add-nfs-mount.yaml [...]
Update the Allowlist for SAS Programming Run-Time
Starting with Stable 2020.1.4 lockdown is set by default on Programming run-time servers. Update the allowlist for the Compute Server
tee /tmp/compute-autoexec.json > /dev/null << EOF { "items": [ { "version": 1, "metadata": { "isDefault": false, "services": [ "compute" ], "mediaType": "application/vnd.sas.configuration.config.sas.compute.server+json;version=1" }, "name": "autoexec_code", "contents": "/*Allow List*/ \n lockdown path='/gelcontent'; \n lockdown path='/mnt/gelcontent'; \n " } ], "version": 2 } EOF gel_sas_viya configuration configurations update --file /tmp/compute-autoexec.json
Build and Apply with sas-orchestration deploy
Keep a copy of the current manifest file. We will use this copy to track the changes your kustomization processing makes to this file.
cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_03-021.yaml
Run the sas-orchestration deploy command.
cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy commmand completes succesfully the final message should say The deploy command completed succesfully as shown in the log snippet below.
The deploy command started Generating deployment artifacts Generating deployment artifacts complete Generating kustomizations Generating kustomizations complete Generating manifests Applying manifests > start_leading gelcorp [...more...] > kubectl delete --namespace gelcorp --wait --timeout 7200s --ignore-not-found configmap sas-deploy-lifecycle-operation-variables configmap "sas-deploy-lifecycle-operation-variables" deleted > stop_leading gelcorp Applying manifests complete The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Run the following command to view the changes in the manifest. The changes are in green in the right column.
icdiff /tmp/${current_namespace}/manifest_03-021.yaml /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml
Validation
In this section we will validate that the changes were sucessfully implemented in the Viya Deployment.
Validate that the NFS directories were mounted into the CAS pods
To pick up the CAS Related changes we need to restart the CAS server. Delete the existing CAS pods and the CAS Operator will automatically start a new instance. In this case we use a selector to target all pods managed by the CAS operator.
kubectl delete pod --selector='app.kubernetes.io/managed-by=sas-cas-operator' sleep 20 kubectl wait pods -l casoperator.sas.com/node-type=controller --for condition=ready --timeout 360s
Check that /shared/gelcontent is mounted into the pods at the location /gelcontent.
kubectl describe pod -l casoperator.sas.com/node-type=controller | grep -A 3 sas-viya-gelcorp-volume
You should see in the output:
/gelcontent from sas-viya-gelcorp-volume (rw) /opt/sas/viya/home/share/refdata/qkb from sas-quality-knowledge-base-volume (rw) /rdutil from sas-rdutil-dir (rw) /sasviyabackup from backup (rw) -- sas-viya-gelcorp-volume: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: pdcesx21071.race.sas.com Path: /shared/gelcontent
Exec into the CAS controller pod and check that files on the NFS share can be accessed.
kubectl exec -it $(kubectl get pod -l casoperator.sas.com/node-type=controller --output=jsonpath={.items..metadata.name}) -c sas-cas-server -- ls -li /gelcontent/gelcorp
You should see in the output:
total 0 461525615 drwxrws--- 8 sas 3004 86 Jan 10 2020 finance 432176384 drwxrws--- 8 sas 3001 86 Mar 22 2020 hr 490905538 drwxrwxr-x 2 sas 2003 58 Mar 22 2020 inventory 381853994 drwxrws--- 8 sas 3003 86 Mar 22 2020 sales 411319639 drwxrwsrwx 6 sas 2003 57 May 27 2021 shared
Let’s see which user is running the CAS container and if we can read the data in the sales area.
kubectl exec -it $(kubectl get pod -l casoperator.sas.com/node-type=controller --output=jsonpath={.items..metadata.name}) -c sas-cas-server -- bash -c "id && cat /gelcontent/gelcorp/sales/data/test.csv"
It looks like we are the user sas id 1001 and it also appears that the user 1001 cannot read the data. We will address this problem in the next hands-on.
uid=1001(sas) gid=1001(sas) groups=1001(sas) cat: /gelcontent/gelcorp/sales/data/test.csv: Permission denied command terminated with exit code 1
Validate that the NFS directories were mounted into the programming run-time Pods
Run the command below to generate a link for SAS Studio. Click on the link in the terminal window.
gellow_urls | grep "SAS Studio"
Logon as Henrik:lnxsas and select New Program. Cut and paste the following code into the SAS Studio editor. The code will access the data from the nfs mount within the compute pod. Henrik is a member of the HR group so he should be able to access the data. Save and submit the code and check the result, the libname should be allocated and the data printed.
NOTE: you may have to wait 10 to 20 seconds for the SAS Studio Compute context to initialize. If you get an error try reselecting the SAS Studio compute context
/* HR data mounted from /shared/gelcontent/gelcorp/hr/data */ libname hrdata "/gelcontent/gelcorp/hr/data"; proc print data=hrdata.performance_lookup; run;
Extra credit: try to read the data from the the Sales area at /gelcontent/gelcorp/sales/data
Get the the pod launched to run Henrik’s SAS Studio SAS session
kubectl get pods -l launcher.sas.com/requested-by-client=sas.studio,launcher.sas.com/username=Henrik --field-selector status.phase=Running --sort-by=.metadata.creationTimestamp
View the log of the pod that was launched for user Henrik.
kubectl logs $(kubectl get pods -l launcher.sas.com/requested-by-client=sas.studio,launcher.sas.com/username=Henrik --field-selector status.phase=Running --sort-by=.metadata.creationTimestamp --output=jsonpath={.items..metadata.name}) | grep HRDATA | gel_log
You should see the libname being accessed in the log.
INFO 2021-05-11 10:30:21.733 +0000 [compsrv] - NOTE: Libref HRDATA was successfully assigned as follows: INFO 2021-05-11 10:30:21.762 +0000 [compsrv] - NOTE: There were 4 observations read from the data set HRDATA.PERFORMANCE_LOOKUP. INFO 2021-05-11 10:30:23.991 +0000 [compsrv] - Request [00000076] >> GET /compute/sessions/aa210bd4-874c-4839-9615-f50eb9dc1b71-ses0000/data/HRDATA
Review
In this practice exercise you:
- made an an NFS server available to a mount point in the CAS and Programming runtime servers
- update the allowlist for the servers so that Viya can access the file-system location
- validated that the mounted directories are available and accessible.
SAS Viya Administration Operations
Lesson 03, Section 2 Exercise: Permissions and Home Directories
In this hands-on you will ensure that permissions are respected and configure SAS Studio to access user home-directories that are mounted from an NFS Server.
- Set the namespace and authenticate
- Update Identities Configuration
- Update the CAS Configuration
- Make User Home-Directories Available
- Build and Apply with sas-orchestration deploy
- Validate
- Review
Set the namespace and authenticate
gel_setCurrentNamespace gelcorp
/opt/pyviyatools/loginviauthinfo.py
Update Identities Configuration
View the users home directories that are located on the NFS server at
/shared/gelcontent/home
.ls -al /shared/gelcontent/home
Partial output:
drwxr-xr-x 36 root root 4096 Sep 21 17:38 . drwxrwxrwx 6 sas sasusers 61 Sep 21 17:43 .. drwx------ 3 Ahmed sasusers 78 Sep 21 17:38 Ahmed drwx------ 3 Alex sasusers 78 Sep 21 17:38 Alex drwx------ 3 Amanda sasusers 78 Sep 21 17:38 Amanda drwx------ 3 cas sas 78 Sep 21 17:38 cas drwx------ 3 Delilah sasusers 78 Sep 21 17:38 Delilah drwx------ 3 Douglas sasusers 78 Sep 21 17:38 Douglas drwx------ 3 Fay sasusers 78 Sep 21 17:38 Fay drwx------ 3 Fernanda sasusers 78 Sep 21 17:38 Fernanda drwx------ 3 financeservice sasusers 78 Sep 21 17:38 financeservice drwx------ 3 Fiona sasusers 78 Sep 21 17:38 Fiona drwx------ 3 Frank sasusers 78 Sep 21 17:38 Frank ...
The attribute identifier.homeDirectoryPrefix must be set on the identities service to the home-directory root location at
/shared/gelcontent/home
. Once it is set, the software will build the home directory by concatenating identifier.homeDirectoryPrefix with the username and accessing the NFS server specified in thelauncher.sas.com/nfs-server
annotation specified on the compute job context. The configuration updates in this hands-on are performed using the configurations plugin of the sas-viya CLI. These updates could also be completed in the Configuration are of SAS Environment Manager. View the current identities configuration.gel_sas_viya configuration configurations show --id $(gel_sas_viya configuration configurations list --definition-name sas.identities | jq -r '.items[0]["id"]')
Expected output:
id : e26c9e3d-9037-4860-bc56-2c01720d8e37 metadata.isDefault : false metadata.mediaType : application/vnd.sas.configuration.config.sas.identities+json;version=5 metadata.services : [identities] cache.cacheRefreshInterval : 12h cache.enabled : true cache.providerPageLimit : 1000 defaultProvider : local endpoints.secured.groups : false endpoints.secured.members : false endpoints.secured.memberships : false endpoints.secured.users : false identifier.disableGids : false identifier.disallowedUids : 1001 identifier.generateGids : false identifier.generateUids : false
Return the mediaType (mediaType can change across releases) and update configuration property config/identities/sas.identities/identifier.homeDirectoryPrefix and set value to /shared/gelcontent/home
MEDIATYPE=$(/opt/sas/viya/home/bin/sas-viya configuration configurations download -d sas.identities | jq -r '.items[]["metadata"]["mediaType"] ' ) echo ${MEDIATYPE} tee /tmp/update_identities.json > /dev/null << EOF { "items": [ { "version": 1, "metadata": { "isDefault": false, "services": [ "identities" ], "mediaType": "${MEDIATYPE}" }, "identifier.homeDirectoryPrefix": "/shared/gelcontent/home", "defaultProvider": "local" } ] } EOF gel_sas_viya configuration configurations update --file /tmp/update_identities.json
This configuration change requires a restart of the identities service. Restart identities and wait for the pod to be ready before continuing (Typically takes around 2 minutes).
kubectl delete pods -l app=sas-identities kubectl wait pods -l app=sas-identities --for condition=ready --timeout 180s
View the updated identities configuration
gel_sas_viya configuration configurations show --id $(gel_sas_viya configuration configurations list --definition-name sas.identities | jq -r '.items[0]["id"]')
Expected output:
id : e26c9e3d-9037-4860-bc56-2c01720d8e37 metadata.isDefault : false metadata.mediaType : application/vnd.sas.configuration.config.sas.identities+json;version=5 metadata.services : [identities] cache.cacheRefreshInterval : 12h cache.enabled : true cache.providerPageLimit : 1000 defaultProvider : local endpoints.secured.groups : false endpoints.secured.members : false endpoints.secured.memberships : false endpoints.secured.users : false identifier.disableGids : false identifier.disallowedUids : 1001 identifier.generateGids : false identifier.generateUids : false identifier.homeDirectoryPrefix : /shared/gelcontent/home
Update the CAS Configuration
Change the CAS Account to make Secondary Groups available
Currently the CAS pod is running as the service account with UID 1001 and GID 1001. This does no provide the necessary permissions to access key content on the NFS share. In this step we will add the supplemental groups for the user who runs CAS so that the CAS server can read the data from the NFS share.
Update the user that CAS runs as.
cd ~/project/deploy/ tee ~/project/deploy/${current_namespace}/site-config/cas-modify-user.yaml > /dev/null << EOF --- apiVersion: builtin kind: PatchTransformer metadata: name: cas-modify-user patch: |- - op: replace path: /spec/controllerTemplate/spec/securityContext/supplementalGroups value: [2003,3000,3001,3002,3003,3004,3005,3006,3007] target: group: viya.sas.com kind: CASDeployment # The following name specification will target all CAS servers. To target specific # CAS servers, comment out the following line then uncomment and edit one of the lines # targeting specific CAS servers. name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1 EOF
Modify ~/project/deploy/${current_namespace}/kustomization.yaml to reference the cas server overlay. In the transformers section add the line - site-config/cas-modify-user.yaml
Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/cas-modify-user.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-modify-user.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the transformers section to add the lines below
[...] transformers: [... previous transformers items ...] - site-config/cas-modify-user.yaml [...]
Add NFS mount for home-directoroes
Add a mount for the home-directories. The mount point inside the container must match the identifier.homeDirectoryPrefix which is /shared/gelcontent/home.
- volumeMounts : mountPath is the location inside the container
- volumes : path is the location outside the container
cd ~/project/deploy/ _deploymentNodeFQDN=$(hostname -f) tee ~/project/deploy/${current_namespace}/site-config/cas-add-nfs-homedir-mount.yaml > /dev/null << EOF # cas-add-nfs-mount.yaml # Add additional mount apiVersion: builtin kind: PatchTransformer metadata: name: cas-add-mount-nfs-homedir patch: |- - op: add path: /spec/controllerTemplate/spec/volumes/- value: name: sas-viya-gelcorp-homedir nfs: path: /shared/gelcontent/home server: ${_deploymentNodeFQDN} - op: add path: /spec/controllerTemplate/spec/containers/0/volumeMounts/- value: name: sas-viya-gelcorp-homedir mountPath: /shared/gelcontent/home target: group: viya.sas.com kind: CASDeployment # The following name specification will target all CAS servers. To target specific # CAS servers, comment out the following line then uncomment and edit one of the lines # targeting specific CAS servers. name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1 EOF
Modify ~/project/deploy/${current_namespace}/kustomization.yaml to reference the cas server overlay.
In the transformers section add the line - site-config/cas-add-nfs-mount.yaml
Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/cas-add-nfs-homedir-mount.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-add-nfs-homedir-mount.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the transformers section to add the lines below
[...] transformers: [... previous transformers items ...] - site-config/cas-add-nfs-homedir-mount.yaml [...]
Enabling Host Launched CAS Sessions
As an alternative, or in addition to modifying the account that CAS
runs under you can use the CASHostAccountRequired
custom
group. Members of this group will run the CAS process as their own
account. There is also a CAS environment variable
CASALLHOSTACCOUNTS
which forces all CAS sessions to run as
the host account (with the exception of session zero).
As an additional security measure to enable host launched CAS sessions you must include the cas-enable-host.yaml in your kustomization.yaml. It must appear before the sas-bases/overlays/required/transformers.yaml
Copy the example file.
cp -p ~/project/deploy/${current_namespace}/sas-bases/examples/cas/configure/cas-enable-host.yaml ~/project/deploy/${current_namespace}/site-config
Add the overlay for site-config/cas-server/cas-enable-host.yaml (need to be placed before
sas-bases/overlays/required/transformers.yaml
)Run this command to update your
kustomization.yaml
file using the sed tool:sed -i '/sas-bases\/overlays\/required\/transformers.yaml/i \ \ \- site-config\/cas-enable-host.yaml' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could manually edit the transformers section to add the line
- site-config/cas-enable-host.yaml
.[...] transformers: [... previous resource items ...] - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers [...]
Create the CASHostAccountRequired group.
gel_sas_viya --output text identities create-group --id CASHostAccountRequired --name "CASHostAccountRequired" --description "Run CAS as users account"
Expected output:
Id CASHostAccountRequired Name CASHostAccountRequired Description Run State active The group was created successfully.
Add users to the CASHostAccountRequired group. These users will launch there CAS session under the user identity.
sas-viya --output text identities add-member --group-id CASHostAccountRequired --user-member-id Henrik sas-viya --output text identities add-member --group-id CASHostAccountRequired --user-member-id Douglas sas-viya --output text identities add-member --group-id CASHostAccountRequired --user-member-id Delilah
Expected output:
Henrik has been added to group CASHostAccountRequired Douglas has been added to group CASHostAccountRequired Delilah has been added to group CASHostAccountRequired
This process will only work on a newly created permstore. Stop CAS so that we are able to delete the PVC. After deleting the PVC, a new persistent volume will be created when CAS is restarted. (Probably best to do this when you are initially deploying to avoid this step.)
kubectl delete casdeployment default kubectl delete pvc -l 'sas.com/backup-role=provider'
You should see:
persistentvolumeclaim "cas-default-data" deleted persistentvolumeclaim "cas-default-permstore" deleted
Make User Home-Directories Available
In the first step we updated identities to set the attribute
identifier.homeDirectoryPrefix
. In this step we will make
the home-directories available.
Add an annotation to the Launcher job to tell it user home directories will be on nfs
In order for the home directories to be accessed inside the launched container, users must specify the NFS server via pod template annotation.
Setting
launcher.sas.com/nfs-server: NFS_SERVER_LOCATION
in the pod template annotation uses the NFS mount when launching containers. If the annotation is not set, the Launcher uses hostPath by default and assumes that the user directories are available locally on the Kubernetes nodes._deploymentNodeFQDN=$(hostname -f) echo ${_deploymentNodeFQDN} tee ~/project/deploy/${current_namespace}/site-config/compute-server-annotate-podtempate.yaml > /dev/null << EOF - op: add path: "/metadata/annotations/launcher.sas.com~1nfs-server" value: ${_deploymentNodeFQDN} EOF
Modify the ~/project/deploy/${current_namespace}/kustomization.yaml to reference the compute server overlay.
In the patches section add the lines below that patch the compute server deployment
Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/compute-server-annotate-podtempate.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.patches += { "path": "site-config/compute-server-annotate-podtempate.yaml", "target": {"name": "sas-compute-job-config", "version": "v1", "kind": "PodTemplate"} }' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the patches section to add the lines below
[...] patches: [... previous patches items ...] - path: site-config/compute-server-annotate-podtempate.yaml target: name: sas-compute-job-config version: v1 kind: PodTemplate [...]
Configure SAS Studio
SAS Studio cannot, by default, access the file system. It is preferred that content such as SAS code are stored in folders. However, it is possible to configure SAS Studio to access the file system and to make users home directories available from an NFS mount. In these steps the configuration plugin of the sas-viya CLI is used. This could also have been completed interactively in SAS Environment Manager.
Update the SAS Studio configuration to set
config/SASStudio/sas.studio/
:- “showServerFiles”: true
- “serverDisplayName” : “NFS gelcontent”
- “fileNavigationRoot”: “USER”
View the configuration before the change.
gel_sas_viya configuration configurations show --id $(gel_sas_viya configuration configurations list --definition-name sas.studio | jq -r '.items[0]["id"]')
Expected output:
id : 98eeed20-a5f8-4b25-a771-c09ed7c781a9 metadata.isDefault : true metadata.mediaType : application/vnd.sas.configuration.config.sas.studio+json;version=21 metadata.services : [SASStudio studio] abandonedSessionTimeout : 5 allowCopyPasteData : true allowDownload : true allowExport : true allowGit : true allowGitKerberosAuthentication : false allowGitPassword : ******** allowGitSSHPassword : ******** allowGitSSLCertFilepath : false allowPrintData : true allowUpload : true defaultTextEncoding : UTF-8 enableAllFeatureFlags : false enableAutoCompleteLibraries : true enableAutoCompleteTables : true fileNavigationRoot : USER flowColumnLimit : 10000 longPollingHoldTimeSeconds : 30 maxGitFileSize : 3e+07 maxUploadSize : 1.048576e+08 showServerFiles : false validMemName : EXTEND validVarName : ANY
Return the mediaType (mediaType can change across releases) and update the configuration.
MEDIATYPE=$(/opt/sas/viya/home/bin/sas-viya configuration configurations download -d sas.studio | jq -r '.items[]["metadata"]["mediaType"] ' ) echo ${MEDIATYPE} tee /tmp/update_studio.json > /dev/null << EOF { "name": "configurations", "items": [ { "metadata": { "isDefault": false, "mediaType": "${MEDIATYPE}" }, "serverDisplayName": "NFS gelcontent", "showServerFiles": true, "fileNavigationRoot": "USER" } ] } EOF gel_sas_viya configuration configurations update --file /tmp/update_studio.json
View the configuration after the change.
gel_sas_viya configuration configurations show --id $(gel_sas_viya configuration configurations list --definition-name sas.studio | jq -r '.items[0]["id"]')
Expected output:
id : 98eeed20-a5f8-4b25-a771-c09ed7c781a9 metadata.isDefault : false metadata.mediaType : application/vnd.sas.configuration.config.sas.studio+json;version=21 metadata.services : [SASStudio studio] abandonedSessionTimeout : 5 allowCopyPasteData : true allowDownload : true allowExport : true allowGit : true allowGitKerberosAuthentication : false allowGitPassword : ******** allowGitSSHPassword : ******** allowGitSSLCertFilepath : false allowPrintData : true allowUpload : true defaultTextEncoding : UTF-8 enableAllFeatureFlags : false enableAutoCompleteLibraries : true enableAutoCompleteTables : true fileNavigationRoot : USER flowColumnLimit : 10000 longPollingHoldTimeSeconds : 30 maxGitFileSize : 3e+07 maxUploadSize : 1.048576e+08 serverDisplayName : NFS gelcontent showServerFiles : true validMemName : EXTEND validVarName : ANY
Build and Apply with sas-orchestration deploy
Keep a copy of the current manifest file. We will use this copy to track the changes your kustomization processing makes to this file.
cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_03-031.yaml
Run the sas-orchestration deploy command.
cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy commmand completes succesfully the final message should say The deploy command completed succesfully as shown in the log snippet below.
The deploy command started Generating deployment artifacts Generating deployment artifacts complete Generating kustomizations Generating kustomizations complete Generating manifests Applying manifests > start_leading gelcorp [...more...] > kubectl delete --namespace gelcorp --wait --timeout 7200s --ignore-not-found configmap sas-deploy-lifecycle-operation-variables configmap "sas-deploy-lifecycle-operation-variables" deleted > stop_leading gelcorp Applying manifests complete The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Run the following command to view the changes in the manifest. The changes are in green in the right column.
icdiff /tmp/${current_namespace}/manifest_03-031.yaml /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml
Validate
Validate the Change of the CAS User
Check which account the cas server is running under and see if the group and secondary group membership is established so we can read the data. (You may need to wait a few seconds for CAS to restart.)
sleep 30 kubectl wait pods -l "casoperator.sas.com/node-type==controller" --for condition=ready --timeout 620s kubectl exec -it \ $(kubectl get pod -l casoperator.sas.com/node-type=controller --output=jsonpath={.items..metadata.name}) \ -c sas-cas-server \ -- bash -c "id && ls -al /gelcontent/gelcorp/sales/data/ && head -n 4 /gelcontent/gelcorp/sales/data/test.csv"
It looks like we are the user cas (id 1002) and all of our group and secondary group memberships are established. As a result we can read the file because we are a member of the Sales group.
uid=1001(sas) gid=1001(sas) groups=1001(sas),2003,3000,3001,3002,3003,3004,3005,3006,3007 total 55468 drwxrws--- 2 sas 3003 50 Sep 7 2017 . drwxrws--- 8 sas 3003 86 Mar 22 2020 .. -rwxrwx--- 1 2002 3003 54198272 Sep 7 2017 salesmaster.sas7bdat -rwxrwx--- 1 2002 3003 2598077 Feb 14 2018 test.csv Store,Dept,Date,IsHoliday 1,1,2012-11-02,FALSE 1,1,2012-11-09,FALSE 1,1,2012-11-16,FALSE
Validate the Change to Host Launched CAS Sessions
Logon to SAS Environment Manager as Henrik and select the Data tab.
gellow_urls | grep "SAS Environment Manager"
Check to see if the CAS session is running as Henrik (4015)
kubectl exec -it $(kubectl get pod -l casoperator.sas.com/node-type=controller --output=jsonpath={.items..metadata.name}) -c sas-cas-server -- bash -c "ps -ef | grep 4015"
You should see a CAS session running as user 4015
4015 12081 2360 0 18:23 ? 00:00:00 /opt/sas/viya/home/SASFoundation/utilities/bin/cas session 141 -role controller -id 0 -keyfile - -controlpid 53873 -port 5570 -cfgpath /cas/config sas 12225 0 0 18:25 pts/0 00:00:00 bash -c ps -ef | grep 4015 sas 12232 12225 0 18:25 pts/0 00:00:00 grep 4015
Validate the Home Directories are availabe in Compute and CAS
Validate that the users home-directory is available. First, Create a SAS Program and put it on the nfs server.
sudo tee /shared/gelcontent/home/Henrik/gel_launcher_details.sas > /dev/null << EOF data _null_; /* list the attributes of this launcher session */ %put NOTE: I am &_CLIENTUSERNAME; %put NOTE: My home directory is &_USERHOME; %put NOTE: My Launcher POD IS &SYSHOSTNAME; run; /* is my CASUSER directory mounted from NFS */ cas mysess; proc cas; builtins.userinfo; table.caslibinfo / caslib='CASUSER' verbose=true; run; quit; cas mysess terminate; EOF sudo chown Henrik:sasusers /shared/gelcontent/home/Henrik/gel_launcher_details.sas sudo chmod 700 /shared/gelcontent/home/Henrik/gel_launcher_details.sas
Stay logged on as Henrik and select Develop Code and Flows to switch to SAS Studio.
You will have to allow some time for the compute context to initialize. Select Explorer and note that:
- the root of the file-system explorer is named “NFS gelcontent” (NOTE: in the latest release the node may still says SAS Server. )
- the Users home-directory from the NFS server is available
Open the Program at
NFS gelcontent > Home > gel_launcher_details.sas
and Run the code. The program will output the name of the pod for this SAS session, the username and the home-directory.In the log:
79 80 data _null_; 81 82 /* list the attributes of this launcher session */ 83 84 %put NOTE: I am &_CLIENTUSERNAME; NOTE: I am Henrik 85 %put NOTE: My home directory is &_USERHOME; NOTE: My home directory is /shared/gelcontent/home/Henrik 86 %put NOTE: My Launcher POD IS &SYSHOSTNAME; NOTE: My Launcher POD IS sas-compute-server-bcbe621a-22dd-47c6-b434-70f3601d98c2-4kznd 87 run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 88 89 /* is my CASUSER directory mounted from NFS */ 90 91 cas mysess; NOTE: The session MYSESS connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The UUID is 9f201ebb-dd02-ef4d-a9c6-5016cfb2e4fb. The user is Henrik and the active caslib is CASUSER(Henrik). NOTE: The SAS option SESSREF was updated with the value MYSESS. NOTE: The SAS macro _SESSREF_ was updated with the value MYSESS. NOTE: The session is using 0 workers. 92 proc cas; 93 builtins.userinfo; 94 table.caslibinfo / caslib='CASUSER' verbose=true; 95 run; NOTE: Active Session now MYSESS. {userInfo={userId=Henrik,providedName=Henrik,uniqueId=Henrik,groups={sasusers,CASHostAccountRequired,HR,2003,3001},providerName= OAuth,anonymous=FALSE,hostAccount=TRUE,guest=FALSE}} 96 quit; NOTE: The PROCEDURE CAS printed page 1. NOTE: PROCEDURE CAS used (Total process time): real time 0.04 seconds cpu time 0.05 seconds 97 cas mysess terminate; NOTE: Deletion of the session MYSESS was successful. NOTE: The default CAS session MYSESS identified by SAS option SESSREF= was terminated. Use the OPTIONS statement to set the SESSREF= option to an active session. NOTE: Request to TERMINATE completed for session MYSESS. 98 99 100 101 /* region: Generated postamble */
In the Result output we can see that Henrik’s CASUSER directory is also on the Shared File-system.
Make a change to the program and Save As to NFS gelcontent > Files > /gelcorp/home/Henrik/gel_launcher_details_v2.sas
Test that the program is persisted on the NFS server and available in Henrik’s home directory outside the pod. You should see the SAS program that you saved.
sudo ls -ali /shared/gelcontent/home/Henrik
total 20 461775230 drwxr-xr-x 3 Henrik sasusers 145 Sep 24 19:04 . 204782145 drwxrwxrwx 4 sas sasusers 33 Sep 24 17:12 .. 461775231 -rw-r--r-- 1 Henrik sasusers 18 Sep 24 16:27 .bash_logout 461775244 -rw-r--r-- 1 Henrik sasusers 193 Sep 24 16:27 .bash_profile 461775245 -rw-r--r-- 1 Henrik sasusers 231 Sep 24 16:27 .bashrc 461775242 -rw-r--r-- 1 Henrik sasusers 227 Sep 24 18:56 gel_launcher_details.sas 461775228 -rwxr-xr-x 1 Henrik sasusers 227 Sep 24 19:04 gel_launcher_details_v2.sas 487108963 drwxr-xr-x 4 Henrik sasusers 39 Sep 24 16:27 .mozilla
Exec into the running launcher pod and see the UID and GID of the user inside the pod and the files.
id Henrik kubectl exec -it $(kubectl get pod -l launcher.sas.com/requested-by-client=sas.studio,launcher.sas.com/username=Henrik --output=jsonpath={.items..metadata.name}) -- bash -c "id && ls -al /gelcontent/home/Henrik"
uid=4015 gid=2003 groups=2003,3001 total 24 drwx------ 3 4015 2003 145 May 4 18:31 . drwxr-xr-x 35 root root 4096 May 4 08:52 .. -rw------- 1 4015 2003 18 May 4 08:52 .bash_logout -rw------- 1 4015 2003 193 May 4 08:52 .bash_profile -rw------- 1 4015 2003 231 May 4 08:52 .bashrc -rwx------ 1 4015 2003 367 May 4 18:25 gel_launcher_details.sas -rwxr-xr-x 1 4015 2003 379 May 4 18:31 gel_launcher_details_v2.sas drwx------ 4 4015 2003 39 May 4 08:52 .mozilla
Review
In this practice exercise you:
- Updated the identites configuration to set identifier.homeDirectoryPrefix to the home-directory root location at /shared/gelcontent/home.
- Update the CAS Configuration to change the CAS user, mount home-directories and enable host-launched CAS sessions for a subset of users.
- Ensured that home-directores are available to SAS Programming Run-time
- Configured SAS Studio to access the file-system and the home-directories.
- Validated all the changes.
SAS Viya Administration Operations
Lesson 03, Section 2 Exercise: Preserve Data Permissions
In our introduction to kustomize we created a PVC that is mounted to CAS. In this section we will copy data to the PVC. We will use a kubernetes job to copy the data to the PVC. Using the job we can ensure that the permissions are preserved. The job is passed
the source location
the target claim
the target sub-directory
Set namespace and authenticate
In a MobaXterm session on sasnode01, set the current namespace to the gelcorp deployment.
gel_setCurrentNamespace gelcorp /opt/pyviyatools/loginviauthinfo.py
Copy Data to the PVC
Copy data to the PVC. Create ConfigMap with job parameters
cd ~/project/deploy/ # source directory all files will be copied _mysource=/shared/gelcontent/gelcorp # target persistent volume claim _targetclaim=gelcontent-data #target directory _targetdir=/gelcorp tee ~/project/deploy/${current_namespace}/site-config/gel-sas-copy-data-configmap.yaml > /dev/null << EOF --- apiVersion: v1 data: _SOURCEDIR: ${_mysource} _TARGETCLAIM: ${_targetclaim} _TARGETDIR: ${_targetdir} kind: ConfigMap metadata: annotations: {} name: gel-copy-data-parameters namespace: ${current_namespace} EOF
Create the Job to perform the copy
tee ~/project/deploy/${current_namespace}/site-config/gel-sas-copy-data.yaml > /dev/null << EOF apiVersion: batch/v1 kind: Job metadata: name: gel-sas-copy-data labels: app.kubernetes.io/name: gel-sas-copy-data spec: template: spec: containers: - name: copydata image: registry.access.redhat.com/ubi7/ubi:latest envFrom: - configMapRef: name: gel-copy-data-parameters command: ["/bin/sh","-c"] args: - echo Starting copy from \$(_SOURCEDIR) to PVC \$(_TARGETCLAIM) and directory \$(_TARGETDIR) ; mkdir -p /target_location\$(_TARGETDIR); chmod 770 /target_location\$(_TARGETDIR); cp -pr /source_location/* /target_location\$(_TARGETDIR); ls -al /target_location\$(_TARGETDIR); echo Completed; volumeMounts: - name: viya-source-location mountPath: /source_location - name: viya-target-location mountPath: /target_location securityContext: fsGroup: 1001 runAsGroup: 1001 runAsUser: 1001 supplementalGroups: - 2003 - 3000 - 3001 - 3002 - 3003 - 3004 - 3005 - 3006 - 3007 volumes: - name: viya-source-location nfs: server: sasnode01 path: "${_mysource}" - name: viya-target-location persistentVolumeClaim: claimName: ${_targetclaim} restartPolicy: Never EOF
Run the Job to copy the data to the PVC
cd ~/project/deploy/ kubectl apply -f ~/project/deploy/${current_namespace}/site-config/gel-sas-copy-data-configmap.yaml kubectl apply -f ~/project/deploy/${current_namespace}/site-config/gel-sas-copy-data.yaml
Check that the job completed
kubectl get job gel-sas-copy-data
Expected output:
log NAME COMPLETIONS DURATION AGE gel-sas-copy-data 0/1 44s 44s
Check if the data has been copied by viewing the log.
kubectl logs -l job-name=gel-sas-copy-data
Expected output:
log Starting copy from /shared/gelcontent/gelcorp to PVC gelcontent-data and directory /gelcorp total 0 drwxrwx--- 7 1001 1001 75 May 4 18:47 . drwxrwxrwx 3 root root 21 May 4 18:47 .. drwxrws--- 8 1001 3004 86 Jan 10 2020 finance drwxrws--- 8 1001 3001 86 Mar 22 2020 hr drwxrwxr-x 2 1001 2003 58 Mar 22 2020 inventory drwxrws--- 8 1001 3003 86 Mar 22 2020 sales drwxrwsrwx 6 1001 2003 57 May 27 2021 shared Completed
Check that the Data is Available on the PVC
Check that the data is available in the PVC and that we can read it.
kubectl exec -it $(kubectl get pod -l casoperator.sas.com/node-type=controller --output=jsonpath={.items..metadata.name}) -c sas-cas-server -- sh -c "id && ls -al /mnt/gelcontent/gelcorp/sales/data/ && head -n 4 /mnt/gelcontent/gelcorp/sales/data/test.csv"
Expected output:
log uid=1001(sas) gid=1001(sas) groups=1001(sas),2003,3000,3001,3002,3003,3004,3005,3006,3007 total 55468 drwxrws--- 2 sas 3003 50 Sep 7 2017 . drwxrws--- 8 sas 3003 86 Mar 22 2020 .. -rwxrwx--- 1 sas 3003 54198272 Sep 7 2017 salesmaster.sas7bdat -rwxrwx--- 1 sas 3003 2598077 Feb 14 2018 test.csv Store,Dept,Date,IsHoliday 1,1,2012-11-02,FALSE 1,1,2012-11-09,FALSE 1,1,2012-11-16,FALSE
Delete the job
kubectl delete job gel-sas-copy-data
SAS Viya Administration Operations
Lesson 03, Section 3 Exercise: Load Content with Automation
In this exercise we will pull the SAS provided sas-viya-cli docker image and use it in an Apache Airflow flow to initialze a Viya environment.
- Set the namespace and authenticate
- Pull and use the SAS Provided sas-viya cli image
- Use pre-built administration CLI images
- Use Apache Airflow to orchestrate processing using the sas-viya CLI container
- Run the flow and review the results
- Validate
- Review
Set the namespace and authenticate
In a MobaXterm session on sasnode01, set the current namespace to the gelcorp deployment.
gel_setCurrentNamespace gelcorp
source ~/project/deploy/.${current_namespace}_vars
export SAS_CLI_PROFILE=${current_namespace}
export SSL_CERT_FILE=~/.certs/${current_namespace}_trustedcerts.pem
export REQUESTS_CA_BUNDLE=${SSL_CERT_FILE}
/opt/pyviyatools/loginviauthinfo.py
Pull and use the SAS Provided sas-viya cli image
The sas-viya CLI Container image is available with a SAS Viya license. To pull the image you need the certificates that are included with your order. In this step we will use mirrormanager to return the image name of the sas-viya cli image in our order.
source /opt/gellow_work/vars/vars.txt climage=$(mirrormgr list remote docker tags --deployment-data /home/cloud-user/project/deploy/license/SASViyaV4_${GELLOW_ORDER}_certs.zip --cadence ${GELLOW_CADENCE_NAME}-${GELLOW_CADENCE_VERSION} | grep sas-viya-cli:latest) echo Order Number is ${GELLOW_ORDER} and latest image is ${climage}
Expected output:
log Order Number is 9CYNLY and latest image is cr.sas.com/viya-4-x64_oci_linux_2-docker/sas-viya-cli:1.1.0-20240319.1710834807264
Use mirror manager to retrieve the logon credentials and logon to the docker registry.
logincmd=$(mirrormgr list remote docker login --deployment-data /home/cloud-user/project/deploy/license/SASViyaV4_${GELLOW_ORDER}_certs.zip) echo $logincmd eval $logincmd
Expected output: ```log docker login -u 9CYNLY -p ‘!|gd^X3Vq0fVJbiL1h9N1JVJ#0mqf986’ cr.sas.com WARNING! Using –password via the CLI is insecure. Use –password-stdin. WARNING! Your password will be stored unencrypted in /home/cloud-user/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded ```
Get CLI image tags.
climage=$(mirrormgr list remote docker tags --deployment-data /home/cloud-user/project/deploy/license/SASViyaV4_${GELLOW_ORDER}_certs.zip --cadence ${GELLOW_CADENCE_NAME}-${GELLOW_CADENCE_VERSION} | grep sas-viya-cli:latest) echo ${climage}
Pull the image and tag it as sas-viya-cli:v1.
docker pull ${climage} docker tag ${climage} sas-viya-cli:v1
Use
docker container run
to test the image. Initially lets just view the CLI help.docker container run -it sas-viya-cli:v1 --help
We will need to provide the SAS Viya certificates to the container. In this step download the certificate file.
kubectl cp $(kubectl get pod | grep "sas-logon-app" | head -1 | awk -F" " '{print $1}'):security/trustedcerts.pem /tmp/trustedcerts.pem
To authenticate with userid and password set
VIYA_USER
,VIYA_PASSWORD
andSAS_SERVICES_EDNPOINT
environment variables. Use the docker run command to authenticate as sasadm and runsas-viya identities whoami
.NOTE: in this syntax you must use the export command to set the environment variables.
export VIYA_USER=sasadm export VIYA_PASSWORD=lnxsas export SAS_SERVICES_ENDPOINT=https://${current_namespace}.$(hostname -f) docker run -it -e SAS_SERVICES_ENDPOINT -v /tmp:/security -e VIYA_USER -e VIYA_PASSWORD sas-viya-cli:v1 --output text identities whoami
Expected output:
log https://gelcorp.pdcesx02038.race.sas.com Login succeeded. Token saved. Id sasadm Name SAS Administrator Title EmailAddresses [map[value:sasadm@gelcorp.com]] PhoneNumbers Addresses [map[country: locality:Cary postalCode: region:]] State active ProviderId ldap CreationTimeStamp ModifiedTimeStamp
You can also use the locally available profile and credentials files by mounting them into the container. This has the benefit of not authenticating on every call. You have to do a few things differently:
- override the entrypoint so that the default authentication is not used
- specify the user
- mount in the cli configuration and credentials file
- specifyy the CLI profile to use as an environment variable
docker container run -it \ "/bin/bash" \ --entrypoint \ --user 1000:1000 \ -v /tmp:/tmp ${SSL_CERT_FILE}:/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -v \ -v ~/.sas/config.json:/cli-home/.sas/config.json \ -v ~/.sas/credentials.json:/cli-home/.sas/credentials.json `basename ${SSL_CERT_FILE}` \ -e SSL_CERT_FILE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e REQUESTS_CA_BUNDLE=/cli-home/sas/.certs/$SAS_CLI_PROFILE sas-viya-cli:v1 -c "./sas-viya --output text identities whoami" -e SAS_CLI_PROFILE=
Expected output:
log Id geladm Name geladm Title Platform Administrator EmailAddresses [map[value:geladm@gelcorp.com]] PhoneNumbers Addresses [map[country: locality:Cary postalCode: region:]] State active ProviderId ldap CreationTimeStamp ModifiedTimeStamp
Use pre-built administration CLI images
You can build your own docker image based on the SAS provided sas-viya cli image. This allows you to add additional tools to the cli image. For the rest of class we will use an image built from the SAS provided Viya CLI image. The image has been pre-built and stored in the gelharbor docker registry. The container images are automatically re-built weekly using a Jenkins process. The docker files and scripts are stored in gitlab here.
Pull the latest build of the sas-viya cli container and test by displaying the version number.
docker pull gelharbor.race.sas.com/admin-toolkit/sas-viya-cli:latest docker tag gelharbor.race.sas.com/admin-toolkit/sas-viya-cli:latest sas-viya-cli:latest docker container run -it sas-viya-cli:latest ./sas-viya --version
Expected output:
log latest: Pulling from admin-toolkit/sas-viya-cli Digest: sha256:02650157e29f0950b62b9508c6b9e4bc105a7213f46c53bd2427d0c348ddab9b Status: Image is up to date for gelharbor.race.sas.com/admin-toolkit/sas-viya-cli:latest gelharbor.race.sas.com/admin-toolkit/sas-viya-cli:latest sas-viya version 1.22.3
Run ad-hoc CLI processing using the container. In this example we will use the container to run a CLI command. The CLI configuration and credentials file are mounted into the container.
docker container run -it \ -v ${SSL_CERT_FILE}:/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -v ~/.sas/config.json:/cli-home/.sas/config.json \ -v ~/.sas/credentials.json:/cli-home/.sas/credentials.json \ -e SSL_CERT_FILE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e REQUESTS_CA_BUNDLE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e SAS_CLI_PROFILE=${current_namespace} \ --output text identities whoami sas-viya-cli:latest sas-viya
Expected output:
log Id geladm Name geladm Title Platform Administrator EmailAddresses [map[value:geladm@gelcorp.com]] PhoneNumbers Addresses [map[country: locality:Cary postalCode: region:]] State active ProviderId ldap CreationTimeStamp ModifiedTimeStamp
There is a lot more typing to use the containerized CLI. In the class environment we have defined two functions that will allow us to run ad hoc commands and scripts more easily. Review the functions.
cat ~/geladmin_common_functions.shinc | grep gel_sas_viya -A 24
gel_sas_viya () { # if env var not set set it to Default SAS_CLI_PROFILE=${SAS_CLI_PROFILE:=Default} # run the sas-admin cli in a container docker container run -it \ -v /tmp:/tmp \ -v ${SSL_CERT_FILE}:/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -v ~/.sas/config.json:/cli-home/.sas/config.json \ -v ~/.sas/credentials.json:/cli-home/.sas/credentials.json \ -e SSL_CERT_FILE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e REQUESTS_CA_BUNDLE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e SAS_CLI_PROFILE=$SAS_CLI_PROFILE gelharbor.race.sas.com/admin-toolkit/sas-viya-cli sas-viya $@ } gel_sas_viya_batch () { if [ $# -eq 0 ]; then echo "ERROR: pass the function the full path to a script" return fi # if env var not set set it to Default SAS_CLI_PROFILE=${SAS_CLI_PROFILE:=Default} # run the sas-admin cli in a container docker container run -it \ -v /tmp:/tmp \ -v /shared/gelcontent:/gelcontent \ -v ${SSL_CERT_FILE}:/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -v ~/.sas/config.json:/cli-home/.sas/config.json \ -v ~/.sas/credentials.json:/cli-home/.sas/credentials.json \ -e SSL_CERT_FILE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e REQUESTS_CA_BUNDLE=/cli-home/.certs/`basename ${SSL_CERT_FILE}` \ -e SAS_CLI_PROFILE=$SAS_CLI_PROFILE gelharbor.race.sas.com/admin-toolkit/sas-viya-cli sh $@ }
Here we can run the same cli command using the function gel_sas_viya. Now the command to use the containerized cli is basically the same as using the downloaded cli.
gel_sas_viya --output text identities whoami
Use Apache Airflow to orchestrate processing using the sas-viya CLI container
The flow will execute scripts that run series of sas-viya commands to perform a specific administration tasks The scripts are stored on an NFS server and will be mounted into the PODS in the flow.
The following scripts will be executed
- Setup identities : 01-setup-identities.sh
- Create a preliminary folder structure : 02-create-folders.sh
- Apply an authorization schema to the folder structure: 03-setup-authorization
- Create some caslibs for data access: 04-setup-caslibs.sh
- Load Data: 05-setup-loaddata.sh
- Apply CAS authorization: 06-setup-casauth.sh
- Load content from Viya Packiges: 07-load-content.sh
- Validate the success of the process: 08-validate.sh
Copy the scripts and files to the shared storage which will be mounted into the pods.
cp -pr ~/PSGEL260-sas-viya-4.0.1-administration/files/gelcorp_initenv /shared/gelcontent/ chmod -R 755 /shared/gelcontent/gelcorp_initenv
Copy the scripts and configuration files
Copy the scripts and files to the shared storage which will be mounted into the pods.
cp -pr ~/PSGEL260-sas-viya-4.0.1-administration/files/gelcorp_initenv /shared/gelcontent/ chmod -R 755 /shared/gelcontent/gelcorp_initenv
Copy the SAS Viya CLI configuration and credential files to the project directory. The sas-viya cli uses a profile to store the connection information for the Viya environment, a credentials file to store the access token used to access the environment, and needs to be able to references the certificates for the Viya environment. In this step these files will be copied to our project directory and then we will generate configMaps that include their content. Ultimately the configmaps will be mounted into the sas-viya CLI container so that it can access Viya.
/opt/pyviyatools/loginviauthinfo.py mkdir -p ~/project/admincli/${current_namespace} cp -p ~/.sas/config.json -p ~/project/admincli/${current_namespace}/ cp -p ~/.sas/credentials.json -p ~/project/admincli/${current_namespace}/ cp -p ~/.certs/${current_namespace}_trustedcerts.pem -p ~/project/admincli/${current_namespace}/trustedcerts.pem
Create configmaps for CLI config files in the airflow namespace.
tee ~/project/admincli/${current_namespace}/kustomization.yaml > /dev/null << EOF --- generatorOptions: disableNameSuffixHash: true configMapGenerator: - name: cli-config files: - config.json - name: cli-token files: - credentials.json - name: cert-file files: - trustedcerts.pem EOF cd ~/project/admincli/${current_namespace} kustomize build -o ~/project/admincli/${current_namespace}/configmaps.yaml kubectl -n airflow apply --server-side=true -f ~/project/admincli/${current_namespace}/configmaps.yaml
Create the Python script that defines the flow
The workflow is created as a python script. In this step we will review the script and copy it to the airflow dags directory.
Notice the following in the flow definition:
- the dag item defines each task,the order of execution and dependencies for the tasks
- each task runs a script that is mounted into the POD from the NFS server
- the container image used is gelharbor.race.sas.com/admin-toolkit/sas-viya-cli:latest
- the credentials, certifcates and CLI profile are mounted into the POD from config maps.
Copy the python file that defines the flow to the airflow DAG directory and review the content.
cp /home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/files/dags/001-load-content.py /shared/gelcontent/airflow/dags/001-load-content.py cat /shared/gelcontent/airflow/dags/001-load-content.py
Run the flow and review the results
In a MobaXterm session on sasnode01, generate the Airflow URL and logon using
admin:admin
.gellow_urls | grep Airflow
In the DAG’s tab notice we have a flow
01-load-content-flow
. The flow has been loaded to Airflow because the software is configured to register flows from any python scripts copied to those directory. Open01-load-content-flow
. Review the flow diagram.Select
Graph
Select the
Run
icon and selectTrigger DAG
. The flow should run and if it is succesful all the nodes should turn green.Click on
task-04-setup-caslibs
and then selectLogs
to view the log from the step.We can also view the log of each task using kubectl.
kubectl -n airflow logs -l task_id=task-04-setup-caslibs --tail 50
Expected output: ```log
The requested caslib “hrdl” has been added successfully. Caslib Properties Name hrdl Server cas-shared-default Description gelcontent hrdl Source Type PATH Path /gelcontent/gelcorp/hr/data/ Scope global Caslib Attributes active true personal false subDirs false The requested caslib "Financial Data" has been added successfully. Caslib Properties Name Financial Data Server cas-shared-default Description gelcontent finance Source Type PATH Path /gelcontent/gelcorp/finance/data/ Scope global Caslib Attributes active true personal false subDirs false
```
If we look at the PODS in the airflow namespace we will see there is a POD with the status Completed for each node in the flow.
kubectl get pods -n airflow | grep task
Expected output:
log task-01-setup-identities-ed8va44f 0/1 Completed 0 22m task-02-setup-folders-2jmd0504 0/1 Completed 0 22m task-03-setup-authorization-7262tzvh 0/1 Completed 0 22m task-04-setup-caslibs-krhgav20 0/1 Completed 0 22m task-05-setup-loaddata-7wp8rnjw 0/1 Completed 0 22m task-06-setup-casauth-37cs2f7n 0/1 Completed 0 21m task-07-load-content-rl3i11i4 0/1 Completed 0 21m task-08-validate-dgaurcjg 0/1 Completed 0 21m
Clean up PODS. Because we set
is_delete_operator_pod=False
the PODS remain even when the task is complete. We did this so that we would have PODS to inspect and view the logs. Now we can cleanup the pods.kubectl -n airflow delete pods -l dag_id=01-load-content-flow
Expected output:
log pod "task-01-setup-identities-81poa3v2" deleted pod "task-02-setup-folders-kuc8pm4f" deleted pod "task-03-setup-authorization-eiboo19i" deleted pod "task-04-setup-caslibs-9plc51t4" deleted pod "task-05-setup-loaddata-c0ijir2u" deleted pod "task-06-setup-casauth-81l4fovm" deleted pod "task-07-load-content-hmjars3f" deleted pod "task-08-validate-c3lk6ayj" deleted
Validate
The validation is run in the last step of the flow.
View the Validation Report. In MobaXterm sasnode1 sftp tab navigate to /shared/gelcontent/gelcorp_initenv/.
Select the html file that starts with report-, right-click, select Open with and open the report with Google Chrome.
Review the report to check what the folders for content were created, the caslib is running and new caslibs are available.
We could also use the CLI to validate for example, list folders
gel_sas_viya --output text folders list-members --path /gelcontent --recursive --tree
|—— gelcontent | |—— GELCorp | | |—— Finance | | | |—— Reports | | | | |—— RevenueTrend (report) | | | | |—— FinanceOverTime (report) | | | | |—— Profit Pie Chart (jobDefinition) | | | | |—— Profit Bar Chart (jobDefinition) | | | | |—— Map of Profit by State (jobDefinition) | | | | |—— LossMakingProductRank (report) | | | |—— Data | | | | |—— FinanceLASRAppendTables1 (dataPlan) | | | | |—— Source Data | | |—— Shared | | | |—— Reports | | | | |—— GELCORP Shared HR Summary Report (report) | | |—— HR | | | |—— Code | | | | |—— HRAnalysysProject | | | | | |—— 4_LoadDataInCAS.sas (file) | | | | | |—— 2_CreateDataInSAS.sas (file) | | | | | |—— 1_CreateFormatsInSAS.sas (file) | | | | | |—— 3_LoadFormatsInSAS.sas (file) | | | |—— Work in Progress | | | |—— Data Plans | | | |—— WorkinProgress | | | |—— Reports | | | | |—— Employee measure histograms (report) | | | | |—— Employee Attrition Overview (report) | | | | |—— Employee attrition factors heatmap (report) | | | | |—— Employee attrition factors correlation (report) | | | |—— Analyses | | | | |—— Cluster Analysis for employees who left (report) | | | | |—— EmployeeSurveyDecisionTree (report) | | | | |—— Regression Analysis of Employee Attrition (report) | | |—— Sales | | | |—— Data Plans | | | |—— Work in Progress | | | |—— WorkinProgress | | | |—— Reports | | | | |—— Sales Forecast (report) | | | | |—— Sales Correlation (report) | | | | |—— Sales Overview (report) | | | |—— Analyses | | | | |—— TemperaturevSales (report) | | | | |—— Sales Regression Analysis (report)
Logon to SAS Drive as geladm : lnxsas and view Reports. Run the command below to generate a link for SAS Drive. Click on the link in the terminal window
gellow_urls | grep "SAS Drive"
Navigate to SAS Content > gelcontent > GELCorp > HR >Reports
Open the Employee attrition factors correlation report
Review
In the practice exercise you
- pulled the sas provided sas-viya-cli docker image
- authenticated and use the cli in the container
- created an Apache Aiflow process to initialize the environment for users. Each step of the flow is run in a the sas-viya cli container and runs a script to perform its task.
- validated that the the flow executed succesfully.
SAS Viya Administration Operations
Lesson 04, Section 1 Exercise: Configure Backup and Restore
Review Backup Settings and Change the Retain Policy
Set current namespace and authenticate.
gel_setCurrentNamespace gelcorp
/opt/pyviyatools/loginviauthinfo.py
Change the Retain Policy of the SAS Backup Persistent Volumes
The Viya Documentation recommends setting the ReclaimPolicy for the Backup PV’s to Retain. In this section we will make that change. With the “Retain” policy, if the PersistentVolumeClaim is deleted, the corresponding PersistentVolume will not be deleted, allowing data to be manually recovered.
Viya requires at least one ReadWriteMany (RWX) StorageClass has been defined and set as the default. In the command below we view default storage classes in our cluster. Notice the RECLAIMPOLICY is delete. This means that any PV’s created from the storage class will inherit this policy.
kubectl get storageclass
Expected output:
log NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE nfs-client (default) cluster.local/nfs-nfs-subdir-external-provisioner Delete Immediate true 9h
View the Backup Persistent volume claims and their volumes.
kubectl get pvc -l 'sas.com/backup-role=storage'
Expected output:
log NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE sas-cas-backup-data Bound pvc-4ea3e4e3-4efc-4e62-b5c1-a959759e4a79 8Gi RWX nfs-client 5d18h sas-common-backup-data Bound pvc-5ba4898e-0309-46dc-977b-13ba5d3e5b07 25Gi RWX nfs-client 5d18h
Store the two backup volume names in environment variables.
casbackvolname=$(kubectl get pvc sas-cas-backup-data -o jsonpath='{.spec.volumeName}') commonbackvolname=$( kubectl get pvc sas-common-backup-data -o jsonpath='{.spec.volumeName}') echo CAS Backup PV: ${casbackvolname} AND Common Backup PV: ${commonbackvolname}
Expected output:
log CAS Backup PV: pvc-4f82795c-bb7d-4fd7-80f1-50785369259f AND Common Backup PV: pvc-5ff882a4-fb4d-4ae4-8b56-65b442966805
Describe the sas-cas-backup-data volume, notice the
Reclaim Policy
, inherited from the storage class isDelete
. For the backup data it would be more appropriate to have a Reclaim policy ofRetain
. With the Retain policy, if a user deletes a PersistentVolumeClaim, the corresponding PersistentVolume will not be deleted, allowing data to be manually recovered.kubectl describe pv ${casbackvolname}
Expected output:
log Name: pvc-4ea3e4e3-4efc-4e62-b5c1-a959759e4a79 Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: cluster.local/nfs-nfs-subdir-external-provisioner Finalizers: [kubernetes.io/pv-protection] StorageClass: nfs-client Status: Bound Claim: from35/sas-cas-backup-data Reclaim Policy: Delete Access Modes: RWX VolumeMode: Filesystem Capacity: 8Gi Node Affinity: <none> Message: Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: intnode01 Path: /srv/nfs/kubedata/from35-sas-cas-backup-data-pvc-4ea3e4e3-4efc-4e62-b5c1-a959759e4a79 ReadOnly: false Events: <none>
The two persistent volumes were dynamically provisioned. In order to update the ReclaimPolicy we must patch both Backup volumes, setting
spec.persistentVolumeReclaimPolicy
toRetain
.kubectl patch pv ${casbackvolname} -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' kubectl patch pv ${commonbackvolname} -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
Expected output:
log persistentvolume/pvc-30887465-dc86-4fd9-b6f2-eaaba3a7d91a patched persistentvolume/pvc-cc6ea395-991e-4bb7-9cb0-7cb623900643 patched
Check that the reclaim poicy has been updated. Notice we now have two Persistent Volumes with a reclaimPolicy of
Retain
.kubectl get pv | grep Retain | grep backup
Expected output:
log pvc-0a852e85-48cc-437b-95f7-92543c9b0e21 25Gi RWX Retain Bound gelcorp/sas-common-backup-data nfs-client 20h pvc-a548e136-9209-430e-b627-6f129308b0d0 8Gi RWX Retain Bound gelcorp/sas-cas-backup-data nfs-client 20h
Changing the reclaim policy to
RETAIN
is a best practice for the Viya Backup volumes. This would mean that in the event of a problem in the namespace the backup would be perserved and the data could be used in a restore.NOTE: because k8s will no longer automatically clean-up these volumes the Viya administrator should make sure any data no longer needed on the volume is deleted.
Review the Current Backup Settings
List the backup and restore cronjobs. Notice that the two that are not suspended(False) are the scheduled backup and purge cronJobs.
kubectl get cronjobs | grep -E "backup|restore"
Expected output:
log sas-backup-purge-job 15 0 1/1 * ? False 0 12h 15h sas-backup-pv-copy-cleanup-job * * 30 2 * True 0 <none> 15h sas-restore-job * * 30 2 * True 0 <none> 15h sas-scheduled-backup-all-sources 0 1 * * 6 True 0 <none> 15h sas-scheduled-backup-incr-job 0 6 * * 1-6 True 0 <none> 15h sas-scheduled-backup-job 0 1 * * 0 False 0 <none> 15h
Review the settings in the backup configMap. The configMap holds parameters for the regularly scheduled backup job and any adhoc jobs created from it. First get the name of the sas-backup-job-parameters configMap.
BACKUP_CM=$(kubectl describe cronjob sas-scheduled-backup-job | grep -i sas-backup-job-parameters | awk '{print $1}'|head -n 1) echo ${BACKUP_CM}
Expected output like:
log sas-backup-job-parameters-tbd6g9ttmh
Describe the configMap.
kubectl describe cm ${BACKUP_CM}
Expected output: ```log Name: sas-backup-job-parameters-d24749d25f Namespace: gelcorp Labels: sas.com/admin=cluster-local sas.com/deployment=sas-viya Annotations:
Data ==== SG_GO_MODULES_ENABLED: —- true SG_PROJECT: —- backup CNTR_REPO_PREFIX: —- convoy INCLUDE_POSTGRES: —- true JOB_TIME_OUT: —- 1200 SAS_BACKUP_JOB_DU_NAME: —- sas-backup-job SAS_LOG_LEVEL: —- DEBUG SAS_SERVICE_NAME: —- sas-backup-job SG_GO_MULTI_MODULES: —- true FILE_SYSTEM_BACKUP_FORMAT: —- tar RETENTION_PERIOD: —- 2 SAS_CONTEXT_PATH: —- backup SAS_DU_NAME: —- backup
BinaryData ====
Events:
``` List the currently scheduled backups. As you can see from the schedule, by default, a backup is scheduled weekly on Sunday at 1:00 am UTC and the all-sources backup is suspended.
kubectl get cronjobs -l "sas.com/backup-job-type=scheduled-backup"
Expected output:
log NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE sas-scheduled-backup-all-sources 0 1 * * 6 True 0 <none> 15h sas-scheduled-backup-job 0 1 * * 0 False 0 <none> 13h
Purging is performed through a CronJob that executes daily at 12:15 a.m. Get the details of the current backup purge job.
kubectl get cronjobs -l "sas.com/backup-job-type=purge-backup"
Expected output
log NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE sas-backup-purge-job 15 0 1/1 * ? False 0 13h 15h
OPTIONAL: Change the Backup Settings
In this OPTIONAL section we will update the backup configuation. We will change the default
- backup retention period
- backup schedule
We will make the changes in the manifests and then build and apply to make the changes in the cluster.
Backup Job Parameters
To change the backup sas-backup-job-parameters configMap to update the backup job parameters. In this step we change the backup retention period for the backup and the level for log messages.
Run this command to insert three constants into the configMapGenerator section of kustomization.yaml.
[[ $(grep -c "name: sas-backup-job-parameters" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.configMapGenerator += { "name": "sas-backup-job-parameters", "behavior": "merge", "literals": ["RETENTION_PERIOD=5","SAS_LOG_LEVEL=INFO"] }' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited kustomization.yaml to include the constants.
[...] configMapGenerator: [... previous resource items ...] - name: sas-backup-job-parameters behavior: merge literals: - RETENTION_PERIOD=5 - SAS_LOG_LEVEL=INFO [...]
Backup Schedule
Create a patch transformer that will update the schedule of the default backup cronJob.
tee ~/project/deploy/${current_namespace}/site-config/change-default-backup-schedule.yaml > /dev/null << EOF --- apiVersion: builtin kind: PatchTransformer metadata: name: sas-scheduled-backup-job-change-default-backup-transformer patch: |- - op: replace path: /spec/schedule value: '0 3 * * 6' target: name: sas-scheduled-backup-job kind: CronJob version: v1 EOF
Modify ~/project/deploy/${current_namespace}/kustomization.yaml to reference the patch transformer overlay.
In the transformers section add the line - site-config/change-default-backup-schedule.yaml
Run this command to update
kustomization.yaml
using the yq tool:[[ $(grep -c "site-config/change-default-backup-schedule.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/change-default-backup-schedule.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can manually edit the transformers section to add the lines below
[...] transformers: [... previous transformers items ...] - site-config/change-default-backup-schedule.yaml [...]
Build and Apply with sas-orchestration deploy
Keep a copy of the current manifest file. We will use this copy to track the changes your kustomization processing makes to this file.
cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_03-051.yaml
Run the sas-orchestration deploy command.
cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy commmand completes succesfully the final message should say The deploy command completed succesfully as shown in the log snippet below.
The deploy command started Generating deployment artifacts Generating deployment artifacts complete Generating kustomizations Generating kustomizations complete Generating manifests Applying manifests > start_leading gelcorp [...more...] > kubectl delete --namespace gelcorp --wait --timeout 7200s --ignore-not-found configmap sas-deploy-lifecycle-operation-variables configmap "sas-deploy-lifecycle-operation-variables" deleted > stop_leading gelcorp Applying manifests complete The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Run the following command to view the changes in the manifest. The changes are in green in the right column.
icdiff /tmp/${current_namespace}/manifest_03-051.yaml /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml
Check the default backup schedule has been updated.
kubectl get cronjobs -l "sas.com/backup-job-type=scheduled-backup"
Expected output:
log NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE sas-scheduled-backup-job 0 3 * * 6 False 0 <none> 13h
Review
- Set the retain policy on the Viya backup PV’s to protect the backup package from accidental deletion
- Backup and restore are implemented using Kubernetes cronJobs
- You can change the default backup settings using the provided overlays
SAS Viya Administration Operations
Lesson 04, Section 2 Exercise: Perform a Backup
In this hands-on you will perform an ad hoc backup of the Viya deployment and copy the resulting backup package outside of the cluster.
- Run the Ad-hoc Backup
- Review the Ad-hoc Backup Results
- Copy the Backup Package outside the cluster
- Review
Run the Ad-hoc Backup
To run an adhoc backup create a backup job from the default SAS Viya scheduled backup cronJob.
Create the adhoc backup job from the scheduled backup.
cd ~/project/deploy/${current_namespace} kubectl create job --from=cronjob/sas-scheduled-backup-job sas-scheduled-backup-job-adhoc-001
It will take a moment for the job to be created and start running. Check if the job has started.
kubectl get jobs sas-scheduled-backup-job-adhoc-001
When the job starts, you can view the job progress in the log. This command will display the backup log as the job runs. You can press CTRL-C to stop viewing the log.
kubectl logs -f job/sas-scheduled-backup-job-adhoc-001 -c sas-backup-job | gel_log
Expected output: ```log
INFO 2023-08-24 14:55:58.735 +0000 [sas-backupjob] - Received a response from the backup agent for the “cas” data source and the “backup” job. INFO 2023-08-24 14:55:58.736 +0000 [sas-backupjob] - The “backup” job for the “cas” data source finished with the status “Completed”. INFO 2023-08-24 14:55:58.736 +0000 [sas-backupjob] - Received a response from the backup agent for the “cas” data source and the “backup” job. INFO 2023-08-24 14:55:58.736 +0000 [sas-backupjob] - The “backup” job for the “cas” data source finished with the status “Completed”. INFO 2023-08-24 14:55:58.736 +0000 [sas-backupjob] - Received a response from the backup agent for the “configurations” data source and the “backup” job. INFO 2023-08-24 14:55:58.736 +0000 [sas-backupjob] - The “backup” job for the “configurations” data source finished with the status “Completed”. INFO 2023-08-24 14:56:58.736 +0000 [sas-backupjob] - Received a response from the backup agent for the “postgres” data source and the “backup” job. INFO 2023-08-24 14:56:58.736 +0000 [sas-backupjob] - The “backup” job for the “postgres” data source finished with the status “Completed”. INFO 2023-08-24 14:57:58.737 +0000 [sas-backupjob] - Received a response from the backup agent for the “fileSystem” data source and the “backup” job. INFO 2023-08-24 14:57:58.737 +0000 [sas-backupjob] - The “backup” job for the “fileSystem” data source finished with the status “Completed”. INFO 2023-08-24 14:57:58.737 +0000 [sas-backupjob] - Received a response from the backup agent for the “fileSystem” data source and the “backup” job. INFO 2023-08-24 14:57:58.737 +0000 [sas-backupjob] - The “backup” job for the “fileSystem” data source finished with the status “Completed”. INFO 2023-08-24 14:57:58.740 +0000 [sas-backupjob] - Created the backup status file: /sasviyabackup/2023-08-24T14_54_48_648_0700/status.json INFO 2023-08-24 14:57:58.742 +0000 [sas-backupjob] - Added a backup job entry to the status file: 2023-08-24T14_54_48_648_0700 INFO 2023-08-24 14:57:58.765 +0000 [sas-backupjob] - Updating the Kubernetes job sas-scheduled-backup-job-adhoc-001 with given label. INFO 2023-08-24 14:57:58.777 +0000 [sas-backupjob] - Updated the Kubernetes job the sas-scheduled-backup-job-adhoc-001. INFO 2023-08-24 14:57:58.777 +0000 [sas-backupjob] - backupjob-log-icu.backup.backup.status.info.log [jobStatus:Completed] ```
You can check the status of the job as it runs. The job usually takes 5 minutes or so to complete. Please wait for the job to complete before moving on.
kubectl get jobs -l "sas.com/backup-job-type=scheduled-backup" -L "sas.com/backup-job-type,sas.com/sas-backup-job-status"
Reissue the command until you see a result that indicates the backup job has completed (you may have more than 1 job)
NAME COMPLETIONS DURATION AGE BACKUP-JOB-TYPE SAS-BACKUP-JOB-STATUS sas-scheduled-backup-job-adhoc-001 0/1 5m32s 5m32s scheduled-backup Completed
Review the Ad-hoc Backup Results
To view the results of your backup, you will need the storage location and unique ID for the backup package(BackupID).
NOTE: the BackupID uniquely identifies a backup package and is formed from the date and timestamp when the backup was created.
The code below captures the backupid and the location of the persistent volumes. Then within the backup package we can review the content of the status.json file. The file contains the detailed status of each backup task.
Check the fields:
- Status
- TaskStatus for each task
- within each task the size of backup for each source.
bckpvname=sas-common-backup-data bckvolname=$( kubectl get pvc $bckpvname -o jsonpath='{.spec.volumeName}') echo Backup Persistent Volume is: $bckvolname caspvname=sas-cas-backup-data casvolname=$( kubectl get pvc $caspvname -o jsonpath='{.spec.volumeName}') echo CAS Backup Persistent Volume is: $casvolname backupid=$(yq4 eval '(.metadata.labels."sas.com/sas-backup-id")' <(kubectl get job sas-scheduled-backup-job-adhoc-001 -o yaml)) echo Backup Id is: $backupid cat /srv/nfs/kubedata/${current_namespace}-${bckpvname}-${bckvolname}/${backupid}/status.json
Expected output:
json { "BackupID": "2020-10-16T17_23_26_626_0700", "Status": "Completed", "StartTime": "2020-10-16T17:23:26.777020011Z", "EndTime": "2020-10-16T17:25:46.806417294Z", "Tasks": [ { "TaskID": "5218cc6d-3a9a-4a76-8d9d-9d6ad89df33d", "TaskStatus": "Completed", "SourceType": "configurations", "DataSource": "sas-adhoc-backup-9hjtd4sc", "ResponseItems": [ { "ServiceID": "configurations", "Status": "Completed", "StatusCode": 0, "Size": "425K", "StartTime": "2020-10-16T17:23:36.825629789Z", "EndTime": "2020-10-16T17:23:37.150478583Z" }, { "ServiceID": "definitions", "Status": "Completed", "StatusCode": 0, "Size": "425K", "StartTime": "2020-10-16T17:23:37.156930002Z", "EndTime": "2020-10-16T17:23:37.326226212Z" }, { "ServiceID": "consulProperties", "Status": "Completed", "StatusCode": 0, "Size": "687", "StartTime": "2020-10-16T17:23:37.332567231Z", "EndTime": "2020-10-16T17:23:37.361899002Z" } ] }, { "TaskID": "608cde22-0ad0-455e-be46-22eb71a38faf", "TaskStatus": "Completed", "SourceType": "postgres", "DataSource": "sas-adhoc-backup-9hjtd4sc", "ResponseItems": [ { "ServiceID": "sas-crunchy-data-postgres", "Status": "Completed", "StatusCode": 0, "Size": "263M", "StartTime": "2020-10-16T17:23:37.523873943Z", "EndTime": "2020-10-16T17:25:37.831081135Z" } ] }, { "TaskID": "a055a85a-048a-4546-a842-b358c9df1925", "TaskStatus": "Completed", "SourceType": "cas", "DataSource": "cas-shared-default", "ResponseItems": [ { "ServiceID": "cas-shared-default", "Status": "Completed", "StatusCode": 0, "Size": "89K", "StartTime": "2020-10-16T17:23:37.018146359Z", "EndTime": "2020-10-16T17:23:37.327337438Z" } ] }, { "TaskID": "ae9296d8-c89b-45b1-bc0b-16ede4fc0a4e", "TaskStatus": "Completed", "SourceType": "fileSystem", "DataSource": "cas-shared-default", "ResponseItems": [ { "ServiceID": "cas-shared-default", "Status": "Completed", "StatusCode": 0, "Size": "1.2G", "StartTime": "2020-10-16T17:23:36.840136553Z", "EndTime": "2020-10-16T17:23:56.451535167Z" } ] } ], "Version": "1.0" }
Knowing the volume names from above, you can view the actual backup data and the cas backup data.
ls -al /srv/nfs/kubedata/${current_namespace}-${bckpvname}-${bckvolname}/${backupid}/__default__ ls -al /srv/nfs/kubedata/${current_namespace}-${caspvname}-${casvolname}/${backupid}/__default__
Expected output:
log [cloud-user@rext03-0057 from35]$ ls -al /srv/nfs/kubedata/${current_namespace}-${bckpvname}-${volname}/${backupid}/__default__ total 0 drwxr-xr-x 4 sas sas 36 Oct 19 13:13 . drwxr-xr-x 3 sas sas 44 Oct 19 13:15 .. drwxr-xr-x 2 sas sas 101 Oct 19 13:13 consul drwxr-xr-x 2 sas sas 93 Oct 19 13:15 postgres [cloud-user@rext03-0057 from35]$ ls -al /srv/nfs/kubedata/${current_namespace}-${caspvname}-${casvolname}/${backupid} /__default__ total 0 drwxr-xr-x 4 sas sas 35 Oct 19 13:13 . drwxr-xr-x 3 sas sas 25 Oct 19 13:13 .. drwxr-xr-x 3 sas sas 51 Oct 19 13:13 cas drwxr-xr-x 3 sas sas 51 Oct 19 13:14 fileSystem
Copy the Backup Package outside the cluster
Use the sas-backup-pv-copy-cleanup script
A script is provided to help in managing the backup package. The script starts a Kubernetes job, the job has a POD which mounts the two Backup PVCs (sas-common-backup-data and sas-cas-backup-data). The POD can be used to copy TO and FROM the backup PVCs. In this section we will copy from the backup PVC’s to the local file-system. Keep in mind the same technique can be used, if needed, to copy a package into a namespace prior to a restore.
Run the script to start the Job. The required paramaters are
namespace
,operation
andcas server
. To use manage the backup package the operation iscopy
chmod 755 "/home/cloud-user/project/deploy/gelcorp/sas-bases/examples/restore/scripts/sas-backup-pv-copy-cleanup.sh" "/home/cloud-user/project/deploy/gelcorp/sas-bases/examples/restore/scripts/sas-backup-pv-copy-cleanup.sh" gelcorp copy default
Expected output:
log The sas-backup-pv-copy-default-default-aa4e2b8 keeps on running until user terminates it. The copy pods are created, and they are in a running state. To check the status of copy pods, run the following command. kubectl -n gelcorp get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup | grep aa4e2b8
Use the command below to get the name of the POD started by the script. This command will work when only one Job is running.
NOTE: You also can use the command in the previous output to get the POD in the job.
BACKUPCOPYPOD=$(kubectl -n gelcorp get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup --field-selector status.phase=Running --output=jsonpath={.items..metadata.name}) echo The backup copy pod name id $BACKUPCOPYPOD
Expected output:
log The backup copy pod name id sas-backup-pv-copy-default-default-020f428-57vlt
View the CAS and Common backup mount locations inside the sas-backup-pv-copy POD. The directories are named for the
BackupID
of the backup package. One of the directories should match the BackupID of the ad-hoc backup.kubectl exec -it ${BACKUPCOPYPOD} -- ls -al /sasviyabackup kubectl exec -it ${BACKUPCOPYPOD} -- ls -al /cas
Expected output:
log [cloud-user@pdcesx02193 ~]$ kubectl exec -it ${BACKUPCOPYPOD} -- ls -al /sasviyabackup Defaulted container "sas-backup-pv-copy-cleanup-job" out of: sas-backup-pv-copy-cleanup-job, sas-certframe (init) total 0 drwxrwxrwx 3 root root 42 Nov 26 01:02 . drwxr-xr-x 1 root root 76 Nov 27 17:52 .. drwxr-xr-x 3 sas sas 44 Nov 26 01:07 2023-11-26T01_02_07_607_0700 [cloud-user@pdcesx02193 ~]$ kubectl exec -it ${BACKUPCOPYPOD} -- ls -al /cas Defaulted container "sas-backup-pv-copy-cleanup-job" out of: sas-backup-pv-copy-cleanup-job, sas-certframe (init) total 0 drwxrwxrwx 3 root root 42 Nov 26 01:02 . drwxr-xr-x 1 root root 76 Nov 27 17:52 .. drwxr-xr-x 3 sas sas 25 Nov 26 01:02 2023-11-26T01_02_07_607_0700
Copy Backup Package
In this step we create a directory on sasnode01 where we will copy the backup package.
mkdir -p /tmp/sas-common-backup-data mkdir -p /tmp/sas-cas-backup-data
Get the BACKUP ID of the last completed full backup. The backupid is stored in a file in the sas-common-backup-data PV we can access the file throught the copy/cleanup POD.
BACKUPCOPYPOD=$(kubectl -n gelcorp get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup --field-selector status.phase=Running --output=jsonpath={.items..metadata.name}) LASTBACKUPFILE=$(kubectl exec ${BACKUPCOPYPOD} -c sas-backup-pv-copy-cleanup-job -- find /sasviyabackup/LastCompletedFullBackups -type f -name '*') backupid=$(kubectl exec ${BACKUPCOPYPOD} -c sas-backup-pv-copy-cleanup-job -- cat ${LASTBACKUPFILE} | jq -r .backupID ) echo Last Completed Backup File=${LASTBACKUPFILE} BACKUPID=${backupid}
Expected output:
log Last Completed Backup File=/sasviyabackup/LastCompletedFullBackups/3fa746b9f419285865e9cef3bc4843dc6a57abb2 BACKUPID=20240904-143543F
Copy the SAS Viya Backup to a location outside the cluster
kubectl cp ${BACKUPCOPYPOD}:/sasviyabackup/${backupid} /tmp/sas-common-backup-data/${backupid} kubectl cp ${BACKUPCOPYPOD}:/cas/${backupid} /tmp/sas-cas-backup-data/${backupid}
Expected output:
Defaulted container "sas-backup-pv-copy-cleanup-job" out of: sas-backup-pv-copy-cleanup-job, sas-certframe (init) tar: Removing leading `/' from member names [cloud-user@rext03-0006 gelcorp]$ kubectl cp ${BACKUPCOPYPOD}:/cas/${backupid} /tmp/sas-cas-backup-data/${backupid} Defaulted container "sas-backup-pv-copy-cleanup-job" out of: sas-backup-pv-copy-cleanup-job, sas-certframe (init) tar: Removing leading `/' from member names
Check the local file-system to see if the package has been copied.
tree -L 4 "/tmp/sas-common-backup-data/" tree -L 4 "/tmp/sas-cas-backup-data/"
Expected output: ```log /tmp/sas-common-backup-data/ └── 20240731-143921F ├── default │ ├── consul │ │ ├── configuration.dmp │ │ ├── definition.dmp │ │ ├── genericProperties.dmp │ │ └── status.json │ └── postgres │ ├── SharedServices_pg_dump.dmp │ ├── SharedServices_pg_dump.log │ └── status.json └── status.json
5 directories, 7 files [cloud-user@pdcesx03198 gelcorp]$ tree -L 4 “/tmp/sas-cas-backup-data/” /tmp/sas-cas-backup-data/ └── 20240731-143921F └── default ├── cas │ ├── cas-shared-default │ └── status.json └── fileSystem ├── cas-shared-default └── status.json
6 directories, 2 files ```
When you are done you can delete the Job, or if you leave the job running the POD will be available for future usage. In our environment we will leave the job running and we can use the pod in the future to access the backup packages.
Review
The backup package can be copied using kubectl and the provided POD. If we wanted to restore this backup in a different Viya environment we could use the copy POD and the reverse kubectl cp commands to copy the package into the backup PVs in that cluster.
You have completed a backup of your Viya environment and have copied the backup package outside of the cluster.
SAS Viya Administration Operations
Lesson 04, Section 3 Exercise: Restore a Backup
Restore a backup
In this hands-on you will restore a Viya Backup.
Select a Backup Package
In a MobaXterm session on sasnode01, set the current namespace to the target deployment, and identify the sas-viya CLI profile to use.
gel_setCurrentNamespace gelcorp /opt/pyviyatools/loginviauthinfo.py
Determine the Backup Package to Restore. To get the
BackupID
of the ad-hoc backup run in the previous exercise we can retrieve it from a label set on the completed backup job.backupid=$(yq4 eval '(.metadata.labels."sas.com/sas-backup-id")' <(kubectl get job sas-scheduled-backup-job-adhoc-001 -o yaml)) echo ${backupid}
If you do not know the name of the job that ran, or it has been deleted you can get backup id of the last succesful backup from a file in the sas-common-backup-data PVC. The file can be accessed from the cleanup/copy POD.
BACKUPCOPYPOD=$(kubectl get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup --field-selector status.phase=Running --output=jsonpath={.items..metadata.name}) LASTBACKUPFILE=$(kubectl exec ${BACKUPCOPYPOD} -c sas-backup-pv-copy-cleanup-job -- find /sasviyabackup/LastCompletedFullBackups -type f -name '*') backupid=$(kubectl exec ${BACKUPCOPYPOD} -c sas-backup-pv-copy-cleanup-job -- cat ${LASTBACKUPFILE} | jq -r .backupID ) echo Last Completed Backup File=${LASTBACKUPFILE} BACKUPID=${backupid}
Expected output:
Last Completed Backup File=/sasviyabackup/LastCompletedFullBackups/3fa746b9f419285865e9cef3bc4843dc6a57abb2 BACKUPID=20240904-143543F
Remove Content
In this section, we will delete a folder and a Caslib to simulate a data loss. We will then create a basic inventory of the user content and the caslibs in the Viya environment. After we restore the backup we can check that the deleted folder and caslib were restored from the backup.
Delete folder.
gel_sas_viya -y folders delete --path /gelcontent/GELCorp/hr/reports --recursive
Expected output
The folder was deleted successfully.
Delete CASlib
gel_sas_viya -y cas caslibs delete --server cas-shared-default --caslib hrdl --su
Expected output
The caslib "hrdl" has been deleted from server "cas-shared-default".
Create basic inventory of user folder content. The inventory is piped to a csv file for use in the comparison after the restore.
/opt/pyviyatools/listcontent.py -f /gelcontent -o csv > /tmp/contentbefore-restore.csv
Create a basic inventory of caslibs. The inventory is piped to a csv file for use in the comparison after the restore.
/opt/pyviyatools/listcaslibs.py > /tmp/casbefore-restore.csv
Restore the Backup
In this section we will restore the backup. The restore process happens in three steps.
- Step 1 Update the Restore configMap
- Step 2 the Restore Job: restores the SAS Configuration Server and SAS Infrastructure Data Server and stops the CAS Server(s)
- Step 3 clear the CAS PVCs and restart CAS in
RESTORE
mode
Step 1: Update the restore configMap
Identify the sas-restore-job-parameters configMap that needs to be modified. This step returns the config map for the restore job.
restore_config_map=$(kubectl describe cronjob sas-restore-job | grep -i sas-restore-job-parameters | awk '{print $1}'|head -n 1) echo The current restore Config Map is: $restore_config_map
Expected output:
The current restore Config Map is: sas-restore-job-parameters-hgd4ftbmmm
Edit the configmap to set the restore parameters. Set the SAS_BACKUP_ID to the backup id of the package to restore, the SAS_DEPLOYMENT_START_MODE to
RESTORE
and the SAS_LOG_LEVEL toDEBUG
.kubectl patch cm $restore_config_map --type json -p '[ {"op": "replace", "path": "/data/SAS_BACKUP_ID", "value":"'${backupid}'"}, {"op": "replace", "path": "/data/SAS_DEPLOYMENT_START_MODE", "value":"RESTORE" }, {"op": "replace", "path": "/data/SAS_LOG_LEVEL", "value":"DEBUG" }]'
Expected output:
configmap/sas-restore-job-parameters-hgd4ftbmmm patched
View the updated config map. Make sure that the SAS_BACKUP_ID and SAS_DEPLOYMENT_START_MODE parameters are correctly set.
kubectl describe cm $restore_config_map
Expected output
Name: sas-restore-job-parameters-hgd4ftbmmm Namespace: gelcorp Labels: app.kubernetes.io/name=sas-restore-job sas.com/admin=cluster-local sas.com/deployment=sas-viya Annotations: <none> Data ==== SAS_SERVICE_NAME: ---- sas-restore-job SG_PROJECT: ---- backup OAUTH2_CLIENT_ACCESSTOKENVALIDITY: ---- 72000 SAS_BACKUP_ID: ---- 2023-10-24T08_11_39_639_0700 SAS_CONTEXT_PATH: ---- restore SAS_DEPLOYMENT_START_MODE: ---- RESTORE SAS_RESTORE_JOB_DU_NAME: ---- sas-restore-job BinaryData ==== Events: <none>
Step 2: Run the restore job to restore the SAS Infrastructure Data Serve and Configuration Server
Start the Restore Job from the Restore cronJob. This process will restore the SAS Infrastructure Data Server and the SAS Configuration Server. In addition it will stop the CAS server to prepare for restore of the CAS server.
kubectl create job --from=cronjob/sas-restore-job sas-restore-job
Expected output:
job.batch/sas-restore-job created
Check that the restore job is running.
kubectl get jobs -l sas.com/backup-job-type=restore -L sas.com/sas-backup-id,sas.com/backup-job-type,sas.com/sas-restore-status
Expected output:
NAME COMPLETIONS DURATION AGE SAS-BACKUP-ID BACKUP-JOB-TYPE SAS-RESTORE-STATUS sas-restore-job 0/1 2m55s 2m55s 2021-01-11T15_43_38_638_0700 restore Running
View the log of the restore job as it runs. You will get the command prompt back when the restore job completes.
kubectl logs -l job-name=sas-restore-job -f -c sas-restore-job | gel_log
Check for specific messages in the log of the restore job to check the status.
kubectl logs -l "job-name=sas-restore-job" -c sas-restore-job --tail 1000 | gel_log | grep "restore job completed successfully" -B 3 -A 1
Expected output:
INFO 2023-10-24 08:48:27.196 +0000 [sas-restorejob] - Successfully completed post-restore operations to enable CAS restore. INFO 2023-10-24 08:48:27.208 +0000 [sas-restorejob] - Updating the Kubernetes job sas-restore-job with given label. INFO 2023-10-24 08:48:27.220 +0000 [sas-restorejob] - Updated the Kubernetes job the sas-restore-job. INFO 2023-10-24 08:48:27.220 +0000 [sas-restorejob] - The restore job completed successfully. INFO 2023-10-24 08:48:27.234 +0000 [sas-restorejob] - Updating the Kubernetes job sas-restore-job with given label.
Step 3: Restore the CAS Server
The process will start the CAS Server where data and configuration will be migrated during startup.
The restore job should have stopped the CAS server. The CAS Server is required to be stopped in order to perform the CAS restore, lets check that CAS is not running.
kubectl get pods --selector="app.kubernetes.io/managed-by==sas-cas-operator"
Expected output:
No resources found in target namespace.
The process uses two provided CAS scripts.
- sas-backup-pv-copy-cleanup.sh deletes the existing data from the CAS PV’s.
- scale-up-cas.sh starts the CAS server(s) in RESTORE mode.
Make the provided CAS scripts executable.
chmod +x ~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/*.sh
The CAS file system restore requires a clean volume. Run the sas-backup-pv-copy-cleanup script to clean up the CAS PVs. This step deletes the existing data in the CAS permstore and CAS data PVCs. The paramters to pass in order are namespace, operation, and a command delimited list of CAS servers.
~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/sas-backup-pv-copy-cleanup.sh gelcorp remove "default"
Expected output:
The cleanup pods are created, and they are in a running state. Ensure that all pods are completed. To check the status of the cleanup pods, run the following command. kubectl -n gelcorp get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup | grep d2c055a
In the output from the previous step the last line is a kubectl command that displays the status of the cleanup. Copy and run the the kubectl from the output of the previous step to check if the cleanup POD is in a completed state. Expected output:
sas-backup-pv-cleanup-default-default-d2c055a-n98wk 0/1 Completed 0 3m45s
Use the provided script to start up the CAS server to start the CAS restore.
~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/scale-up-cas.sh gelcorp "default"
Expected output:
casdeployment.viya.sas.com/default patched
Check the results. First, make sure the CAS server is up.
kubectl wait --for=condition=ready --timeout=600s pod -l "app.kubernetes.io/instance=default"
If you see messages like this, reissue the same command until the prompt does not immediately return control to you. This simply means that the CAS pods are not yet running. It can take 2-3 minutes before the CAS pods are able to respond.
error: no matching resources found
Eventually, you should see confirmation that the CAS pods are up.
pod/sas-cas-server-default-controller condition met
Check the log to see if the CAS server performed the restore. The logs should show the start of the restore process that restores the backup content to the target CAS persistent volumes.
kubectl logs sas-cas-server-default-controller -c sas-cas-server | gel_log | grep -A 10 "RESTORE"
Expected output:
Mon Jan 18 03:07:34 UTC 2021 - INFO: SAS_DEPLOYMENT_START_MODE is set to RESTORE, Initiating restore process Mon Jan 18 03:07:34 UTC 2021 - ------------------------------------------------- Mon Jan 18 03:07:34 UTC 2021 - INFO: Evaluating backup content for restore Mon Jan 18 03:07:34 UTC 2021 - INFO: Listing cas data volume contents at: /cas/data Mon Jan 18 03:07:34 UTC 2021 - INFO: Initiated restoring files Mon Jan 18 03:07:34 UTC 2021 - INFO: copying volume data '/sasviyabackup/2021-01-18T01_48_36_636_0700/__default__/fileSystem/cas-shared-default/cas-default-data-volume/apps' -> '/cas/data/apps' '/sasviyabackup/2021-01-18T01_48_36_636_0700/__default__/fileSystem/cas-shared-default/cas-default-data-volume/apps/projects' -> '/cas/data/apps/projects' '/sasviyabackup/2021-01-18T01_48_36_636_0700/__default__/fileSystem/cas-shared-default/cas-default-data-volume/apps/sashealth' -> '/cas/data/apps/sashealth' '/sasviyabackup/2021-01-18T01_48_36_636_0700/__default__/fileSystem/cas-shared-default/cas-default-data-volume/caslibs' -> '/cas/data/caslibs' '/sasviyabackup/2021-01-18T01_48_36_636_0700/__default__/fileSystem/cas-shared-default/cas-default-data-volume/caslibs/modelMonitorLibrary' -> '/cas/data/caslibs/modelMonitorLibrary'
Check to see that the permstore was restored.
kubectl logs sas-cas-server-default-controller -c sas-cas-server -n ${current_namespace} | grep -B 1 -A 5 "Restoring CAS permstore"
Expected output:
[cloud-user@pdcesx02092 gelcorp]$ kubectl logs sas-cas-server-default-controller -c sas-cas-server -n ${current_namespace} | grep -B 1 -A 5 "Restoring CAS permstore" {"version": 1, "timeStamp": "2022-11-18T21:40:35.588564+00:00", "level": "info", "source": "cas-shared-default", "message": "SAS_DEPLOYMENT_START_MODE is set to RESTORE, Initiating restore process.", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:221"}} {"version": 1, "timeStamp": "2022-11-18T21:40:35.669603+00:00", "level": "info", "source": "cas-shared-default", "message": "Restoring CAS permstore volume contents", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:137"}} {"version": 1, "timeStamp": "2022-11-18T21:40:35.746658+00:00", "level": "info", "source": "cas-shared-default", "message": "Target CAS permstore volume contents at /cas/permstore (Should be empty): \n", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:152"}} {"version": 1, "timeStamp": "2022-11-18T21:40:35.856896+00:00", "level": "info", "source": "cas-shared-default", "message": "changed ownership of '/cas/permstore/primaryctrl' from root:root to 1001:1001", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:174"}} {"version": 1, "timeStamp": "2022-11-18T21:40:35.933131+00:00", "level": "info", "source": "cas-shared-default", "message": "Copying backup permstore volume contents from source /sasviyabackup/2022-11-18T20_05_28_628_0700/__default__/cas/cas-shared-default to target /cas/permstore/primaryctrl", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:177"}} {"version": 1, "timeStamp": "2022-11-18T21:40:36.444409+00:00", "level": "info", "source": "cas-shared-default", "message": "sending incremental file list", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:196"}} {"version": 1, "timeStamp": "2022-11-18T21:40:36.528909+00:00", "level": "info", "source": "cas-shared-default", "message": "06499622-f1d7-7646-a32c-68c301a729a4.admitm", "properties": {"pod": "sas-cas-server-default-controller", "caller": "restore_cas.sh:196"}}
Reset the SAS restore job configMap parameters and check that the command worked.
kubectl patch cm $restore_config_map --type json -p '[{ "op": "remove", "path": "/data/SAS_BACKUP_ID" },{"op": "remove", "path": "/data/SAS_DEPLOYMENT_START_MODE"}]' kubectl describe cm $restore_config_map
Expected output:
configmap/sas-restore-job-parameters-hgd4ftbmmm patched Name: sas-restore-job-parameters-hgd4ftbmmm Namespace: gelcorp Labels: app.kubernetes.io/name=sas-restore-job sas.com/admin=cluster-local sas.com/deployment=sas-viya Annotations: <none> Data ==== OAUTH2_CLIENT_ACCESSTOKENVALIDITY: ---- 72000 SAS_CONTEXT_PATH: ---- restore SAS_RESTORE_JOB_DU_NAME: ---- sas-restore-job SAS_SERVICE_NAME: ---- sas-restore-job SG_PROJECT: ---- backup BinaryData ==== Events: <none>
Validate
In this section we will check that our content has been restored from the backup.
After the restore create basic inventory of user folder content.
/opt/pyviyatools/listcontent.py -f /gelcontent -o csv > /tmp/contentafter-restore.csv
After the restore create a basic inventory of caslibs.
/opt/pyviyatools/listcaslibs.py > /tmp/casafter-restore.csv
Compare the two inventory files for CAS. File 1 was created before the restore and file 2 after the restore. Notice the CAS library
hrdl
has been restored from the backup./opt/pyviyatools/comparecontent.py --file1 /tmp/casbefore-restore.csv --file2 /tmp/casafter-restore.csv
Expected output:
NOTE: Compare the content of file1=/tmp/casbefore-restore.csv and file2=/tmp/casafter-restore.csv NOTE: SUMMARY NOTE: there is nothing in file2 that is not in file1. NOTE: DETAILS NOTE: The content listed below is in file2 but not in file1: server,caslib cas-shared-default,hrdl
Compare the two inventory files of the folder content. File 1 was created before the restore and file 2 after the restore. Notice that the folder
/gelcontent/GELCorp/HR/Reports
and its content have been restored from the backup./opt/pyviyatools/comparecontent.py --file1 /tmp/contentbefore-restore.csv --file2 /tmp/contentafter-restore.csv
Expected output:
NOTE: Compare the content of file1=/tmp/contentbefore-restore.csv and file2=/tmp/contentafter-restore.csv NOTE: SUMMARY NOTE: there is nothing in file2 that is not in file1. NOTE: DETAILS NOTE: The content listed below is in file2 but not in file1: id ,pathtoitem ,name ,contentType ,createdBy ,creationTimeStamp ,modifiedBy ,modifiedTimeStamp ,uri "2d22881b-7529-4e13-b349-48f223e32861","/gelcontent/GELCorp/HR/Reports/","Employee attrition factors heatmap","report","sasadm","2017-10-25T07:34:25.488Z","sasadm","2024-09-05T02:12:20.089Z","/reports/reports/3a95ba0d-d2bd-4897-b379-3f7e97a55e83" "e43c09a3-9793-4df0-b6e5-e9adcc4abc6d","/gelcontent/GELCorp/HR/Reports/","Employee Attrition Overview","report","sasadm","2017-10-25T08:13:57.1Z","sasadm","2024-09-05T02:12:20.087Z","/reports/reports/61e4e9e1-0b8a-4d28-89f5-3d98cf90cdbd" "b1f234fa-a77b-4876-b249-337c6f944c6e","/gelcontent/GELCorp/HR/Reports/","Employee attrition factors correlation","report","sasadm","2017-10-25T07:29:24.103Z","sasadm","2024-09-05T02:12:20.087Z","/reports/reports/c809ca34-ab79-47f8-ba25-b24cf3ae0740" "5b905498-8a27-4ecd-8065-2aaf8c001138","/gelcontent/GELCorp/HR/Reports/","Employee measure histograms","report","sasadm","2017-10-25T07:38:27.981Z","sasadm","2024-09-05T02:12:20.024Z","/reports/reports/c8869178-3495-44f8-bc48-c68c8129835c" "03bbb4c9-dc48-40dc-bc16-6f9bfb530098","/gelcontent/GELCorp/HR/","Reports","folder","geladm","2024-09-05T02:11:11.334785Z","geladm","2024-09-05T02:11:11.334786Z","/folders/folders/b7b4b719-7a76-4e91-bb41-0252b55abe48"
Review
In this hands-on you select a completed SAS Viya backup and restored it to Viya.
SAS Viya Administration Operations
Lesson 05, Section 0 Exercise: Configure a Reusable Compute Context
Create a reusable compute context with pool of compute servers
In this exercise, we create a reusable compute context for HR members to use, which runs as user hrservice, another member of HR.
In this hands-on exercise
- Optional: Try existing SAS Studio compute context as Henrik
- Store credentials for hrservice in Compute Service
- Create compute context to run as hrservice
- Test new compute context runs as hrservice
- Make servers that run with the new compute context reusable
- Show that servers run with new compute context are reusable
- Configure a pool of available servers
- Create a reusable compute context with a pool of compute servers with a script
Optional: Try existing SAS Studio compute context as Henrik
This step is optional. Click here to see it.
Note: You may have already done something like this task in an earlier hands-on exercise. Feel free to skip this task and proceed to the next one if you like.
Run the following command in MobaXterm:
id Henrik
Expected results:
uid=4015(Henrik) gid=2003(sasusers) groups=2003(sasusers),3001(HR)
From this you can see that Henrik’s uid number is 4015.
Open SAS Studio and log in as Henrik:lnxsas.
Tip: To generate the URL if you need it:
gellow_urls | grep "SAS Studio"
Make sure your compute session is running under the ‘SAS Studio compute context’: change the compute context if necessary. Wait for the session to start, if it has not already started.
From the menu choose New > SAS Program, or click the ‘Program in SAS’ button in the Start Page to open a new SAS program pane.
Check which user your compute session is running as. In SAS Studio’s SAS Program tab, paste and run the following code:
%put NOTE: I am &_CLIENTUSERNAME; %put NOTE: My UID is &SYSUSERID; %put NOTE: My home directory is &_USERHOME; %put NOTE: My Compute POD IS &SYSHOSTNAME;
Expected results - both the automatic macro variables
_CLIENTUSERNAME
andSYSUSERID
return a value ofHenrik
:1 /* region: Generated preamble */ 79 80 %put NOTE: I am &_CLIENTUSERNAME; NOTE: I am Henrik 81 %put NOTE: My UID is &SYSUSERID; NOTE: My UID is Henrik 82 %put NOTE: My home directory is &_USERHOME; NOTE: My home directory is /shared/gelcontent/home/Henrik 83 %put NOTE: My Compute POD IS &SYSHOSTNAME; NOTE: My Compute POD IS sas-compute-server-4009df27-36f8-4656-9d19-6b2be004c8c2-34 84 85 /* region: Generated postamble */ 96
Back in MobaXterm, connected to sasnode01 as cloud-user, exec into the running launcher pod and see the UID and GID of the user inside the pod and the files.
kubectl exec -it $(kubectl get pod -l launcher.sas.com/requested-by-client=sas.studio,launcher.sas.com/username=Henrik --output=jsonpath={.items..metadata.name}) -- bash -c "id && ls -al /gelcontent/home/Henrik"
Expected output - notice that the uid from the id command is also 4015, i.e. Henrik, that all the files are owned by user 4015, and that the home directory contains a file you saved there as Henrik in an earlier exercise:
uid=4015 gid=2003 groups=2003,3000,3001 total 20 drwx------ 4 4015 2003 125 Sep 23 16:11 . drwxr-xr-x 35 root root 4096 Sep 17 10:32 .. -rw------- 1 4015 2003 18 Sep 17 10:32 .bash_logout -rw------- 1 4015 2003 193 Sep 17 10:32 .bash_profile -rw------- 1 4015 2003 231 Sep 17 10:32 .bashrc drwx------ 2 4015 2003 6 Sep 23 16:11 casuser -rwx------ 1 4015 2003 367 Sep 17 11:22 gel_launcher_details.sas drwx------ 4 4015 2003 39 Sep 17 10:32 .mozilla
This gives us a baseline to compare with later on: we are definitely running this SAS Programming Run-Time session as Henrik.
Store credentials for hrservice in Compute Service
This will be the first step in the process of letting Henrik run a SAS Compute context with shared credentials for another account, hrservice.
Use the gel_sas_viya script to run the sas-viya CLI’s compute plugin to list existing shared credentials - we don’t expect there to be any yet:
gel_sas_viya compute credentials list
Expected output:
There are no shared service account credentials.
Use gel_sas_viya to create (i.e. store) a shared credential for user hrservice:lnxsas.
Note: Here we are passing the credentials directly in the script. To be more secure, you could store them in a protected file readable only to the user who runs the script (e.g. with permissions of 0600). Or, if you are creating the credentials as a one-off task, you could omit the
-u
(--user
) and/or-p
(--password
) parameters from the command to be prompted to for them interactively.gel_sas_viya compute credentials create -u hrservice -p lnxsas -d "Shared service account called hrservice"
Expected output:
2024/09/24 15:49:34 The shared service account credential for hrservice was created successfully.
Then check that the credentials have been created and stored:
gel_sas_viya compute credentials list
Expected output:
Shared Service Account Credentials: 1. hrservice - compute-password - Shared
Create compute context to run as hrservice
To configure SAS Programming Run-Time servers as reusable, they must first be configured to run under a shared account, like the one for which we just saved credentials.
Tip: If you used Chrome as your main browser for SAS Studio so far, we suggest you use Firefox for this task, or the other way around. This allows you to be logged in as a user (Henrik, Ahmed etc.) in one browser, and as geladm in the other, without having to log out and log in again so often.
In a different browser to the one you used to open SAS Studio as Henrik earlier, open SAS Environment Manager and log in as geladm:lnxsas.
Tip: To generate the URL if you need it:
gellow_urls | grep "SAS Environment Manager"
As always, opt in to the SASAdministrators assumable group, and if prompted click ‘Skip setup’ and ‘Let’s go’.
In SAS Environment Manager, as geladm, open the Contexts page.
Select the Compute contexts view.
Right-click the SAS Studio Compute context and choose ‘Copy’ from the popup menu:
In the New Compute Context dialog, set the properties of the new context to the following values. Add an attribute for runServerAs=hrservice:
Property
Value
Name:
SAS Studio compute context as hrservice
Description:
A compute context for SAS Studio which allows members of HR to run code as hrservice.
Launcher context:
SAS Studio launcher context
Identity type:
Identities
Groups:
HR
Attributes:
runServerAs
hrservice
Resources:
shrfmt
Shared formats - Base SAS I/O Engine(Present only if you created it in an earlier exercise)
Advanced:
SAS options:
(none)
Autoexec content:
(none)
This is what the Basic tab of the dialog should look like:
Click Save, and after a moment, the new compute context should be created:
Test new compute context runs as hrservice
Back in your main browser (e.g. Chrome), in SAS Studio, still logged in as Henrik, click the server context button in the top right-hand corner of SAS Studio to view the list of available contexts.
Scroll down if necessary to see your new context “SAS Studio compute context as hrservice”.
Choose SAS Studio compute context as hrservice. If prompted, click ‘Change’.
After a moment a compute session under the new compute context will start.
In a SAS Program tab, paste and run the same code you ran earlier:
%put NOTE: I am &_CLIENTUSERNAME; %put NOTE: My UID is &SYSUSERID; %put NOTE: My home directory is &_USERHOME; %put NOTE: My Compute POD IS &SYSHOSTNAME;
Expected results:
1 /* region: Generated preamble */ 79 80 %put NOTE: I am &_CLIENTUSERNAME; NOTE: I am Henrik 81 %put NOTE: My UID is &SYSUSERID; NOTE: My UID is hrservice 82 %put NOTE: My home directory is &_USERHOME; NOTE: My home directory is /shared/gelcontent/home/hrservice 83 %put NOTE: My Compute POD IS &SYSHOSTNAME; NOTE: My Compute POD IS sas-compute-server-3f376ec7-504f-451e-bc6d-1b8e70d4afd3-37 84 85 /* region: Generated postamble */ 96
Your _CLIENTUSERNAME is Henrik, but now your SYSUSERID is hrservice, and ‘your’ home directory is /shared/gelcontent/home/hrservice instead of /shared/gelcontent/home/Henrik.
For added confirmation, exec into the running launcher pod and see the UID and GID of the user inside the pod and the files.
Note: If you are paying very close attention, you may see that the inner kubectl command here is looking for slightly different labels on the sas-launcher pod than we looked for earlier. That’s because the labels it has are different, now that it is running as hrservice instead of the user who logged in. Earlier, we looked for Henrik’s launcher pod with:
- -l
launcher.sas.com/requested-by-client=
sas.studio
,launcher.sas.com/username=Henrik
Now we are looking for a pod with these labels:
- -l
launcher.sas.com/requested-by-client=
sas.compute
,launcher.sas.com/username=hrservice
Also, notice that we are listing the contents of hrservice’s home directory, instead of Henrik’s. You can see that the files in that home directory are different to those we saw earlier.
kubectl exec -it $(kubectl get pod -l launcher.sas.com/requested-by-client=sas.compute,launcher.sas.com/username=hrservice --output=jsonpath={.items..metadata.name}) -c "sas-programming-environment" -- bash -c "id && ls -al /shared/gelcontent/home/hrservice"
Expected output:
uid=3001 gid=2003 groups=2003 total 12 drwx------ 3 3001 2003 78 Sep 17 10:32 . drwxr-xr-x 3 root root 23 Sep 25 12:43 .. -rw------- 1 3001 2003 18 Sep 17 10:32 .bash_logout -rw------- 1 3001 2003 193 Sep 17 10:32 .bash_profile -rw------- 1 3001 2003 231 Sep 17 10:32 .bashrc drwx------ 4 3001 2003 39 Sep 17 10:32 .mozilla
You can see that this time, our uid is hrservice’s uid,
3001
, instead of Henrik’s uid,4015
.- -l
launcher.sas.com/requested-by-client=
Run an id command to verify that UID 3001 is the hrservice.
id 3001
You should see that UID 3001 is the hrservice.
uid=3001(hrservice) gid=2003(sasusers) groups=2003(sasusers)
This demonstrates two things.
First, that Henrik has started a compute session in SAS Studio which runs as hrservice.
Second, you can tell who Henrik is running his session as by inspecting the value of both &SYSUSERID and &_USERHOME, which indicates that the home directory is set to
/shared/gelcontent/home/hrservice
.And one more time to make the dual identity absolutely clear (Henrik running as hrservice), click on the user menu in the very top right of the application window. We are still logged in to SAS Studio as Henrik:
Make servers that run with the new compute context reusable
In your alternate browser (e.g. Firefox if you were mainly using Chrome), open SAS Environment Manager and sign in as geladm:lnxsas, if you aren’t already signed in.
Return to the Contexts page, and the Compute contexts view.
Edit your “SAS Studio compute context as hrservice” compute context. Add a new attribute, as follows:
reuseServerProcesses=true
Here are all the properties, with the new attribute in bold:
Property
Value
Name:
SAS Studio compute context as hrservice
Description:
A compute context for SAS Studio which allows members of HR to run code as hrservice.
Launcher context:
SAS Studio launcher context
Identity type:
Identities
Groups:
HR
Attributes:
runServerAs
hrservice
reuseServerProcesses
true
Advanced:
SAS options:
(none)
Autoexec content:
(none)
The modified compute context should look like this:
Note: The attributes may be listed in the reverse order in your environment when you first add a new attribute. The order of the attributes is not important.
Save your change.
The documentation describes some other properties which you can also set, if you don’t like the default values:
Attribute: Default Notes serverInactiveTimeout 600 Determines the time the server can remain idle before it is terminated. A server is considered to be idle if there is no active session in the server. The default value is 600 seconds (10 minutes). serverReuseLimit Determines the number of times a server can be reused before it is terminated. If this attribute is not set, there is no limit on how many times the server can be reused.
Show that servers run with new compute context are reusable
In your main browser (e.g. Chrome), sign out of SAS Studio. Click ‘Discard and Exit’ if prompted.
Sign in again as Henrik:lnxsas.
Tip: Really do sign out, and sign back in again. Do not just click the browser’s refresh button, and do not just choose Options > Reset SAS session.
It appears that just clicking the browser refresh button, or resetting the session in SAS Studio, normally results in you getting a compute session in a new pod, instead of in the same pod as before.
I think this is because the old compute session does not have enough time to end when you refresh the browser page or reset the session. When your refreshed SAS Studio requests a compute session, the old compute session is either still running or still terminating. So SAS Launcher has to start your new session in a new pod. If you sign out of SAS Studio, I think enough time elapses for your old SAS session to end, leaving the existing pod available to be re-used, and you normally get a session running in the same pod again.
Ensure that the current compute context is still “SAS Studio compute context as hrservice”, and open a new program window again. Run the usual code:
%put NOTE: I am &_CLIENTUSERNAME; %put NOTE: My UID is &SYSUSERID; %put NOTE: My home directory is &_USERHOME; %put NOTE: My Compute POD IS &SYSHOSTNAME;
Expected results:
1 /* region: Generated preamble */ 79 80 %put NOTE: I am &_CLIENTUSERNAME; NOTE: I am Henrik 81 %put NOTE: My UID is &SYSUSERID; NOTE: My UID is hrservice 82 %put NOTE: My home directory is &_USERHOME; NOTE: My home directory is /shared/gelcontent/home/hrservice 83 %put NOTE: My Compute POD IS &SYSHOSTNAME; NOTE: My Compute POD IS sas-compute-server-470fe399-46f2-463d-ae50-90686543615d-41 84 85 /* region: Generated postamble */ 96
Make a note of the Compute pod name as you did before - for example, copy it from SAS Studio and paste it into a text editor like Notepad++.
Sign out of SAS Studio (‘Discard and Exit’ if prompted) and sign in yet again as Henrik:lnxsas.
Once again, check that the current compute context is still “SAS Studio compute context as hrservice”, and open a new program window again.
Note: This time, your compute session may start a little more quickly than it did before!
Run the usual code:
%put NOTE: I am &_CLIENTUSERNAME; %put NOTE: My UID is &SYSUSERID; %put NOTE: My home directory is &_USERHOME; %put NOTE: My Compute POD IS &SYSHOSTNAME;
Expected results:
1 /* region: Generated preamble */ 79 80 %put NOTE: I am &_CLIENTUSERNAME; NOTE: I am Henrik 81 %put NOTE: My UID is &SYSUSERID; NOTE: My UID is hrservice 82 %put NOTE: My home directory is &_USERHOME; NOTE: My home directory is /shared/gelcontent/home/hrservice 83 %put NOTE: My Compute POD IS &SYSHOSTNAME; NOTE: My Compute POD IS sas-compute-server-470fe399-46f2-463d-ae50-90686543615d-41 84 85 /* region: Generated postamble */ 96
Make a note of the Compute pod name as you did before - for example, copy it from SAS Studio and paste it into a text editor like Notepad++.
Note: Notice that the name of the pod running this SAS Studio session is the same as the pod in the previous SAS Studio session.
This shows that the compute server was reused.
Configure a pool of available servers
In your main browser, sign out of SAS Studio. ‘Discard and Exit’ if prompted.
IMPORTANT: To create a pool of available compute servers, the servers must be reusable, and must run under a service account. See the preceding tasks in this exercise, above.
In your alternate browser (e.g. Firefox if you were mainly using Chrome, or the other way around), open SAS Environment Manager and sign in as geladm:lnxsas, if you aren’t already signed in.
Return to the Contexts page, and the Compute contexts view.
Edit your “SAS Studio compute context as hrservice” compute context. Add a new attribute, as follows:
serverMinAvailable=1
Here are all the properties, with the new attribute in bold:
Property
Value
Name:
SAS Studio compute context as hrservice
Description:
A compute context for SAS Studio which allows members of HR to run code as hrservice.
Launcher context:
SAS Studio launcher context
Identity type:
Identities
Groups:
HR
Attributes:
runServerAs
hrservice
reuseServerProcesses
true
serverMinAvailable
1
Advanced:
SAS options:
(none)
Autoexec content:
(none)
The modified compute context should look like this:
Save your change.
Run this command in MobaXterm, to find compute pods which were launched by the SAS Compute service:
kubectl get pod -l launcher.sas.com/requested-by-client=sas.compute,launcher.sas.com/username=hrservice
Expected output - the number of compute pods you see may vary depending on when you run the command in relation to setting serverMinAvailable to 1:
NAME READY STATUS RESTARTS AGE sas-compute-server-ef843aee-51e0-4965-86cd-fba68f36dfa6-42 2/2 Running 0 50s
In your main browser, sign in to SAS Studio again as Henrik:lnxsas.
Once again, check that the current compute context is still “SAS Studio compute context as hrservice”.
Q: What do you notice about how long it took for your compute server in the “SAS Studio compute context as hrservice” context to be available?
A: It should have been quicker than before - perhaps around 4 or 5 seconds. You connected to a pre-started compute server from the ‘pool’ (of 1, in this case!) of available compute servers that you requested be created for this compute context.
Open a new program window again. Run the usual code:
%put NOTE: I am &_CLIENTUSERNAME; %put NOTE: My UID is &SYSUSERID; %put NOTE: My home directory is &_USERHOME; %put NOTE: My Compute POD IS &SYSHOSTNAME;
Expected results - we are running in a pod that was already ‘pre-started’ and available:
1 /* region: Generated preamble */ 79 80 %put NOTE: I am &_CLIENTUSERNAME; NOTE: I am Henrik 81 %put NOTE: My UID is &SYSUSERID; NOTE: My UID is hrservice 82 %put NOTE: My home directory is &_USERHOME; NOTE: My home directory is /shared/gelcontent/home/hrservice 83 %put NOTE: My Compute POD IS &SYSHOSTNAME; NOTE: My Compute POD IS sas-compute-server-ef843aee-51e0-4965-86cd-fba68f36dfa6-42 84 85 /* region: Generated postamble */ 96
Note: Notice that the compute pod in this SAS Studio session is the same one that was pre-started.
Run this command again in MobaXterm, to find compute contexts which were launched by the SAS Compute service:
kubectl get pod -l launcher.sas.com/requested-by-client=sas.compute,launcher.sas.com/username=hrservice
Expected output - the pod that was running before, plus one new pod which started when we signed in to SAS Studio as Henrik and took the existing pre-started compute pod:
NAME READY STATUS RESTARTS AGE sas-compute-server-988182b9-e8aa-48e3-9ea3-ab18726f104a-43 2/2 Running 0 49s sas-compute-server-ef843aee-51e0-4965-86cd-fba68f36dfa6-42 2/2 Running 0 4m50s
Q: What does this show?
A: Notice that one of the compute sessions started a few minutes ago, and the other when you signed in to SAS Studio more recently. This shows that when you modified the ‘SAS Studio compute context as hrservice’ compute context to set serverMinAvailable = 1, the SAS Launcher service started a new compute server under that context - a ‘pool’ of 1 compute server, under the username hrservice. Then, when you signed in to SAS Studio as Henrik, you connected to an (or rather, the only) available compute server, which means you got a compute session more quickly. As soon as you were connected to it, it was no longer ‘available’, it was in use. So the SAS Launcher service started another SAS Compute server under the same context, to be ‘available’ ready and waiting. When a user takes an available server from the pool, another server is started in its place, so that there is always the requested number of unused, ready and waiting servers available.
OPTIONAL: In your main browser, use a stopwatch or timer to see how long it takes to switch:
- From “SAS Studio compute context as hrservice” to “SAS Studio compute context” (which does NOT have a pool of available servers)
- From “SAS Studio compute context” to “SAS Studio compute context as hrservice” (which you just configured to maintain a pool of 1 available server(s))
Here is a sample of times we measured, all in seconds, over three context switches in each of the directions above, all in the same SAS Studio session as Henrik:
Attempt “compute context as hrservice” to “compute context” “compute context” to “compute context as hrservice” 1 28.7 5.9 2 29.4 5.9 3 29.2 5.1 Average 29.1 5.7 As you can see from these results, where you can configure a SAS compute context to 1. run as a shared account, 2. be reusable, and 3. maintain a pool of available compute servers, the fact a pool of available compute servers are maintained significantly reduces the time to get a compute session when one is needed.
Create a reusable compute context with a pool of compute servers with a script
Run this all at once, in MobaXterm connected to sasnode01 as cloud-user, to create another reusable compute context with one pre-started compute server called “SAS Studio compute context as hrservice too”, in a single easy-to-script step.
tee /home/cloud-user/prestarted_reusable_hrservice_cc.json > /dev/null << EOF { "name": "SAS Studio compute context as hrservice too", "description": "Another compute context for SAS Studio which allows members of HR to run code as hrservice.", "attributes": { "runServerAs": "hrservice", "reuseServerProcesses": "true", "serverMinAvailable": "1" }, "launchContext": { "contextName": "SAS Studio launcher context" }, "launchType": "service", "authorizedGroups": [ "HR" ] } EOF sas-viya compute contexts create -r -d @/home/cloud-user/prestarted_reusable_hrservice_cc.json sleep 30
Run this command again in MobaXterm, to find compute contexts which were launched by the SAS Compute service:
kubectl get pod -l launcher.sas.com/requested-by-client=sas.compute,launcher.sas.com/username=hrservice
Expected output - there is one new pod which started when we created the new compute context from the command line, with one pre-started compute server:
NAME READY STATUS RESTARTS AGE sas-compute-server-0fa95c99-57d9-441e-a4b2-6edd5e011aa8-44 2/2 Running 0 8m13s sas-compute-server-c71f3a6e-10b9-40ab-ab22-2f5b8c604253-45 2/2 Running 0 36s sas-compute-server-ef843aee-51e0-4965-86cd-fba68f36dfa6-42 2/2 Running 0 20m
Obviously you can modify the attributes and other properties in the JSON file to suit your requirements, subject to the resources available in your SAS Viya environment.
SAS Viya Administration Operations
Lesson 05, Section 1 Exercise: Configure Python Integration
In this hands-on you will complete the configuration necessary to integrate Python with SAS Viya.
- Setup
- Steps completed so far
- Mount the sas-pyconfig PVC
- Connect Python command
- Adjust LOCKDOWN to allow Python
- Configure watchdog
- Configure CAS for external languages
- Review kustomization changes
- Apply changes
- Validate Python integration
Setup
In a MobaXterm session on sasnode01, set the current namespace to the gelcorp deployment.
gel_setCurrentNamespace gelcorp
Keep a copy of the current manifest and
kustomization.yaml
files. We will use these copies to track the changes your kustomization processing makes to these two files.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_03-036.yaml cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_03-036.yaml
Steps completed so far
As part of the workshop deployment, two of the steps normally included in this process have been completed for you. In the interest of time, we have already
- Configured the ASTORES PVC that is required by MAS
- Installed Python and R using the SAS Configurator for Open Source.
Mount the sas-pyconfig PVC
Because Python was installed using the SAS Configurator for Open Source, Python is located on the sas-pyconfig PVC. You now need to mount the sas-pyconfig PVC to the MAS, CAS, and launcher-based pods so they can access Python.
Create a new directory in $deploy/site-config for the customizations
mkdir -p ~/project/deploy/${current_namespace}/site-config/sas-open-source-config/python
Copy
$deploy/sas-bases/examples/sas-open-source-config/python/python-transformer.yaml
to the site-config directory.export deploy=~/project/deploy/${current_namespace} cd ${deploy}/site-config/sas-open-source-config/python cp ${deploy}/sas-bases/examples/sas-open-source-config/python/python-transformer.yaml . chmod ug+w ./python-transformer.yaml
Use sed to customize the python-transformer.yaml template.
- Replace {{ VOLUME-ATTRIBUTES }} with persistentVolumeClaim: {claimName: sas-pyconfig} for all python-volume definitions.
- Replace the default /python mount paths with /opt/sas/viya/home/sas-pyconfig, which is required when using the SAS Configurator for Open Source.
- Replace the {{ PYTHON-EXE-DIR }} and {{ PYTHON-EXECUTABLE }} for the Java policy allow list.
cd ${deploy}/site-config/sas-open-source-config/python sed -i "s/{{ VOLUME-ATTRIBUTES }}/persistentVolumeClaim: {claimName: sas-pyconfig}/g" python-transformer.yaml sed -i "s/\/python/\/opt\/sas\/viya\/home\/sas-pyconfig/g" python-transformer.yaml sed -i "s/{{ PYTHON-EXE-DIR }}/default_py\/bin/g" python-transformer.yaml sed -i "s/{{ PYTHON-EXECUTABLE }}/python3/g" python-transformer.yaml
Examine the differences to verify your changes on the right with the original template values on the left.
icdiff -W ${deploy}/sas-bases/examples/sas-open-source-config/python/python-transformer.yaml ./python-transformer.yaml
Modify
~/project/deploy/gelcorp/kustomization.yaml
to referencesite-config/sas-open-source-config/python/python-transformer.yaml
. The python-transformer.yaml needs to be referenced before sas-bases/overlays/required/transformers.yaml.[[ $(grep -c "site-config/sas-open-source-config/python/python-transformer.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ sed -i '/sas-bases\/overlays\/required\/transformers.yaml/i \ \ \- site-config\/sas-open-source-config\/python\/python-transformer.yaml' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited the transformers section to add the reference as shown below.
transformers: ... - site-config/sas-open-source-config/python/python-transformer.yaml - sas-bases/overlays/required/transformers.yaml ...
Verify that python-transformer.yaml was added before the required transformers.yaml. You should see it listed in green in the right column.
icdiff -W /tmp/gelcorp/kustomization_03-036.yaml ${deploy}/kustomization.yaml
Connect Python command
With Python mounted to our SAS Viya pods, the next step is to provide MAS and compute pods with the fully qualified commands to Python and to additional configuration elements.
Create
~/project/deploy/gelcorp/site-config/sas-open-source-config/python/kustomization.yaml
to define environment variables pointing to the Python interpreter.- MAS_PYPATH which is used by SAS Micro Analytic Service
- PROC_PYPATH which is used by PROC PYTHON in compute servers
- DM_PYPATH which is used by the Open Source Code node in SAS Visual Data Mining and Machine Learning
- SAS_EXTLANG_SETTINGS which controls access to Python from CAS (more on this later)
- SAS_EXT_LLP_PYTHON which is used when the base distribution or packages for open-source software require additional run-time libraries that are not part of the shipped container image, similar to the LD_LIBRARY_PATH concept.
tee ${deploy}/site-config/sas-open-source-config/python/kustomization.yaml > /dev/null << EOF configMapGenerator: - name: sas-open-source-config-python literals: - MAS_PYPATH=/opt/sas/viya/home/sas-pyconfig/default_py/bin/python3 - MAS_M2PATH=/opt/sas/viya/home/SASFoundation/misc/embscoreeng/mas2py.py - PROC_PYPATH=/opt/sas/viya/home/sas-pyconfig/default_py/bin/python3 - PROC_M2PATH=/opt/sas/viya/home/SASFoundation/misc/tk - DM_PYPATH=/opt/sas/viya/home/sas-pyconfig/default_py/bin/python3 - SAS_EXTLANG_SETTINGS=/opt/sas/viya/home/sas-pyconfig/extlang.xml - SAS_EXT_LLP_PYTHON=/opt/sas/viya/home/sas-pyconfig/lib/python3.9/lib-dynload - name: sas-open-source-config-python-mas literals: - MAS_PYPORT= 31100 EOF
Modify
~/project/deploy/gelcorp/kustomization.yaml
to add a reference tosite-config/sas-open-source-config/python
in the resources field.[[ $(grep -xc "site-config/sas-open-source-config/python" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.resources += ["site-config/sas-open-source-config/python"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited the resources section to add the reference as shown below.
resources: ... - site-config/sas-open-source-config/python
Verify that
site-config/sas-open-source-config/python
was added to the resources field. You should see it listed in green in the right column.icdiff -W /tmp/gelcorp/kustomization_03-036.yaml ${deploy}/kustomization.yaml
Adjust LOCKDOWN to allow Python
For security reasons, SAS Viya compute servers are configured in LOCKDOWN mode which prohibits users from invoking external processes. The next step enables communication between Python and SAS Viya compute servers in LOCKDOWN.
Copy
sas-bases/examples/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml
tosite-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml
.mkdir -p ${deploy}/site-config/sas-programming-environment/lockdown cp ${deploy}/sas-bases/examples/sas-programming-environment/lockdown/enable* "$_" chmod 644 ${deploy}/site-config/sas-programming-environment/lockdown/*.yaml
The following code edits
site-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml
to enable python,python_embed, and socket access methods. The socket method is required for the Python Code Editor.sed -i "s/{{ ACCESS-METHOD-LIST }}/python python_embed socket/g" $deploy/site-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml
Modify
~/project/deploy/gelcorp/kustomization.yaml
to add a reference tosite-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml
in the transformers field.[[ $(grep -c "site-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited the transformers section to add the reference as shown below.
transformers: ... - site-config/sas-programming-environment/lockdown/enable-lockdown-access-methods.yaml
Verify that
enable-lockdown-access-methods.yaml
was added to the transformers field. You should see it listed in green in the right column.icdiff -W /tmp/gelcorp/kustomization_03-036.yaml ${deploy}/kustomization.yaml
Configure watchdog
While compute server sessions are locked down by default, Python processes are not. Fortunately, the SAS Compute Server provides the ability to execute SAS Watchdog, which monitors the spawned Python processes to ensure that they comply with the terms of LOCKDOWN system options.
SAS Watchdog emulates the restrictions imposed by LOCKDOWN by restricting access only to files that exist in folders that are allowed by LOCKDOWN.
To enable watchdog, simply add a reference to
sas-bases/overlays/sas-programming-environment/watchdog
to the transformers field of your base kustomization.yaml before the required transformers.yaml.[[ $(grep -c "sas-bases/overlays/sas-programming-environment/watchdog" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ sed -i '/sas-bases\/overlays\/required\/transformers.yaml/i \ \ \- sas-bases\/overlays\/sas-programming-environment\/watchdog' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited the transformers section to add the reference as shown below.
transformers: ... - sas-bases/overlays/sas-programming-environment/watchdog - sas-bases/overlays/required/transformers.yaml ...
Verify that watchdog was added before the required transformers.yaml. You should see it listed in green in the right column.
icdiff -W /tmp/gelcorp/kustomization_03-036.yaml ${deploy}/kustomization.yaml
Configure CAS for external languages
There are three additional steps to configure CAS for external language integration. Two of the three steps were done in an earlier exercise but we will include them here in case you did not complete that work.
The first step is to configure CAS for host access which enables CAS to do host identity session launching. You did this step in an earlier exercise but you can perform the following steps, understanding that any errors you see are likely due to the transformer already having been included.
- Copy
$deploy/sas-bases/examples/cas/configure/cas-enable-host.yaml
to$deploy/site-config
. If you see a Permission denied error that can be ignored. It means that the file already exists from an earlier exercise.
cp -p ${deploy}/sas-bases/examples/cas/configure/cas-enable-host.yaml ~/project/deploy/${current_namespace}/site-config
- Add a reference to it in your base kustomization.yaml file’s transformers field before the required transformers.yaml.
[[ $(grep -c "site-config/cas-enable-host.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ sed -i '/sas-bases\/overlays\/required\/transformers.yaml/i \ \ \- site-config\/cas-enable-host.yaml' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited the transformers section to add the reference as shown below.
transformers: ... - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers.yaml ...
- Verify that
cas-enable-host.yaml
was added before the required transformers.yaml. You should see it listed in the right column. If you added it just now it will be displayed in green. If you added it in an earlier exercise it will appear in white.
icdiff -W /tmp/gelcorp/kustomization_03-036.yaml ~/project/deploy/gelcorp/kustomization.yaml
- Copy
The second step is to configure users who need host identity sessions. This was done earlier in exercise 03_031_Respecting_Permissions_and_Home_Directories so if you completed that work you can skip ahead to step #3.
- Otherwise, create the CASHostAccountRequired group.
gel_sas_viya --output text identities create-group --id CASHostAccountRequired --name "CASHostAccountRequired" --description "Run CAS as users account"
Id CASHostAccountRequired Name CASHostAccountRequired Description Run State active The group was created successfully.
If you see the following instead, you have already created the CASHostAccountRequired group and can ignore the error.
The following errors have occurred: The identity "CASHostAccountRequired" already exists.
- Add some users to the CASHostAccountRequired group. These users will launch there CAS session under the user identity.
gel_sas_viya --output text identities add-member --group-id CASHostAccountRequired --user-member-id Henrik gel_sas_viya --output text identities add-member --group-id CASHostAccountRequired --user-member-id geladm gel_sas_viya --output text identities add-member --group-id CASHostAccountRequired --user-member-id Delilah
Henrik has been added to group CASHostAccountRequired geladm has been added to group CASHostAccountRequired Delilah has been added to group CASHostAccountRequired
The third step is to create an XML file that allows specified users to access external languages from CAS. Any referenced users must be in the CASHostAccountRequired group. Earlier, you initialized the SAS_EXTLANG_SETTINGS environment variable with
/opt/sas/viya/home/sas-pyconfig/extlang.xml
so that is the file we need to create. We are using the sas-pyconfig PVC for this file since it is a location that is accessible to CAS.- Get the path for the sas-pyconfig PVC.
volume=$(kubectl describe pvc sas-pyconfig | grep Volume: | awk '{print $NF}') pvPath=$(kubectl describe pv ${volume} | grep Path: | awk '{print $NF}') echo pvPath is ${pvPath}
- Create the extlang.xml file on the sas-pyconfig PVC. The permissions in this file allow only geladm, Henrik, and Delilah to access Python and R from CAS.
sudo -u sas tee ${pvPath}/extlang.xml > /dev/null << EOF <EXTLANG version="1.0" mode="ALLOW" allowAllUsers="BLOCK"> <DEFAULT scratchDisk="/tmp" diskAllowlist="/opt/sas/viya/home/sas-pyconfig" userSetScratchDisk="BLOCK"> <LANGUAGE name="PYTHON3" interpreter="/opt/sas/viya/home/sas-pyconfig/default_py/bin/python3" userSetEnv="BLOCK" userSetInterpreter="BLOCK"> </LANGUAGE> <LANGUAGE name="R" interpreter="/opt/sas/viya/home/sas-pyconfig/default_r/bin/Rscript" userSetEnv="BLOCK" userSetInterpreter="BLOCK"> </LANGUAGE> </DEFAULT> <GROUP name="geladm"> <LANGUAGE name="PYTHON3" userInlineCode="ALLOW" userSetEnv="ALLOW" userSetInterpreter="ALLOW" /> <LANGUAGE name="R" userInlineCode="ALLOW" userSetEnv="ALLOW" userSetInterpreter="ALLOW" /> </GROUP> <GROUP name="analysts" users="Henrik,Delilah"> <LANGUAGE name="PYTHON3" userInlineCode="ALLOW"/> <LANGUAGE name="R" userInlineCode="ALLOW"/> </GROUP> </EXTLANG> EOF
- You should see
extlang.xml
in the listing of the sas-pyconfig volume.
ls -alF ${pvPath}
lrwxrwxrwx 1 sas sas 56 Apr 25 18:37 default_py -> /opt/sas/viya/home/sas-pyconfig/Python-3.9.16.1714081401 lrwxrwxrwx 1 sas sas 50 Apr 25 18:16 default_r -> /opt/sas/viya/home/sas-pyconfig/R-4.2.3.1714081401 -rw-r--r-- 1 sas sas 1198 Apr 26 12:28 extlang.xml -rw-r--r-- 1 sas sas 1154 Apr 25 18:37 md5sum drwxr-xr-x 8 sas sas 83 Apr 25 18:26 Python-3.9.16.1714081401 drwxr-xr-x 5 sas sas 43 Apr 25 17:49 R-4.2.3.1714081401
Review kustomization changes
Run the following command to view the cumulative changes you have made to kustomization.yaml. Your changes are in green in the right column.
icdiff -W /tmp/gelcorp/kustomization_03-036.yaml ${deploy}/kustomization.yaml
Apply changes
With the configuration complete, rebuild the SAS deployment to apply your changes to the cluster.
Apply your changes to the deployment using the
sas-orchestration deploy
command..cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully, the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started Generating deployment artifacts Generating deployment artifacts complete Generating kustomizations Generating kustomizations complete Generating manifests Applying manifests > start_leading gelcorp [...more...] > kubectl delete --namespace gelcorp --wait --timeout 7200s --ignore-not-found configmap sas-deploy-lifecycle-operation-variables configmap "sas-deploy-lifecycle-operation-variables" deleted > stop_leading gelcorp Applying manifests complete The deploy command completed successfully
If the sas-orchestration deploy command fails, review the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Validate Python integration
Let’s use a simple program in SAS Studio to verify that you can run Python code.
Get the SAS Studio URL.
gellow_urls | grep "SAS Studio"
Open SAS Studio and log in as
Henrik:lnxsas
.If the SAS Studio compute context does not initialize successfully, wait 2 minutes and then re-select the SAS Studio compute context which will try to launch another compute server for you. You may need to repeat this a few more times if the servers are under load.
Paste this code into SAS Studio Code pane.
proc python; submit; import sys print(sys.version) print("hello world") endsubmit; run;
Verify in the log that Python initialized and notice the log message that cites the Python release.
80 proc python; 81 submit NOTE: Python initialized. Python 3.9.16 (main, Apr 25 2024, 22:21:24) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> 81 ! ; 82 import sys 83 print(sys.version) 84 print("hello world") 85 endsubmit; 86 run; >>> 3.9.16 (main, Apr 25 2024, 22:21:24) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] hello world >>> NOTE: PROCEDURE PYTHON used (Total process time): real time 1.15 seconds cpu time 0.05 seconds
Sign out of SAS Studio as you have completed the exercise.
SAS Viya Administration Operations
Lesson 06, Section 1 Exercise: Configure a Queue
Queue Management
In this section, you will inspect the defined queues, configure a new queue and a new context, and then submit workloads to utilize them.
Table of contents
- Queue Management
- Table of contents
- Explore and define queues
- Submit and interact with jobs
- Associate contexts with queues
Explore and define queues
View the defined queues in SAS Environment Manager.
Authenticate to the CLI as a SAS Administrator and view queues with the
workload-orchestrator
plugin./opt/pyviyatools/loginviauthinfo.py sas-viya workload-orchestrator queues list
Expected output:
{ "items": [ { "configInfo": { "activeOverride": "", "isDefaultQueue": true, "maxJobs": -1, "maxJobsPerHost": -1, "maxJobsPerUser": -1, "priority": 10, "scalingMinJobs": -1, "scalingMinSecs": -1, "willRestartJobs": false }, "name": "default", "processingInfo": { "jobsPending": 0, "jobsRunning": 0, "jobsSuspended": 0, "state": "OPEN-ACTIVE" }, "tenant": "uaa", "version": 1 } ] }
Log on to Environment Manager as
geladm:lnxsas
and opt in to Assumable Groups.Open the Workload Orchestrator page, switch to the Configuration tab, and open the Queues panel.
Click the New queue button to define a new queue with the following settings:
- Name:
adhoc
- Priority:
5
- Maximum jobs per user:
2
- Users:
Finance
(Hint: Click the ‘identities’ icon next to the Users field and in the dialog, add theFinance
group to the Selected Identities panel.) - Administrators:
Delilah
- Limits:
- maxMemory:
0.001
- maxClockTime:
70
- maxMemory:
- Name:
Click the Save icon.
Submit and interact with jobs
Run the following to authenticate to the CLI as user Delilah:
# Create authinfo file for Delilah tee ~/.authinfo_Delilah > /dev/null << EOF default user Delilah password lnxsas EOF chmod 600 ~/.authinfo_Delilah # log in to the CLI as Delilah /opt/pyviyatools/loginviauthinfo.py -f ~/.authinfo_Delilah
Submit a job as Delilah:
sas-viya batch jobs submit-pgm --pgm /mnt/workshop_files/workshop_content/Utils/swo_work/doWork1mins.sas -c default --queue adhoc
What does the message tell you about the result of issuing the above command?
View the answer
While Delilah is a queue administrator, she does not have permission to submit jobs to the
adhoc
queue.Inactive the adhoc queue:
sas-viya workload-orchestrator queues open-inactivate --queue adhoc
Expected output:
The queue "adhoc" is set successfully to "OPEN-INACTIVE".
Now try try submitting a job as geladm, a SAS administrator, to the inactivated
adhoc
queue./opt/pyviyatools/loginviauthinfo.py sas-viya batch jobs submit-pgm --pgm /mnt/workshop_files/workshop_content/Utils/swo_work/doWork1mins.sas -c default --queue adhoc
Return to SAS Environment Manager and view the Workload Orchestrator > Jobs page.
What is the status of the job you submitted?
View the answer
It is in a PENDING state, because inactivated queues can accept jobs, but will not process them until the queue is reactivated.
Go the Queues tab and Activate the
adhoc
queue. Note that geladm has the privilege to do so as a SAS Administrator.Go to the Jobs tab and check to see that the job starts and runs.
Return to MobaXterm and try running another job with the CLI, this time one that takes longer to run.
sas-viya batch jobs submit-pgm --pgm /mnt/workshop_files/workshop_content/Utils/swo_work/doWork2mins.sas -c default --queue adhoc
Run the following to view the status of the job as it executes.
watch sas-viya --output text workload-orchestrator jobs list --queue adhoc --state ALL
Wait for the job to finish execution. What happens to the doWork2mins job? Why?
View the answer
The job gets terminated after approximately 70 seconds due to the ‘maxClockTime’ limit you specified for the adhoc queue.
Press Ctrl + C to return to the terminal prompt.
Associate contexts with queues
Go to SAS Environment Manager’s Contexts area from the navigation menu. From the drop-down, select
Batch contexts
.Select the
default
context and then click the pencil icon to edit the context.On the Advanced tab, specify
default
for the SAS Workload Orchestrator queue field.Click Save.
Once again try submitting another batch job to the
adhoc
queue with thedefault
context.sas-viya batch jobs submit-pgm --pgm /mnt/workshop_files/workshop_content/Utils/swo_work/doWork10mins.sas -c default --queue adhoc
Use SAS Environment Manager to see which queue the job is submitted to. When you find the answer, click the Cancel icon to terminate the job.
SAS Viya Administration Operations
Lesson 07, Section 0 Exercise: Default CAS Server Review
Review the Default CAS Server
In this exercise you will examine the default CAS Server and its Kubernetes components.
Table of content
- Set the namespace
- List all CAS relative pods
- List all
CASDeployment
- Look at the CAS pods and containers
- List the volumes available to CAS Server pods
- Lessons learned
Set the namespace
gel_setCurrentNamespace gelcorp
List all CAS relative pods
This list all the pods that contain “sas-cas
” in their
name. The way to list all initial cas pods and CAS Server pods in a
single command.
kubectl get pods \
-o wide \
| { head -1; grep "sas-cas"; }
Note: “{}” (brackets) are used here to pass multiple command in the “|” (pipe), and the “head -1” command provides us with the header of the kubectl command output.
List all CASDeployment
Each CASDeployment
represents a single Viya deployment
CAS Server.
This command lists the CASDeployments
(CAS Server
instances) that exist in you Viya deployment.
kubectl get casdeployments
NAME AGE
default 56m
Look at the CAS pods and containers
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes. Pod can contains a unique or a group of containers.
List the CAS Server pods.
The command below will list all CAS Server pods. The CAS operator pod manages all
CASDeployments
.kubectl get pods \ --selector="app.kubernetes.io/managed-by==sas-cas-operator" \ -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sas-cas-server-default-controller 3/3 Running 0 60m 10.42.2.221 intnode01 <none> <none>
Currently only a single default CAS Server exists. The default CAS Server is an SMP server so you do not see workers or backup controller pods in the listing. You will see those later in the workshop.
In other CAS configurations you may see other pods listed such as
- sas-cas-server-default-controller: a CAS Server controller (SMP & MPP).
- sas-cas-server-default-backup: a CAS Server backup controller (MPP only).
- sas-cas-server-default-worker-[0..N]: a CAS Server worker (MPP only).
Look at the details of the CAS Server pod.
This command lists details about the CAS controller which provides information about the type of CAS Server you have.
kubectl describe pods \ \ sas-cas-server-default-controller | grep " casoperator." \ | awk -F"/" '{print $2}'
Click here to see the output
cas-cfg-mode=smp cas-env-consul-name=cas-shared-default controller-active=1 controller-index=0 instance-index=0 node-type=controller server=default service-name=primary
Possible values are:
- cas-cfg-mode: smp or mpp
- cas-env-consul-name and server are metadata information about the CAS Server
- node-type: controller or worker
- if controller:
- controller-active: 1 or 0 (0=inactive; 1=active)
- controller-index: 0 or 1 (0=primaryController; 1=secondaryController)
- if worker:
- worker-index: 0..N (the worker number)
- if controller:
- instance-index: 0..N (exist only when state transfer is enable)
List the containers in your CAS Server pod
The
kubectl top pods
command is normally used to get information about the pod resources consumption. But using with the--containers
parameter, it is also a very easy way to list all of a pod’s containers.kubectl top pods \ \ sas-cas-server-default-controller --containers
POD NAME CPU(cores) MEMORY(bytes) sas-cas-server-default-controller sas-cas-server 118m 70Mi sas-cas-server-default-controller sas-backup-agent 1m 18Mi sas-cas-server-default-controller sas-consul-agent 26m 21Mi
The
NAME
field contains names of all CAS Server pod containers.Note that the
sas-cas-server
container was namedcas
before 2022.09
List the volumes available to CAS Server pods
A Kubernetes volume is essentially a storage area accessible to all containers running in a pod. In contrast to the container-local filesystem, the data in volumes is preserved across container restarts. Kubernetes supports many types of volumes. A pod can use any number of volume types simultaneously.
Note that the
state transfer
is enabled by default in this SAS Viya deployment for the cas-shared-default server: a GEL team deployment choice. This has an impact on the number of volumes that are mounted to the CAS server (more details below).
List all current default CAS Server volumes.
This command lists all Kubernetes volumes that are created for a CAS Server. You can see that a CAS Server uses volumes of many different types.
kubectl get pods \ \ sas-cas-server-default-controller -o=json \ | jq '[.spec.volumes[] | if has("configMap") then "Name: "+.name, "Type: configMap", "" elif has("emptyDir") then "Name: "+.name, "Type: emptyDir", "" elif has("hostPath") then "Name: "+.name, "Type: hostPath", "Path: "+.hostPath.path, "" elif has("nfs") then "Name: "+.name, "Type: nfs", "Path: "+.nfs.path, "Server: "+.nfs.server, "" elif has("persistentVolumeClaim") then "Name: "+.name, "Type: persistentVolumeClaim", "Claim Name: "+.persistentVolumeClaim.claimName, "" elif has("secret") then "Name: "+.name, "Type: secret", "" else empty end]' \ | tr -d '",[]'
Click here to see the output
Name: cas-default-permstore-volume Type: persistentVolumeClaim Claim Name: cas-default-permstore Name: cas-default-data-volume Type: persistentVolumeClaim Claim Name: cas-default-data Name: cas-default-cache-volume Type: emptyDir Name: cas-default-config-volume Type: emptyDir Name: cas-tmp-volume Type: emptyDir Name: cas-license-volume Type: secret Name: commonfilesvols Type: persistentVolumeClaim Claim Name: sas-commonfiles Name: backup Type: persistentVolumeClaim Claim Name: sas-cas-backup-data Name: tmp Type: emptyDir Name: consul-tmp-volume Type: emptyDir Name: certframe-token Type: secret Name: security Type: emptyDir Name: customer-provided-ca-certificates Type: configMap Name: sas-viya-gelcontent-pvc-volume Type: persistentVolumeClaim Claim Name: gelcontent-data Name: sudo-ts-tmp Type: emptyDir Name: sas-quality-knowledge-base-volume Type: persistentVolumeClaim Claim Name: sas-quality-knowledge-base Name: sas-rdutil-dir Type: configMap Name: cas-default-transfer-volume Type: persistentVolumeClaim Claim Name: sas-cas-transfer-data Name: astores-volume Type: persistentVolumeClaim Claim Name: sas-microanalytic-score-astores Name: sas-viya-gelcorp-volume Type: nfs Path: /shared/gelcontent Server: pdcesx03145.race.sas.com Name: cas-workers Type: secret
The different types of volumes you see are:
- Persistent volumes: used to store data that need to be persisted
when the pods restart.
- persistentVolumeClaim: is used to mount Persistent Volumes into CAS Server pods.
- nfs: an NFS volume mounted directly to CAS Server pod (Server = NFS Server - Path = NFS path). Automatically remounted each time the pod restart.
- Ephemeral volumes: are recreated each time the pod restarts.
- configMap: each data item in the ConfigMap is represented by an individual file in the volume.
- emptyDir: created when a pod is first assigned to a Kubernetes node and exists as long as that pod is running on that node.
- secret: used to pass sensitive information, such as passwords, to pods.
Note that emptyDir, configMap, and secret are local ephemeral storage managed by Kubernetes on each cluster node.
The
commonfilesvols
persistentVolumeClaim exists to store all CAS Server binaries and files that are required for CAS servers to run. This volume is shared by all CAS server pods in a Viya deployment to help reduce the size of the cas container in each CAS Server pod.The
sas-quality-knowledge-base-volume
persistentVolumeClaim exists because SAS Data Quality product is licensed.The
sas-viya-gelcontent-pvc-volume
persistentVolumeClaim exists because of 03_021_Mount_NFS_to_Viya hands-on.The
sas-viya-gelcorp-volume
nfs exist because of 02_021_Kustomize hands-on.These persistent volumes are the key volumes for the CAS server (their names contain
cas-
):cas-default-permstore
: persists the metadata for CAS including caslib definitions, permissions, etc.cas-default-data-volume
: stores data that is saved and possibly reloaded into the CAS Server.sas-cas-backup-data
: stores the CAS Server backups.cas-default-transfer-volume
use for the state transfer when enabled (exists only when state transfer is enabled for a CAS server).
- Persistent volumes: used to store data that need to be persisted
when the pods restart.
List the CAS Server
cas
container mounted volumes.This command lists all volumes mounted to the
cas
container of the default CAS server.kubectl get pods \ \ sas-cas-server-default-controller -o=json \ | jq '[.spec.containers[0].volumeMounts[] | "Name: "+.name, "Mount path:"+.mountPath, ""]' \ | tr -d '",[]'
Note that the
containers[0]
is thecas
container.0
is always the index of thecas
container into thecontainers[]
array.Click here to see the output
Name: cas-default-permstore-volume Mount path:/cas/permstore Name: cas-default-data-volume Mount path:/cas/data Name: cas-default-cache-volume Mount path:/cas/cache Name: cas-default-config-volume Mount path:/cas/config Name: cas-tmp-volume Mount path:/tmp Name: cas-license-volume Mount path:/cas/license Name: commonfilesvols Mount path:/opt/sas/viya/home/commonfiles Name: podinfo Mount path:/etc/podinfo Name: backup Mount path:/sasviyabackup Name: security Mount path:/security Name: security Mount path:/opt/sas/viya/config/etc/SASSecurityCertificateFramework/cacerts Name: security Mount path:/opt/sas/viya/config/etc/SASSecurityCertificateFramework/private Name: sas-viya-gelcontent-pvc-volume Mount path:/mnt/gelcontent Name: sudo-ts-tmp Mount path:/run/sudo Name: sas-rdutil-dir Mount path:/rdutil Name: sas-quality-knowledge-base-volume Mount path:/opt/sas/viya/home/share/refdata/qkb Name: cas-default-transfer-volume Mount path:/cas/transferdir Name: astores-volume Mount path:/models/resources/viya Name: sas-viya-gelcorp-volume Mount path:/gelcontent Name: cas-workers Mount path:/var/casdata Name: kube-api-access-l5r68 Mount path:/var/run/secrets/kubernetes.io/serviceaccount
The
Mount path
is the cas container local path where the volume is attach.A single volume can be attached to multiple mount path (e.g.,
security
)List the CAS Server specific defined Persistent Volumes (pv).
The Persistent Volumes are created and managed at the Kubernetes cluster level. They are a Kubernetes cluster resource, not a namespace resource. Because of that, when the
persistentVolumes
resources is queried by using the kubectl CLI, the--namespace
argument is ignored.kubectl get persistentVolumes \ -o wide \ | { head -1; grep "cas-"; }
Click here to see the output
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE pvc-15d7fbbb-c15a-4685-a21e-e61adadc7056 8Gi RWX Delete Bound gelcorp/cas-default-data nfs-client 4d5h Filesystem pvc-1fd1be30-bf19-43c7-87ae-cd4f3af30028 100Mi RWX Delete Bound gelcorp/cas-default-permstore nfs-client 4d5h Filesystem pvc-ddf16678-4759-46df-8e85-2c03b00b5f52 8Gi RWX Delete Bound gelcorp/sas-cas-backup-data nfs-client 4d5h Filesystem pvc-fa215d9b-14e6-4701-a510-2577736cb566 8Gi RWX Delete Bound gelcorp/sas-cas-transfer-data nfs-client 4d5h Filesystem
You can note from this output that three volumes are created for the cas-shared-default Server. In this Viya deployment they are defined as a NFS storage class.
All of these Viya
persistentVolumes
names are prefixed bypvc-
.The
CLAIM
field contains interesting information:<NAMESPACE>/<persistentVolumeClaim NAME>
.List the CAS server specific Persistent Volumes Claims (pvc).
The Persistent Volume Claims are created and managed at the namespace level. They are a namespace resource. Because of that, when the
persistentVolumeClaims
resources is queried by using the kubectl CLI, the--namespace
argument is important.kubectl get persistentVolumeClaims \ -o wide \ | { head -1; grep "cas-"; }
Click here to see the output
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE cas-default-data Bound pvc-15d7fbbb-c15a-4685-a21e-e61adadc7056 8Gi RWX nfs-client 4d5h Filesystem cas-default-permstore Bound pvc-1fd1be30-bf19-43c7-87ae-cd4f3af30028 100Mi RWX nfs-client 4d5h Filesystem sas-cas-backup-data Bound pvc-ddf16678-4759-46df-8e85-2c03b00b5f52 8Gi RWX nfs-client 4d5h Filesystem sas-cas-transfer-data Bound pvc-fa215d9b-14e6-4701-a510-2577736cb566 8Gi RWX nfs-client 4d5h Filesystem
The
persistentVolumeClaims
create a link between apersistentVolume
(Kubernetes cluster resources) and a volume defined for a pod in a specific namespace.Regarding the
persistentVolumeClaims
, if you compare this output with the previous command output you can note that- The
NAME
is part of apersistentVolumes
CLAIM
field - The
VOLUME
corresponds to apersistentVolumes
NAME
field CAPACITY
,ACCESS MODES
,STORAGECLASS
,AGE
, andVOLUMEMODE
are exactly the same.
- The
Lessons learned
The default CAS Server is an SMP server.
The CAS server pod has three containers
- sas-cas-server
- sas-backup-agent
- sas-consul-agent
Three plus one specific volumes linked via a specific persistentVolumeClaim volumes to a container-local filesystem mount path.
“Three plus one” because the
cas-default-transfer-volume
volume exists only if thestate transfer
is enabled for the CAS server.cas container
Kubernetes
Mount path
Volume
Claim
Pesistent volume
/cas/permstore
cas-default-permstore-volume
cas-default-permstore
pvc-15d7fbbb-c15a-4685-a21e-e61adadc7056
/cas/data
cas-default-data-volume
cas-default-data
pvc-1fd1be30-bf19-43c7-87ae-cd4f3af30028
/sasviyabackup
backup
sas-cas-backup-data
pvc-ddf16678-4759-46df-8e85-2c03b00b5f52
/cas/transferdir
cas-default-transfer-volume
sas-cas-transfer-data
pvc-fa215d9b-14e6-4701-a510-2577736cb566
SAS Viya Administration Operations
Lesson 07, Section 1 Exercise: Add a New CAS Server
Add a new CAS server
In this exercise you will add a new CAS server to your Viya deployment.
Table of content
Set the namespace
gel_setCurrentNamespace gelcorp
Create the new CAS server
The
create-cas-server.sh
script generates all of the manifests you need to create, deploy, and configure a new CAS server. Look at the options of the scriptcreate-cas-server.sh
to get an idea of what you can do with it.bash ~/project/deploy/${current_namespace}/sas-bases/examples/cas/create/create-cas-server.sh \ --help
Flags: -h --help help -i, --instance CAS server instance name -o, --output Output location. If undefined, default to working directory. -v, --version CAS server creation utility version -w, --workers Specify the number of CAS worker nodes. Default is 0 (SMP). -b, --backup Set this to include a CAS backup controller. Disabled by default. -t, --tenant Set the tenant name. default is shared. -r, --transfer Set this to enable support for state transfer between restarts. Disabled by default. -a, --affinity Specify the node affinity and toleration to use for this deployment. Default is 'cas'. -q, --required-affinity Set this flag to have the node affinity be a required node affinity. Default is preferred node affinity.
Important notes:
- “-a, –affinity”, and “-q, –required-affinity” are options that provide the SAS Viya administrator to be able to decide on which Kubernetes nodePool the CAS server pods have to be started, and if it is mandatory or not.
- -r, –transfer option is used to enable/disable the state transfer between CAS server restarts. This will keep the loaded data and CAS sessions persistent to the CAS server restarts. In this workshop we decided to activate this option by default, and you will see later its impact on CAS servers.
Use
create-cas-server.sh
to create a new distributed gelcorp CAS server with a backup controller and two workers. We want the manifests for the new CAS server to be placed in the~/project/deploy/gelcorp/site-config
directory.bash ~/project/deploy/${current_namespace}/sas-bases/examples/cas/create/create-cas-server.sh \ --instance gelcorp \ --output ~/project/deploy/${current_namespace}/site-config \ --workers 2 \ --backup 1 \ --transfer 1
Note that we created the gelcorp CAS server using the
--transfer
option to enable the CAS server state transfer.The name of the new CAS server will be cas-shared-gelcorp.
Fri May 13 12:08:48 EDT 2022 - instance = gelcorp Fri May 13 12:08:48 EDT 2022 - tenant = Fri May 13 12:08:48 EDT 2022 - output = /home/cloud-user/project/deploy/gelcorp/site-config make: *** No rule to make target `install'. Stop. output directory does not exist: /home/cloud-user/project/deploy/gelcorp/site-config/ creating directory: /home/cloud-user/project/deploy/gelcorp/site-config/ Generating artifacts... 100.0% [=======================================================================] |-cas-shared-gelcorp (root directory) |-cas-shared-gelcorp-cr.yaml |-kustomization.yaml |-shared-gelcorp-pvc.yaml |-annotations.yaml |-backup-agent-patch.yaml |-cas-consul-sidecar.yaml |-cas-fsgroup-security-context.yaml |-cas-sssd-sidecar.yaml |-kustomizeconfig.yaml |-provider-pvc.yaml |-transfer-pvc.yaml |-enable-binary-port.yaml |-enable-http-port.yaml |-configmaps.yaml |-state-transfer.yaml |-node-affinity.yaml |-require-affinity.yaml create-cas-server.sh complete!
As shown in the command output, all of the gelcorp CAS server manifests are written to the
/home/cloud-user/project/deploy/gelcorp/site-config/cas-shared-gelcorp
directory.Click here if you want to list the gelcorp CAS server manifests
ls -al ~/project/deploy/${current_namespace}/site-config/cas-shared-gelcorp
You should see…
total 80 drwxrwxr-x 2 cloud-user cloud-user 4096 May 12 08:20 . drwxr-xr-x 7 cloud-user cloud-user 4096 May 12 08:20 .. -rw-rw-r-- 1 cloud-user cloud-user 203 May 12 08:20 annotations.yaml -rw-rw-r-- 1 cloud-user cloud-user 3761 May 12 08:20 backup-agent-patch.yaml -rw-rw-r-- 1 cloud-user cloud-user 2856 May 12 08:20 cas-consul-sidecar.yaml -rw-rw-r-- 1 cloud-user cloud-user 359 May 12 08:20 cas-fsgroup-security-context.yaml -rw-rw-r-- 1 cloud-user cloud-user 5814 May 12 08:20 cas-shared-gelcorp-cr.yaml -rw-rw-r-- 1 cloud-user cloud-user 2282 May 12 08:20 cas-sssd-sidecar.yaml -rw-rw-r-- 1 cloud-user cloud-user 259 May 12 08:20 configmaps.yaml -rw-rw-r-- 1 cloud-user cloud-user 304 May 12 08:20 enable-binary-port.yaml -rw-rw-r-- 1 cloud-user cloud-user 298 May 12 08:20 enable-http-port.yaml -rw-rw-r-- 1 cloud-user cloud-user 340 May 12 08:20 kustomization.yaml -rw-rw-r-- 1 cloud-user cloud-user 1267 May 12 08:20 kustomizeconfig.yaml -rw-rw-r-- 1 cloud-user cloud-user 1353 May 12 08:20 node-affinity.yaml -rw-rw-r-- 1 cloud-user cloud-user 291 May 12 08:20 provider-pvc.yaml -rw-rw-r-- 1 cloud-user cloud-user 486 May 12 08:20 require-affinity.yaml -rw-rw-r-- 1 cloud-user cloud-user 652 May 12 08:20 shared-gelcorp-pvc.yaml -rw-rw-r-- 1 cloud-user cloud-user 433 May 12 08:20 state-transfer.yaml -rw-rw-r-- 1 cloud-user cloud-user 396 May 12 08:20 transfer-pvc.yaml
The next step is to modify
~/project/deploy/gelcorp/kustomization.yaml
to include a reference the cas-shared-gelcorp manifests.Backup the current
kustomization.yaml
file.cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_07-021-01.yaml
Use this
yq
command to add a reference to thesite-config/cas-shared-gelcorp
manifests in theresources
field of the Viya deploymentkustomization.yaml
file.[[ $(grep -c "site-config/cas-shared-gelcorp" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.resources += ["site-config/cas-shared-gelcorp"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can update the
~/project/deploy/gelcorp/kustomization.yaml
file using your favorite text editor:[...] resources: [... previous transformers items ...] - site-config/cas-shared-gelcorp [...]
Verify that the update is in place.
cat ~/project/deploy/${current_namespace}/kustomization.yaml
Search and ensure that
site-config/cas-shared-gelcorp
exists in theresources
field of the Viya deploymentkustomization.yaml
file.Click here to see the output
--- namespace: gelcorp resources: - sas-bases/base # GEL Specifics to create CA secret for OpenSSL Issuer - site-config/security/gel-openssl-ca - sas-bases/overlays/network/networking.k8s.io # Using networking.k8s.io API since 2021.1.6 - site-config/security/openssl-generated-ingress-certificate.yaml # Default to OpenSSL Issuer in 2021.2.6 - sas-bases/overlays/cas-server - sas-bases/overlays/crunchydata/postgres-operator # New Stable 2022.10 - sas-bases/overlays/postgres/platform-postgres # New Stable 2022.10 - sas-bases/overlays/internal-elasticsearch # New Stable 2020.1.3 - sas-bases/overlays/update-checker # added update checker ## disable CAS autoresources to keep things simpler #- sas-bases/overlays/cas-server/auto-resources # CAS-related #- sas-bases/overlays/crunchydata_pgadmin # Deploy the sas-crunchy-data-pgadmin container - remove 2022.10 - site-config/sas-prepull/add-prepull-cr-crb.yaml - sas-bases/overlays/cas-server/state-transfer # Enable state transfer for the cas-shared-default CAS server - new PVC sas-cas-transfer-data - site-config/sas-microanalytic-score/astores/resources.yaml - site-config/gelcontent_pvc.yaml - site-config/cas-shared-gelcorp configurations: - sas-bases/overlays/required/kustomizeconfig.yaml transformers: - sas-bases/overlays/internal-elasticsearch/sysctl-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/startup/ordered-startup-transformer.yaml - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers.yaml - site-config/mirror.yaml #- site-config/daily_update_check.yaml # change the frequency of the update-check #- sas-bases/overlays/cas-server/auto-resources/remove-resources.yaml # CAS-related ## temporarily removed to alleviate RACE issues - sas-bases/overlays/internal-elasticsearch/internal-elasticsearch-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/sas-programming-environment/enable-admin-script-access.yaml # To enable admin scripts #- sas-bases/overlays/scaling/zero-scale/phase-0-transformer.yaml #- sas-bases/overlays/scaling/zero-scale/phase-1-transformer.yaml - sas-bases/overlays/cas-server/state-transfer/support-state-transfer.yaml # Enable state transfer for the cas-shared-default CAS server - enable and mount new PVC - site-config/change-check-interval.yaml - sas-bases/overlays/sas-microanalytic-score/astores/astores-transformer.yaml - site-config/sas-pyconfig/change-configuration.yaml - site-config/sas-pyconfig/change-limits.yaml - site-config/cas-add-nfs-mount.yaml - site-config/cas-add-allowlist-paths.yaml - site-config/cas-modify-user.yaml components: - sas-bases/components/crunchydata/internal-platform-postgres # New Stable 2022.10 - sas-bases/components/security/core/base/full-stack-tls - sas-bases/components/security/network/networking.k8s.io/ingress/nginx.ingress.kubernetes.io/full-stack-tls patches: - path: site-config/storageclass.yaml target: kind: PersistentVolumeClaim annotationSelector: sas.com/component-name in (sas-backup-job,sas-data-quality-services,sas-commonfiles,sas-cas-operator,sas-pyconfig) - path: site-config/cas-gelcontent-mount-pvc.yaml target: group: viya.sas.com kind: CASDeployment name: .* version: v1alpha1 - path: site-config/compute-server-add-nfs-mount.yaml target: labelSelector: sas.com/template-intent=sas-launcher version: v1 kind: PodTemplate - path: site-config/compute-server-annotate-podtempate.yaml target: name: sas-compute-job-config version: v1 kind: PodTemplate secretGenerator: - name: sas-consul-config behavior: merge files: - SITEDEFAULT_CONF=site-config/sitedefault.yaml - name: sas-image-pull-secrets behavior: replace type: kubernetes.io/dockerconfigjson files: - .dockerconfigjson=site-config/crcache-image-pull-secrets.json configMapGenerator: - name: ingress-input behavior: merge literals: - INGRESS_HOST=gelcorp.pdcesx03145.race.sas.com - name: sas-shared-config behavior: merge literals: - SAS_SERVICES_URL=https://gelcorp.pdcesx03145.race.sas.com # # This is to fix an issue that only appears in very slow environments. # # Do not do this at a customer site - name: sas-go-config behavior: merge literals: - SAS_BOOTSTRAP_HTTP_CLIENT_TIMEOUT_REQUEST='15m' - name: input behavior: merge literals: - IMAGE_REGISTRY=crcache-race-sas-cary.unx.sas.com
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_07-021-01.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Look at the existing
CASDeployment
custom resourceskubectl get casdeployment
You should now see the new CAS server you created.
NAME AGE default 3h8m shared-gelcorp 48s
It may take several more minutes for the gelcorp CAS server to fully initialize. The following command will notify you when the CAS server is ready.
kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
You should see these messages in the output.
pod/sas-cas-server-shared-gelcorp-backup condition met pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met
While you are waiting for the CAS server to be ready, you can use OpenLens to monitor the CAS pods.
- Open OpenLens and connect to your GEL Kubernetes cluster.
- Navigate to Workloads –> Pods and then filter on
- namespace: gelcorp
- sas-cas-server
You can sort by
Age
ascending to place the newest pods at the top of the list.When your CAS pods show a status of running you can display the status of all the gelcorp CAS server pods by running this command.
kubectl get pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ -o wide
You should see something like…
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sas-cas-server-shared-gelcorp-backup 3/3 Running 0 10m26s 10.42.0.83 intnode02 <none> <none> sas-cas-server-shared-gelcorp-controller 3/3 Running 0 10m26s 10.42.4.168 intnode04 <none> <none> sas-cas-server-shared-gelcorp-worker-0 3/3 Running 0 10m21s 10.42.2.63 intnode03 <none> <none> sas-cas-server-shared-gelcorp-worker-1 3/3 Running 0 10m21s 10.42.3.115 intnode05 <none> <none>
The gelcorp CAS server has now started and is ready to be used.
Examine the cas-shared-gelcorp server
List the pod containers for the CAS controller.
kubectl top pods \ \ sas-cas-server-shared-gelcorp-controller --containers
POD NAME CPU(cores) MEMORY(bytes) sas-cas-server-shared-gelcorp-controller sas-cas-server 19m 63Mi sas-cas-server-shared-gelcorp-controller sas-backup-agent 1m 29Mi sas-cas-server-shared-gelcorp-controller sas-consul-agent 18m 24Mi
Does the list of containers differ from what you saw for the default CAS server?
List the CAS server Persistent Volumes (pv).
kubectl get persistentVolumes \ -o wide \ | { head -1; \ grep -E "${current_namespace}\/(sas-)?cas-" \ | grep "shared-gelcorp"; }
Click here to see the output
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE pvc-62c75280-6e36-44aa-a60c-53de62f0271a 8Gi RWX Delete Bound gelcorp/cas-shared-gelcorp-data nfs-client 9m19s Filesystem pvc-a018e7ff-a008-494b-92ac-43d1d28d919e 8Gi RWX Delete Bound gelcorp/sas-cas-transfer-data-shared-gelcorp nfs-client 9m18s Filesystem pvc-b0a73e8d-10b3-43d8-a982-c1724e62a19c 100Mi RWX Delete Bound gelcorp/cas-shared-gelcorp-permstore nfs-client 9m19s Filesystem pvc-b757acf9-3504-4a1e-9641-03a0dd793359 4Gi RWX Delete Bound gelcorp/sas-cas-backup-data-shared-gelcorp nfs-client 9m19s Filesystem
Note that an additional persistent volume was created because we enable the state transfer for the gelcorp CAS server.
Does the list of persistent volumes look different?
List the CAS server Persistent Volumes Claims.
kubectl get persistentvolumeclaims \ -o wide \ | { head -1; \ grep "shared\-${current_namespace}"; }
Click here to see the output
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE cas-shared-gelcorp-data Bound pvc-62c75280-6e36-44aa-a60c-53de62f0271a 8Gi RWX nfs-client 10m Filesystem cas-shared-gelcorp-permstore Bound pvc-b0a73e8d-10b3-43d8-a982-c1724e62a19c 100Mi RWX nfs-client 10m Filesystem sas-cas-backup-data-shared-gelcorp Bound pvc-b757acf9-3504-4a1e-9641-03a0dd793359 4Gi RWX nfs-client 10m Filesystem sas-cas-transfer-data-shared-gelcorp Bound pvc-a018e7ff-a008-494b-92ac-43d1d28d919e 8Gi RWX nfs-client 10m Filesystem
Note that an additional persistent volume claim was created because we enable the state transfer for the gelcorp CAS server.
Do you see any differences in the PVCs compared to the default CAS server?
Lessons learned
- It is easy to add a new CAS server to your deployment using the
create-cas-server.sh
script. - You can add either an SMP or MPP CAS server depending the parameters
you passed to the
create-cas-server.sh
script. - The
create-cas-server.sh
script creates all of the manifests needed to create, deploy, and configure a new CAS server. - You must add a reference in
kustomization.yaml
to the location of your new CAS server manifests to add the CAS server to your deployment.
SAS Viya Administration Operations
Lesson 07, Section 2 Exercise: Stop and Restart CAS
Start/Stop/Restart a CAS server
In this exercise you will learn how to stop, start, and restart a CAS server.
Table of content
- Set the namespace
- Check the status of the cas-shared-gelcorp server pods
- Stop the cas-shared-gelcorp server
- Start the cas-shared-gelcorp server
- Restart the cas-shared-gelcorp server
- Lessons learned
Set the namespace
gel_setCurrentNamespace gelcorp
Check the status of the cas-shared-gelcorp server pods
Verify that the cas-shared-gelcorp server is running.
kubectl get pods \
--selector="casoperator.sas.com/server==shared-gelcorp" \
-o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sas-cas-server-shared-gelcorp-backup 3/3 Running 0 17h 10.42.0.83 intnode02 <none> <none>
sas-cas-server-shared-gelcorp-controller 3/3 Running 0 17h 10.42.4.168 intnode04 <none> <none>
sas-cas-server-shared-gelcorp-worker-0 3/3 Running 0 17h 10.42.2.63 intnode03 <none> <none>
sas-cas-server-shared-gelcorp-worker-1 3/3 Running 0 17h 10.42.3.115 intnode05 <none> <none>
Open OpenLens and connect to your GEL Kubernetes cluster.
Navigate to Workloads –> Pods and then filter on
- namespace: gelcorp
- sas-cas-server-shared-gelcorp
Sort by
Age
ascending.
IMPORTANT: Leave OpenLens open and do not change the filtering as you will come back to this display during later steps.
Stop the cas-shared-gelcorp server
When you stop a CAS server, all CAS server pod instances are stopped and deleted and no new pod instances are automatically restarted by the operator, regardless of the replicas setting. The administrator will need to execute a start command to create new CAS server pod instances.
Stop the cas-shared-gelcorp server by setting value of
/spec/shutdown
totrue
in theCASDeployment
.kubectl patch casdeployment \ \ shared-gelcorp --type=json -p='[{"op": "add", "path": "/spec/shutdown", "value":true}]'
casdeployment.viya.sas.com/shared-gelcorp patched
Quickly return to OpenLens and watch the impact of your command on the CAS pods.
All cas-shared-gelcorp pods should show a status of Terminating.
Once the pods terminate, no new cas-shared-gelcorp pods should appear because the CAS server has stopped.
After few seconds, run this command to list the cas-shared-gelcorp pods.
kubectl get pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ -o wide
You should see…
No resources found in gelcorp namespace.
The cas-shared-gelcorp server has completely stopped.
Start the cas-shared-gelcorp server
When the CAS server start command is executed, new instances of the CAS server pods are started and the CAS server is available for use. The CAS server is configured as defined but no previously loaded tables will be available except those tables pre-loaded at startup during session zero processing.
Start the cas-shared-gelcorp server by setting the value of
/spec/shutdown
tofalse
in theCASDeployment
kubectl patch casdeployment \ \ shared-gelcorp --type=json -p='[{"op": "add", "path": "/spec/shutdown", "value":false}]'
casdeployment.viya.sas.com/shared-gelcorp patched
It may take a few minutes for the gelcorp CAS server to fully restart. The following command will notify you when the gelcorp CAS server controller is ready.
kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
It will take CAS time to start so quickly switch back to OpenLens and monitor the impact of your command on the CAS pods.
You should see that cas-shared-gelcorp pods go from Pending…
…to Running with some containers still yellow…
… to Running with all containers green.
Return to your MobaXterm session and you should see these messages in the output informing you that the CAS server is ready.
pod/sas-cas-server-shared-gelcorp-backup condition met pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met
Now look once more at the status of the cas-shared-gelcorp pods.
kubectl get pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sas-cas-server-shared-gelcorp-backup 3/3 Running 0 4m2s 10.42.0.85 intnode02 <none> <none> sas-cas-server-shared-gelcorp-controller 3/3 Running 0 4m2s 10.42.4.200 intnode04 <none> <none> sas-cas-server-shared-gelcorp-worker-0 3/3 Running 0 3m58s 10.42.2.65 intnode03 <none> <none> sas-cas-server-shared-gelcorp-worker-1 3/3 Running 0 3m58s 10.42.3.117 intnode05 <none> <none>
The cas-shared-gelcorp server is started and all required pods are running.
Restart the cas-shared-gelcorp server
When the administrator restarts a CAS server, all instances of the CAS pods are stopped and then a new instance of each CAS server pod is immediately restarted. The CAS server is restarted as configured but no previously loaded tables are available except those tables pre-loaded at startup during session zero processing.
To restart the cas-shared-gelcorp server simply delete the CAS server pods. Kubernetes will notice that your deployment no longer has the pods you requested and will automatically restart new instances of the deleted pods.
kubectl delete pod \ --selector="casoperator.sas.com/server==shared-gelcorp"
You should see these messages in the output.
pod "sas-cas-server-shared-gelcorp-backup" deleted pod "sas-cas-server-shared-gelcorp-controller" deleted pod "sas-cas-server-shared-gelcorp-worker-0" deleted pod "sas-cas-server-shared-gelcorp-worker-1" deleted
Quickly switch back to OpenLens and watch what happens.
You should see all cas-shared-gelcorp pods Terminating.
Then you should see new pods automatically start to replace the deleted ones.
It may take a few minutes for the CAS server to fully restart.
The following command will notify you when the gelcorp CAS server controller is ready.
kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
You should see these messages in the output.
pod/sas-cas-server-shared-gelcorp-backup condition met pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met
Check the status of the cas-shared-gelcorp pods.
kubectl get pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sas-cas-server-shared-gelcorp-backup 3/3 Running 0 2m28s 10.42.3.119 intnode05 <none> <none> sas-cas-server-shared-gelcorp-controller 3/3 Running 0 2m28s 10.42.2.67 intnode03 <none> <none> sas-cas-server-shared-gelcorp-worker-0 3/3 Running 0 2m40s 10.42.4.202 intnode04 <none> <none> sas-cas-server-shared-gelcorp-worker-1 3/3 Running 0 2m28s 10.42.0.87 intnode02 <none> <none>
The cas-shared-gelcorp server has been successfully restarted.
Lessons learned
- There is a difference between stopping a CAS server and restarting a CAS server.
- Stopping a CAS server deletes all CAS server pod instances. New CAS server pod instances will be created only when the CAS Server is started.
- Restarting a CAS server deletes all CAS server pod instances and new pods instances are automatically, and immediately, started in their place.
- Stopping or restarting a CAS server unloads all in-memory tables.
- Stopping a CAS server can be useful in a multi-CAS server configuration to free up resources or to quickly remove access to the data loaded in a particular CAS server.
SAS Viya Administration Operations
Lesson 07, Section 2 Exercise: Using State Transfer
Restart a CAS server using state transfer
In this exercise you will use the CAS server state transfer capability to restart the CAS server. This will preserve the sessions, tables, and state of a running CAS server.
Table of content
- Set the namespace
- Current cas-shared-gelcorp CAS server pods
- Current CAS servers loaded tables
- Load tables in the cas-shared-gelcorp CAS server
- Execute the cas-shared-gelcorp CAS server state transfer and monitor it
- Look at the impact of the CAS server state transfer on data and CAS session
- Lessons learned
Set the namespace
gel_setCurrentNamespace gelcorp
Current cas-shared-gelcorp CAS server pods
Using OpenLens, you can see the current status of the cas-shared-gelcorp server. The cas-shared-gelcorp server is MPP with a backup controller and two workers.
- Navigate to Workloads –> Pods and then filter on
- namespace: gelcorp
- sas-cas-server-shared-gelcorp
The name of each cas-shared-gelcorp server pods is like: sas-cas-server-<CASServerInstanceName>-<CASServerNodeType>
If you double-click on a cas-shared-gelcorp server pod you could see detailed information about the pod.
Because the cas-shared-gelcorp server was created
with the state transfer option, a new label is defined for each CAS
server pod: casoperator.sas.com/instance-index
.
By default the instance-index
is set to 0
.
This value is not used with the first instance of the CAS server pods.
You will see later in this hands-on how it will be used.
Current CAS servers loaded tables
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Look at the loaded tables properties
All these tables are all loaded on the cas-shared-default server. No table is currently loaded into the cas-shared-gelcorp server.
DO NOT LOG OFF SAS ENVIRONMENT MANAGER, you will have to go back to SAS Environment Manager later in this hands-on to monitor
Available
(loaded) tables again.
Load tables in the cas-shared-gelcorp CAS server
Open SAS Studio, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Studio"
In the
Explorer
panel, navigate toFolder Shortcuts \ Shortcut to My Folder \ My SAS Code
, and then open theTestCASServerStateTransfer.sas
program.You can fold all region code section if you want, like in the screen below.
Execute the
Step1
code to start a CAS session and load tables in the cas-shared-gelcorp server.Select the first region code and submit it by clicking on the run button.
In the
Log
panel, search for the CAS session UUID and note it. You will have to compare this value with the CAS session UUID from the new instance of the cas-shared-gelcorp server later in this hands-on.Go back to your SAS environment Manager session. Then, in the
Data
panel, refresh theAvailable
tab content.Look at the loaded tables properties.
Two tables where loaded into the cas-shared-gelcorp server as expected by the SAS program above.
Execute the cas-shared-gelcorp CAS server state transfer and monitor it
If you look again the SAS code in SAS Studio, the
Step2
consist to wait until the cas-shared-gelcorp server state transfer command is executed in MobaXterm.Go back to your MobaXterm session tu run the run the kubectl command below.
This command will patch the cas-shared-gelcorp server
CASDeployment
custom resource to initiate the CAS server state transfer process.kubectl patch casdeployment \ \ shared-gelcorp --type='json' \ -p='[{"op": "replace", "path": "/spec/startStateTransfer", "value":true}]'
casdeployment.viya.sas.com/shared-gelcorp patched
Switch back to OpenLens to monitor the impact of the state transfer process against the cas-shared-gelcorp server pods.
Navigate to Workloads –> Pods and then filter on
- namespace: gelcorp
- sas-cas-server-shared-gelcorp
State transfer step1: a new instance of the cas-shared-gelcorp server pods are started in the SAS Viya deployment.
These pods instance are started for the same CAS server, based on the same
CASDeployment
custom resource. The cas-shared-gelcorp server instance name does not change (same as theCASDeployment
custom resource), but, if you look at the cas-shared-gelcorp server pods name, the name of the pods changed a little bit.The name of each new instance of the cas-shared-gelcorp pods is now like: sas-cas-server-<CASServerInstanceName>-<CASServerInstanceIndex>-<CASServerNodeType>.
This step could take a few minutes depending the Kubernetes cluster resource availability. Kubernetes has to find where the new instance of the CAS server pods can start based on resources availability, and other rules like workload node placement.
State transfer step2: when the 2 instances of the cas-shared-gelcorp server pods are running, all loaded data and CAS sessions are transferred to the new instance of the CAS server.
This step could take a while since all loaded table, and existing CAS session information, from the previous instance of the CAS server have to be saved in JSON format files into the
cas-default-transfer-volume
volume. All these JSON files are used to reload the data into the new instance of the CAS Server and all CAS sessions have to be recreated as they were.The data that is transferred is all data that was loaded in the CAS server:
- The global data
- The users’ sessions data
Be careful…
- Two instances of the CAS server pods have to run simultaneously for a few minutes.
- The data have to be loaded in both instances of the CAS server for a few minutes.
This requires a lot of extra resources in the Kubernetes cluster to support this CAS server capability.
SAS Viya administrator have to really think about the cost impacts before enabeling the CAS server state transfer.
The CAS server state transfer capability has to be discuss with the Architect, the customer, and the deployment team before SAS Viya is deployed.
State transfer step3: the transfer of the data and sessions is finished, the cas-shared-gelcorp server is now fully restarted (a new instance of the CAS server pods started and the previous instance is terminated).
The new instance of the cas-shared-gelcorp server is ready to be used by the users.
You will see in next steps of this hands-on the impact for the users regarding the loaded tables and existing CAS sessions.
Look at the new instance of the cas-shared-gelcorp server pods in OpenLens.
You can see that the new instance of the cas-shared-gelcorp server is running. Because it is a new instance, a label was updated on each pods of the CAS server:
casoperator.sas.com/instance-index
.The
casoperator.sas.com/instance-index
label value is incremented by one each time the state transfer is initiated for the CAS server.
Look at the impact of the CAS server state transfer on data and CAS session
Go back to your SAS Studio session.
Execute the
Step3
of the SAS programSelect the
Step3
region code and submit it by clicking on the run button.In the
Log
panel, search for the CAS session UUID and and compare it with the one that you noted earlier in this hands-on.You can see that the CAS session UUID did not change. The CAS session was transferred to the new instance of the cas-shared-gelcorp server.
Go back to your SAS environment Manager session. Then, in the
Data
panel, refresh theAvailable
tab content.Look at the loaded tables properties.
Lessons learned
By enabling the CAS server state transfer capability, it is possible to preserve the sessions, tables, and state of a running CAS server for a new CAS server instance that is being started as part of a CAS server update (apply new configuration, change the topology, or update the CAS server pods).
The CAS server state transfer capability requires to be able to use the double of the resources that the CAS server used for a few minutes.
Using this restarting process, the users are less impacted, there is now downtime.
SAS Viya Administration Operations
Lesson 07, Section 3 Exercise: Configure CAS Startup
Modify the CAS Server Configuration Files
In this exercise we will review CAS server configuration and configure CAS startup for session zero processing.
Modifying the CAS Server configuration files requires a restart of your CAS servers which results in the termination of all active connections and sessions and the loss of any in-memory data. But all CAS server configurations, and the permstore are persisted.
Table of contents
- Set the namespace and authenticate
- Make the HR data and custom formats available to the cas-shared-gelcorp server
- Inspect CAS startup parameters
- Modify the CAS server session zero processing to load HR tables
- Lesson learned
Set the namespace and authenticate
gel_setCurrentNamespace gelcorp
/opt/pyviyatools/loginviauthinfo.py
Make the HR data and custom formats available to the cas-shared-gelcorp server
The goal of this hands-on is to configure what happens during cas-shared-gelcorp CAS Server session zero processing.
When the cas-shared-gelcorp server starts, we would like for certain HR tables to be loaded and for the custom formats the tables reference to be available to CAS.
Workaround required for the current version of Viya
Without this workaround, the user that owns the cas-shared-gelcorp CAS serer process, regardless of its group membership settings, will be not able to access the required data because secondary groups memberships are not defined before the end of the CAS server startup which is too late for session zero processing.
# Change the permissions on required HR directories and files
sudo chmod o+rx /shared/gelcontent/gelcorp/hr
sudo chmod o+rx /shared/gelcontent/gelcorp/hr/data
sudo chmod o+r /shared/gelcontent/gelcorp/hr/data/*.*
sudo chmod o+rx /shared/gelcontent/gelcorp/hr/formats
sudo chmod o+r /shared/gelcontent/gelcorp/hr/formats/*.*
This adds rx (read/execute) permissions against some directories to allow the cas user to be able to access them. And add r (read) permission to some files to allow the cas user to be able to read these files.
These file accesses are required for you to be able to test the CAS session-zero script on the cas-shared-gelcorp server.
HOPE THIS BUG WILL BE FIXED SOON.
Make the HR user defined formats available to the cas-shared-gelcorp server
List the current user defined formats available into the cas-shared-gelcorp server.
gel_sas_viya --output text \ \ cas format-libraries --server cas-shared-gelcorp list
You should see…
There are no SAS format libraries in the server "cas-shared-gelcorp".
Make the HR user defined formats available to the cas-shared-gelcorp server.
gel_sas_viya cas format-libraries \ --server cas-shared-gelcorp \ create --format-library HRFORMATS \ --search-order append \ --source-path /gelcontent/gelcorp/hr/formats/formats.sas7bcat \ --caslib "formats" \ --su \ --force
You should see…
The SAS format library "HRFORMATS" was successfully created and was appended to the end of the SAS format search order.
This command is used to access the HR user defined formats SAS catalog (formats.sas7bcat) and load it as CAS server user formats (fmtLibName)named HRFORMATS. The new CAS user format library is then loaded inside the Formats global CAS library.
Validate using the
sas-viya
CLIgel_sas_viya --output text \ \ cas format-libraries --server cas-shared-gelcorp list
You should see…
Format Library Present in Format Search Path Scope Persisted Caslib Table HRFORMATS false global true FORMATS HRFORMATS HRFORMATS true session true FORMATS HRFORMATS
Lets look in the cas-shared-gelcorp Server permstore to see what happened.
Since we enabled the state transfer for the cas-shared-gelcorp server, the name of the CAS server pods will change each time the statetransfer was initiated. The name of the pod contains the
instance-index
value until the CA server pods are restarted (deleted then recreated).Because he name of the CAS server pods is not stable, we have to extract the required pod name using label filtering to be able to access aspecific CAS server pod.
The command below returns the current name of the cas-shared-gelcorp server controller pod.
_CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- bash -c "cat /cas/permstore/primaryctrl/addfmtlibs_startup.lua"
-------- -- Format Library persistence file #1, Version 1.0 -------- log.info('----------------------------------------') log.info('Lua: Running add_fmt_libs.lua') log.info('----------------------------------------') s:sessionProp_addFmtLib{caslib="Formats",fmtLibName="HRFORMATS",name="hrformats.sashdat",replace=true,promote=true}
A lua file was created with the cas format library definition. This means that the HR UDF will be reloaded into the cas-shared-gelcorp servereach time it is restarted.
Validate using the SAS Environment Manager.
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
User-Defined Formats
page and then selectcas-shared-gelcorp
from theServer:
drop down list.Verify that the
HRFORMATS
is listed in the Format Library list.You are now able to see the
HRFORMATS
formats.If you look at the
edu
format properties
Make HR data available to the cas-shared-gelcorp server
List the current global CAS libraries available to the cas-shared-gelcorp server.
gel_sas_viya --output text \ \ cas caslibs --server cas-shared-gelcorp list
You should see…
Name Source Type Description Scope Path CASUSER(geladm) PATH Personal File System Caslib global /cas/data/caslibs/casuserlibraries/geladm/ Formats PATH Stores user defined formats. global /cas/data/caslibs/formats/ ModelPerformanceData PATH Stores performance data output for the Model Management service. global /cas/data/caslibs/modelMonitorLibrary/ Models PATH Stores models created by Visual Analytics for use in other analytics or SAS Studio. global /cas/data/caslibs/models/ Public PATH Shared and writeable caslib, accessible to all users. global /cas/data/caslibs/public/ Samples PATH Stores sample data, supplied by SAS. global /cas/data/caslibs/samples/ SystemData PATH Stores application generated data, used for general reporting. global /cas/data/caslibs/sysData/
There should be no CAS library defined to access the HR department data (
/gelcontent/gelcorp/hr/data
).Create a HR CAS library for the cas-shared-gelcorp server.
gel_sas_viya cas caslibs \ \ create path --caslib hrdl \ --path /gelcontent/gelcorp/hr/data \ --server cas-shared-gelcorp \ --description "gelcontent_for_HR_department" \ --superuser
You should see…
The requested caslib "hrdl" has been added successfully. Caslib Properties Name hrdl Server cas-shared-gelcorp Description gelcontent_for_HR_department Source Type PATH Path /gelcontent/gelcorp/hr/data/ Scope global Caslib Attributes active true personal false subDirs false
Validate the new caslib using the
sas-viya
command.gel_sas_viya --output text \ \ cas caslibs --server cas-shared-gelcorp list
You should see…
Name Source Type Description Scope Path CASUSER(geladm) PATH Personal File System Caslib global /cas/data/caslibs/casuserlibraries/geladm/ Formats PATH Stores user defined formats. global /cas/data/caslibs/formats/ hrdl PATH gelcontent_for_the_HR_department global /gelcontent/gelcorp/hr/data/ ModelPerformanceData PATH Stores performance data output for the Model Management service. global /cas/data/caslibs/modelMonitorLibrary/ Models PATH Stores models created by Visual Analytics for use in other analytics or SAS Studio. global /cas/data/caslibs/models/ Public PATH Shared and writeable caslib, accessible to all users. global /cas/data/caslibs/public/ Samples PATH Stores sample data, supplied by SAS. global /cas/data/caslibs/samples/ SystemData PATH Stores application generated data, used for general reporting. global /cas/data/caslibs/sysData/
Validate the CAS library using SAS Environment Manager.
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Data
page and Select theData Sources
tab.Select
cas-shared-gelcorp
as the CAS server.You should now be able to see the
hdrl
CAS library.
Inspect CAS startup parameters
First, review the CAS configuration to see the parameters that
control the behavior of the CAS server when it starts (session zero). In
this task we will inspect the configuration using the
sas-viya
command-line interface.
Use the
configuration
plugin to list all cas-shared-gelcorp server configuration instances.gel_sas_viya --output text \ \ configuration configurations --service cas-shared-gelcorp \ list --definition-name sas.cas.instance.config
Click here to see the output
Id DefinitionName Name Services IsDefault 213a06db-e5ac-42cb-a541-8120562b01c3 sas.cas.instance.config config cas-shared-gelcorp true 8ed88586-5b40-4da4-b86d-c7cd1d6973c4 sas.cas.instance.config delete cas-shared-gelcorp true 63347f67-7a35-44cc-b78a-94c3c98cd2fa sas.cas.instance.config logconfig cas-shared-gelcorp true cd90210f-a039-44c4-99aa-ff4eb3e25e1f sas.cas.instance.config sessionlogconfig cas-shared-gelcorp true eae686a8-2503-4710-8bfb-07b5b7c4691a sas.cas.instance.config settings cas-shared-gelcorp true 1a7bc8e9-d706-451f-ad79-c4a907f15e51 sas.cas.instance.config startup cas-shared-gelcorp true
You will be able to see these configuration instances later using SAS Environment Manager.
Let’s start by looking at the current startup settings by listing details of the
sas.cas.instance.config:startup
configuration.T To do this, we need to get the ID of that particular configuration instance so we can pass that into theshow
command.Get the instance ID for the cas-shared-gelcorp startup configuration. This is basically the same command you just ran with a bit of extra code to strip out just the ID value.
_CAS_Startup_ConfigInstance_Id=$(gel_sas_viya --output text \ \ configuration configurations --service cas-shared-gelcorp \ list --definition-name sas.cas.instance.config \ | grep "startup" \ | awk '{printf $1}') echo ID=${_CAS_Startup_ConfigInstance_Id}
Now that we have the instance ID, show the details of the cas-shared-gelcorp startup configuration instance.
First, use the
sas-viya
command:gel_sas_viya --output text \ \ configuration configurations --id=${_CAS_Startup_ConfigInstance_Id} show
id : 1a7bc8e9-d706-451f-ad79-c4a907f15e51 metadata.isDefault : true metadata.mediaType : application/vnd.sas.configuration.config.sas.cas.instance.config+json;version=1 metadata.services : [cas-shared-gelcorp] name : startup contents : -- CAS session-zero startup script extensions. -- -- Lua-formatted SWAT client code -- that executes specified actions during session-zero prior to -- clients connecting to CAS. -- s:table_addCaslib{ name="sales", description="Sales data", dataSource={srcType="path"}, path="/data/sales" }
The
contents
value contains Lua code that is to be executed during CAS Server startup (session zero). The configuration details are stored in the SAS Infrastructure Data Server (PostgreSQL).During CAS server initialization, the Lua code from the contents value is extracted from the configuration and written to the
/cas/config/casstartup_usermods.lua
file inside the CAS pod. When session zero processing takes place, CAS then reads the/cas/config/casstartup_usermods.lua
file and carries out the instructions.To prove that, take a look inside the CAS server controller pod and you should see that the contents of
casstartup_usermods.lua
match the value from the configuration instance._CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- bash -c "cat /cas/config/casstartup_usermods.lua"
-- CAS session-zero startup script extensions. -- -- Lua-formatted SWAT client code -- that executes specified actions during session-zero prior to -- clients connecting to CAS. -- s:table_addCaslib{ name="sales", description="Sales data", dataSource={srcType="path"}, path="/data/sales" }
You can also use SAS Environment Manager to examine the configuration instance.
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Configuration
page and then selectDefinitions
as the view.Filter on
sas-cas
, selectsas.cas.instance.config
, and click on theCollapse all
icon (double arrows to top).You should now be able to see all of
sas.cas.instance.config
definitions for all CAS Servers (shared-default
andshared-gelcorp
).Expand the
cas-shared-gelcorp: startup
definitionYou can now see the Lua code that will populate the
casstatup_usermods.lua
file in the CAS Server pod.
Quiz time! Now that you have seen the Lua code many times, what processing will actually take place when the CAS server starts?
Click here to see the output
Nothing!
All lines are commented (“–”).
Modify the CAS server session zero processing to load HR tables
In the previous steps you defined an HR caslib to make HR data accessible to CAS but that does not force the loading of any tables into memory.
In this step, you will modify the
cas-shared-gelcorp: startup
definition to pre-load two HR
tables each time the cas-shared-gelcorp server
starts.
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Configuration
page and selectDefinitions
as the view.Filter on
sas-cas
, selectsas.cas.instance.config
, and click on theCollapse all
icon (double arrows to top).You should now be able to see the all
sas.cas.instance.config
definitions for all existing CAS Servers (shared-default
andshared-gelcorp
).Edit the
cas-shared-gelcorp: startup
definition instance by clicking the pencil icon.Add the following lines to the
contents
property’s text field below the existing text and then save your change. These Lua commands instruct CAS to load the HR_SUMMARY and HRDATA tables into memory from the hrdl caslib.-- Add User Defined Formats permanently and re-loadable ------------------------------------------------------ -- Not required since defined CAS formats libraries are automatically loaded since Stable 2021.1.4 -- Add HR tables to be reloaded at CAS Server start --------------------------------------------------- ---- Load HR summary table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hr_summary.csv", promote=true } ---- Load HR data table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hrdata.sas7bdat", promote=true }
The cas-shared-gelcorp server needs to be restarted for the change to take effect.
Since we enabled the state transfer the cas-shared-gelcorp server you have now choices to restart the CAS server.
Choice 1: initiate the state transfer
All loaded tables and active CAS session will be kept.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be incremented by 1.kubectl patch casdeployment \ \ shared-gelcorp --type='json' -p='[{"op": "replace", "path": "/spec/startStateTransfer", "value":true}]' sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
Choice 2: delete the CAS server pods
All loaded tables and active CAS sessions will be lost.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be reset to 0.kubectl delete pod \ --selector="casoperator.sas.com/server==shared-gelcorp" sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
While you are waiting, switch to OpenLens and monitor the CAS pod activity.
Open OpenLens, connect to the GEL Kubernetes cluster, navigate to Workloads/Pods, and then filter on:
- namespace: gelcorp
- sas-cas-server-shared-gelcorp
As you saw in the last exercise, all cas-shared-gelcorp pods should terminate.
Then you should see the cas-shared-gelcorp pods restart.
When all containers are green the cas-shared-gelcorp is started and is ready for you to use.
Once the server is ready, verify that the HR tables have been loaded into memory.
Using SAS Environment Manager
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Data
page.On the
Available
tab, which shows tables loaded in memory, you should see the two HR tables, HR_SUMMARY and HRDATA are loaded.Click on the
HRDATA
table name to display its details, noticing that the custom format EDU. has been applied to the Education variable which is a double.Switch to the Sample Data tab and verify that the EDU. custom format has been applied to the Education values.
Using
sas-viya
.Look at the
cas-shared-gelcorp: startup
configuration instance.As you did before, get the cas-shared-gelcorp server startup configuration instance Id.
_CAS_Startup_ConfigInstance_Id=$(gel_sas_viya --output text \ \ configuration configurations --service cas-shared-gelcorp \ list --definition-name sas.cas.instance.config \ | grep "startup" \ | awk '{printf $1}') echo ID=${_CAS_Startup_ConfigInstance_Id}
Show details of the cas-shared-gelcorp server startup configuration instance.
gel_sas_viya --output text \ \ configuration configurations --id=${_CAS_Startup_ConfigInstance_Id} show
You should see…
id : 3b957407-eaf8-4a91-9ca1-99c4fb95790e metadata.isDefault : false metadata.mediaType : application/vnd.sas.configuration.config.sas.cas.instance.config+json;version=1 metadata.services : [cas-shared-gelcorp] name : startup contents : -- CAS session-zero startup script extensions. -- -- Lua-formatted SWAT client code -- that executes specified actions during session-zero prior to -- clients connecting to CAS. -- s:table_addCaslib{ name="sales", description="Sales data", dataSource={srcType="path"}, path="/data/sales" } -- Add User Defined Formats permanently and re-loadable ------------------------------------------------------ -- Not required since defined CAS formats libraries are automatically loaded since Stable 2021.1.4 -- Add HR tables to be reloaded at CAS Server start --------------------------------------------------- ---- Load HR summary table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hr_summary.csv", promote=true } ---- Load HR data table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hrdata.sas7bdat", promote=true }
Verify that the HRFORMATS format library is in the list of available format libraries.
gel_sas_viya --output text \ \ cas format-libraries --server cas-shared-gelcorp list
You should see:
Format Library Present in Format Search Path Scope Persisted Caslib Table HRFORMATS false global true FORMATS HRFORMATS HRFORMATS true session true FORMATS HRFORMATS
List the formats from the HRFORMATS CAS format library.
gel_sas_viya --output text \ \ cas format-libraries --format-library HRFORMATS \ show-formats --server cas-shared-gelcorp
Format Name Version edu 1 perf 1 rate 1 work 1
List the files from the hrdl global CAS library and see which ones were automatically loaded when the CAS server restarted.
gel_sas_viya --output text \ \ cas tables --server cas-shared-gelcorp \ list --caslib hrdl
Name Source Table Name Scope State EMPLOYEE_NEW employee_new.sas7bdat None unloaded HR_SUMMARY hr_summary.csv global loaded HRDATA hrdata.sas7bdat global loaded PERFORMANCE_LOOKUP performance_lookup.sas7bdat None unloaded
Using kubectl, look at the cas-shared-gelcorp server
casstartup_usermods.lua
file content._CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- bash -c "cat /cas/config/casstartup_usermods.lua"
Click here to see the output
-- CAS session-zero startup script extensions. -- -- Lua-formatted SWAT client code -- that executes specified actions during session-zero prior to -- clients connecting to CAS. -- s:table_addCaslib{ name="sales", description="Sales data", dataSource={srcType="path"}, path="/data/sales" } -- Add User Defined Formats permanently and reloadable ------------------------------------------------------ -- Not required since defined CAS formats libraries are automatically loaded since Stable 2021.1.4 -- Add HR tables to be reloaded at CAS Server start --------------------------------------------------- ---- Load HR summary table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hr_summary.csv", promote=true } ---- Load HR data table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hrdata.sas7bdat", promote=true }
Lesson learned
- CAS server usermods files must be managed using either SAS Environment Manager or the sas-viya CLI.
- The CAS server usermods files should not be modified directly inside the CAS server pods cas container.
- When pre-loading data in session zero processing, you must
- Make sure you have a caslib defined to the data
- Make sure user-defined formats used by the tables are available to CAS
- Add Lua code to the
sas.cas.instance.config:startup
configuration to load the tables.
- Modifying
sas.cas.instance.config
definitions requires restarting the CAS server to pick up the changes. You do not have to update the entire Viya deployment though.
SAS Viya Administration Operations
Lesson 07, Section 4 Exercise: Change Topology of an Additional CAS Server
Managing CAS Server Topology for Servers You Added
The steps to modify the topology of CAS servers you add to the deployment differ from the steps to modify the default CAS server.
In this hands-on you will learn how to modify the topology of CAS
servers you have added to the deployment. You will re-run the
create-cas-server.sh
script with different parameters to
re-generate a new set of manifests for the
cas-shared-gelcorp server to modify its current
topology. You will also be able to confirm that the configuration
changes you made earlier are preserved when you modify the topology as
long as the CAS instance name is kept the same.
The topology change technique you will follow here is different than
the one recommended in the SAS Viya documentation but does not work for
the cas-shared-default server. The technique for
modifying the default CAS server’s topology is covered in the
07_052_CAS_Manage_Topology_Default_Optional.md
hands-on.
After making a topology change you then look at the impact of the change on the cas-shared-gelcorp content.
Table of content
- Set the namespace
- The current topology of cas-shared-gelcorp server
- Modify the cas-shared-gelcorp server topology
- Look at the impact of the topology change on cas-shared-gelcorp server configuration
- Lessons learned
Set the namespace
gel_setCurrentNamespace gelcorp
/opt/pyviyatools/loginviauthinfo.py
The current topology of cas-shared-gelcorp server
Recall that in previous exercises you:
- Created cas-shared-gelcorp as an MPP CAS Server
with
- a CAS controller
- a CAS backup controller
- two CAS workers
- Configured cas-shared-gelcorp session zero to load specific HR tables
- Relocated CAS_DISK_CACHE for cas-shared-gelcorp
server from an
emptyDir
volume to ahostPath
volume.
Modify the cas-shared-gelcorp server topology
Now let’s modify the topology of cas-shared-gelcorp server so that it has one CAS controller and four CAS workers and no longer has a backup controller.
Because you added cas-shared-gelcorp server using
the create-cas-server.sh
script, you can simply re-run the
script with the appropriate parameter changes needed to define the new
topology you want the server to have. And because we are updating an
existing CAS server, we will be careful to keep the
--instance gelcorp
parameter the same as we did when we
created the server.
Re-generate the cas-shared-gelcorp server manifests by re-running
create-cas-server.sh
with parameters to increase the number of workers to four and to remove the backup controller.echo "y" | bash ~/project/deploy/${current_namespace}/sas-bases/examples/cas/create/create-cas-server.sh \ --instance gelcorp \ --output ~/project/deploy/${current_namespace}/site-config \ --workers 4 \ --backup 0 \ --transfer 1
Fri May 13 12:10:25 EDT 2022 - instance = gelcorp Fri May 13 12:10:25 EDT 2022 - tenant = Fri May 13 12:10:25 EDT 2022 - output = /home/cloud-user/project/deploy/gelcorp/site-config make: *** No rule to make target `install'. Stop. output directory does not exist: /home/cloud-user/project/deploy/gelcorp/site-config/ creating directory: /home/cloud-user/project/deploy/gelcorp/site-config/ Generating artifacts... 100.0% [=======================================================================] |-cas-shared-gelcorp (root directory) |-cas-shared-gelcorp-cr.yaml |-kustomization.yaml |-shared-gelcorp-pvc.yaml |-annotations.yaml |-backup-agent-patch.yaml |-cas-consul-sidecar.yaml |-cas-fsgroup-security-context.yaml |-cas-sssd-sidecar.yaml |-kustomizeconfig.yaml |-provider-pvc.yaml |-transfer-pvc.yaml |-enable-binary-port.yaml |-enable-http-port.yaml |-configmaps.yaml |-state-transfer.yaml |-node-affinity.yaml create-cas-server.sh complete!
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_07-051-01.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
It may take several more minutes for the cas-shared-gelcorp server to fully initialize. The following command will notify you when the CAS Server is ready.
kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
You should see these messages in the output.
pod/sas-cas-server-shared-gelcorp-backup condition met pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met pod/sas-cas-server-shared-gelcorp-worker-2 condition met pod/sas-cas-server-shared-gelcorp-worker-3 condition met
Now take a look at the cas-shared-gelcorp pods. Does anything look strange to you?
kubectl get pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ -o wide
You should see something like…
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sas-cas-server-shared-gelcorp-3-backup 3/3 Running 0 27m 10.42.0.125 intnode03 <none> <none> sas-cas-server-shared-gelcorp-3-controller 3/3 Running 0 27m 10.42.4.72 intnode05 <none> <none> sas-cas-server-shared-gelcorp-3-worker-0 3/3 Running 0 27m 10.42.2.97 intnode02 <none> <none> sas-cas-server-shared-gelcorp-3-worker-1 3/3 Running 0 27m 10.42.3.79 intnode04 <none> <none> sas-cas-server-shared-gelcorp-3-worker-2 3/3 Running 0 6m7s 10.42.1.62 intnode01 <none> <none> sas-cas-server-shared-gelcorp-3-worker-3 3/3 Running 0 6m7s 10.42.0.129 intnode03 <none> <none>
The cas-shared-gelcorp server has started but it does not have the topology you may have expected. The additional CAS workers have been added (+2) but the backup controller still exists even though you modified the topology to remove it.
To fully implement the CAS server topology changes you must now restart the CAS server.
Since we enabled the state transfer the cas-shared-gelcorp server you have now choices to restart the CAS server.
Choice 1: initiate the state transfer
All loaded tables and active CAS session will be kept.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be incremented by 1.kubectl patch casdeployment \ \ shared-gelcorp --type='json' -p='[{"op": "replace", "path": "/spec/startStateTransfer", "value":true}]' sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
Choice 2: delete the CAS server pods
All loaded tables and active CAS sessions will be lost.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be reset to 0.kubectl delete pod \ --selector="casoperator.sas.com/server==shared-gelcorp" sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
You should see something like this.
pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met pod/sas-cas-server-shared-gelcorp-worker-2 condition met pod/sas-cas-server-shared-gelcorp-worker-3 condition met
Now cas-shared-gelcorp server has the topology you configured with a controller and four workers.
Look at the impact of the topology change on cas-shared-gelcorp server configuration
Using the kubectl CLI
List the configuration files on the cas-shared-gelcorp server controller and note when they were created.
_CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- bash -c "ls -al /cas/config/"
Click here to see the output
total 184 drwxrwsrwx 6 root sas 4096 Jun 1 16:40 . drwxr-xr-x 1 root root 21 Jun 1 16:39 .. -rw-r--r-- 1 1002 sas 1549 Jun 1 16:40 casconfig_container.lua -rw-r--r-- 1 1002 sas 545 Jun 1 16:39 casconfig_deployment.lua -rw-r--r-- 1 1002 sas 11443 Jun 1 16:39 casconfig.lua -rw-r--r-- 1 1002 sas 287 Jun 1 16:34 casconfig_usermods.lua -rw-r--r-- 1 1002 sas 711 Jun 1 16:40 cas_container.settings -rwx------ 1 1002 sas 65 Jun 1 16:40 cas_key -rw-r--r-- 1 1002 sas 5 Jun 1 16:40 cas.pid -rw-r--r-- 1 1002 sas 1282 Jun 1 16:39 cas.settings -rw-r--r-- 1 1002 sas 855 Jun 1 16:39 casstartup.lua -rw-r--r-- 1 1002 sas 1124 Jun 1 16:34 casstartup_usermods.lua -rw-r--r-- 1 1002 sas 163 Jun 1 16:40 cas_usermods_bootstrap.log -rw-r--r-- 1 1002 sas 217 Jun 1 16:34 cas_usermods.settings -rw-r--r-- 1 1002 sas 1744 Jun 1 16:39 cas.yml drwxr-sr-x 2 1002 sas 6 Jun 1 16:39 conf.d -rw-r--r-- 1 1002 sas 8 Jun 1 16:39 .configrc -rw-r--r-- 1 1002 sas 41602 Jun 1 16:39 confLog.json -rwxr-x--- 1 1002 sas 5531 Jun 1 16:40 crsplanning-qp-logback.xml -rw-r--r-- 1 1002 sas 98 Jun 1 16:34 kv.log -rwxr-xr-x 1 1002 sas 3024 Jun 1 16:39 launchconfig -rw-r--r-- 1 1002 sas 1296 Jun 1 16:34 logconfig.session.xml -rw-r--r-- 1 1002 sas 3814 Jun 1 16:39 logconfig.trace.xml -rw-r--r-- 1 1002 sas 3210 Jun 1 16:34 logconfig.xml -rw-r--r-- 1 1002 sas 1354 Jun 1 16:39 node.lua -rw-r--r-- 1 1002 sas 2875 Jun 1 16:40 node_usermods.lua -rwxr-xr-x 1 1002 sas 12825 Jun 1 16:39 perms.xml -rw-r--r-- 1 1002 sas 456 Jun 1 16:40 sas-cas-container -rw-r--r-- 1 1002 sas 8089 Jun 1 16:40 sas-configuration-configurations-v1.json -rw-r--r-- 1 1002 sas 1216 Jun 1 16:40 sas-configuration-definitions-v1.json drwxr-sr-x 3 1002 sas 17 Jun 1 16:40 share drwxr-sr-x 2 1002 sas 171 Jun 1 16:40 start.d drwxr-sr-x 2 1002 sas 47 Jun 1 16:40 tokens -rw-r--r-- 1 1002 sas 91 Jun 1 16:40 usermodsdelete.sh
You can see that all of the CAS server configuration files have been regenerated.
List out the
casconfig_container.lua
file which contains CAS environment variables. Was the previous configuration of CAS_DISK_CACHE retained even though you changed the topology?_CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- bash -c "cat /cas/config/casconfig_container.lua"
-- Inserting section capturing variables set on container creation. cas.dqlocale = 'ENUSA' cas.hostknownby = 'controller.sas-cas-server-shared-gelcorp.gelcorp' cas.initialworkers = 4 cas.dqsetuploc = 'QKB CI 33' cas.elastic = 'true' cas.gcport = 5571 cas.servicesbaseurl = 'https://gelcorp.***********.race.sas.com' cas.machinelist = '/dev/null' cas.userloc = '/cas/data/caslibs/casuserlibraries/%USER' cas.permstore = '/cas/permstore' cas.mode = 'mpp' cas.initialbackups = 0 cas.keyfile = '/cas/config/cas_key' cas.colocation = 'none' env.CONSUL_HTTP_ADDR = 'https://localhost:8500' env.CAS_VIRTUAL_HOST = 'controller.sas-cas-server-shared-gelcorp.gelcorp' env.CAS_DEPLOYED_LOGCFGLOC = '/opt/sas/viya/config/etc/cas/default/logconfig.xml' env.CASDATADIR_CASLIBS = '/cas/data/caslibs' env.CASDATADIR = '/cas/data' env.CAS_VIRTUAL_PORT = 8777 env.CONSUL_CACERT = '/security/trustedcerts.pem' env.CAS_VIRTUAL_PATH = '/cas-shared-gelcorp-http' env.CAS_K8S_SERVICE_NAME = 'sas-cas-server-shared-gelcorp-client' env.CAS_USE_CONSUL = 'true' env.CONSUL_NAME = 'cas-shared-gelcorp' env.CASDEPLOYMENT_SPEC_ALLOWLIST_APPEND = '/cas/data/caslibs:/gelcontent:/mnt/gelcontent/' env.CASPERMSTORE = '/cas/permstore' env.CAS_VIRTUAL_PROTO = 'http' env.CASDATADIR_APPS = '/cas/data/apps' env.CAS_DISK_CACHE = '/casdiskcache/cdc01:/casdiskcache/cdc02:/casdiskcache/cdc03:/casdiskcache/cdc04' env.CAS_INSTANCE_MODE = 'shared' env.CAS_LICENSE = '/cas/license/license.sas' env.CLIENT_ID = 'cas-shared-gelcorp' env.CLIENT_SECRET_LOC = '/cas/config/tokens/client.secret'
Now let’s make sure the CAS startup configuration for session zero processing is still in place.
Using the same code you ran in an earlier exercise, get the instance ID for the cas-shared-gelcorp server startup configuration.
_CAS_Startup_ConfigInstance_Id=$(gel_sas_viya --output text \ \ configuration configurations --service cas-shared-gelcorp \ list --definition-name sas.cas.instance.config \ | grep "startup" \ | awk '{printf $1}') echo ID=${_CAS_Startup_ConfigInstance_Id}
Now that we have the instance ID, show the details of the cas-shared-gelcorp server startup configuration instance.
gel_sas_viya --output text \ \ configuration configurations --id=${_CAS_Startup_ConfigInstance_Id} show
You should see that your previous configuration change to load the HR tables is still in place.
id : 068e19bc-4819-44a1-aef7-493f26fceaae metadata.isDefault : false metadata.mediaType : application/vnd.sas.configuration.config.sas.cas.instance.config+json;version=1 metadata.services : [cas-shared-gelcorp] name : startup contents : -- CAS session-zero startup script extensions. -- -- Lua-formatted SWAT client code -- that executes specified actions during session-zero prior to -- clients connecting to CAS. -- s:table_addCaslib{ name="sales", description="Sales data", dataSource={srcType="path"}, path="/data/sales" } -- Add User Defined Formats permanently and re-loadable ------------------------------------------------------ -- Not required since defined CAS formats libraries are automatically loaded since Stable 2021.1.4 -- Add HR tables to be reloaded at CAS Server start --------------------------------------------------- ---- Load HR summary table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hr_summary.csv", promote=true } ---- Load HR data table s:table_loadTable{caslib="hrdl", casOut={caslib="hrdl",replication=0.0}, path="hrdata.sas7bdat", promote=true }
Using SAS Environment Manager
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Verify the CAS server topology.
Open the
Server
pageRight-click on the cas-shared-gelcorp server and select the
Configuration
option.Switch to the
Nodes
tab. You can see that cas-shared-gelcorp server has a primary controller and four workers.
Verify the CAS libraries and tables.
All global CAS libraries remain even after you restart the cas-shared-gelcorp server.
Open the
Data
page.On the
Data Sources
tab expand thecas-shared-gelcorp
connection to display its CAS libraries.Verify that you still see the hrdl caslib created in a previous exercise.
Expand the the hrdl CAS library to display its tables.
After you restarted the cas-shared-gelcorp server, the data files remained available (data sources), and some tables are loaded into the memory because of the session zero defined in a previous hands-on.
Lessons learned
It is easy to change the topology of a non default CAS server.
Re-run the
create-cas-server.sh
with different--workers
and--backup
values, but keep the same--instance
and--output
values. This will owerwrite the CAS server existing manifests.Regenerate and apply the SASDeployment custom resource.
Restart the CAS server. In memory data are lost except if the CAS Server session zero is set.
Topology does not result in losing any preexisting configurations such as:
- User Defined Formats,
- Global CAS Libraries,
- Session zero processing,
- CAS_DISK_CACHE relocation
SAS Viya Administration Operations
Lesson 07, Section 4 Exercise: Change Topology of Default CAS Server
Managing cas-shared-default server topology - OPTIONAL
In this exercise you will:
- Review the cas-shared-default server topology, configuration, and content
- Convert cas-shared-default from an SMP server to an MPP server by adding CAS workers nodes.
- Add a backup controller to the cas-shared-default MPP server.
- Look at the impact of the topology changes on the cas-shared-default server content.
Table of content
- Set the namespace
- The current cas-shared-default server
- Convert cas-shared-default server from SMP to MPP
- Add a backup controller to the MPP server
- Apply the topology modifications to the cas-shared-default server
- Validate the cas-shared-default server topology changes
- Lessons learned
- (For your info only) Rollback the cas-shared-default server from MPP CAS server to SMP
Set the namespace
gel_setCurrentNamespace gelcorp
The current cas-shared-default server
From the intitial Viya deployment, the cas-shared-default server is SMP.
Review your current CAS server configuration
Access some of your CAS server metadata using the kubectl CLI
kubectl describe pods \ \ sas-cas-server-default-controller | grep " casoperator." \ | awk -F"/" '{print $2}'
Note that currently the cas-shared-default server is SMP.
Click here to see the output
cas-cfg-mode=smp cas-env-consul-name=cas-shared-default controller-active=1 controller-index=0 node-type=controller server=default service-name=primary
List your CAS server pods.
kubectl get pods \ --selector="casoperator.sas.com/server==default"
Click here to see the output
NAME READY STATUS RESTARTS AGE sas-cas-server-default-controller 3/3 Running 0 2m19s
Using SAS Environment Manager
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Servers
page and right-click on thecas-shared-default
server. And then click onConfiguration
.Navigate to the
Nodes
tab. You can now see the current cas-shared-default server configuration: SMP (a single CAS controller, no workers).
Using OpenLens
Open OpenLens and connect to your GEL Kubernetes cluster.
Navigate to Workloads –> Pods and then filter on
- namespace: gelcorp
- sas-cas-server-default
Convert cas-shared-default server from SMP to MPP
To convert an SMP CAS server to MPP, you need to modify the CAS server deployment using a patchTransformer to modify the number of CAS workers from 0 to the desired number of workers.
The number of workers is specified using the
sas-bases/examples/cas/configure/cas-manage-workers.yaml
manifest.
View the
cas-manage-workers.yaml
filecat ~/project/deploy/${current_namespace}/sas-bases/examples/cas/configure/cas-manage-workers.yaml
Click here to see the cas-manage-workers.yaml content
# This block of code is for specifying the number of workers in an MPP # deployment. Do not use this block for SMP deployments. The default value is 2 --- apiVersion: builtin kind: PatchTransformer metadata: name: cas-manage-workers patch: |- - op: replace path: /spec/workers value: {{ NUMBER-OF-WORKERS }} target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1
Create a
cas-manage-workers-cas-shared-default.yaml
file with two workers in the site-config directory.Copy the cas-manage-workers.yaml manifest in the project site-config directory
cp -p ~/project/deploy/${current_namespace}/sas-bases/examples/cas/configure/cas-manage-workers.yaml ~/project/deploy/${current_namespace}/site-config/cas-manage-workers-cas-shared-default.yaml chmod 664 ~/project/deploy/${current_namespace}/site-config/cas-manage-workers-cas-shared-default.yaml
Change the Target filtering in the
cas-manage-workers-cas-shared-default.yaml
fileOnly the cas-shared-default server has to be modified. By default, the provided manifest targets all CAS Server. Because of that, it is required to modify the manifest to apply the topology changes only against the cas-shared-default server.
sed -i 's/name: \.\*/\#name: \.\*/' ~/project/deploy/${current_namespace}/site-config/cas-manage-workers-cas-shared-default.yaml sed -i 's/\#labelSelector: /labelSelector: /g' ~/project/deploy/${current_namespace}/site-config/cas-manage-workers-cas-shared-default.yaml
Change the number of workers.
_numberOfWorkers=2 sed -i "/value:/{n;s/.*/ ${_numberOfWorkers}/}" ~/project/deploy/${current_namespace}/site-config/cas-manage-workers-cas-shared-default.yaml
Click here to see the cas-manage-workers-cas-shared-default.yaml content
cat ~/project/deploy/${current_namespace}/site-config/cas-manage-workers-cas-shared-default.yaml
# This block of code is for specifying the number of workers in an MPP # deployment. Do not use this block for SMP deployments. The default value is 2 --- apiVersion: builtin kind: PatchTransformer metadata: name: cas-manage-workers patch: |- - op: replace path: /spec/workers value: 2 target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: #name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: labelSelector: "sas.com/cas-server-default" version: v1alpha1
Modify
~/project/deploy/gelcorp/kustomization.yaml
to reference the cas server manifest.Backup the current
kustomization.yaml
file.cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_07-052-01.yaml
In the
transformers
field add the line “- site-config/cas-manage-workers-cas-shared-default.yaml” using the yq tool:[[ $(grep -c "site-config/cas-manage-workers-cas-shared-default.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-manage-workers-cas-shared-default.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can update the
~/project/deploy/gelcorp/kustomization.yaml
file using your favorite text editor:[...] transformers: [... previous transformers items ...] - site-config/cas-manage-workers-cas-shared-default.yaml [...]
Verify that the update is included in
kustomization.yaml
.cat ~/project/deploy/${current_namespace}/kustomization.yaml
Search for
- site-config/cas-manage-workers-cas-shared-default.yaml
into thetransformers
field of the~/project/deploy/gelcorp/kustomization.yaml
.Click here to see the output
--- namespace: gelcorp resources: - sas-bases/base # GEL Specifics to create CA secret for OpenSSL Issuer - site-config/security/gel-openssl-ca - sas-bases/overlays/network/networking.k8s.io # Using networking.k8s.io API since 2021.1.6 - site-config/security/openssl-generated-ingress-certificate.yaml # Default to OpenSSL Issuer in 2021.2.6 - sas-bases/overlays/cas-server - sas-bases/overlays/crunchydata/postgres-operator # New Stable 2022.10 - sas-bases/overlays/postgres/platform-postgres # New Stable 2022.10 - sas-bases/overlays/internal-elasticsearch # New Stable 2020.1.3 - sas-bases/overlays/update-checker # added update checker ## disable CAS autoresources to keep things simpler #- sas-bases/overlays/cas-server/auto-resources # CAS-related #- sas-bases/overlays/crunchydata_pgadmin # Deploy the sas-crunchy-data-pgadmin container - remove 2022.10 - site-config/sas-prepull/add-prepull-cr-crb.yaml - sas-bases/overlays/cas-server/state-transfer # Enable state transfer for the cas-shared-default CAS server - new PVC sas-cas-transfer-data - site-config/sas-microanalytic-score/astores/resources.yaml - site-config/gelcontent_pvc.yaml - site-config/cas-shared-gelcorp configurations: - sas-bases/overlays/required/kustomizeconfig.yaml transformers: - sas-bases/overlays/internal-elasticsearch/sysctl-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/startup/ordered-startup-transformer.yaml - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers.yaml - site-config/mirror.yaml #- site-config/daily_update_check.yaml # change the frequency of the update-check #- sas-bases/overlays/cas-server/auto-resources/remove-resources.yaml # CAS-related ## temporarily removed to alleviate RACE issues - sas-bases/overlays/internal-elasticsearch/internal-elasticsearch-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/sas-programming-environment/enable-admin-script-access.yaml # To enable admin scripts #- sas-bases/overlays/scaling/zero-scale/phase-0-transformer.yaml #- sas-bases/overlays/scaling/zero-scale/phase-1-transformer.yaml - sas-bases/overlays/cas-server/state-transfer/support-state-transfer.yaml # Enable state transfer for the cas-shared-default CAS server - enable and mount new PVC - site-config/change-check-interval.yaml - sas-bases/overlays/sas-microanalytic-score/astores/astores-transformer.yaml - site-config/sas-pyconfig/change-configuration.yaml - site-config/sas-pyconfig/change-limits.yaml - site-config/cas-add-nfs-mount.yaml - site-config/cas-add-allowlist-paths.yaml - site-config/cas-modify-user.yaml - site-config/cas-manage-casdiskcache-shared-gelcorp.yaml - site-config/cas-manage-workers-cas-shared-default.yaml components: - sas-bases/components/crunchydata/internal-platform-postgres # New Stable 2022.10 - sas-bases/components/security/core/base/full-stack-tls - sas-bases/components/security/network/networking.k8s.io/ingress/nginx.ingress.kubernetes.io/full-stack-tls patches: - path: site-config/storageclass.yaml target: kind: PersistentVolumeClaim annotationSelector: sas.com/component-name in (sas-backup-job,sas-data-quality-services,sas-commonfiles,sas-cas-operator,sas-pyconfig) - path: site-config/cas-gelcontent-mount-pvc.yaml target: group: viya.sas.com kind: CASDeployment name: .* version: v1alpha1 - path: site-config/compute-server-add-nfs-mount.yaml target: labelSelector: sas.com/template-intent=sas-launcher version: v1 kind: PodTemplate - path: site-config/compute-server-annotate-podtempate.yaml target: name: sas-compute-job-config version: v1 kind: PodTemplate secretGenerator: - name: sas-consul-config behavior: merge files: - SITEDEFAULT_CONF=site-config/sitedefault.yaml - name: sas-image-pull-secrets behavior: replace type: kubernetes.io/dockerconfigjson files: - .dockerconfigjson=site-config/crcache-image-pull-secrets.json configMapGenerator: - name: ingress-input behavior: merge literals: - INGRESS_HOST=gelcorp.pdcesx03145.race.sas.com - name: sas-shared-config behavior: merge literals: - SAS_SERVICES_URL=https://gelcorp.pdcesx03145.race.sas.com # # This is to fix an issue that only appears in very slow environments. # # Do not do this at a customer site - name: sas-go-config behavior: merge literals: - SAS_BOOTSTRAP_HTTP_CLIENT_TIMEOUT_REQUEST='15m' - name: input behavior: merge literals: - IMAGE_REGISTRY=crcache-race-sas-cary.unx.sas.com
Normally at this step, you will have to do a backup of the current
manifest.yaml
file and then run thesas-orchestration
deploy command. But because you will have to add also a backup controller to the cas-shared-default server, these two step will be run later to apply all topology modifications in one step.
Note: to add or remove CAS workers to the cas-shared-default server, you just have to modify the
~/project/deploy/gelcorp/site-config/cas-manage-workers-cas-shared-default.yaml
.
Add a backup controller to the MPP server
View the
cas-manage-backup.yaml
filecat ~/project/deploy/${current_namespace}/sas-bases/examples/cas/configure/cas-manage-backup.yaml
Click here to see the cas-manage-backup.yaml content
--- apiVersion: builtin kind: PatchTransformer metadata: name: cas-manage-backup patch: |- - op: replace path: /spec/backupControllers value: 1 target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1
Create the
cas-manage-backup-cas-shared-default.yaml
file in the site-config directory to add a backup controller to cas-shared-default server.Copy the
cas-manage-backup.yaml
manifest in the project site-config directorycp -p ~/project/deploy/${current_namespace}/sas-bases/examples/cas/configure/cas-manage-backup.yaml ~/project/deploy/${current_namespace}/site-config/cas-manage-backup-cas-shared-default.yaml chmod 664 ~/project/deploy/${current_namespace}/site-config/cas-manage-backup-cas-shared-default.yaml
Change the
Target
filtering in thecas-manage-backup-cas-shared-default.yaml
fileOnly the cas-shared-default server has to be modified. By default, the provided manifest targets all CAS Server. Because of that, it is required to modify the manifest to apply the topology changes only against the cas-shared-default server.
sed -i 's/name: \.\*/\#name: \.\*/' ~/project/deploy/${current_namespace}/site-config/cas-manage-backup-cas-shared-default.yaml sed -i 's/\#labelSelector: /labelSelector: /g' ~/project/deploy/${current_namespace}/site-config/cas-manage-backup-cas-shared-default.yaml
Click here to see the cas-manage-backup-cas-shared-default.yaml content
cat ~/project/deploy/${current_namespace}/site-config/cas-manage-backup-cas-shared-default.yaml
# This block of code is for specifying adding a backup controller in an MPP # deployment. Do not use this block for SMP deployments. --- apiVersion: builtin kind: PatchTransformer metadata: name: cas-manage-backup patch: |- - op: replace path: /spec/backupControllers value: 1 target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: #name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: labelSelector: "sas.com/cas-server-default" version: v1alpha1
Modify
~/project/deploy/gelcorp/kustomization.yaml
to reference the cas server manifest.Backup the current
kustomization.yaml
file.cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_07-052-02.yaml
In the
transformers
field add the line “- site-config/cas-manage-backup-cas-shared-default.yaml” using the yq tool:[[ $(grep -c "site-config/cas-manage-backup-cas-shared-default.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-manage-backup-cas-shared-default.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can update the
~/project/deploy/gelcorp/kustomization.yaml
file using your favorite text editor:[...] transformers: [... previous transformers items ...] - site-config/cas-manage-backup-cas-shared-default.yaml [...]
Verify that the modification is in place.
cat ~/project/deploy/${current_namespace}/kustomization.yaml
Search for
- site-config/cas-manage-backup-cas-shared-default.yaml
into thetransformers
field of the~/project/deploy/gelcorp/kustomization.yaml
.Click here to see the output
--- namespace: gelcorp resources: - sas-bases/base # GEL Specifics to create CA secret for OpenSSL Issuer - site-config/security/gel-openssl-ca - sas-bases/overlays/network/networking.k8s.io # Using networking.k8s.io API since 2021.1.6 - site-config/security/openssl-generated-ingress-certificate.yaml # Default to OpenSSL Issuer in 2021.2.6 - sas-bases/overlays/cas-server - sas-bases/overlays/crunchydata/postgres-operator # New Stable 2022.10 - sas-bases/overlays/postgres/platform-postgres # New Stable 2022.10 - sas-bases/overlays/internal-elasticsearch # New Stable 2020.1.3 - sas-bases/overlays/update-checker # added update checker ## disable CAS autoresources to keep things simpler #- sas-bases/overlays/cas-server/auto-resources # CAS-related #- sas-bases/overlays/crunchydata_pgadmin # Deploy the sas-crunchy-data-pgadmin container - remove 2022.10 - site-config/sas-prepull/add-prepull-cr-crb.yaml - sas-bases/overlays/cas-server/state-transfer # Enable state transfer for the cas-shared-default CAS server - new PVC sas-cas-transfer-data - site-config/sas-microanalytic-score/astores/resources.yaml - site-config/gelcontent_pvc.yaml - site-config/cas-shared-gelcorp configurations: - sas-bases/overlays/required/kustomizeconfig.yaml transformers: - sas-bases/overlays/internal-elasticsearch/sysctl-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/startup/ordered-startup-transformer.yaml - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers.yaml - site-config/mirror.yaml #- site-config/daily_update_check.yaml # change the frequency of the update-check #- sas-bases/overlays/cas-server/auto-resources/remove-resources.yaml # CAS-related ## temporarily removed to alleviate RACE issues - sas-bases/overlays/internal-elasticsearch/internal-elasticsearch-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/sas-programming-environment/enable-admin-script-access.yaml # To enable admin scripts #- sas-bases/overlays/scaling/zero-scale/phase-0-transformer.yaml #- sas-bases/overlays/scaling/zero-scale/phase-1-transformer.yaml - sas-bases/overlays/cas-server/state-transfer/support-state-transfer.yaml # Enable state transfer for the cas-shared-default CAS server - enable and mount new PVC - site-config/change-check-interval.yaml - sas-bases/overlays/sas-microanalytic-score/astores/astores-transformer.yaml - site-config/sas-pyconfig/change-configuration.yaml - site-config/sas-pyconfig/change-limits.yaml - site-config/cas-add-nfs-mount.yaml - site-config/cas-add-allowlist-paths.yaml - site-config/cas-modify-user.yaml - site-config/cas-manage-casdiskcache-shared-gelcorp.yaml - site-config/cas-manage-workers-cas-shared-default.yaml - site-config/cas-manage-backup-cas-shared-default.yaml components: - sas-bases/components/crunchydata/internal-platform-postgres # New Stable 2022.10 - sas-bases/components/security/core/base/full-stack-tls - sas-bases/components/security/network/networking.k8s.io/ingress/nginx.ingress.kubernetes.io/full-stack-tls patches: - path: site-config/storageclass.yaml target: kind: PersistentVolumeClaim annotationSelector: sas.com/component-name in (sas-backup-job,sas-data-quality-services,sas-commonfiles,sas-cas-operator,sas-pyconfig) - path: site-config/cas-gelcontent-mount-pvc.yaml target: group: viya.sas.com kind: CASDeployment name: .* version: v1alpha1 - path: site-config/compute-server-add-nfs-mount.yaml target: labelSelector: sas.com/template-intent=sas-launcher version: v1 kind: PodTemplate - path: site-config/compute-server-annotate-podtempate.yaml target: name: sas-compute-job-config version: v1 kind: PodTemplate secretGenerator: - name: sas-consul-config behavior: merge files: - SITEDEFAULT_CONF=site-config/sitedefault.yaml - name: sas-image-pull-secrets behavior: replace type: kubernetes.io/dockerconfigjson files: - .dockerconfigjson=site-config/crcache-image-pull-secrets.json configMapGenerator: - name: ingress-input behavior: merge literals: - INGRESS_HOST=gelcorp.pdcesx03145.race.sas.com - name: sas-shared-config behavior: merge literals: - SAS_SERVICES_URL=https://gelcorp.pdcesx03145.race.sas.com # # This is to fix an issue that only appears in very slow environments. # # Do not do this at a customer site - name: sas-go-config behavior: merge literals: - SAS_BOOTSTRAP_HTTP_CLIENT_TIMEOUT_REQUEST='15m' - name: input behavior: merge literals: - IMAGE_REGISTRY=crcache-race-sas-cary.unx.sas.com
Note: to add or remove CAS backup controller to the cas-shared-default server, you just have to modify the
~/project/deploy/gelcorp/site-config/cas-manage-backup-cas-shared-default.yaml
.
Apply the topology modifications to the cas-shared-default server
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_07-052-02.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
It may take several more minutes for the cas-shared-default server to fully initialize. The following command will notify you when the CAS Server is ready.
kubectl wait pods \ --selector="casoperator.sas.com/server==default" \ --for condition=ready \ --timeout 15m
You should see these messages in the output.
pod/sas-cas-server-default-backup condition met pod/sas-cas-server-default-controller condition met pod/sas-cas-server-default-worker-0 condition met pod/sas-cas-server-default-worker-1 condition met
The cas-shared-default server expected pods are running, but, because you changed its topology from SMP to MPP, to fully implement the CAS server new topology changes you must now restart the CAS server.
Restart the cas-shared-default server so that it is aware of the new CAS backup controller.
Since we enabled the state transfer the cas-shared-default server you have now choices to restart the CAS server.
Choice 1: initiate the state transfer
All loaded tables and active CAS session will be kept.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be incremented by 1.kubectl patch casdeployment \ \ default --type='json' -p='[{"op": "replace", "path": "/spec/startStateTransfer", "value":true}]' sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==default" \ --for condition=ready \ --timeout 15m
Choice 2: delete the CAS server pods
All loaded tables and active CAS sessions will be lost.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be reset to 0.kubectl delete pod \ --selector="casoperator.sas.com/server==default" sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==default" \ --for condition=ready \ --timeout 15m
Access some of your CAS server metadata using the kubectl CLI, and confirm that the cas-shared-default server is now MPP with a backup controller and 2 workers .
kubectl describe pods \ --selector="casoperator.sas.com/server==default" \ | grep " casoperator." \ | awk -F"/" '{print $2}' \ | sed '/cas-cfg-mode=/i\ '
Note that the cas-shared-default server is now MPP.
Click here to see the output
cas-cfg-mode=mpp cas-env-consul-name=cas-shared-default controller-active=0 controller-index=1 node-type=controller server=default service-name=backup cas-cfg-mode=mpp cas-env-consul-name=cas-shared-default controller-active=1 controller-index=0 node-type=controller server=default service-name=primary cas-cfg-mode=mpp cas-env-consul-name=cas-shared-default node-type=worker server=default service-name=worker worker-index=0 cas-cfg-mode=mpp cas-env-consul-name=cas-shared-default node-type=worker server=default service-name=worker worker-index=1
Validate the cas-shared-default server topology changes
Using SAS Environment Manager
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Server
page, and right-click on thecas-shared-default
server, and click onConfiguration
. Then navigate to theNodes
tab.You can now see the current cas-shared-default server configuration: MPP with two controllers (primary, and secondary/backup), and two workers.
Using OpenLens
Open OpenLens and connect to your GEL Kubernetes cluster.
Navigate to Workloads –> Pods and then filter on
- namespace: gelcorp
- sas-cas-server-default
You can see all the cas-shared-default server pods. Five pods: one controller, one backup, and two workers.
Lessons learned
- By default the provided cas-shared-default server manifests (sas-bases/overlays/cas-server/) are configured for an SMP CAS server.
- It is easy to convert the cas-shared-default server from SMP to MPP (ony a single controller by default).
- When the cas-shared-default server is MPP, it is easy to modify the number of CAS workers.
- It is easy to add or remove a backup controller in an existing MPP server.
- In memory data are lost each time the CAS server is restarted if the CAS State Transfer is not enable and used for the CAS erver.
(For your info only) Rollback the cas-shared-default server from MPP CAS server to SMP
FOR YOUR INFORMATION ONLY - DO NOT PROCESS THE INSTRUCTION BELOW DURING THIS WORKSHOP
Click here to see the required steps
Remove CAS server workers and secondary/backup controller: 2 methods
- Keep a copy of the current
kuztomization.yaml
file, then remove thesite-config/cas-manage-workers-cas-shared-default.yaml
andsite-config/cas-manage-backup-cas-shared-default.yaml
files references in thetransformers
field of thekuztomization.yaml
file.
or
Keep a copy of the current
site-config/cas-manage-workers-cas-shared-default.yaml
andsite-config/cas-manage-backup-cas-shared-default.yaml
files (if exists), and then modify them:- Set
workers
value to 0 in *site-config/cas-manage-workers-cas-shared-default.yaml
(remove all workers) - Set
backupControllers
value to 0 insite-config/cas-manage-backup-cas-shared-default.yaml
(remove secondary/backup controller)
Note: this strategy does not required to modify the
kuztomization.yaml
file.- Set
- Keep a copy of the current
Keep a copy of the current
manifest.yaml
manifest.Run the
sas-orchestration
deploy command.Restart the cas-shared-default server so that it is aware of the new topology (mainly because the backup controller was removed).
The cas-shared-default server will be back to SMP.
SAS Viya Administration Operations
Lesson 07, Section 5 Exercise: Configure CAS for External Access
Access CAS server from outside the Viya deployment namespace
In this exercise you will enable both binary and HTTP services for the cas-shared-gelcorp server to be able to access this CAS server from outside the Viya deployment.
You can look at this GEL blog for more information about accessing CAS from outside its SAS Viya deployment namespace
Table of content
- Set the namespace
- Access the cas-shared-gelcorp server using the default HTTP ingress
- Enable both binary and HTTP services for the cas-shared-gelcorp server
- Access the cas-shared-gelcorp server using the binary service
- Access the cas-shared-gelcorp server using the HTTP service
- Lessons learned
Set the namespace
gel_setCurrentNamespace gelcorp
Access the cas-shared-gelcorp server using the default HTTP ingress
In this step of the hands-on, you will have to test the default HTTP ingress connection to the cas-shared-gelcorp server by listing the CAS server nodes.
You will do it from:
- The Windows machine (sas-client) using the Postman application.
- The Linux machine through a MobaXterm session using curl.
On Windows, use Postman query to access the cas-shared-gelcorp server
Using the Postman application, test the cas-shared-gelcorp server HTTP ingress access.
Open Postman on your sas-client (Windows) machine
Update the
RACE
environment{{racemachine}}
current variable valueA Postman
RACE
environment was created for you with some variables that will be used by the Postman queries.The
{{racemachine}}
variable values (initial and current) were set tomachineName
.You have to replace the
{{racemachine}}
variable current value with the short name of yoursasnode01
machine. You can find it on the prompt of a MobaXterm terminal, or by running this command in MobaXterm.echo "The required RACE machine name: $(hostname)"
Copy this returned value to the
CURRENT VALUE
of the{{racemachine}}
variable.Then save the modified
RACE
Postman environment.Run the
Get NodeNames
query from theHTTP Ingress
Postman collectionA Postman collections were created for you to query the the cas-shared-gelcorp server.
At this step, open the
HTTP ingress collection
and ensure that theRACE
Postman environment is selected.Then just send the query to get the result.
On Linux, use a curl query to access the cas-shared-gelcorp server
From a MobaXterm session, run the command bellow to test the cas-shared-gelcorp server HTTP ingress using curl.
curl --user geladm:lnxsas https://gelcorp.$(hostname -f)/cas-shared-gelcorp-http/cas/nodeNames
[ "controller.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-1.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-0.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-3.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-2.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local" ]
Enable both binary and HTTP services for the cas-shared-gelcorp server
To enable the binary and HTTP services, SAS provide a
patchTransformer
manifest through the SAS Viya deployment
assets. You can find this manifest in
`$deploy/sas-bases/examples/cas/configure/cas-enable-external-services.yaml
.
A single patchTransformer
manifest is used to manage both
binary and HTTP services, for all CAS server in a SAS Viya deployment by
default (target options: “name: .*
”).
In this hands-on, because we want to enable both binary and HTTP
services for the cas-shared-gelcorp server only, you
will have to copy then modify the provided
cas-enable-external-services.yaml
manifest.
Copy the provided
cas-enable-external-services.yaml
patchTransformer
manifest in thesite-config
directorycp -p ~/project/deploy/${current_namespace}/sas-bases/examples/cas/configure/cas-enable-external-services.yaml ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml
Click here to see the cas-enable-external-services_shared-gelcorp.yaml content
cat ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml
--- apiVersion: builtin kind: PatchTransformer metadata: name: cas-enable-external-services patch: |- # After you set publishBinaryService to true, apply # the manifest, you can view the Service with # `kubectl get svc sas-cas-server-default-bin` - op: add path: /spec/publishBinaryService value: true # After you set publishHTTPService to true, apply # the manifest, you can view the Service with # `kubectl get svc sas-cas-server-default-http` #- op: add # path: /spec/publishHTTPService # value: true # By default, the services are added as NodePorts. # To configure them as LoadBalancers, uncomment the following # service template and optionally, set source ranges. # # Note: Setting the service template to LoadBalancer # affects all CAS services, including the publishDCServices # and publishEPCSService if those are set for SAS/ACCESS and # Data Connectors. # - op: add # path: /spec/serviceTemplate # value: # spec: # type: LoadBalancer # loadBalancerSourceRanges: # - 192.168.0.0/16 # - 10.0.0.0/8 # # Note: Some cloud providers may require additional settings # in the service template. For example, adding the following # annotation lets you set the load balancer timeout on AWS: # # - op: add # path: /spec/serviceTemplate # value: # spec: # type: LoadBalancer # loadBalancerSourceRanges: # - 192.168.0.0/16 # - 10.0.0.0/8 # metadata: # annotations: # service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "300" # # Consult your cloud provider's documentation for more information.target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: name: .* # Uncomment this to apply to one particular named CAS server: #name: {{ NAME-OF-SERVER }} # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1
Modify the
site-config/cas-enable-external-services_shared-gelcorp.yaml
patchTransformer
manifest to apply it only on the cas-shared-gelcorp serverEnable the HTTP service
You have to uncomment the lines that set the HTTP service (
publishHTTPService
).Note that the lines that set the binary service (
publishBinaryService
) are not commented by default.Use these sed commands to modify the
~/project/deploy/gelcorp/site-config/cas-enable-external-services_shared-gelcorp.yaml
manifest:sed -i 's/\#- op: add/- op: add/' ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml sed -i "s/\# path: \/spec\/publishHTTPService/ path: \/spec\/publishHTTPService/g" ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml sed -i "s/\# value: true/ value: true/g" ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml
Alternatively, you can update the
~/project/deploy/gelcorp/site-config/cas-enable-external-services_shared-gelcorp.yaml
manifest using your favorite text editor:--- apiVersion: builtin kind: PatchTransformer metadata: name: cas-enable-external-services patch: |- # After you set publishBinaryService to true, apply # the manifest, you can view the Service with # `kubectl get svc sas-cas-server-default-bin` - op: add path: /spec/publishBinaryService value: true # After you set publishHTTPService to true, apply # the manifest, you can view the Service with # `kubectl get svc sas-cas-server-default-http` - op: add path: /spec/publishHTTPService value: true # By default, the services are added as NodePorts.[...]
Change the Target filtering in the
cas-enable-external-services_shared-gelcorp.yaml
patchTransformer
manifestOnly the cas-shared-gelcorp server has to be modified. By default, the provided manifest targets all CAS Server. Because of that, it is required to modify the manifest to enable the binary and HTTP services only for the cas-shared-gelcorp server.
Use these sed commands to modify the
~/project/deploy/gelcorp/site-config/cas-enable-external-services_shared-gelcorp.yaml
manifest:sed -i 's/name: \.\*/\#name: \.\*/' ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml sed -i "s/\#name: {{ NAME-OF-SERVER }}/name: shared-${current_namespace}/g" ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml
Look at the modifications you made in the
cas-enable-external-services_shared-gelcorp.yaml
manifestcat ~/project/deploy/${current_namespace}/site-config/cas-enable-external-services_shared-gelcorp.yaml
Alternatively, you can update the
~/project/deploy/gelcorp/site-config/cas-enable-external-services_shared-gelcorp.yaml
manifest using your favorite text editor:[...] # Consult your cloud provider's documentation for more information. target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: #name: .* # Uncomment this to apply to one particular named CAS server: name: shared-gelcorp # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1
Click here to see the cas-manage-backup-cas-shared-default.yaml content
--- apiVersion: builtin kind: PatchTransformer metadata: name: cas-enable-external-services patch: |- # After you set publishBinaryService to true, apply # the manifest, you can view the Service with # `kubectl get svc sas-cas-server-default-bin` - op: add path: /spec/publishBinaryService value: true # After you set publishHTTPService to true, apply # the manifest, you can view the Service with # `kubectl get svc sas-cas-server-default-http` - op: add path: /spec/publishHTTPService value: true # By default, the services are added as NodePorts. # To configure them as LoadBalancers, uncomment the following # service template and optionally, set source ranges. # # Note: Setting the service template to LoadBalancer # affects all CAS services, including the publishDCServices # and publishEPCSService if those are set for SAS/ACCESS and # Data Connectors. # - op: add # path: /spec/serviceTemplate # value: # spec: # type: LoadBalancer # loadBalancerSourceRanges: # - 192.168.0.0/16 # - 10.0.0.0/8 # # Note: Some cloud providers may require additional settings # in the service template. For example, adding the following # annotation lets you set the load balancer timeout on AWS: # # - op: add # path: /spec/serviceTemplate # value: # spec: # type: LoadBalancer # loadBalancerSourceRanges: # - 192.168.0.0/16 # - 10.0.0.0/8 # metadata: # annotations: # service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "300" # # Consult your cloud provider's documentation for more information.target: group: viya.sas.com kind: CASDeployment # Uncomment this to apply to all CAS servers: #name: .* # Uncomment this to apply to one particular named CAS server: name: shared-gelcorp # Uncomment this to apply to the default CAS server: #labelSelector: "sas.com/cas-server-default" version: v1alpha1
Modify
~/project/deploy/gelcorp/kustomization.yaml
to reference thepatchTransformer
manifest.Backup the current
kustomization.yaml
file.cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_07-081-01.yaml
In the
transformers
field add the line “- site-config/cas-manage-workers-cas-shared-default.yaml” using the yq tool:[[ $(grep -c "site-config/cas-enable-external-services_shared-gelcorp.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i ".transformers += [\"site-config/cas-enable-external-services_shared-gelcorp.yaml\"]" ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can update the
~/project/deploy/gelcorp/kustomization.yaml
file using your favorite text editor:[...] transformers: [... previous transformers items ...] - site-config/cas-enable-external-services_shared-gelcorp.yaml [...]
Verify that the modification is in place.
cat ~/project/deploy/${current_namespace}/kustomization.yaml
Search for
- site-config/cas-enable-external-services_shared-gelcorp.yaml
into thetransformers
field of thekustomization.yaml
file.Click here to see the output
--- namespace: gelcorp resources: - sas-bases/base # GEL Specifics to create CA secret for OpenSSL Issuer - site-config/security/gel-openssl-ca - sas-bases/overlays/network/networking.k8s.io # Using networking.k8s.io API since 2021.1.6 - site-config/security/openssl-generated-ingress-certificate.yaml # Default to OpenSSL Issuer in 2021.2.6 - sas-bases/overlays/cas-server - sas-bases/overlays/crunchydata/postgres-operator # New Stable 2022.10 - sas-bases/overlays/postgres/platform-postgres # New Stable 2022.10 - sas-bases/overlays/internal-elasticsearch # New Stable 2020.1.3 - sas-bases/overlays/update-checker # added update checker ## disable CAS autoresources to keep things simpler #- sas-bases/overlays/cas-server/auto-resources # CAS-related #- sas-bases/overlays/crunchydata_pgadmin # Deploy the sas-crunchy-data-pgadmin container - remove 2022.10 - site-config/sas-prepull/add-prepull-cr-crb.yaml - sas-bases/overlays/cas-server/state-transfer # Enable state transfer for the cas-shared-default CAS server - new PVC sas-cas-transfer-data - site-config/sas-microanalytic-score/astores/resources.yaml - site-config/gelcontent_pvc.yaml - site-config/cas-shared-gelcorp configurations: - sas-bases/overlays/required/kustomizeconfig.yaml transformers: - sas-bases/overlays/internal-elasticsearch/sysctl-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/startup/ordered-startup-transformer.yaml - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers.yaml - site-config/mirror.yaml #- site-config/daily_update_check.yaml # change the frequency of the update-check #- sas-bases/overlays/cas-server/auto-resources/remove-resources.yaml # CAS-related ## temporarily removed to alleviate RACE issues - sas-bases/overlays/internal-elasticsearch/internal-elasticsearch-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/sas-programming-environment/enable-admin-script-access.yaml # To enable admin scripts #- sas-bases/overlays/scaling/zero-scale/phase-0-transformer.yaml #- sas-bases/overlays/scaling/zero-scale/phase-1-transformer.yaml - sas-bases/overlays/cas-server/state-transfer/support-state-transfer.yaml # Enable state transfer for the cas-shared-default CAS server - enable and mount new PVC - site-config/change-check-interval.yaml - sas-bases/overlays/sas-microanalytic-score/astores/astores-transformer.yaml - site-config/sas-pyconfig/change-configuration.yaml - site-config/sas-pyconfig/change-limits.yaml - site-config/cas-add-nfs-mount.yaml - site-config/cas-add-allowlist-paths.yaml - site-config/cas-modify-user.yaml - site-config/cas-manage-casdiskcache-shared-gelcorp.yaml - site-config/cas-manage-workers-cas-shared-default.yaml - site-config/cas-manage-backup-cas-shared-default.yaml - site-config/cas-enable-external-services_shared-gelcorp.yaml components: - sas-bases/components/crunchydata/internal-platform-postgres # New Stable 2022.10 - sas-bases/components/security/core/base/full-stack-tls - sas-bases/components/security/network/networking.k8s.io/ingress/nginx.ingress.kubernetes.io/full-stack-tls patches: - path: site-config/storageclass.yaml target: kind: PersistentVolumeClaim annotationSelector: sas.com/component-name in (sas-backup-job,sas-data-quality-services,sas-commonfiles,sas-cas-operator,sas-pyconfig) - path: site-config/cas-gelcontent-mount-pvc.yaml target: group: viya.sas.com kind: CASDeployment name: .* version: v1alpha1 - path: site-config/compute-server-add-nfs-mount.yaml target: labelSelector: sas.com/template-intent=sas-launcher version: v1 kind: PodTemplate - path: site-config/compute-server-annotate-podtempate.yaml target: name: sas-compute-job-config version: v1 kind: PodTemplate secretGenerator: - name: sas-consul-config behavior: merge files: - SITEDEFAULT_CONF=site-config/sitedefault.yaml - name: sas-image-pull-secrets behavior: replace type: kubernetes.io/dockerconfigjson files: - .dockerconfigjson=site-config/crcache-image-pull-secrets.json configMapGenerator: - name: ingress-input behavior: merge literals: - INGRESS_HOST=gelcorp.pdcesx03145.race.sas.com - name: sas-shared-config behavior: merge literals: - SAS_SERVICES_URL=https://gelcorp.pdcesx03145.race.sas.com # # This is to fix an issue that only appears in very slow environments. # # Do not do this at a customer site - name: sas-go-config behavior: merge literals: - SAS_BOOTSTRAP_HTTP_CLIENT_TIMEOUT_REQUEST='15m' - name: input behavior: merge literals: - IMAGE_REGISTRY=crcache-race-sas-cary.unx.sas.com
Now let’s rebuild and apply the Viya deployment manifest.
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_07-081-01.yaml
Generate the SAS Deployment Custom Resource
cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Restart the cas-shared-gelcorp server using the state transfer capability
Since we enabled the state transfer the cas-shared-gelcorp server you have now choices to restart the CAS server.
Choice 1: initiate the state transfer
All loaded tables and active CAS session will be kept.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be incremented by 1.kubectl patch casdeployment \ \ shared-gelcorp --type='json' \ -p='[{"op": "replace", "path": "/spec/startStateTransfer", "value":true}]' sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
Choice 2: delete the CAS server pods
All loaded tables and active CAS sessions will be lost.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be reset to 0.kubectl delete pod \ --selector="casoperator.sas.com/server==shared-gelcorp" sleep 60s kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready \ --timeout 15m
You should see these messages in the output.
pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met pod/sas-cas-server-shared-gelcorp-worker-2 condition met pod/sas-cas-server-shared-gelcorp-worker-4 condition met
Find the information regarding the new enabled binary and HTTP services for the cas-shared-gelcorp server.
kubectl get services \ --selector "casoperator.sas.com/server=shared-gelcorp" \ | grep -E "NAME | NodePort | LoadBalancer "
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE sas-cas-server-shared-gelcorp-bin NodePort 10.43.154.32 <none> 5570:21761/TCP 7m59s sas-cas-server-shared-gelcorp-http NodePort 10.43.141.106 <none> 8777:11115/TCP,80:27041/TCP,443:24107/TCP 7m59s
Access the cas-shared-gelcorp server using the binary service
Use the SAS 9.4 display manager to access the cas-shared-gelcorp server
Find the required connection information:
the host
echo "The required RACE machine name: $(hostname)"
the port
The information you got above regarding the binary service port is like
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE sas-cas-server-shared-gelcorp-bin NodePort 10.43.154.32 <none> 5570:21761/TCP 7m59s
The required port is the 5570 port forwarding value like the bold value in this example: “5570:21761/TCP”.
The commands bellow will provide you with the required value.
_binaryServicePort=$(kubectl get services \ --selector "casoperator.sas.com/server=shared-gelcorp" \ | grep "shared-gelcorp-bin" \ | awk '{printf $5}' \ | awk -F ":" '{printf $2}' \ | awk -F "/" '{printf $1}') echo "The required binary service port number: ${_binaryServicePort}"
Open the SAS 9.4 display manager
Open the
CASServer_TestExternalAccess.sas
program and modify it using the connection information (host and port) you get above.The provided
CASServer_TestExternalAccess.sas
program required to be modify to set the required host and port parameters.You have to replace the template values by the value you found in a step above.
- Replace
<machinename>
by the host short name - Replace
<port>
by the binary service port number
- Replace
Execute it to validate the connection to the cas-shared-gelcorp server using its binary service.
Now you just have to submit the SAS code by pressing the
F3
keyYou can see in the
Log
window that it was possible to connect to the cas-shared-gelcorp server through its binary service from a SAS 9.4 client.If you are more curious, you can look at the CAS session details and the mounted CAS libraries content.
Access the cas-shared-gelcorp server using the HTTP service
In this step of the hands-on, you will have to test the HTTP service connection to the cas-shared-gelcorp server by listing the CAS server nodes.
You will do it from:
- The Windows machine (sas-client) using the Postman application.
- The Linux machine through a MobaXterm session using curl.
On Windows, use Postman query to access the cas-shared-gelcorp server through its HTTP service
Using the Postman application, test the cas-shared-gelcorp server HTTP service access.
Open Postman on your sas-client (Windows) machine
Update the
RACE
environment{{HTTPServicePort}}
current variable valueA Postman
RACE
environment was created for you with some variables that will be used by the Postman queries.The
{{HTTPServicePort}}
variable values (initial and current) were set toservicePort
.You have to replace the
{{HTTPServicePort}}
variable current value with the value returned by running this command in MobaXterm.The information you got above regarding the HTTP service port is like
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE sas-cas-server-shared-gelcorp-http NodePort 10.43.141.106 <none> 8777:11115/TCP,80:27041/TCP,443:24107/TCP 7m59s
The required port is the 443 port forwarding value like the bold value in this example: “8777:11115/TCP,80:27041/TCP,443:24107/TCP”.
The commands bellow will provide you with the required value.
_HTTPServicePort=$(kubectl get services \ --selector "casoperator.sas.com/server=shared-gelcorp" \ | grep "shared-gelcorp-http" \ | awk '{printf $5}' \ | awk -F "," '{printf $3}' \ | awk -F ":" '{printf $2}' \ | awk -F "/" '{printf $1}') echo "The required HTTP service port number: ${_HTTPServicePort}"
Copy this returned value to the
CURRENT VALUE
of the{{HTTPServicePort}}
variable.Then save the modified
RACE
Postman environment.Run the
Get NodeNames
query from theHTTP Service
Postman collectionA Postman collections were created for you to query the the cas-shared-gelcorp server.
At this step, open the
HTTP Service
collection and ensure that theRACE
Postman environment is selected.Then just send the query to get the result.
On Linux, use a curl query to access the cas-shared-gelcorp server
From a MobaXterm session, run the command bellow to test the cas-shared-gelcorp server HTTP service using curl.
_HTTPServicePort=$(kubectl get services \ --selector "casoperator.sas.com/server=shared-gelcorp" \ | grep "shared-gelcorp-http" \ | awk '{printf $5}' \ | awk -F "," '{printf $3}' \ | awk -F ":" '{printf $2}' \ | awk -F "/" '{printf $1}') echo ${_HTTPServicePort} curl --user geladm:lnxsas https://gelcorp.$(hostname -f):${_HTTPServicePort}/cas/nodeNames
[ "controller.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-1.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-0.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-3.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local", "worker-2.sas-cas-server-shared-gelcorp.gelcorp.svc.cluster.local" ]
Lessons learned
By default in each SAS Viya deployment, the HTTP ingress is enabled for all CAS servers. This HTTP ingress allows access to a CAS server from outside the SAS Viya deployment using the CAS REST-API only. It is impossible to access directly the CAS server using a binary protocol from any SAS clients.
The SAS VIya administrator can decide to enable two services to allow access to a CAS server from outside the SAS Viya deployment: the binary and HTTP services. These services are Kubernetes nodePort services by default, but can be configure to use a loadBalancer (please refer to the documentation for this kind of settings).
The binary and HTTP services can be enabled individually or simultaneously.
Enabling the binary and HTTP services requires to regenerate tke
SASDeploynment
custom resource, or thesite.yaml
files, and apply it then to restart the CAS servers.
SAS Viya Administration Operations
Lesson 07, Section 6 Exercise: Remove a CAS Server
Remove a CAS Server (non cas-shared-default) - OPTIONAL
In this exercise you will remove the cas-shared-gelcorp server from your Viya deployment.
Table of content
Set the namespace
gel_setCurrentNamespace gelcorp
Remove the cas-shared-gelcorp server
Delete the cas-shared-gelcorp server
CASDeployment
List current
CASDeployments
kubectl get casdeployments
You should see…
NAME AGE default 10d shared-gelcorp 3h25m
Delete the cas-shared-gelcorp server
CASDeployment
kubectl delete casdeployments \ shared-gelcorp
You should see…
casdeployment.viya.sas.com "shared-gelcorp" deleted
Validate that the cas-shared-gelcorp server
CASDeployment
was deletedkubectl get casdeployments
You should see…
NAME AGE default 10d
If you do not plan to reuse the shared-gelcorp server, you can delete the cas-shared-gelcorp server manifests
rm -rf ~/project/deploy/${current_namespace}/site-config/cas-shared-gelcorp ls -al ~/project/deploy/${current_namespace}/site-config/*shared-gelcorp*
Using your favorite text editor, remove (or comment) the cas-shared-gelcorp server manifests references from the
kustomize.yaml
file inside the current project directory.Backup the current
kustomization.yaml
file.cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_07-061-01.yaml
Remove (or comment) this references from the
kustomization.yaml
file:- In
resources
section: - site-config/cas-shared-gelcorp - In
transformers
section: - site-config/cas-manage-topology-shared-gelcorp.yaml
[[ $(grep -c "site-config/cas-shared-gelcorp" ~/project/deploy/${current_namespace}/kustomization.yaml) == 1 ]] && \ _itemIndex=$(yq4 eval '.resources.[] | select(. == "site-config/cas-shared-gelcorp") | path | .[1]' ~/project/deploy/${current_namespace}/kustomization.yaml) && \ yq4 eval -i 'del.resources.['${_itemIndex}']' ~/project/deploy/${current_namespace}/kustomization.yaml
- In
Check that the update was applied
yq4 eval '.resources.[] | select(. == "*shared-gelcorp*")' ~/project/deploy/${current_namespace}/kustomization.yaml
You should see no references to these CAS Server manifests in the
kustomization.yaml
on the current project directory.Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_07-061-01.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Look at the existing
CASDeployment
custom resourceskubectl get casdeployment
You should see…
NAME AGE default 3h8m
You can now look at the status of all CAS server pods by running this command.
kubectl get pods \ --selector="app.kubernetes.io/managed-by==sas-cas-operator" \ -o wide
You should see something like…
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sas-cas-server-default-controller 3/3 Running 0 3d13h 10.42.2.74 intnode03 <none> <none>
Only the cas-shared-default server is now started and ready to be used. No more cas-shared-gelcorp server.
Lessons learned
It is easy to remove a CAS Server from your Viya deployment.
Deleting the CAS Server CAS Deployment is a proven practice that guarantees that the CAS Server will become inactive immediately (if not the CAS Server will stay active until the new SASDeployment custom resource will be fully applied).
Removing the CAS Server manifests from the
kustomization.yaml
file is mandatory.Deleting the CAS Server manifests from the
site-config
directory is not mandatory. These manifests can be preserved specifically if you plan to re-onboard th2 CAS Server in a near future.Run the
sas-orchestration
deploy command is mandatory.
Never remove the cas-shared-default server since it is required for the Viya deployment to be fully functional.
SAS Viya Administration Operations
Lesson 08, Section 0 Exercise: Start and Stop SAS Viya
Starting and Stopping SAS Viya
In this exercise we will walk through the process for stopping and automatically restarting individual Viya pods, as well as completely stopping and then starting the entire Viya deployment. We will also look the commands that can be run to monitor the progress of the shutdown and startup sequences and validate readiness of the deployment.
Table of contents
Set the namespace, the sas-viya CLI profile, and authenticate
Set the current namespace and log on.
gel_setCurrentNamespace gelcorp /opt/pyviyatools/loginviauthinfo.py
Restart a pod
In this step, we will try restarting one individual pod, but the same process can be used to restart a set of pods or all pods in the namespace.
View the list of pods in your namespace.
kubectl get pods
Delete a single pod. For example, kill the sas-transfer pod.
kubectl delete pod -l app=sas-transfer
Try viewing the list of pods again, this time filtering using the label for the sas-transfer pod only.
kubectl get pods -l app=sas-transfer
NAME READY STATUS RESTARTS AGE sas-transfer-59bc4c9966-mwnvm 1/1 Running 0 77s
Note that the transfer pod appears to be running, but the AGE column has been updated. The pod is newer than the other pods because it was automatically restarted when we deleted it.
Remember that Kubernetes uses ReplicaSets to maintain a stable set of running pods. If a pod dies (or is killed), a new, identical replica is automatically started to achieve the desired declared state.
NOTE: If intending to restart all pods in the namespace, remember to delete the pods and not the namespace. Deleting the namespace will delete all resources, including pods, secrets, services, deployments, replicasets and nothing will be automatically restarted. The namespace will need to be redeployed by building and applying the manifest file.
To restart all pods (do not run in the workshop environment), run:
kubectl delete pods --all
Stop the entire environment
It may sometimes be necessary to completely stop the SAS Viya deployment (all pods, jobs, etc.) without automatically restarting. For example, it may be necessary to stop in order to perform maintenance on the cluster nodes.
A Kubernetes cronjob is provided to perform start and stop operations whilst observing dependencies.
Create a job to run an ad-hoc stop operation using the included
sas-stop-all
cronjob.kubectl create job sas-stop-all-`date +%s` --from cronjobs/sas-stop-all -n gelcorp
Expected output:
job.batch/sas-stop-all-1637717983 created
Follow the pod log to check the status of the stop operation
kubectl logs --follow \ $(kubectl get po --no-headers -l "job-name=$(kubectl get job |grep sas-stop-all |awk '{print $1}')" | awk '{print $1'}) | gel_log
Note that the log output indicates that the stop operation performs tasks such as stopping the operators, suspending jobs, and scaling deployments to zero replicas.
The stop operation is finished when the log displays the message ‘The lifecycle run command completed successfully’.
View the pods that are still running after the stop operation has completed.
kubectl get pods
Which components are still running?
The remaining pods are pods from previously executed jobs (note that ‘0/1’ containers are ready of these pod, and their status is ‘Completed’). The exception is the Prometheus Pushgateway, which is a monitoring component; it continues to run in the namespace and is not included in the stop or start lifecycle operations. If any Compute sessions were running (e.g. if an users were logged in to SAS Studio), the pods for those sessions will also be running. Log out of those sessions (or delete the pods) to terminate them.
Start the environment
Create a job to immediately run an ad-hoc start operation using the included
sas-start-all
cronjob.kubectl create job sas-start-all-`date +%s` --from cronjobs/sas-start-all -n gelcorp
Expected output:
job.batch/sas-start-all-1637719104
Follow the pod log to monitor the status of the start operation.
kubectl logs --follow \ $(kubectl get po --no-headers -l "job-name=$(kubectl get job |grep sas-start-all |awk '{print $1}')" | awk '{print $1'}) | gel_log
Note that the log output indicates that the start operation performs tasks such as starting the operators, resuming jobs, and scaling deployments back up.
Note: Messages like the following may appear in the log:
JSON path '{.spec.replicas}' in resource 'apps/v1' 'Deployment' 'sas-data-profiles': replicas is not found
This indicates the start operation is attempting (but failing) to find the previous replica count from the deployment spec. In this case, the pod is started with a default of 1 replica. These messages can be safely ignored.
The start operation is finished when the log displays the message:
The lifecycle run command completed successfully
.Although the start operation has finished executing, pods are still starting in the background. View the status of the pods in the Viya namespace.
kubectl get pods
Note that many pods are still starting (with 0/1 containers ready). Continue to the next step to monitor the startup process.
Monitoring startup progress
There are several ways to monitor the startup process. This is one example, but you can choose from many applications or operating system tools to view the progress of startup and shut down procedures.
Try monitoring the starting pods by using sas-readiness with tmux. First, open a new MobaXterm tab and run the following to define a tmux session to watch the pods.
SessName=gelcorp_watch NS=gelcorp tmux new -s $SessName -d tmux send-keys \ -t $SessName "watch 'kubectl get pods -o wide -n ${NS} | grep 0/ | grep -v Completed ' " C-m tmux split-window -v -t $SessName tmux send-keys \ -t $SessName "watch -n 5 \"kubectl -n ${NS} logs -l app=sas-readiness |gel_log | tail -n 1 \"" C-m tmux split-window -v -t $SessName tmux send-keys \ -t $SessName "kubectl wait -n ${NS} --for=condition=ready pod -l app=sas-readiness --timeout=2700s" C-m
Attach to the tmux session.
tmux a -t ${SessName}
Watch the pods, ensuring they disappear from the top pane as they return to Running state. The centre and bottom panes show the output of the readiness checks. Ensure they are completed to indicate the deployment is ready for use again.
When all pods are started (after approximately 15 minutes), detach from the tmux session by pressing Ctrl + b, then d.
Execute the gel_ReadyViya4 function as a final validation test. This function has been created to provide an easy way to query readiness and stability of the Viya deployment.
gel_ReadyViya4 -n gelcorp -r 60 -rs 10
In the command above, the
-r
flag specifies the number of minutes to wait for the first ready message, and the-rs
flag defines the sensitivity (deployment is considered ‘ready’ even if there are this number of non-responsive endpoints).In output, look for the following messages:
NOTE: All checks passed. Marking as ready. The first recorded failure was 16m3s ago. NOTE: Readiness detected based on parameters used.
Note: In a customer environment, you can use either of the above approaches to monitor/check readinesss of your deployment.
SAS Viya Administration Operations
Lesson 08, Section 1 Exercise: Apply a Patch Update
Applying Patch Updates
The Deployment Operator can be used to update your Viya deployment automatically. In addition to updating the Viya software version, it can also be used to update licenses, add or remove products, and switch cadences. In this exercise, you will check for and apply any patches that may be available for the deployed version of SAS Viya using the Deployment Operator.
Table of contents
Prerequisite steps
The output of the command that queries the sas-deployment ConfigMap also displays information about the cadence name, version and release.
Run the command to check the release number of the SAS software running in your environment:
kubectl -n gelcorp get cm -o yaml | grep 'CADENCE' | head -8
Expected output should be similar to this except for the SAS_CADENCE_RELEASE:
SAS_BASE_CADENCE_NAME: stable SAS_BASE_CADENCE_VERSION: "2024.03" SAS_CADENCE_DISPLAY_NAME: Long-Term Support 2024.03 SAS_CADENCE_DISPLAY_SHORT_NAME: Long-Term Support SAS_CADENCE_DISPLAY_VERSION: "2024.03" SAS_CADENCE_NAME: lts SAS_CADENCE_RELEASE: "20240930.1727729121985" SAS_CADENCE_VERSION: "2024.03"
As shown in the output, this version of LTS 2024.03 is based on the Stable 2024.03 version, indicated by
SAS_BASE_CADENCE_NAME
andSAS_BASE_CADENCE_VERSION
.Also note the value of
SAS_CADENCE_RELEASE
, which indicates the specific release number (the most granular level). This is the ‘patch level’ that has been applied.Next, check if there are any patches available for the deployed version using the Update Checker.
The Update Checker is a job that runs on a schedule (as a Kubernetes cronjob), but it can be run on-demand as an ad-hoc job.
Create the ad-hoc job from the Kubernetes cronjob.
kubectl create job --from=cronjob/sas-update-checker update-checker-manual
You should see:
job.batch/update-checker-manual created
View the job’s pod log to see the Update Checker report output:
kubectl logs -f $(kubectl get pods | grep update-checker-manual | awk '{print $1}' ) | gel_log
Note: The output of the report provides an alternative way to check the cadence, version and release of SAS software you have deployed.
The output may indicate the availability of available patches, but there may not be any available depending on the specific release of SAS Viya you are running and whether or not any newer releases are available for your deployed version the at the time of running the Update Checker.
If patches are available, they are indicated by a line in the report output indicating availability of a new version (a similar line will indicate whether an update is available for your deployed cadence):
New release available for deployed version Long-Term Support 2024.03: Long-Term Support 2024.03 20240930.1727729121985.
Additional detail is also displayed for releases available for individual products.
If there are not any new patches available, the following will be displayed in the output:
No new release available for deployed version Long-Term Support 2024.03.
If there are no patches available for your deployment, you may skip this exercise and only return to complete the subsequent tasks after the update checker job shows that patches are available. You will need to delete the
update-checker-manual
job and recreate/re-run it. Depending on when patches ship, there may be be a delay of several days before patches are available for your environment.Delete the update-checker job and its associated pod.
kubectl delete job update-checker-manual
The recommended practice is to download the latest Deployment Assets for the target release and review enclosed README.md files for any product-specific pre-update tasks that may apply. While there there are unlikely to be many (if any) manual steps when applying patches, updated software introduced in a patch may require manual steps to be performed (e.g. updates to kustomization.yaml).
In this exercise, in order to demonstrate the patch update process in a simple way, you will use the existing Deployment Assets that were used for the initial deployment of the environment.
Click here to view the typical process for downloading new assets (not required for this exercise)
- Backup the existing $deploy directory.
- Download and copy new deployment assets to the $deploy directory
- Delete the
sas-bases
directory withrm -rf sas-bases
- Extract the new assets
- Review README files in
sas-bases
for applicable product-specific configuration changes that need to be performed
Applying the latest patch release
Updating your software (to a new release/patch or version) with the
Deployment Operator requires that you specify in the SASDeployment CR
file the release that you would like to apply. You can also apply the
latest available patch release (instead of a specific release)
by simply inserting a blank release number for the
cadenceRelease
parameter in the CR file as per the below
instructions.
Deploy
Perform the update by running the sas-orchestration deploy task. Note the value of the
--cadence-release
parameter; in a customer environment, the value for this property should be set to the release number matching your deployment assets (which can be obtained from$deploy/sas-bases/.orchestration/cadence.yaml
). For this exercise, we have set it to a value of""
, which will result in the latest available release being applied.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --cadence-release "" \ --image-registry ${_viyaMirrorReg}
The deployment may take around 10 minutes to complete. When it is done, you will see the following in the terminal:
Applying manifests complete The deploy command completed successfully
Validation
Verify the release number of SAS Viya software you are now running by submitting the following command:
kubectl -n gelcorp get cm -o yaml | grep 'CADENCE' | head -8
The output should indicate that your are now running a later
SAS_CADENCE_RELEASE
of Viya software than you were running when you executed this command at the beginning of the exercise.SAS_BASE_CADENCE_NAME: stable SAS_BASE_CADENCE_VERSION: "2024.03" SAS_CADENCE_DISPLAY_NAME: Long-Term Support 2024.03 SAS_CADENCE_DISPLAY_SHORT_NAME: Long-Term Support SAS_CADENCE_DISPLAY_VERSION: "2024.03" SAS_CADENCE_NAME: lts SAS_CADENCE_RELEASE: "20240930.1727729121985" SAS_CADENCE_VERSION: "2024.03"
Note: Another way to verify the release of software you are now running is to re-run an ad-hoc Update Checker job.
Execute the gel_ReadyViya4 function as an additional check of readiness and stability after the update.
gel_ReadyViya4 -n gelcorp -r 60 -rs 10
In the output, look for the following messages:
log NOTE: All checks passed. Marking as ready. The first recorded failure was 3m8s ago. NOTE: Readiness detected based on parameters used.
Cleanup
IMPORTANT: This step must be run to avoid issues with later exercises.
Execute the following command as
cloud-user
in your MobaXterm terminal session:~/PSGEL260-sas-viya-4.0.1-administration/scripts/gel_tools/gel_getSaveSASViyaDeploymentCadence.sh
SAS Viya Administration Operations
Lesson 08, Section 2 Exercise: Update to a New Version
Updating SAS Software
In this task, you will change the cadence of your Viya deployment
from LTS to Stable using the sas-orchestration deploy
task.
Table of contents
Switch Cadence
You can view the deployed version information by running:
kubectl get cm -o yaml | grep CADENCE | head -8
You should see output similar to this:
SAS_BASE_CADENCE_NAME: stable SAS_BASE_CADENCE_VERSION: "2024.03" SAS_CADENCE_DISPLAY_NAME: Long-Term Support 2024.03 SAS_CADENCE_DISPLAY_SHORT_NAME: Long-Term Support SAS_CADENCE_DISPLAY_VERSION: "2024.03" SAS_CADENCE_NAME: lts SAS_CADENCE_RELEASE: "20240930.1727729121985" SAS_CADENCE_VERSION: "2024.03"
As shown in the output, you are running LTS version 2024.03 as shown in the
SAS_CADENCE_DISPLAY_NAME
field. Also note that this version is based on the Stable 2024.03 version, indicated bySAS_BASE_CADENCE_NAME
andSAS_BASE_CADENCE_VERSION
.There are some restrictions to consider when switching cadence. When moving from LTS to Stable, it is a requirement that the target Stable version is at the same as or newer version than the source LTS version.
From the source Long-Term Support yyyy.10 version deployed in the workshop environment, you can perform a single update to one of the following Stable versions: yyyy.11, yyyy.12, yyyy.01, yyyy.02, and yyyy.03. In the following task, the cadence will be switched to Stable 2023.11.
Retrieve new deployment assets. Prior to an update, deployment assets for the target version must be downloaded and the enclosed README.md files should be reviewed for any product-specific pre-upgrade tasks (for example, tasks that require updates to the kustomization.yaml file).
Typically, new assets for the target version must be downloaded from my.sas.com or using the viya4-orders-cli. In the workshop environment, the new assets have already been downloaded.
Copy the new assets into your $deploy directory.
Backup the existing $deploy directory.
cp -pr ~/project/deploy/gelcorp ~/project/deploy/gelcorp_07-031
Copy the new deployment assets to the $deploy directory.
cp -p /mnt/workshop_files/workshop_content/updating-data/SASViyaV4_9CV11D_stable_2024.03_*_deploymentAssets_*.tgz ~/project/deploy/gelcorp
Delete
sas-bases
, and extract the new assets.cd ~/project/deploy/gelcorp rm -rf sas-bases tar xfv ~/project/deploy/gelcorp/SASViyaV4_9CV11D_stable_2023.05*
At this point in the process, any relevant pre-upgrade tasks would typically be performed. These may include tasks such as:
- Downloading the correct version of the sas-orchestration image or updating the SAS Deployment Operator
- Product-specific configuration tasks as outlined in README files in the $deploy directory
- Changes to configure your environment to use a mirror registry
- Regenerating CAS Servers (for multi-tenant environments and deployments with multiple CAS servers)
- Pausing SingleStore (if deployed)
- Before Deployment steps documented in the Deployment Notes for your target version, as well as all applicable interim versions, excluding those that are marked as not applicable to deployments that were deployed with sas-orchestration (these will be automatically performed by the sas-orchestration deploy task).
IMPORTANT: Note that the version of deployment assets used does not impact the version of the software that is downloaded as part of the update. Software updates are downloaded directly from the image registry for the cadence version specified in the deploy command, regardless of the version of deployment assets being used. However, product-specific configuration changes can be specific to a particular version of Viya (e.g. changes to TLS that were delivered in 2020.1.3 required additions/deletions from kustomization.yaml). As such, it is recommended to download the latest deployment assets and to carefully review the README files for configuration tasks and manual steps that may be applicable.
For this particular upgrade of the workshop environment from LTS 2024.03 to Stable 2024.06, no pre-upgrade task are necessary (no revelant tasks in the Deployment Notes for Stable 2024.06).
Deploy
Before you begin the deployment, you must copy the value that is shown for
release
in the$deploy/sas-bases/.orchestration/cadence.yaml
file from the deployment assets. The following command will store the value in a variable you can refer to in the next step.cadenceReleaseNum=$(yq r sas-bases/.orchestration/cadence.yaml spec.release)
Run the sas-orchestration deploy command to start the upgrade, ensuring you insert the appropriate values for the
--cadence-name
and--cadence-version
parameters. Remember to also insert the target release number (matching your deployment assets) as the value for the--cadence-release
flag.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name "stable" \ --cadence-version "2024.06" \ --cadence-release ${cadenceReleaseNum} \ --image-registry ${_viyaMirrorReg}
When the deployment completes, you will see the following in the terminal:
Applying manifests complete The deploy command completed successfully
WARNING: If you got an error regarding the sas-orchestration tool like:
Error: Orchestration version is '1.93.2'; expected orchestration version is '1.97.>4-20230503.1683146435603
you must update the sas-orchestration tool to be able execute successfully the sas-orchestration deploy command above.
To update the sas-orchestration tool, you must execute these tasks:
In your environment, go the SAS Viya Platform deployment asset directory (in this workshop: /home/cloud-user/project/deploy/gelcorp/sas-bases)
Navigate to this directory: examples/kubernetes-tools/
Look at the Prerequisites section of the README.md file
Then run the two docker CLI commands below after extracting the required sas-orchestration tool image version:
Extract the required image version of the sas-orchestration tool:
_requiredVersion=$(grep "docker pull" ~/project/deploy/gelcorp/sas-bases/examples/kubernetes-tools/README.md | awk -F ":" '{print $2}')
Load/Pull the required image.
Note: In the docker commands below, we change the sas-orchestration tool docker image repository, since we use a mirror repository in this workshop.
docker pull crcache-race-sas-cary.unx.sas.com/viya-4-x64_oci_linux_2-docker/sas-orchestration:${_requiredVersion}
Tag the new image. This is required to use the sas-orchestration tool pod without having to pass the version of the image (like an alias).
docker tag crcache-race-sas-cary.unx.sas.com/viya-4-x64_oci_linux_2-docker/sas-orchestration:${_requiredVersion} sas-orchestration
The new version of the sas-orchestration tool is now updated and available in your environment. You can re-execute the sas-orchestration deploy command above.
Post-update Tasks
When the update has completed successfully, once again refer to the Deployment Notes and perform any documented After Deployment Commands for interim and target version. For this workshop, there are no manual post-update tasks to perform.
Validation
Verify that the cadence has been switched in the output of the previous command. You can also verify the cadence version of SAS Viya you are now running by submitting the following command:
kubectl -n gelcorp get cm -o yaml | grep 'CADENCE' | head -6
The output should indicate that your are now running a later
SAS_CADENCE_VERSION
of Viya software than you were running when you executed this command at the beginning of the exercise.SAS_CADENCE_DISPLAY_NAME: Stable 2024.06 SAS_CADENCE_DISPLAY_SHORT_NAME: Stable SAS_CADENCE_DISPLAY_VERSION: "2024.06" SAS_CADENCE_NAME: stable SAS_CADENCE_RELEASE: "20240612.1702438953990" SAS_CADENCE_VERSION: "2024.06"
Note: Now you have switched to stable cadence, there is no
SAS_BASE_CADENCE_NAME
field. Only LTS versions are based on earlier Stable cadence versions.Execute the gel_ReadyViya4 function as an additional check of readiness and stability after the update.
gel_ReadyViya4 -n gelcorp -r 60 -rs 10
In the output, look for the following messages:
log NOTE: All checks passed. Marking as ready. The first recorded failure was 14m29s ago. NOTE: Readiness detected based on parameters used.
Another way to verify the release of software you are now running is to re-run an ad-hoc Update Checker job (delete the previously run job if necessary).
Cleanup
IMPORTANT: This step must be run to avoid issues with later exercises.
Execute the following command as
cloud-user
in your MobaXterm terminal session:~/PSGEL260-sas-viya-4.0.1-administration/scripts/gel_tools/gel_getSaveSASViyaDeploymentCadence.sh
SAS Viya Administration Operations
Lesson 08, Section 3 Exercise: Update a License
Updating Licenses
In this exercise, we will renew the SAS license. Considering SAS Viya’s cadence lifecycle and the fact that deployment assets include a license, the license will always be current on a Stable cadence deployment as long as the deployment complies with the support policy (must be no more than four months/versions old). Licenses only need to be renewed on deployments running LTS cadence that are not updated at least once per year.
Table of contents
Set the namespace, the sas-viya CLI profile, and authenticate
Set current namespace and log on.
gel_setCurrentNamespace gelcorp /opt/pyviyatools/loginviauthinfo.py
Review existing license
First, review the existing license file and product expiry dates.
Get the URL for SAS Environment Manager and then click the link in the terminal window.
gellow_urls | cat "SAS Environment Manager""
Log on as geladm:lnxsas and navigate to the Licenses area and review the list of products and expiration dates.
Apply new license
Renewal licenses are typically available from the customer’s My SAS portal, on the Orders page. For this hands-on activity, the renewal license has already been obtained from the portal and saved as renewal-license.jwt on the sasnode1 server in the cluster.
Copy the renewal license file to the existing license * directory(inthe*deploy directory).
cp /mnt/workshop_files/workshop_content/updating-data/renewal-license.jwt \ /home/cloud-user/project/deploy/license/
Deploy
Run the sas-orchestration deploy command to start the license update, ensuring you pass in the path to the new license for the
--license
parameter.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license ./license/renewal-license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --cadence-release "" \ --image-registry ${_viyaMirrorReg}
Validation
Use the sas-viya CLI’s licenses plugin to verify the license has been updated.
sas_viya --output text licenses products list
Verify that the expiry dates have been extended.
log Product Name Product ID Status Max CPU Count Expiration Date (UTC) Grace Period End (UTC) Warning Period End (UTC) Base SAS 0 current No CPU limit 2026-11-11 2026-12-26 2027-02-09 SAS/STAT 1 current No CPU limit 2026-11-11 2026-12-26 2027-02-09 SAS/GRAPH 2 current No CPU limit 2026-11-11 2026-12-26 2027-02-09 Enterprise Miner Server 50 current No CPU limit 2026-11-11 2026-12-26 2027-02-09 SAS/Secure 94 current No CPU limit 2026-11-11 2026-12-26 2027-02-09 Cloud Analytic Services SAS Client 1000 current No CPU limit 2026-11-11 2026-12-26 2027-02-09 ...
SAS Viya Administration Operations
Lesson 08, Section 5 Exercise: Relocate SASWORK
Relocate SAS Programming Run-Time Temporary Files
In this exercise, we will relocate temporary files created for launched compute, connect and batch sessions, from an emptyDir in the pod to a hostpath on the node at /viyavolume. We will demonstrate the change with a SAS Batch program, but the change also effects SAS Compute and launched SAS Connect sessions. This does not effect spawned SAS Connect sessions.
In this hands-on exercise
- Open a second terminal session
- Submit a batch job which creates a SASWORK dataset
- Find that dataset in SASWORK on a node’s host filesystem
- Create a /viyavolume directory on each node
- Relocate the viya volume in programming run-time pods to use /viyavolume
- Submit the SAS batch job which creates a dataset in SASWORK again
- Find dataset in SASWORK again
- OPTIONAL:
See left-over temporary files and directories
- See /viyavolume directory structure left by the SAS batch jobs we ran
- Exercise for the reader: See /viyavolume directory structure while a SAS Studio session is running, and after it finishes
- Exercise for the reader: See /viyavolume directory structure left behind by a crashed SAS programming run-time session
Open a second terminal session
In MobaXterm, you might have one SSH connection to sasnode01 as cloud-user open already. If not, open one now.
Then, open a second SSH connection to sasnode01 as cloud-user so that you have two open at the same time. We will use both SSH connections in this exercise.
Submit a batch job which creates a SASWORK dataset
In MobaXterm, in your first SSH connection to sasnode01 as cloud-user, run the bash commands below, all at once:
tee /shared/gelcontent/gelcorp/shared/code/create_data_and_sleep.sas > /dev/null << EOF data dummy_data; do i=1 to 11; j=ranuni(1234); output; end; run; /* Keep the session open for five minutes (5 x 60 seconds) */ data _null_; call sleep(5,60); run; EOF ls -al /shared/gelcontent/gelcorp/shared/code/create_data_and_sleep.sas cat /shared/gelcontent/gelcorp/shared/code/create_data_and_sleep.sas
Then paste and run these commands all at once, to submit
create_data_and_sleep.sas
to run as a batch job:_sasprogram=/gelcontent/gelcorp/shared/code/create_data_and_sleep.sas # Run the SAS program as a batch job gel_sas_viya batch jobs submit-pgm --rem-pgm-path ${_sasprogram} --context default --watchoutput --waitnoresults --results-dir /tmp
Note: This program creates a temporary dataset, then sleeps for five minutes.
Watch the output from the command above until you see SAS log output begin to appear, but do not wait for the program to finish running!
Leave this terminal tab open with the program still running.
Find that dataset in SASWORK on a node’s host filesystem
In your second connection to sasnode01, while the SAS program
create_data_and_sleep.sas
is still running in your first connection, run this ansible command:ansible 'sasnode*' -b -m shell -a 'find /var/lib /viyavolume -type f -name "dummy_data.sas7bdat" -print | xargs --no-run-if-empty sudo ls -al'
Example results when create_data_and_sleep.sas is running
Note: In the example results below,
dummy_data.sas7bdat
was on sasnode05. Our workshop cluster is configured so that batch jobs can run on any node.sasnode05 | CHANGED | rc=0 >> -rw-r--r-- 1 geladm sasadmins 131072 Sep 19 13:18 /var/lib/kubelet/pods/63a7c2c3-4c3e-4eff-8d8b-1ace89ac3498/volumes/kubernetes.io~empty-dir/viya/tmp/batch/default/SAS_workC299000001D9_sas-batch-server-8aa7518e-450c-4fce-bb2f-69da17eeba16-23/dummy_data.sas7bdatfind: ‘/viyavolume’: No such file or directory sasnode03 | CHANGED | rc=0 >> find: ‘/viyavolume’: No such file or directory sasnode02 | CHANGED | rc=0 >> find: ‘/viyavolume’: No such file or directory sasnode04 | CHANGED | rc=0 >> find: ‘/viyavolume’: No such file or directory sasnode01 | CHANGED | rc=0 >> find: ‘/viyavolume’: No such file or directory
The dummy_data.sas7bdat file is at a path something like:
/var/lib/kubelet/pods/63a7c2c3-4c3e-4eff-8d8b-1ace89ac3498/volumes/kubernetes.io~empty-dir/viya/tmp/batch/default/SAS_workC299000001D9_sas-batch-server-8aa7518e-450c-4fce-bb2f-69da17eeba16-23/dummy_data.sas7bdat
This path begins with
/var/lib/kubelet/pods/<guid>/volumes/kubernetes.io~empty-dir/viya
, which is where our RKE Kubernetes deployment has created the host path for the pod’s emptyDir namedviya
. It is under Kubernetes’ control, and we are not really supposed to be doing anything with this directory.Below that path, the SAS_work directory is at …
/tmp/batch/default/SAS_work<session_id>_<sas-launcher-pod-name>
.
Create a /viyavolume directory on each node
Next we will create a directory on each node in our cluster, which we can mount into launched programming run-time pods as the ‘viya’ volume. This volume is where temporary files are created by processes running inside the SAS programming run-time container within those pods. If we mount our own hostpath volume into the pods, it will be used instead of the default emptyDir.
Note: In our RACE environment we do not have a more performant or larger volume available. So we will just create a directory on the boot disk, at the root of each node’s filesystem, as
/viyavolume
. In a production cluster, you would mount larger or more performant storage to the relevant nodes in your cluster, at whatever path you choose, and that path would be what you specify in the .path property of the ‘viya’ volume definition in the PodTemplate overlay created in the next step, where you see/viyavolume
below.
Run this in MobaXterm, at the sasnode01 shell prompt, to create a directory on each node in our cluster to be the host path for viya volumes in any pods that have one and run on that node:
# Create volumes on each node for programming run-time temporary files ansible 'sasnode*' -b -m 'shell' -a 'mkdir -p /viyavolume/; chmod 1777 /viyavolume/' ansible 'sasnode*' -m 'shell' -a 'echo "List /viyavolume/ directory content"; ls -al /viyavolume/'
Relocate the viya volume in programming run-time pods to use /viyavolume
Run this to create a
patchTransformer
manifest to change the storage class for SAS Programming Run-time pods from emptyDir to nfs, and to add a viya volume pointing to the new directory path.# Relocate viya volume - sas-launcher-jobs tee ~/project/deploy/${current_namespace}/site-config/change-viya-volume-storage-class.yaml > /dev/null <<EOF apiVersion: builtin kind: PatchTransformer metadata: name: delete-viya-volume patch: |- apiVersion: v1 kind: PodTemplate metadata: name: change-viya-volume-storage-class template: spec: volumes: - \$patch: delete name: viya target: kind: PodTemplate labelSelector: "sas.com/template-intent=sas-launcher" --- apiVersion: builtin kind: PatchTransformer metadata: name: add-viya-volume patch: |- - op: add path: /template/spec/volumes/- value: name: viya hostPath: path: /viyavolume target: kind: PodTemplate labelSelector: "sas.com/template-intent=sas-launcher" EOF
Make a copy of your
kustomization.yaml
file, so we can see the effect of adding a line to it in the next step.cp -p ~/project/deploy/gelcorp/kustomization.yaml ~/project/deploy/gelcorp/kustomization-08-071-01.yaml
Update your kustomization.yaml to reference this PatchTransformer:
Modify
~/project/deploy/gelcorp/kustomization.yaml
to referencesite-config/change-viya-volume-storage-class.yaml
. The change-viya-volume-storage-class.yaml needs to be referenced before sas-bases/overlays/required/transformers.yaml:[[ $(grep -c "site-config/change-viya-volume-storage-class.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ sed -i '/sas-bases\/overlays\/required\/transformers.yaml/i \ \ \- site-config\/change-viya-volume-storage-class.yaml' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you could have manually edited the transformers section to add the reference as shown below.
transformers: ... - site-config/change-viya-volume-storage-class.yaml - sas-bases/overlays/required/transformers.yaml ...
Run the following command to view the change this yq4 command made to your
kustomization.yaml
. The changes are in green in the right column.icdiff -W ~/project/deploy/gelcorp/kustomization-08-071-01.yaml ~/project/deploy/gelcorp/kustomization.yaml
You should see
change-viya-volume-storage-class.yaml
in green on the right, beforetransformers.yaml
:Delete the
kustomization-08-071-01.yaml
file, so that it is not inadvertently included in gelcorp-sasdeployment.yaml in the next step.rm ~/project/deploy/gelcorp/kustomization-08-071-01.yaml
Build and Apply using SAS-Orchestration Deploy
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_08-071-01.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully, the final message should say ‘The deploy command completed successfully’ as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Submit the SAS batch job which creates a dataset in SASWORK again
In your first SSH connection to sasnode01 in MobaXterm, submit
create_data_and_sleep.sas
as a batch job:_sasprogram=/gelcontent/gelcorp/shared/code/create_data_and_sleep.sas # Run the SAS program as a batch job gel_sas_viya batch jobs submit-pgm --rem-pgm-path ${_sasprogram} --context default --watchoutput --waitnoresults --results-dir /tmp
Watch the output from the command above until you see SAS log output begin to appear, but do not wait for the program to finish running!
Leave this terminal tab open with the program still running.
Find dataset in SASWORK again
In your second connection to sasnode01, while the SAS program
create_data_and_sleep.sas
is still running in your first connection, run this ansible command:ansible 'sasnode*' -b -m shell -a 'find /var/lib /viyavolume -type f -name "dummy_data.sas7bdat" -print | xargs --no-run-if-empty sudo ls -al'
Example results when create_data_and_sleep.sas is running
Note: In the example results below,
dummy_data.sas7bdat
was on sasnode05 again. Our workshop cluster is configured so that batch jobs can run on any node.sasnode05 | CHANGED | rc=0 >> -rw-r--r-- 1 geladm sasadmins 131072 Sep 23 07:39 /viyavolume/tmp/batch/default/SAS_workD3C2000001DB_sas-batch-server-6a512e8a-be11-4739-9cb7-a5bb821d97b6-27/dummy_data.sas7bdat sasnode03 | CHANGED | rc=0 >> sasnode04 | CHANGED | rc=0 >> sasnode02 | CHANGED | rc=0 >> sasnode01 | CHANGED | rc=0 >>
The dummy_data.sas7bdat file is at a path something like:
/viyavolume/tmp/batch/default/SAS_workD3C2000001DB_sas-batch-server-6a512e8a-be11-4739-9cb7-a5bb821d97b6-27/dummy_data.sas7bdat
This path begins with
/viyavolume
, which the host path we asked the SAS programming run-time pods to use for the temporary volume namedviya
. It is under our control.Below that path, the SAS_work directory is at …
/tmp/batch/default/SAS_work<session_id>_<sas-launcher-pod-name>
, as it was before.
OPTIONAL: See left-over temporary files and directories
When the SAS Programming Run-time finishes and exits normally (i.e. not a crash), it does a reasonable job of deleting any temporary files left behind from the job. But it does leave some of the directory structure
- Wait for the batch job to finish running.
See /viyavolume directory structure left by the SAS batch jobs we ran
Run this command in either of your SSH connections to sasnode01. It shows directory structure and files (if there are any) left over after recent batch jobs and other launched programming run-time sessions have run and ended:
ansible 'sasnode*' -b -m shell -a 'tree /viyavolume/'
Expected results:
sasnode05 | CHANGED | rc=0 >> /viyavolume/ ├── log │ ├── batch │ │ └── default │ ├── compsrv │ │ └── default │ └── connectserver │ └── default ├── run │ ├── batch │ │ └── default │ │ └── uid4000 │ ├── compsrv │ │ └── default │ └── connectserver │ └── default ├── spool │ ├── batch │ │ └── default │ ├── compsrv │ │ └── default │ └── connectserver │ └── default └── tmp ├── batch │ └── default ├── compsrv │ └── default └── connectserver └── default 29 directories, 0 files sasnode03 | CHANGED | rc=0 >> /viyavolume/ 0 directories, 0 files sasnode02 | CHANGED | rc=0 >> /viyavolume/ 0 directories, 0 files sasnode04 | CHANGED | rc=0 >> /viyavolume/ 0 directories, 0 files sasnode01 | CHANGED | rc=0 >> /viyavolume/ 0 directories, 0 files
As you can see in the results (even if they do not exactly match the example above), a launched SAS programming run-time pod creates a whole directory structure for its temporary files, whether it will use all of them nor not:
At the top level of the
viya
volume in the pods, it creates subdirectories calledlog
run
spool
andtmp
.Below each of these, is a
batch
,compsrv
andconnectserver
subdirectoryBelow each of those is a
default
directory.Note: I believe the
default
directories are named for the compute/connect/batch context; we used the default batch context to run our program.
Most of those directories are empty, except /viyavolume/run/batch/default which contains an extra subdirectory, named uid4000. We ran our batch job as user geladm, whose POSIX uid in this environment is 4000.
Exercise for the reader: See /viyavolume directory structure while a SAS Studio session is running, and after it finishes
Start a SAS Studio session, and then re-run the ansible command above to see what files are present in the /viyavolume directory on the pod’s host node while a compute session is running.
Question: Are there files present in the temporary directory that you were not expecting to see?
Sign out of SAS Studio. Then re-run the same ansible command again, to see what is left behind after a compute session.
Exercise for the reader: See /viyavolume directory structure left behind by a crashed SAS programming run-time session
By now you know everything you need to try deliberately crashing a compute or batch session in a SAS Compute or SAS Batch pod, and then to find out what files it leaves in the /viyavolume directory on the pod’s host node.
Hint:
%macro oops; %abort abend; %mend; %oops;
SAS Viya Administration Operations
Lesson 08, Section 5 Exercise: Relocate CAS_DISK_CACHE
Configure a new location for CAS_DISK_CACHE
In this exercise you will reconfigure
cas-shared-gelcorp server to relocate its
CAS_DISK_CACHE
from the default location of
emptyDir
to a hostPath
volume location that
you will create on each Kubernetes node that you expect to host CAS
server pods.
Table of contents
- Set the namespace
- Identify the current location of CAS_DISK_CACHE
- Reconfigure CAS to relocate CAS_DISK_CACHE
- Lesson learned
Set the namespace
gel_setCurrentNamespace gelcorp
Identify the current location of CAS_DISK_CACHE
Before you make any changes let’s examine the CAS server configuration to verify where CAS_DISK_CACHE is currently located.
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Servers
page.Right-click on
cas-shared-gelcorp
and select theConfiguration
option.Navigate to the Nodes tab
Choose the
Controller
node, right-click on it, and select theRuntime Environment
option.Scroll through the Environment Variable table until you find the
CAS_DISK_CACHE
variable. You should see this:
Let’s try using kubectl to see if we can obtain the same information. Normally we would expect to find this information in a
CASENV_CAS_DISK_CACHE
environment variable on the CAS controller so let’s use kubectl to display and filter candidate variable values. Notice that we have specified in the command that we want to execute the command in thecas
container of the sas-cas-server-gelcorp-controller pod._CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- env \ | grep "CAS" \ | grep -v "SAS_"
You should see something like…
CASCFG_DQSETUPLOC=QKB CI 33 CASCFG_HOSTKNOWNBY=controller.sas-cas-server-shared-gelcorp.gelcorp CASENV_CAS_VIRTUAL_HOST=controller.sas-cas-server-shared-gelcorp.gelcorp CASKEY=ce087d3dbd1a5a39abe248a8e9b6a36d488e101550e016531b05b49b46c7687c CASCFG_DQLOCALE=ENUSA CASCFG_INITIALBACKUPS=1 CAS_POD_NAME=sas-cas-server-shared-gelcorp-controller CASCFG_MODE=mpp CASCONTROLLERHOST=controller.sas-cas-server-shared-gelcorp.gelcorp CASCFG_INITIALWORKERS=2 CASBACKUPHOST=backup.sas-cas-server-shared-gelcorp.gelcorp CASCLOUDNATIVE=1 CASENV_CAS_VIRTUAL_PATH=/cas-shared-gelcorp-http CAS_CLIENT_SSL_CA_LIST=/security/trustedcerts.pem CASENV_CONSUL_NAME=cas-shared-gelcorp CASENV_CAS_K8S_SERVICE_NAME=sas-cas-server-shared-gelcorp-client CASENV_CASDEPLOYMENT_SPEC_ALLOWLIST_APPEND=/cas/data/caslibs:/gelcontent:/mnt/gelcontent/ CASENV_CASDATADIR=/cas/data CASENV_CASPERMSTORE=/cas/permstore CASCFG_GCPORT=5571 CASENV_CAS_VIRTUAL_PROTO=http CASENV_CAS_VIRTUAL_PORT=8777 CASENV_CAS_LICENSE=/cas/license/license.sas
Remember that we are looking for the
CASENV_CAS_DISK_CACHE
variable value but it appears thatCASENV_CAS_DISK_CACHE
is not defined. This is because CAS_DISK_CACHE is using the defaultemptyDir
volume. In this case the CAS server uses the default/cas/cache
path to locateCAS_DISK_CACHE
in anemptyDir
volume.
Reconfigure CAS to relocate CAS_DISK_CACHE
To relocate CAS_DISK_CACHE you will need to
- Create the directories for the CAS_DISK_CACHE on each Kubernetes node used for CAS pods
- Create a
patchTransformer
manifest to- mount the directories to the CAS pods
- configure CAS to use the new CAS_DISK_CACHE location
- Add the
patchTransformer
manifest tokustomization.yaml
- Rebuild and apply your new SASDeployment custom resource to implement the changes.
Create the new directories for CAS_DISK_CACHE on each Kubernetes cluster node. In this case, we are going to use Ansible to create the directories on all of the nodes.
_casInstance=shared-gelcorp ansible 'sasnode*' \ -b \ -m 'shell' \ -a "mkdir -p /casdiskcache/${_casInstance}; chmod 777 /casdiskcache; chmod 777 /casdiskcache/${_casInstance};" ansible 'sasnode*' \ -b \ -m 'shell' \ -a "mkdir -p /casdiskcache/${_casInstance}/cdc01; mkdir -p /casdiskcache/${_casInstance}/cdc02; mkdir -p /casdiskcache/${_casInstance}/cdc03; mkdir -p /casdiskcache/${_casInstance}/cdc04; chmod 1777 /casdiskcache/${_casInstance}/*;" ansible 'sasnode*' \ -b \ -m 'shell' \ -a "ls -al /casdiskcache/${_casInstance};"
Now that the directories exist, create a
patchTransformer
manifest to mount the directories to CAS pods and to configure CAS to set the value for theCASENV_CAS_DISK_CACHE
environment variable so CAS will use the new directories.tee ~/project/deploy/${current_namespace}/site-config/cas-manage-casdiskcache-${_casInstance}.yaml > /dev/null << EOF # This patchTranformer file is created for the ${_casInstance} CAS server only --- # This block of code is for creating the CAS server mount point - node physical /casdiskcache/default mounted to container /casdiskcache apiVersion: builtin kind: PatchTransformer metadata: name: cas-add-host-mount-casdiskcache-${_casInstance} patch: |- - op: add path: /spec/controllerTemplate/spec/volumes/- value: name: casdiskcache hostPath: path: /casdiskcache/${_casInstance} - op: add path: /spec/controllerTemplate/spec/containers/0/volumeMounts/- value: name: casdiskcache mountPath: /casdiskcache target: group: viya.sas.com kind: CASDeployment # Target filtering, chose/uncomment one of these option: # To filter the default CAS server (cas-shared-default) only: #labelSelector: "sas.com/cas-server-default" # To filter another CAS server (casdeployments): #name: <CASInstanceName> name: ${_casInstance} # To filter all CAS servers: #name: .* version: v1alpha1 --- # This block of code is for adding environment variables for the CAS server. apiVersion: builtin kind: PatchTransformer metadata: name: cas-add-environment-variables-casdiskcache-${_casInstance} patch: |- - op: add path: /spec/controllerTemplate/spec/containers/0/env/- value: name: CASENV_CAS_DISK_CACHE value: "/casdiskcache/cdc01:/casdiskcache/cdc02:/casdiskcache/cdc03:/casdiskcache/cdc04" target: group: viya.sas.com kind: CASDeployment # Target filtering, chose/uncomment one of these option: # To filter the default CAS server (cas-shared-default) only: #labelSelector: "sas.com/cas-server-default" # To filter another CAS server (casdeployments): #name: <CASInstanceName> name: ${_casInstance} # To filter all CAS servers: #name: .* version: v1alpha1 EOF
Now add a reference to
cas-manage-casdiskcache-shared-gelcorp.yaml
in thekustomization.yaml
file.Backup the current
kustomization.yaml
file.cp -p ~/project/deploy/${current_namespace}/kustomization.yaml /tmp/${current_namespace}/kustomization_05-042-01.yaml
Use this
yq
command to add a reference to thecas-manage-casdiskcache-shared-gelcorp.yaml
manifest in thetransformers
field of the Viya deploymentkustomization.yaml
file. While the command may look complicated, it is simply adding the reference after making sure that the reference does not already exist.[[ $(grep -c "site-config/cas-manage-casdiskcache-${_casInstance}.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/cas-manage-casdiskcache-'${_casInstance}'.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Alternatively, you can update the Viya deployment
kustomization.yaml
file using your favorite text editor:[...] transformers: [... previous transformers items ...] - site-config/cas-manage-casdiskcache-shared-gelcorp.yaml [...]
Check that the update is in place.
cat ~/project/deploy/${current_namespace}/kustomization.yaml
Make sure that
site-config/cas-shared-site-config/cas-manage-casdiskcache-shared-gelcorp.yaml
exists in thetransformers
field of the Viya deploymentkustomization.yaml
file.Click here to see the output
--- namespace: gelcorp resources: - sas-bases/base # GEL Specifics to create CA secret for OpenSSL Issuer - site-config/security/gel-openssl-ca - sas-bases/overlays/network/networking.k8s.io # Using networking.k8s.io API since 2021.1.6 - site-config/security/openssl-generated-ingress-certificate.yaml # Default to OpenSSL Issuer in 2021.2.6 - sas-bases/overlays/cas-server - sas-bases/overlays/crunchydata/postgres-operator # New Stable 2022.10 - sas-bases/overlays/postgres/platform-postgres # New Stable 2022.10 - sas-bases/overlays/internal-elasticsearch # New Stable 2020.1.3 - sas-bases/overlays/update-checker # added update checker ## disable CAS autoresources to keep things simpler #- sas-bases/overlays/cas-server/auto-resources # CAS-related #- sas-bases/overlays/crunchydata_pgadmin # Deploy the sas-crunchy-data-pgadmin container - remove 2022.10 - site-config/sas-prepull/add-prepull-cr-crb.yaml - sas-bases/overlays/cas-server/state-transfer # Enable state transfer for the cas-shared-default CAS server - new PVC sas-cas-transfer-data - site-config/sas-microanalytic-score/astores/resources.yaml - site-config/gelcontent_pvc.yaml - site-config/cas-shared-gelcorp configurations: - sas-bases/overlays/required/kustomizeconfig.yaml transformers: - sas-bases/overlays/internal-elasticsearch/sysctl-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/startup/ordered-startup-transformer.yaml - site-config/cas-enable-host.yaml - sas-bases/overlays/required/transformers.yaml - site-config/mirror.yaml #- site-config/daily_update_check.yaml # change the frequency of the update-check #- sas-bases/overlays/cas-server/auto-resources/remove-resources.yaml # CAS-related ## temporarily removed to alleviate RACE issues - sas-bases/overlays/internal-elasticsearch/internal-elasticsearch-transformer.yaml # New Stable 2020.1.3 - sas-bases/overlays/sas-programming-environment/enable-admin-script-access.yaml # To enable admin scripts #- sas-bases/overlays/scaling/zero-scale/phase-0-transformer.yaml #- sas-bases/overlays/scaling/zero-scale/phase-1-transformer.yaml - sas-bases/overlays/cas-server/state-transfer/support-state-transfer.yaml # Enable state transfer for the cas-shared-default CAS server - enable and mount new PVC - site-config/change-check-interval.yaml - sas-bases/overlays/sas-microanalytic-score/astores/astores-transformer.yaml - site-config/sas-pyconfig/change-configuration.yaml - site-config/sas-pyconfig/change-limits.yaml - site-config/cas-add-nfs-mount.yaml - site-config/cas-add-allowlist-paths.yaml - site-config/cas-modify-user.yaml - site-config/cas-manage-casdiskcache-shared-gelcorp.yaml components: - sas-bases/components/crunchydata/internal-platform-postgres # New Stable 2022.10 - sas-bases/components/security/core/base/full-stack-tls - sas-bases/components/security/network/networking.k8s.io/ingress/nginx.ingress.kubernetes.io/full-stack-tls patches: - path: site-config/storageclass.yaml target: kind: PersistentVolumeClaim annotationSelector: sas.com/component-name in (sas-backup-job,sas-data-quality-services,sas-commonfiles,sas-cas-operator,sas-pyconfig) - path: site-config/cas-gelcontent-mount-pvc.yaml target: group: viya.sas.com kind: CASDeployment name: .* version: v1alpha1 - path: site-config/compute-server-add-nfs-mount.yaml target: labelSelector: sas.com/template-intent=sas-launcher version: v1 kind: PodTemplate - path: site-config/compute-server-annotate-podtempate.yaml target: name: sas-compute-job-config version: v1 kind: PodTemplate secretGenerator: - name: sas-consul-config behavior: merge files: - SITEDEFAULT_CONF=site-config/sitedefault.yaml - name: sas-image-pull-secrets behavior: replace type: kubernetes.io/dockerconfigjson files: - .dockerconfigjson=site-config/crcache-image-pull-secrets.json configMapGenerator: - name: ingress-input behavior: merge literals: - INGRESS_HOST=gelcorp.pdcesx03145.race.sas.com - name: sas-shared-config behavior: merge literals: - SAS_SERVICES_URL=https://gelcorp.pdcesx03145.race.sas.com # # This is to fix an issue that only appears in very slow environments. # # Do not do this at a customer site - name: sas-go-config behavior: merge literals: - SAS_BOOTSTRAP_HTTP_CLIENT_TIMEOUT_REQUEST='15m' - name: input behavior: merge literals: - IMAGE_REGISTRY=crcache-race-sas-cary.unx.sas.com
Now let’s rebuild and apply the Viya deployment manifest to apply the new CAS_DISK_CACHE setting.
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_05-042-01.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Restart the cas-shared-gelcorp server so that it is aware of the new CAS_DISK_CACHE configuration.
Since we enabled the state transfer the cas-shared-gelcorp server you have now choices to restart the CAS server.
Choice 1: initiate the state transfer
All loaded tables and active CAS session will be kept.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be incremented by 1.kubectl patch casdeployment shared-gelcorp \ --type='json' \ -p='[{"op": "replace", "path": "/spec/startStateTransfer", "value":true}]'
Choice 2: delete the CAS server pods
All loaded tables and active CAS sessions will be lost.
The
casoperator.sas.com/instance-index
label for all pods of the CAS server will be reset to 0.kubectl delete pod \ --selector="casoperator.sas.com/server==shared-gelcorp"
Quickly switch over to OpenLens and watch what happens to the CAS pods.
If you switch over to OpenLens fast enough you may be able to see the cas-shared-gelcorp pods terminate.
Then you should see all cas-shared-gelcorp pods restart.
When all pods are Running and the containers of all cas-shared-gelcorp pods show green the server is ready to be used.
As one last validation step, run the following command to make sure the CAS server is ready.
kubectl wait pods \ --selector="casoperator.sas.com/server==shared-gelcorp" \ --for condition=ready --timeout 15m
You should see these messages in the output.
pod/sas-cas-server-shared-gelcorp-backup condition met pod/sas-cas-server-shared-gelcorp-controller condition met pod/sas-cas-server-shared-gelcorp-worker-0 condition met pod/sas-cas-server-shared-gelcorp-worker-1 condition met
The cas-shared-gelcorp server is now reconfigured to use the new CAS_DISK_CACHE location.
Let’s validate the new CAS_DISK_CACHE settings by repeating the steps you did at the start of this exercise.
Open SAS Environment Manager, log in as
geladm
, and assume theSASAdministrators
membership.gellow_urls | grep "SAS Environment Manager"
Navigate to the
Servers
page.Right-click on
cas-shared-gelcorp
and select theConfiguration
option.Navigate to the Nodes tab
Choose the
Controller
node, right-click on it, and select theRuntime Environment
option.Scroll through the Environment Variable table until you find the
CAS_DISK_CACHE
variable. You should see that CAS_DISK_CACHE is pointing to the newhostpath
location.
Let’s try using kubectl again to see if we can obtain the CAS_DISK_CACHE information. Remember that earlier the
CASENV_CAS_DISK_CACHE
environment variable was undefined on the CAS controller._CASControllerPodName=$(kubectl get pod \ --selector "casoperator.sas.com/server==shared-gelcorp,casoperator.sas.com/node-type==controller,casoperator.sas.com/controller-index==0" \ --no-headers \ | awk '{printf $1}') echo ${_CASControllerPodName} kubectl exec -it ${_CASControllerPodName} \ -c sas-cas-server \ -- env \ | grep "CAS" \ | grep -v "SAS_"
Do you see the
CASENV_CAS_DISK_CACHE
variable this time?You should see something like this.
CASCONTROLLERHOST=controller.sas-cas-server-shared-gelcorp.gelcorp CASENV_CAS_DISK_CACHE=/casdiskcache/cdc01:/casdiskcache/cdc02:/casdiskcache/cdc03:/casdiskcache/cdc04 CASENV_CAS_VIRTUAL_HOST=controller.sas-cas-server-shared-gelcorp.gelcorp CASBACKUPHOST=backup.sas-cas-server-shared-gelcorp.gelcorp CASENV_CAS_K8S_SERVICE_NAME=sas-cas-server-shared-gelcorp-client CASENV_CONSUL_NAME=cas-shared-gelcorp CASCFG_HOSTKNOWNBY=controller.sas-cas-server-shared-gelcorp.gelcorp CASENV_CAS_VIRTUAL_PATH=/cas-shared-gelcorp-http CASCFG_DQSETUPLOC=QKB CI 33 CASKEY=ce087d3dbd1a5a39abe248a8e9b6a36d488e101550e016531b05b49b46c7687c CASCFG_INITIALBACKUPS=1 CASCFG_DQLOCALE=ENUSA CASCFG_INITIALWORKERS=2 CAS_CLIENT_SSL_CA_LIST=/security/trustedcerts.pem CAS_POD_NAME=sas-cas-server-shared-gelcorp-controller CASCFG_MODE=mpp CASENV_CASDEPLOYMENT_SPEC_ALLOWLIST_APPEND=/cas/data/caslibs:/gelcontent:/mnt/gelcontent/ CASCLOUDNATIVE=1 CASENV_CASDATADIR=/cas/data CASENV_CASPERMSTORE=/cas/permstore CASCFG_GCPORT=5571 CASENV_CAS_VIRTUAL_PROTO=http CASENV_CAS_VIRTUAL_PORT=8777 CASENV_CAS_LICENSE=/cas/license/license.sas
Extra credit: Look at the contents of the CAS_DISK_CACHE inside the Kubernetes nodes file system.
ansible 'sasnode*' \ -b \ -m 'shell' \ -a 'lsof -nP -c cas 2>/dev/null \ | grep "(deleted)" \ | grep -E "casdiskcache|sasnode"' \ | grep -v "| FAILED |" \ | grep -v "non-zero return code"
This allows you to see the blocks of data loaded into the cas-shared-gelcorp server. Remember that the HR tables were automatically reloaded because of the session zero settings you configured earlier.
Click here to see the output
You should see something like…
sasnode03 | CHANGED | rc=0 >> cas 23335 sas 24r REG 8,3 801896 168188636 /casdiskcache/cdc03/casmap_1487_48FEF3E0_0x7f3816287418_801896 (deleted) cas 23335 sas 27r REG 8,3 253016 172597847 /casdiskcache/cdc04/casmap_1487_48FEF4CB_0x7f3816287418_253016 (deleted) cas 23335 sas 28r REG 8,3 3504 164078333 /casdiskcache/cdc02/casmap_1487_48FF1F0F_0x7f380fdf72a8_3504 (deleted) cas 23335 sas 30r REG 8,3 5042208 168188637 /casdiskcache/cdc03/casmap_1487_48FF20BE_0x7f380fcf1008_5042208 (deleted) cas 23335 sas 32r REG 8,3 5052576 172602757 /casdiskcache/cdc04/casmap_1487_48FF2135_0x7f380fdf72a8_5052576 (deleted) sasnode02 | CHANGED | rc=0 >> cas 4201 sas 24r REG 8,3 804200 163935601 /casdiskcache/cdc04/casmap_1490_48FEF3E1_0x7f2d7cc38418_804200 (deleted) cas 4201 sas 27r REG 8,3 253016 152216596 /casdiskcache/cdc01/casmap_1490_48FEF4CC_0x7f2d7cc38418_253016 (deleted) cas 4201 sas 28r REG 8,3 3504 163935611 /casdiskcache/cdc04/casmap_1490_48FF1EFE_0x7f2d7cc38008_3504 (deleted) cas 4201 sas 30r REG 8,3 5052576 152216598 /casdiskcache/cdc01/casmap_1490_48FF20AE_0x7f2d7cc38008_5052576 (deleted) cas 4201 sas 32r REG 8,3 5042208 155530074 /casdiskcache/cdc02/casmap_1490_48FF2157_0x7f2d768232a8_5042208 (deleted) sasnode04 | CHANGED | rc=0 >>
Lesson learned
To relocate CAS_DISK_CACHE from its default emptyDir
volume to a hostpath
volume you will need to:
- Create directories for the CAS_DISK_CACHE on each Kubernetes node used for CAS pods
- Create a
patchTransformer
manifest to- mount the directories to the CAS pods
- configure CAS to use the new CAS_DISK_CACHE location
- Add the
patchTransformer
manifest tokustomization.yaml
- Rebuild and apply your new SASDeployment custom resource to implement the changes
- Restart the CAS server
SAS Viya Administration Operations
Lesson 09, Section 1 Exercise: Configure Compute CPU and Memory
Configure_Compute_Memory_Limits
In its default configuration, the SAS Viya compute pod template specifies that its main container should be limited to using 2GB of memory. SAS Viya’s default value for the MEMSIZE SAS system option is also 2GB.
In this exercise, we will see what happens when you increase MEMSIZE above the compute pod’s memory limit, and then run SAS code which tries to use more memory than that limit: although SAS tries to prevent us from doing so, it is not always successful, and we can cause the compute pod to be killed by Kubernetes.
We will then increase the compute pod’s main container’s memory limit, so that the same SAS code runs successfully.
In this hands-on exercise
- Start a Compute Session in SAS Studio
- See the current value of the MEMSIZE SAS system option
- Create several large datasets and try to load them into memory
- Try to increase memsize in a running compute server
- Create a compute context with increased memsize
- Create several large datasets and try to load them into memory in a compute context with memsize = 4G
- See the memory limits and usage for the sas-programming-environment container
- Increase the sas-compute-job-config Memory limit
- Try to load large datasets into memory in a compute server with a memory limit of 4G
- See the new memory limits and usage for the sas-programming-environment container
Start a Compute Session in SAS Studio
In Chrome, open SAS Studio. If you need the URL, run this and Ctrl + click the link it outputs:
gellow_urls | grep "SAS Studio"
Log in to SAS Studio as:
- username:
Delilah
- password:
lnxsas
- username:
When SAS Studio opens, wait until you have a compute session running under the “SAS Studio compute context”.
See the current value of the MEMSIZE SAS system option
As the documentation explains, the MEMSIZE system option specifies the limit on the total amount of virtual memory that can be used by a SAS session.
Open a new SAS Program tab.
Copy the following into the new SAS Program tab, and run it:
proc options option=memsize; run;
Expected SAS log output:
80 proc options option=memsize; run; SAS (r) Proprietary Software Release V.04.00 TS1M0 MEMSIZE=2147483648 Specifies the limit on the amount of virtual memory that can be used during a SAS session.
Here, we see that MEMSIZE=2147483648 bytes, which is 2 Gigabytes (= 2 * (1024 ^ 3) bytes), the default value in SAS Viya.
So, SAS should prevent your session from using more than 2 Gb memory. Let’s try using more to see what happens.
Create several large datasets and try to load them into memory
In SAS Studio, still signed in as Delilah, still in a compute session under the “SAS Studio compute context”, copy the following code into the new SAS Program tab, and run it:
%let libraryname=work; %let datasetname=bigtable; %let rows=671089; %macro generate(n_rows,n_num_cols,n_char_cols,outdata=test,seed=0); data &outdata; array nums[&n_num_cols]; array chars[&n_char_cols] $; temp = "abcdefghijklmnopqrstuvwxyz"; do i=1 to &n_rows; do j=1 to &n_num_cols; nums[j] = ranuni(&seed); end; do j=1 to &n_char_cols; chars[j] = substr(temp,ceil(ranuni(&seed)*18),8); end; output; end; drop i j temp; run; %mend; %generate(&rows.,100,100,outdata=&datasetname); %generate(&rows.,100,100,outdata=&datasetname.2); %generate(&rows.,100,100,outdata=&datasetname.3); PROC SQL ; TITLE ‘Filesize for &datasetname Data Set’ ; SELECT libname, memname, memtype, FILESIZE FORMAT=SIZEKMG., FILESIZE FORMAT=SIZEK. FROM DICTIONARY.TABLES WHERE libname = upper("&libraryname") AND memname CONTAINS upper("&datasetname") AND memtype = "DATA" ; QUIT ; * Load datasets into memory; sasfile &libraryname..&datasetname load; sasfile &libraryname..&datasetname.2 load; sasfile &libraryname..&datasetname.3 load;
Expected results - each of the datasets created is about 1GB:
Switch to the Log tab on the right hand panel in SAS Studio, and look for the log output from running the three
sasfile ... load;
statements at the end of the program.When the compute server tries to run the three
sasfile &libraryname..&datasetname load;
statements at the end of that program, you should see a NOTE, a WARNING and an ERROR in the SAS program log, something like this:123 * Load datasets into memory; 124 sasfile &libraryname..&datasetname load; NOTE: The file WORK.BIGTABLE.DATA has been loaded into memory by the SASFILE statement. 125 sasfile &libraryname..&datasetname.2 load; WARNING: Only 3950 of 8286 pages of WORK.BIGTABLE2.DATA can be loaded into memory by the SASFILE statement. 126 sasfile &libraryname..&datasetname.3 load; ERROR: File WORK.BIGTABLE3.DATA is damaged. I/O processing did not complete. 127
When the compute server tried to load the three datasets into memory, it successfully loaded the first one, it could only load part of the second dataset, because the dataset is larger than the compute server’s remaining unallocated memory below the MEMSIZE limit. It failed to load the third dataset entirely, possibly due to being out of available memory.
Tip: To find the .sas7bdat files for the three datasets on your cluster, in MobaXterm, in an ssh session to sasnode01 as cloud user, which searches for it on each node, run this:
ansible 'sasnode*' -b -m shell -a 'find /var/lib -type f -name "bigtable*.sas7bdat" -print | xargs --no-run-if-empty sudo ls -al'
Example output is below. The command found all three bigtable dataset files on sasnode02, but they might be on a different node when you run this. It depends where SAS Workload Management decided to start the compute server pod:
sasnode02 | CHANGED | rc=0 >> -rw-r--r-- 1 1283006467 1283006467 1086193664 Jul 31 14:10 /var/lib/kubelet/pods/2e7392e7-4012-48f7-b5ae-7889e7837dcb/volumes/kubernetes.io~empty-dir/viya/tmp/compsrv/default/313c5741-1776-4f36-9a61-2eda79bc23de/SAS_workC168000001A2_sas-compute-server-c0a20423-be04-4455-b754-b8b5fc0dd1f1-33/bigtable2.sas7bdat -rw-r--r-- 1 1283006467 1283006467 1086193664 Jul 31 14:10 /var/lib/kubelet/pods/2e7392e7-4012-48f7-b5ae-7889e7837dcb/volumes/kubernetes.io~empty-dir/viya/tmp/compsrv/default/313c5741-1776-4f36-9a61-2eda79bc23de/SAS_workC168000001A2_sas-compute-server-c0a20423-be04-4455-b754-b8b5fc0dd1f1-33/bigtable3.sas7bdat -rw-r--r-- 1 1283006467 1283006467 1086193664 Jul 31 14:10 /var/lib/kubelet/pods/2e7392e7-4012-48f7-b5ae-7889e7837dcb/volumes/kubernetes.io~empty-dir/viya/tmp/compsrv/default/313c5741-1776-4f36-9a61-2eda79bc23de/SAS_workC168000001A2_sas-compute-server-c0a20423-be04-4455-b754-b8b5fc0dd1f1-33/bigtable.sas7bdat sasnode05 | CHANGED | rc=0 >> sasnode04 | CHANGED | rc=0 >> sasnode01 | CHANGED | rc=0 >> sasnode03 | CHANGED | rc=0 >>
Try to increase memsize in a running compute server
Still in SAS Studio as Delilah, choose New > SAS Program from the menu to open another SAS program window.
In the second SAS Program window, try running this SAS statement:
options memsize=4G;
Note: Possible result: sometimes, the log might contain about four error messages like this:
ERROR: XOB failure detected. Aborted during the COMPILATION phase.
If you see these errors, they indicate that the compute server failed to run SAS Studio’s preamble code. If this happens, your compute server is not in a healthy state. You can fix that by starting a new compute server, and inside it a new compute session, as follows:
Still in SAS Studio as Delilah, choose Options > Reset SAS session, and in the Reset Session prompt, click Reset.
Wait while the new compute session is started in a new compute server. This may take 30 seconds or so.
Try running the same SAS options statement in your second SAS program tab again:
options memsize=4G;
The expected result is a warning, saying you are not allowed to change the memsize in a compute server after it has finished starting up:
80 options memsize=4G; ------- 30 WARNING 30-12: SAS option MEMSIZE is valid only at startup of the SAS System. The SAS option is ignored.
If MEMSIZE can only be changed during the compute server initialization, then by default only SAS Administrators can change the MEMSIZE in a compute server. This is a good thing.
Create a compute context with increased memsize
We will create a compute context with a memsize of 4GB, so that it fully load a 3GB table into memory with ‘room to spare’, because we know that some of its total memory capacity will already be in use for other things.
In Firefox, open SAS Environment Manager. If you need the URL, run this in your MobaXterm session connected to sasnode01 as cloud-user:
gellow_urls | grep "SAS Environment Manager"
Click and drag your mouse pointer to select the SAS Environment Manager URL to the clipboard, then paste it into the address bar in Firefox.
In Firefox, log in to SAS Environment Manager as:
- username:
geladm
- password:
lnxsas
- username:
Navigate to the Contexts page in SAS Environment Manager.
In the Contexts page, from the View menu select Compute contexts.
Right-click the SAS Studio compute context, and select Copy from the popup menu.
Name the copy of the compute context “SAS Studio compute context with memsize 4G”, and on the Advanced tab, paste this into the box labelled “Enter each SAS option on a new line:”
-memsize 4G
The Advanced tab of the new compute context dialog should look like this:
Click Save to save the new compute context. You should see it in the list of compute contexts.
Create several large datasets and try to load them into memory in a compute context with memsize = 4G
Switch back to Chrome.
In Chrome, still in SAS Studio, still signed in as Delilah, click the Reload the page button, or press F5 to reload the web page. Click Reload if prompted in a popup dialog. This will start a new SAS Studio session, but you will still be signed in as Delilah.
Note: The previous compute server and its pod are not terminated right away. They will eventually time out and be terminated.
When the SAS Studio page has reloaded, and a new compute session has finished starting, click on the compute context menu, and choose ‘SAS Studio compute context with memsize 4G’ from the dropdown list:
In the Change Compute Context popup, click Change to confirm.
When the new compute session has started under “SAS Studio compute context with memsize 4G”, run the same proc options statement you ran earlier to see the new value of memsize:
proc options option=memsize; run;
Expected SAS log output:
80 proc options option=memsize; run; SAS (r) Proprietary Software Release V.04.00 TS1M0 MEMSIZE=4294967296 Specifies the limit on the amount of virtual memory that can be used during a SAS session.
Here, we see that MEMSIZE=4294967296 bytes, which is 4 Gigabytes (= 4 * (1024 ^ 3) bytes). This shows that the SAS option you added to the new compute context worked. As far as SAS is concerned, this should be enough to load a 3GB table.
Copy the following code into the new SAS Program tab, and run it. This is the same code you ran earlier:
%let libraryname=work; %let datasetname=bigtable; %let rows=671089; %macro generate(n_rows,n_num_cols,n_char_cols,outdata=test,seed=0); data &outdata; array nums[&n_num_cols]; array chars[&n_char_cols] $; temp = "abcdefghijklmnopqrstuvwxyz"; do i=1 to &n_rows; do j=1 to &n_num_cols; nums[j] = ranuni(&seed); end; do j=1 to &n_char_cols; chars[j] = substr(temp,ceil(ranuni(&seed)*18),8); end; output; end; drop i j temp; run; %mend; %generate(&rows.,100,100,outdata=&datasetname); %generate(&rows.,100,100,outdata=&datasetname.2); %generate(&rows.,100,100,outdata=&datasetname.3); PROC SQL ; TITLE ‘Filesize for &datasetname Data Set’ ; SELECT libname, memname, memtype, FILESIZE FORMAT=SIZEKMG., FILESIZE FORMAT=SIZEK. FROM DICTIONARY.TABLES WHERE libname = upper("&libraryname") AND memname CONTAINS upper("&datasetname") AND memtype = "DATA" ; QUIT ; * Load datasets into memory; sasfile &libraryname..&datasetname load; sasfile &libraryname..&datasetname.2 load; sasfile &libraryname..&datasetname.3 load;
Expected results - each of the datasets created is about 1GB:
However, what happens next seems to vary between two similarly likely alternatives. Sometimes, the program completes with error messages like this, and the compute session continues to run in a sas-compute-server pod which remains running:
119 * Load datasets into memory; 120 sasfile &libraryname..&datasetname load; NOTE: The file WORK.BIGTABLE.DATA has been loaded into memory by the SASFILE statement. 121 sasfile &libraryname..&datasetname.2 load; ERROR: File WORK.BIGTABLE2.DATA is damaged. I/O processing did not complete. 122 sasfile &libraryname..&datasetname.3 load; ERROR: File WORK.BIGTABLE3.DATA is damaged. I/O processing did not complete. 123
On other occasions, the program might not finish before you see this error message in SAS Studio:
If you see SAS program log messages like those above and your SAS compute session continues to work, try simply running the program again. In our experience, it usually does not take many attempts at running the program above in a compute context with MEMSIZE 4G for the compute container to use more than 2G of memory, and for Kubernetes’ OOM killer to kill the sas-compute-server pod. It happens reasonably often on the first attempt.
If, or hopefully when, you see the error dialog in SAS Studio saying SAS Session Problem Detected, Reset the session and wait for the new compute session to start.
See the memory limits and usage for the sas-programming-environment container
In your MobaXterm session connected to sasnode01 as cloud-user, run this to view the resources requested for the sas-programming-environment container in Delilah’s compute server pod:
MY_POD=`kubectl get pods --no-headers -l launcher.sas.com/username=Delilah | awk '{print $1}'` kubectl get pod $MY_POD -o json | jq -r '.spec.containers[] | select (.name=="sas-programming-environment") | .name, .resources '
Note: The first command of the two above gets a list of pods launched on behalf of the user Delilah, and puts the first value of the output line, which is the pod name, in variable called MY_POD. The second command then gets a description of that pod in JSON format, and uses jq to select the pod spec’s sas-programming-environment container, and then pretty-print the container’s name and resources section for easier reading.
Expected output :
sas-programming-environment { "limits": { "cpu": "2", "memory": "2Gi" }, "requests": { "cpu": "50m", "memory": "300M" } }
Run the following command to find out what resources the pod is actually using at the moment in time when you run the command:
MY_POD=`kubectl get pods --no-headers -l launcher.sas.com/username=Delilah | awk '{print $1}'` kubectl top pod $MY_POD
Since at this point in the exercise, your SAS Studio session was recently reset, this compute pod is freshly started and is not likely to be using much memory.
Your turn. Use the what you have learned in this exercise so far to run the SAS program above, and watch the amount of memory your compute server uses while it runs.
Tip: You may also use OpenLens for this, but it appears OpenLens does not always report the amount of memory in use correctly - it sometimes reports twice the memory use that the kubectl command above reports.
Increase the sas-compute-job-config Memory limit
Create a PodTemplate overlay, which defines a higher memory limit for the Compute Server specific
sas-compute-job-config
PodTemplate.# Set Compute Server Memory requests and limits tee ~/project/deploy/$current_namespace/site-config/compute-memory-limits.yaml > /dev/null <<EOF ################################################################################### # Kustomize patch configuration to set the default and max for # memory requests and memory limits to 10% more than the defaults. # # This PatchTransformer will target only the compute server podTemplate, # with name=sas-compute-job-config. This does not include the batch, connect, # or general-purpose programming run-time podTemplates. # # We left a commented-out alternative target in the file, which would select # all launched podTemplates except for the sas-cas-pod-template. ############################################################################### --- apiVersion: builtin kind: PatchTransformer metadata: name: compute-memory-limits patch: |- - op: add path: /metadata/annotations/launcher.sas.com~1default-memory-limit value: 4000M - op: add path: /metadata/annotations/launcher.sas.com~1max-memory-limit value: 4000M target: kind: PodTemplate # labelSelector: sas.com/template-intent=sas-launcher,workload.sas.com/class=compute name: sas-compute-job-config EOF
Make a copy of your
kustomization.yaml
file, so we can see the effect of adding two new lines to it in the next step.cp -p ~/project/deploy/gelcorp/kustomization.yaml ~/project/deploy/gelcorp/kustomization-09-011-01.yaml
Use an script to update your kustomization.yaml to reference the overlay:
# Insert reference to memory limits patch transformer, if it is not already in kustomization.yaml [[ $(grep -c "site-config/compute-memory-limits.yaml" ~/project/deploy/${current_namespace}/kustomization.yaml) == 0 ]] && \ yq4 eval -i '.transformers += ["site-config/compute-memory-limits.yaml"]' ~/project/deploy/${current_namespace}/kustomization.yaml
Run the following command to view the change this yq4 command made to your
kustomization.yaml
. The changes are in green in the right column.icdiff ~/project/deploy/gelcorp/kustomization-09-011-01.yaml ~/project/deploy/gelcorp/kustomization.yaml
Delete the
kustomization-09-011-01.yaml
file, so that it is not inadvertently included in gelcorp-sasdeployment.yaml in the next step.rm ~/project/deploy/gelcorp/kustomization-09-011-01.yaml
Build and Apply using SAS-Orchestration Deploy
Keep a copy of the current
manifest.yaml
file.cp -p /tmp/${current_namespace}/deploy_work/deploy/manifest.yaml /tmp/${current_namespace}/manifest_09-011-01.yaml
Run the
sas-orchestration
deploy command.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
When the deploy command completes successfully the final message should say The deploy command completed successfully as shown in the log snippet below.
The deploy command started [...] The deploy command completed successfully
If the sas-orchestration deploy command fails checkout the steps in 99_Additional_Topics/03_Troubleshoot_SAS_Orchestration_Deploy to help you troubleshoot any problems.
Try to load large datasets into memory in a compute server with a memory limit of 4G
Switch back to Chrome. Sign in to SAS Studio again as Delilah if your session has timed out.
In Chrome, in SAS Studio as Delilah, make sure you have a compute session started under the “SAS Studio compute context with memsize 4G”. You may have to reset your SAS session.
Then copy the following code into a new SAS Program tab, and run it. This is the same code you ran earlier:
%let libraryname=work; %let datasetname=bigtable; %let rows=671089; %macro generate(n_rows,n_num_cols,n_char_cols,outdata=test,seed=0); data &outdata; array nums[&n_num_cols]; array chars[&n_char_cols] $; temp = "abcdefghijklmnopqrstuvwxyz"; do i=1 to &n_rows; do j=1 to &n_num_cols; nums[j] = ranuni(&seed); end; do j=1 to &n_char_cols; chars[j] = substr(temp,ceil(ranuni(&seed)*18),8); end; output; end; drop i j temp; run; %mend; %generate(&rows.,100,100,outdata=&datasetname); %generate(&rows.,100,100,outdata=&datasetname.2); %generate(&rows.,100,100,outdata=&datasetname.3); PROC SQL ; TITLE ‘Filesize for &datasetname Data Set’ ; SELECT libname, memname, memtype, FILESIZE FORMAT=SIZEKMG., FILESIZE FORMAT=SIZEK. FROM DICTIONARY.TABLES WHERE libname = upper("&libraryname") AND memname CONTAINS upper("&datasetname") AND memtype = "DATA" ; QUIT ; * Load datasets into memory; sasfile &libraryname..&datasetname load; sasfile &libraryname..&datasetname.2 load; sasfile &libraryname..&datasetname.3 load;
Expected results - each of the datasets created is about 1GB (the results table is the same as when we saw it earlier).
This time, all three large datasets should be loaded into memory successfully, and the corresponding log messages look like this:
119 * Load datasets into memory; 120 sasfile &libraryname..&datasetname load; NOTE: The file WORK.BIGTABLE.DATA has been loaded into memory by the SASFILE statement. 121 sasfile &libraryname..&datasetname.2 load; NOTE: The file WORK.BIGTABLE2.DATA has been loaded into memory by the SASFILE statement. 122 sasfile &libraryname..&datasetname.3 load; NOTE: The file WORK.BIGTABLE3.DATA has been loaded into memory by the SASFILE statement.
Your SAS compute session should continue to run without issue.
See the new memory limits and usage for the sas-programming-environment container
In your MobaXterm session connected to sasnode01 as cloud-user, run this to view the resources requested for the sas-programming-environment container in Delilah’s compute server pod:
MY_POD=`kubectl get pods --no-headers -l launcher.sas.com/username=Delilah | awk '{print $1}'` kubectl get pod $MY_POD -o json | jq -r '.spec.containers[] | select (.name=="sas-programming-environment") | .name, .resources '
Note: The first command of the two above gets a list of pods launched on behalf of the user Delilah, and puts the first value of the output line, which is the pod name, in variable called MY_POD. The second command then gets a description of that pod in JSON format, and uses jq to select the pod spec’s sas-programming-environment container, and then pretty-print the container’s name and resources section for easier reading.
Expected output - notice that the memory limit is now 4G, instead of 2G as we saw earlier:
sas-programming-environment { "limits": { "cpu": "2", "memory": "4G" }, "requests": { "cpu": "50m", "memory": "300M" } }
Run the following command to find out what resources the pod is actually using at the moment in time when you run the command:
MY_POD=`kubectl get pods --no-headers -l launcher.sas.com/username=Delilah | awk '{print $1}'` kubectl top pod $MY_POD
Example output:
NAME CPU(cores) MEMORY(bytes) sas-compute-server-388f7b0a-07c2-4672-903f-d01a8af37ede-38 1m 3488Mi
The compute server pod is using much more than 2GB memory, and should not be killed by Kubernetes as long as it does not exceed the new, higher limit of 4GB memory that we set.
In this way, you now know how to change both the MEMSIZE and the compute server memory limit, to similar values, so that the SAS compute server can run successfully SAS programs that require more than 2GB of memory. The example program we use in this exercise is contrived, but it served to demonstrate the issues that you will likely see when your SAS program requires more memory than the available limits in SAS and Kubernetes, and how those limits can be adjusted to enable the program to run.
SAS Viya Administration Operations
Lesson 10, Section 0 Exercise: Defining Alerts
In this exercise, we will create a new alert for Prometheus AlertManager by defining a PrometheusRule. The alert will be configured to send an alert notification when metric values meet the condition specified in the rule.
- View metrics in the Prometheus Expression Browser
- Create a rule
- Define a routing tree
- Manage firing alerts
View metrics in the Prometheus Expression Browser
In this step, we will explore the Prometheus UI and examine the metrics that can be queried. PromQL expressions are used to query the metrics collected by Prometheus. These form the basis for alert conditions.
Log on to the Prometheus UI. The URL can be retrieved by running:
gellow_urls | grep "Prometheus"
On the Graph page, PromQL queries can be entered in the Expression box. Metrics can be selected from the Metrics Explorer by clicking the globe icon next to the Execute button.
Select (or find with auto-complete) the container_memory_usage_bytes metric and click Execute.The metric data value is displayed in bytes. View in GB by changing the expression to:
container_memory_usage_bytes / (1024 * 1024 * 1024)
Review the results in the table.
Filter the results again by modifying the query to display SAS Viya containers (i.e. in pods with names beginning with ‘
sas-
’) from thegelcorp
namespace only.container_memory_usage_bytes{container!~"POD",namespace="gelcorp",pod=~"sas-.+"} / (1024 * 1024 * 1024)
Click the Graph button to view the time-series chart. Use the displayed information to answer the following:
- Which container is using the most memory?
- Which pod is it in?
How could the query be modified to display results for intnode03 only?
View the answer
container_memory_usage_bytes{namespace="gelcorp",pod=~"sas-.+",node="intnode03"} / (1024 * 1024 * 1024)
Run the new query from the previous answer. Which container is using the most memory now?
Create a rule
An alert condition can be specified as a PromQL query in a PrometheusRule definition. In this step, define a PrometheusRule to create an alert rule to trigger when the amount of memory consumed by containers inside SAS Viya pods in the gelcorp namespace are more than 20% of total available memory on a Kubernetes cluster node. IMPORTANT: Note that the threshold of 5% is unusually low; this is intentional for this demonstration to ensure the alert fires.
Using PromQL, this condition can be expressed as:
log ((sum by (node) (container_memory_usage_bytes{namespace="gelcorp",pod=~"sas-.+"})) / (sum by (node) (kube_node_status_capacity{resource="memory"})) * 100) > 5
Now create the PrometheusRule to set up the alert.
Create a YAML file to create a new PrometheusRules containing the query (defined in the expr element).
tee ~/PrometheusRule.yaml > /dev/null << EOF apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: prometheus-operator-prometheus role: alert-rules name: prometheus-viya-rules namespace: v4mmon spec: groups: - name: custom-viya-alerts rules: - alert: ViyaMemoryUsage annotations: description: Total SAS Viya namespace container memory usage is more than 5% of total memory capacity. summary: SAS Viya container high memory usage runbook: https://gelgitlab.race.sas.com/GEL/workshops/PSGEL260-sas-viya-4.0.1-administration/-/blob/master/04_Observability/images/runbook.md expr: ((sum by (node) (container_memory_usage_bytes{namespace="gelcorp",pod=~"sas-.+"})) / (sum by (node) (kube_node_status_capacity{resource="memory"})) * 100) > 5 labels: severity: critical EOF
Apply the rule to the namespace.
kubectl create --filename ~/PrometheusRule.yaml -n v4mmon
View the output
prometheusrule.monitoring.coreos.com/prometheus-viya-rules created
Define a routing tree
Firing alerts can send alert notifications to send messages to nominated people to let them know that an alert condition has been met. Routing, which is the process that defines who gets notified and how, is specified in the Alertmanager configuration in the form of a routing tree. A routing tree defines receivers (persons or channels to whom alert notifications are delivered), and routes (the conditions for determining the receiver to which specific alert notifications are sent).
Define a routing tree to send all firing alerts to a receiver called viya-admins-email-alert, and configure this receiver to send alert notifications to the cloud-user’s email address.
Create a YAML file containing the necessary configuration to define the routing of the alert (when it fires) to cloud-user@localhost.com.
tee ~/alertmanager.yaml > /dev/null << EOF global: smtp_smarthost: $(hostname -f):1025 smtp_from: 'alertmanager@gelcorp.com' smtp_require_tls: false resolve_timeout: 5m route: receiver: viya-admins-email-alert group_wait: 30s group_interval: 5m repeat_interval: 12h receivers: - name: viya-admins-email-alert email_configs: - to: cloud-user@localhost.com headers: Subject: 'Prometheus AM Alert Triggered' send_resolved: true require_tls: false EOF
The values defined in global section in this file contain connection information for the default local mail server. The route section does not contain any child routes or any perform any label matching/filtering. This
The Prometheus AlertManager configuration is stored as a secret(alertmanager-v4m-alertmanager) in the v4mmon namespace. In order for the configuration to be updated with the contents of the YAML file, the secret must be updated.
First, encode the YAML with base64 encoding. Run the command below to store the encoded string in a variable.
encodedamcfg=$(cat ~/alertmanager.yaml | base64 -w0)
The resulting encoded string must now be added (in the alertmanager.yaml element) to a new YAML file, alertmanager-secret.yaml, as shown below, in order to update the alertmanager-v4m-alertmanager secret.
The command below inserts the encoded value of the encodedamcfg variable into the new YAML file.
tee ~/alertmanager-secret.yaml > /dev/null << EOF apiVersion: v1 data: alertmanager.yaml: $(echo $encodedamcfg) kind: Secret metadata: name: alertmanager-v4m-alertmanager namespace: v4mmon type: Opaque EOF
Update the secret.
kubectl apply -f ~/alertmanager-secret.yaml -n v4mmon
View the output
Warning: resource secrets/alertmanager-v4m-alertmanager is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically. secret/alertmanager-v4m-alertmanager configured
Check that the configuration has been updated using the
amtool
CLI deployed in the Alertmanager pod.# get alertmanager url amurl=$(gellow_urls | grep "Alert Manager"|awk '{print $6}') # check config kubectl -n v4mmon exec -it alertmanager-v4m-alertmanager-0 -- amtool --alertmanager.url=$amurl config show
The new configuration may take a minute to take effect. When it does, the output will appear as follows, with the new receiver defined:
global: resolve_timeout: 5m http_config: follow_redirects: true smtp_from: alertmanager@gelcorp.com smtp_hello: localhost smtp_smarthost: pdcesx02109.race.sas.com:1025 smtp_require_tls: false pagerduty_url: https://events.pagerduty.com/v2/enqueue opsgenie_api_url: https://api.opsgenie.com/ wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/ telegram_api_url: https://api.telegram.org route: receiver: viya-admins-email-alert continue: false group_wait: 30s group_interval: 5m repeat_interval: 12h receivers: - name: viya-admins-email-alert email_configs: - send_resolved: false to: cloud-user@localhost.com from: alertmanager@gelcorp.com hello: localhost smarthost: pdcesx02109.race.sas.com:1025 headers: From: alertmanager@gelcorp.com Send_resolved: "true" Subject: Prometheus AM Alert Triggered To: cloud-user@localhost.com html: '{{ template "email.default.html" . }}' require_tls: false templates: []
Test the new route using
amtool
by simulating an alert being triggered.kubectl -n v4mmon exec -it alertmanager-v4m-alertmanager-0 -- \ --alertmanager.url=$amurl config routes test -v severity=high amtool
Expected output:
viya-admins-email-alert
Since you now only have one route in the Alertmanager configuration, all alerts, regardless of label values, will be sent to the sole receiver when they begin firing.
Manage firing alerts
Check to see if the alert correctly fires, and that AlertManager sends an email notification when it does. Note that because the threshold for the alert condition was set so low (20%), it will be firing immediately.
In the workshop environment, an email client (Evolution) has been
installed and configured to receive emails sent to
cloud-user@localhost.com.
Open Evolution by launching it directly from MobaXterm on sasnode01.
evolution
Verify that an alert notification email has been sent, and that the ViyaMemoryUsage alert appears in the list of triggered alerts.
Note that the alert notification email displays all firing alerts, because we only defined one route and one receiver in the routing tree, and they become the defaults for all alerts.
Click the link to View in AlertManager at the top of the notification email. This will open the AlertManager UI in your browser.
Expand the list of “Not grouped” alerts and find the ViyaMemoryUsage alert.
Why are there multiple alerts firing for this rule? (Hint: click Info to view additional details.)
Silence the alert (firing for intnode01) by clicking on the Silence button. Set a 48 hour silence for the alert firing for intnode01.
- Enter your name in the Creator field.
- Remove the “node=intnode01” matcher (to silence the alert for any node for which it is firing) by clicking the trashcan icon.
- Enter a comment in the Comment field.
Click Create.
On the silence confirmation page, note that 5 alerts have been silenced (one for each node).
Head back to the Alerts page and verify the silence has taken effect (the alert is no longer firing).
Close the browser tab and the Evolution mail client.
SAS Viya Administration Operations
Lesson 11, Section 0 Exercise: Troubleshoot Issues
If you have not completed the rest of the course please follow the instructions in 01_Introduction/01_901_Fast-Forward_Instructions to run the exercise solutions for Chapters 2 and 3.
Issue 1
Symptoms and Description: Users report that there VA reports are not working and power users indicate they cannot access their data in CAS.
### The Problem
Create the problem
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue001_create.sh"
Open SAS Drive using the generated link and logon as geladm:lnxsas
gellow_urls | grep "SAS Drive"
Navigate to
/ Products / SAS Visual Analytics / Samples
and open the ’Retail Insights` report.What happens?
Troubleshooting
I can fix it!
- Develop a strategy for how you will fix the problem, then try and implement your fix.
Ask me some questions to guide me through the problem identification and resolution
Click here to get to be asked some questions.
- Does the error indicate what the next step should be?
- Is the CAS Server running?
- What is the status of CAS related pods?
- Can you view the logs of the CAS Server?
- Can you view details of the CAS controller pod?
- Can you restart the CAS Server?
Guide me through the process
Click here to get a guided troubleshooting process.
Logon to Environment Manager as geladm:lnxsas. Select Servers. What do you see in relation to the CAS Server?
What Viya services could be the problem? Use kubectl to get the POD’s that are managed by the CAS operator.
kubectl get pod --selector='app.kubernetes.io/instance=default'
Expected output:
log NAME READY STATUS RESTARTS AGE sas-cas-server-default-controller 0/3 Pending 0 8m8s
Looks like the CAS controller is PENDING. A PENDING a pod is waiting to get scheduled on a node, or for at least one of its containers to initialize. Perform a describe on the CAS controller. Review the events section of the output.
kubectl describe pod --selector='app.kubernetes.io/instance=default'
Expected output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m48s default-scheduler 0/5 nodes are available: 5 Insufficient cpu. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod. Warning FailedScheduling 105s default-scheduler 0/5 nodes are available: 5 Insufficient cpu. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
Can you determine the issue from the message. The POD cannot find a node with enough CPU to start the CAS controller. The PENDING status usually means that Kubernetes cannot find a place to start the POD because of resource issues, either disk, memory or CPU.
Have I run out of CPU on my nodes? Doesn’t look like it but in real-life this could be the problem.
kubectl top nodes
Expected output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% intnode01 1830m 22% 39795Mi 62% intnode02 714m 8% 33957Mi 52% intnode03 1133m 14% 24021Mi 37% intnode04 1585m 19% 28259Mi 44% intnode05 2342m 29% 20552Mi 32%
How much CPU is CAS asking for from Kuberenetes? The command below shows the requests setting for each container in the CAS pod. Kubernetes will look for a node to schedule the pod that can meet the sum of the cpu requests and memory requests defined for the pod.
kubectl describe pod --selector='app.kubernetes.io/managed-by=sas-cas-operator' | grep "Requests:" -A3 -B7
The CAS container is requesting too much cpu. To fix it you would have to adjust the requests settings for the CAS server OR make sure you have nodes available that can satisfy the CPU request. This is obviously a problem we created for you. Please proceed to the Fix it section.
Fix it
Run the following script to fix the problem.
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue001_fix.sh"
Issue 2
The Problem
Symptoms and Description: Users report they cannot run a program in SAS Studio
Create the problem
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue002_create.sh"
Open SAS Studio using the generated link and logon as geladm:lnxsas
gellow_urls | grep "SAS Studio"
Can you submit any SAS Code?
Troubleshooting
I can fix it!
- Develop a strategy for how you will fix the problem, then try and implement your fix.
Ask me some questions to guide me through the problem identification and resolution
Click here to get to be asked some questions.
- Can you logon to SAS Studio as an administrator and run a SAS Program? What happens?
- Is there a message that helps you understand what is wrong?
- What log can you check to see what is going on?
- Does the message in the log help you?
- Where will you fix the problem?
Guide me through the process
Click here to get a guided troubleshooting process.
Logon to SAS Studio as geladm:lnxsas.
In a terminal window on sasnode1 find the launcher pod owned by geladm.
kubectl get pod -l launcher.sas.com/requested-by-client=sas.studio,launcher.sas.com/username=geladm
View the log from the pod. Is there any useful information?
kubectl logs -l launcher.sas.com/requested-by-client=sas.studio,launcher.sas.com/username=geladm | klog
Looks like a SAS Session cannot start.
Defaulted container "sas-programming-environment" out of: sas-programming-environment, sas-certframe (init), sas-config-init (init) ERROR 2023-05-17 15:36:29.217 +0000 [compsrv] - ERROR: (SASXKRIN): KERNEL RESOURCE INITIALIZATION FAILED. ERROR 2023-05-17 15:36:29.217 +0000 [compsrv] - ERROR: Unable to initialize the SAS kernel. INFO 2023-05-17 15:36:29.253 +0000 [compsrv] - Request [00000002] >> GET /compute/sessions/6c000a0a-fcfd-40ba-9e55-4cac56ae6dd8-ses0000/state INFO 2023-05-17 15:36:29.253 +0000 [compsrv] - Response [00000002] << HTTP/1.1 200 OK INFO 2023-05-17 15:36:29.408 +0000 [compsrv] - Request [00000003] >> POST /compute/sessions/6c000a0a-fcfd-40ba-9e55-4cac56ae6dd8-ses0000/jobs ERROR 2023-05-17 15:36:29.409 +0000 [compsrv] - The session requested is currently in a failed or stopped state. INFO 2023-05-17 15:36:29.409 +0000 [compsrv] - Response [00000003] << HTTP/1.1 400 Bad Request INFO 2023-05-17 15:36:29.410 +0000 [compsrv] - Header [00000003] << Content-Type: application/vnd.sas.error+json;version=2;charset=utf-8 INFO 2023-05-17 15:36:29.410 +0000 [compsrv] - Header [00000003] << Content-Length: 412 INFO 2023-05-17 15:36:29.410 +0000 [compsrv] - Data [00000003] << {"details":["ERROR: Unrecognized SAS option name YOUBROKEIT.","ERROR: (SASXKRIN): KERNEL RESOURCE INITIALIZATION FAILED.","ERROR: Unable to initialize the SAS kernel."],"errorCode":5113,"errors":[],"httpStatusCode":400,"id":"","links":[],"message":"The session requested is currently in a failed or stopped state.","remediation":"Correct the errors in the session request, and create a new session.","version":2}
A key piece of information from the log is “ERROR: Unrecognized SAS option name YOUBROKEIT.”,“ERROR: (SASXKRIN): KERNEL RESOURCE INITIALIZATION FAILED. SAS Options are set in the SAS Config or SAS Autoexec. In SAS Viya these files are modified with SAS Environment Manager. (See this blog post.)
Sign in to SAS Environment manager as geladm:lnxsas.
- In the vertical navigation bar, select
Configuration
- Using the View: drop-down list, choose
Definitions
and Selectsas.compute.server
- Click the edit button next to Compute service:configuration_options
- Edit the configuration to fix the problem.
- In the vertical navigation bar, select
Test the fix by logging out and logging in again to SAS Studio.
Fix it
Run the following script to fix the problem OR if you fixed it yourself skip this step.
bash -x "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue002_fix.sh"
Issue 3
Symptoms and Description: The Viya administrator is trying to make a configuration change in the environment. The process is failing. Can you help?
The Problem
Create the problem
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue003_create.sh"
Run the
orchestrate deploy
command and review the output.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
What happens?
Troubleshooting
I can fix it!
- Develop a strategy for how you will fix the problem, then try and implement your fix.
Ask me some questions to guide me through the problem identification and resolution
Click here to get to be asked some questions.
- Can you use the
sas-orchestration deploy
command to apply the change? - What message is returned?
- Can you manually build a kubernetes manifest ?
Guide me through the process
Click here to get a guided troubleshooting process.
Run the
orchestrate deploy
command and review the output.cd ~/project/deploy rm -rf /tmp/${current_namespace}/deploy_work/* source ~/project/deploy/.${current_namespace}_vars docker run --rm \ -v ${PWD}/license:/license \ -v ${PWD}/${current_namespace}:/${current_namespace} \ -v ${HOME}/.kube/config_portable:/kube/config \ -v /tmp/${current_namespace}/deploy_work:/work \ -e KUBECONFIG=/kube/config \ --user $(id -u):$(id -g) \ \ sas-orchestration \ deploy --namespace ${current_namespace} \ --deployment-data /license/SASViyaV4_${_order}_certs.zip \ --license /license/SASViyaV4_${_order}_license.jwt \ --user-content /${current_namespace} \ --cadence-name ${_cadenceName} \ --cadence-version ${_cadenceVersion} \ --image-registry ${_viyaMirrorReg}
The output from the
sas-orchestration deploy
command notes “Error accumulating resources” This usually means a problem in the kustomization.yaml file that does not allow the manifest to be built. If you look closely at the message you will see the text overlays/cas-servers: no such file or directory, get: invalid source string: sas-bases/overlays/cas-servers"“.TIP a simple way to test if your kustomization.yaml has errors is to use kustomize to do a manual build and see if it is succesful. Make sure you output the mainfests to a temporary location outside of your project directory.
cd ~/project/deploy/gelcorp kustomize build -o /tmp/site.yaml
Expected output:
Error: accumulating resources: accumulateFile "accumulating resources from 'sas-bases/overlays/cas-servers': evalsymlink failure on '/home/cloud-user/project/deploy/gelcorp/sas-bases/overlays/cas-servers' : lstat /home/cloud-user/project/deploy/gelcorp/sas-bases/overlays/cas-servers: no such file or directory", loader.New "Error loading sas-bases/overlays/cas-servers with git: url lacks host: sas-bases/overlays/cas-servers, dir: evalsymlink failure on '/home/cloud-user/project/deploy/gelcorp/sas-bases/overlays/cas-servers' : lstat /home/cloud-user/project/deploy/gelcorp/sas-bases/overlays/cas-servers: no such file or directory, get: invalid source string: sas-bases/overlays/cas-servers"
The paths in the console output from the
sas-orchestration deploy
command are valid inside the running docker container. The paths in the message we get when we do a manual build with kustomize are the actual source of the files inside our project directory. From the message we can see there is a problem with the reference to /home/cloud-user/project/deploy/gelcorp/sas-bases/overlays/cas-servers . In this case it is a typo, it should be cas-server.To fix the problem edit the kustomization.yaml file and repeat step 1 to run the
orchestration deploy
command. it should now work.
Fix it
Run the following script to fix the problem.
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue003_fix.sh"
Issue 4
Symptoms and Description: SAS jobs submitted in batch are failing to complete successfully. Can you help?
The Problem
Create the problem
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue004_create.sh"
Try running a batch job.
sas-viya batch jobs submit-pgm --pgm /home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/files/code/doWork1mins.sas -c default
What happens?
Troubleshooting
I can fix it!
- Develop a strategy for how you will fix the problem, then try and implement your fix.
Ask me some questions to guide me through the problem identification and resolution
Click here to get to be asked some questions.
- Do interactive jobs work? Can you run code in SAS Studio? What queue do these interactive jobs use?
- At what point do the jobs fail?
- Can you run a job using a different context or queue? Are there any differences?
- Does the status of the job in Jobs tab of SAS Environment Manager’s Workload Orchestrator area provide any clues?
If you have identified the problem you can move onto the Fix it section.
Guide me through the process
Click here to get a guided troubleshooting process.
Check to see if interactive jobs work. First, get the URL for SAS Studio.
gellow_urls | grep "SAS Studio"
Log on
geladm:lnxsas
. Note that you succesfully establish a connection to the SAS Studio compute context.Try running some code:
data work.large_cars; set sashelp.cars; do i = 1 to 1000; /* Replicate the dataset 1000 times */ output; end; run; proc sort data=work.large_cars out=work.sorted_cars; by make model type; run;
After a while, perhaps even before you can execute the code, an error is displayed:
The issue therefore seems to be affecting interactive jobs as well as batch jobs.
Remember that all SAS Compute workloads are submitted as jobs to SAS Workload Orchestrator queues. Check which queue the SAS Studio compute context is using by going to SAS Environment Manager using the Manage Environment link from the navigation menu.
Click on Contexts and select Compute contexts from the dropdown box. Click on the SAS Studio compute context to view its properties.
Note that there is no value for SAS Workload Orchestrator queue, which tells us that this context will send jobs to the default queue.
Investigate the failing batch jobs further. Note that the failing jobs are submitted using the default context, but fail after several seconds. The command does not specify a queue, which means they too are added to the default queue.
If you created an ahdoc queue in an earlier exercise, try submitting jobs to that queue.
sas-viya batch jobs submit-pgm --pgm /home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/files/code/doWork1mins.sas -c default -q adhoc
Do you see the the same behaviour?
Compare the default queue with queues created earlier using the CLI.
sas-viya workload-orchestrator queues list
Is there anything in the queue configurations that may explain the behaviour you are seeing?
Return to SAS Environment Manager and navigate to the Workload Orchestator area’s Jobs page (or use the CLI’s workload-orchestrator plugin to view the jobs).
The failed jobs have a state of
KILLED-LIMIT
. This gives us a clue as to why the jobs are being terminated.Did you see anything about limits in the default queue configuration?
Click on one of the failed job IDs to view more information. Click on the Limits page.
Note that the current value for the maxClockTime resource is greater than the defined maximum value of 10.
This reflects the limit defined in the queue configuration for the default queue seen earlier.
... "queues": [ { "isDefaultQueue": true, "limits": [ { "name": "maxClockTime", "value": 10 } ], "maxJobs": -1, "maxJobsPerHost": -1, "maxJobsPerUser": -1, "name": "default", "priority": 10, "scalingMinJobs": -1, "scalingMinSecs": -1, "tenant": "uaa", "willRestartJobs": false } ],
With this limit defined, all jobs in the queue will fail after 10 seconds.
Fix the problem by removing the limit. In SAS Environment Manager’s Workload Orchestrator area, click on the Configuration tab, and then click Queues.
Expand the default queue, and scroll to the bottom to view the defined limit. Click the trashcan icon to delete the maxClockTime limit (or increase it).
Click the Save button to apply the change.
Try submitting another batch job to validate.
sas-viya batch jobs submit-pgm --pgm /home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/files/code/doWork1mins.sas -c default
This time, the job will finish executing successfully.
Fix it
Run the following script to fix the problem.
bash -c "/home/cloud-user/PSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue004_fix.sh"
Issue 5
Symptoms and Description: CAS Management Service is not available from SAS Environment Manager
The Problem
Create the problem
bash -x "/home/cloud-userPSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue005_create.sh"
Troubleshooting
I can fix it!
- Develop a strategy for how you will fix the problem, then try and implement your fix.
Ask me some questions to guide me through the problem identification and resolution
Click here to get to be asked some questions.
- Does the error indicate what the next step should be?
- What is the status of the CAS control pod?
- What is the status of CAS related pods?
- Can you view the logs of the CAS Server?
- Can you view details of the CAS controller pod?
- Can you restart the CAS Server?
Guide me through the process
Click here to get a guided troubleshooting process.
Logon to Environment Manager as geladm:lnxsas. Select Servers. What do you see in relation to the CAS Server?
The POD that services the CAS Management service is cas-control. Check the status of CAS control.
kubectl get pods -l app=sas-cas-control
Expected output:
log NAME READY STATUS RESTARTS AGE sas-cas-control-69c657fd7c-lsdng 0/1 Running 0 136m
Looks like none of the containers have started. Do a describe of CAS control and review the events section. This shows the POD is not ready but with no reason why.
kubectl describe pod -l app=sas-cas-control
Expected output:
log Warning Unhealthy 3m9s (x317 over 86m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
The next step would be to look at the log of CAS control. Now we have more information, it appears there is no CAS server.
kubectl logs -l app=sas-cas-control | gel_log
Expected output:
log INFO 2024-09-05T16:58:56.764673+00:00 [sas-cas-control]- no ready CAS servers and no shutdown CAS servers found, so cas-control is not ready INFO 2024-09-05T16:59:13.448135+00:00 [sas-cas-control]- no ready CAS servers, so cas-control is not ready INFO 2024-09-05T16:59:13.448172+00:00 [sas-cas-control]- checking for shutdown CAS servers INFO 2024-09-05T16:59:13.520353+00:00 [sas-cas-control]- no ready CAS servers and no shutdown CAS servers found, so cas-control is not ready INFO 2024-09-05T16:59:33.433562+00:00 [sas-cas-control]- no ready CAS servers, so cas-control is not ready INFO 2024-09-05T16:59:33.433598+00:00 [sas-cas-control]- checking for shutdown CAS servers
Use kubectl to get the POD’s that are managed by the CAS operator. Notice the status of sas-cas-server-default-controller shows Init:0/2. In Kubernetes, the status Init:0/2 for a pod indicates that the pod has two init containers, and neither of them has been completed successfully yet. (Init containers perform startup tasks in PODS, which must be completed before the main application containers start). The output here indicates there is a problem starting the CAS server.
kubectl get pod --selector='app.kubernetes.io/instance=default'
Expected output:
log NAME READY STATUS RESTARTS AGE sas-cas-server-default-controller 0/3 Init:0/2 0 93m
For more information perform a describe on the CAS controller. Review the events section of the output.
kubectl describe pod --selector='app.kubernetes.io/instance=default'
Expected output:
log Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 88s (x48 over 82m) kubelet MountVolume.SetUp failed for volume "sas-viya-gelcorp-volume" : mount failed: exit status 32 Mounting command: mount Mounting arguments: -t nfs mynfs.sco.com:/shared/gelcontent /var/lib/kubelet/pods/1ef1388d-2913-4e00-ac29-25c665d7abf6/volumes/kubernetes.io~nfs/sas-viya-gelcorp-volume Output: mount.nfs: Failed to resolve server mynfs.sco.com: Name or service not known
Can you determine the issue from the message. Looks like a mount command is failing. For futher debugging we could check the event log.
kubectl get events | grep sas-cas
Expected output:
log 47s Warning Unhealthy pod/sas-cas-control-69c657fd7c-lsdng Readiness probe failed: HTTP probe failed with statuscode: 503 58s Warning FailedMount pod/sas-cas-server-default-controller MountVolume.SetUp failed for volume "sas-viya-gelcorp-volume" : mount failed: exit status 32...
The message from the kubectl describe provides the best clue to the issue :
mount.nfs: Failed to resolve server mynfs.sco.com: Name or service not known
. The mount of the shared storage inside the POD is failing because the POD cannot access mynfs.sco.com. The events command indicates that the name of the volume that failed is “sas-viya-gelcorp-volume”.
Fix it
The problem for us is one that was created by entering an unknown host as the hosttname of the NFS server. We can fix it by adding the correct hostname. Run the following script to fix the problem.
bash -x "/home/cloud-userPSGEL260-sas-viya-4.0.1-administration/10_Troubleshooting/scripts/issue005_fix.sh"