Recently we ran in some issues when doing a training with several trainees on our SAP Datahub environment that is running on a Kubernetes cluster deployed on Azure. If you do a default deployment of Kubernetes with advanced networking, you end up with a pod limit of 30 pods per node. This is something you need to consider before installation, since it can only be set during initial deployment of the cluster and it cannot be changed afterwards (see this article).
We deployed SAP Datahub on a kubernetes cluster on Azure currently running 3 nodes (8 vcpus, 32 GB memory). For the little exercises we had forseen this should have been plenty, but still we ran into issues. Expanding the resources to 8 nodes temporarily resolved the issue, but soon we were having troubles again. Here is an explanation why.
Every node in the cluster is limited to 30 pods. When starting up a kubernetes cluster, kubernetes itself already starts a number of pods on each node to provide a number of services to the applications that will be deployed onto the cluster:
- network services
- dns services
- proxy services
- kubernets dashboard
- monitoring services
If you look at our 3 node deployment already 25 pods are taken by kuberenetes itself:
Also the datahub installation needs a considrable number of pods running to provide all the core services needed to run the environment. If I look at the pods in the kubernetes “SAPDATAHUB” (name of the namespace we where we deployed our datahub to in kuberenetes), we have 51 pods running for the core system. Actually 49, since 2 pods are there after using the sapdatahub launchpad and using the system management, which I will explain later.
25 pods used by kuberenetes and 49 pods used by means 76 pods of the 90 available on our 3 nodes deployment are already used and I haven’t even started using the application yet. Because,when you launch the Datahub launchpad a pod is started on the cluster. As soon as you start one of the datahub application by clicking on of the tiles, at least one other pods is started on the cluster.
If I start using the Connection Mangement, Meta Data Explorer, Modeler, Vora Tools and system management, you can retrieve these pods back in the Kuberenets dashboard. Using the launchpad and starting up the applications launches pods that are dedicated to the user. This means that another user will spin-up another set of pods dedicated to his user. The initial delays experienced when using the launchpad and applications for the first time as a new user are caused by the spinup of the needed pods on the cluster.
Be aware that the launchpad and application pods stay active on the cluster even if you logout of datahub. A user that already logged in before will reuse the already started pods. You will however notice you have no longer the initial startup delay.
If you start doing data Profiling via the “Meta explorer” or start executing graphs you created in the “Modeler” the demanded process will perform their execution tasks by submitting pods to the cluster. The next screenshot shows the additonal pods launched by starting a profiling job on a dataset.
In the case of the profiling, one coordinator pods is started that will stay active and dedicated to the user, the other pods will end and free the pod allocation on the kubernetes node where they ran.
Once you run into your pod-limit, the pods will no longer startup and start waiting for pod-slots to free up by pods that completed or pod-slots that become available by up-scaling the kubernetes cluster through the addition of a new node. Another way to free up some pods is by deleting application instances for some of the users using the System Management application available via Launchpad. Deleting User instances will also delete the related pod.
To work around the issue we faced during the training, we spinned up some additional nodes, but because of our pod-limits, two training users would completely allocate one VM of 8vcpu, 32 GB, leaving available resources under utilized. When looking at the cpu and memory request the concerned node only 1/3 of the cpu and memory where allocated. If you spun up your kuberenetes cluster with even a more powerful vm, the resource loss becomes even greater.
So when deploying you kubernetes cluster, you should probably consider a higher pod-limit than 30. Azure allows for a maximum pod-limit of 250, while kubernetes doesn’t recommend to go over 100.
Feel free to give any comment or feedback.