Here at Tomorrow.io, we lean on fantastic open source tools such as JupyterHub to boost scientist productivity and support science and data science applications. Our preferred way to deploy and host a JupyterHub instance that all of our R&D team can use these days is to run a custom flavored “Pangeo cluster” on top of Google Kubernetes Engine, with bells and whistles for providing the team with whatever resources they may need to tackle big weather and climate data problems.
But as an enterprise company, we take security very seriously. So just how do you secure a Pangeo GKE cluster on an untrusted network with no VPN client while allowing your users to authenticate seamlessly using their regular SSO credentials? With Identity Aware Proxy of course!
Identity Aware Proxy (IAP) is a Google Cloud specific solution that allows for authentication via OAuth to HTTPS or SSH/TCP resources. Here I am using IAP in two contexts:
- The first is to secure the Pangeo cluster with IAP from the ground up using an open source Terraform module that creates a locked down cluster. This cluster relies on an IAP-enabled bastion host with an HTTPS proxy to connect to kubectl.
- The second is to secure Pangeo itself with IAP, which requires offloading to an external HTTPS load balancer secured with a Google Managed Certificate.
The end result is an authentication screen that prompts users for credentials before they view the HTTPS-secured Pangeo landing page.
Hardened Private GKE Cluster
The “Safer Cluster” follows Google’s recommendations for hardening their clusters found here. There were three main consequences of using this secured private cluster.
The first consequence of the secure cluster was using kubectl and helm through the proxy which is documented on the Terraform module page. I had to be logged in through the gcloud utility (which is the IAP auth) as well as keep a constant SSH connection open to the bastion host in one of my terminal screens as I worked. I also had to set an environment variable in my terminal where I used kubectl.
export HTTPS_PROXY=localhost:8888
Once I did that, I could use kubectl and helm as usual, although I found that intermittently it would fail with a connection error. I resolved this by closing and reopening the SSH connection to the bastion host. One caveat here was that I found that this environment variable interfered with other proxies such as the gcloud authentication login command. I resolved this by unsetting the environment variable.
The second consequence of using the secure cluster was the hardened Pod Security Policy. The advantage here is that no privileged pods could be created. However when I naively used Helm to install pangeo without understanding this, services could not launch pods at all, even unprivileged ones. I had to create a Pod Security Policy that allowed the creation of unprivileged pods, then create a cluster role and cluster role binding that permitted all authenticated users to access this policy.
Podsecuritypolicy.yaml apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted spec: privileged: false # Don't allow privileged pods! # The rest fills in some required fields. seLinux: rule: RunAsAny supplementalGroups: rule: RunAsAny runAsUser: rule: RunAsAny fsGroup: rule: RunAsAny volumes: - '*' kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: psp:restricted rules: - apiGroups: - extensions resources: - podsecuritypolicies resourceNames: - restricted # the psp we are giving access to verbs: - use --- # This applies psp/restricted to all authenticated users kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: psp:restricted subjects: - kind: Group name: system:authenticated # All authenticated users apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: psp:restricted # A references to the role above apiGroup: rbac.authorization.k8s.io
Then services were able to create pods as usual.
The third main consequence of using the Hardened cluster is that it does not work with a SaaS continuous deployment provider that goes over the public internet such as Terraform Cloud. Configuring CD for this setup is outside the scope of this document.
Securing the Pangeo Application with IAP
For this, I relied in part on Google’s documentation, however much of it was not included in this doc. There is a blog post from DoiT around securing Vault that includes helpful information on IAP about halfway down.
Summary of the steps for enabling IAP for Pangeo:
These first five steps are documented in Google’s documentation
- Create an OAuth consent screen
- Create OAuth credentials with a client ID and secret
- Enable IAP for the Google project
- Create a Kubernetes secret with the OAuth credential client ID and secret
- Create a Backend Config referencing this Kubernetes secret
These next steps were not part of the Google documentation but were validated by an expert from DoiT:
- Create a global static IP address (not regional)
- Create a DNS record pointing to the IP address
- Configure the Pangeo proxy-public service to be of type NodePort with an annotation referencing the Backend Config
- Create a Google Managed Certificate
- Create an Ingress with an annotation referencing the Google Managed Certificate and the global static IP address
- Validate the domain ownership with Google Webmaster Tools
- Once the domain is validated, delete and recreate the ingress and the certificate
- Add Users as the IAP secured web users for the Ingress
These steps were specific to configuring Pangeo for working with IAP and https:
- Configure the Dask Gateway Address to point to the internal service, not the external domain
- Ensure that both Dask (>=2.19.0) and Bokeh (>=2.1.1) are up to date in the image
Create a global static IP address
Follow https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address and make a note of the name of the IP address so it can be associated with the Ingress.
Create a DNS record pointing to the IP address
Follow https://cloud.google.com/dns/docs/records/ to create a DNS record that points to the static IP address created above.
Configure the Pangeo proxy-public service
This should be done with Helm. Enable https with type set to offload and hosts set to an array with your domain. Pass in NodePort as the proxy type and the annotation referencing the IAP Backend Config.
pangeo: jupyterhub: proxy: https: enabled: true type: offload hosts: - pangeo.example.com service: type: NodePort labels: {} annotations: beta.cloud.google.com/backend-config: '{"default": "iap-backend-config"}'
Create a Google Managed Certificate
I added this to our helm deployment:
—
apiVersion: networking.gke.io/v1beta1
kind: ManagedCertificate
metadata:
name: {{ .Values.cert_name }}
spec:
domains:
- {{ .Values.domain }}
Be aware that the certificate takes some time to provision as does the ingress, they will not work instantly when they have been deployed.
Create an Ingress
At first I used the Ingress YAML file that came with Jupyterhub to create the Ingress, however the path is hard coded into the Helm template file which causes a 404 when it comes up. Therefore I created my own Ingress that went in as a Helm template:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: pangeo-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: {{ .Values.static_ip_name }}
networking.gke.io/managed-certificates: {{ .Values.cert_name }}
spec:
backend:
serviceName: proxy-public
servicePort: 80
Validate the domain ownership
Navigate to https://www.google.com/webmasters/verification/home?hl=en and click “Add a Property”. Enter the DNS you created. Click “Alternate Methods” and Domain Name Provider. Select Google Domains. It will give you a TXT record to add to your DNS. Add the record as documented in https://cloud.google.com/dns/docs/records/ . Wait a minute or two then click “Verify”. If it doesn’t go through right away wait another few minutes and try again.
Recreate the ingress and the certificate
I experienced an issue with the Certificate in which it was stuck in Provisioning state after I changed the IP address it pointed to and verified the domain. Deleting the Ingress and the Certificate and recreating them cleared this up.
Add Users as IAP Tunnel Users
In the Identity Aware Proxy screen under Security you should see your Ingress with IAP turned on. You need to select this Ingress and in the side panel click “Add Member”. This could be any user or group you choose. If you forget this step your users will be unable to log in and will get a denial screen.
Configure the Dask Gateway Address
The Dask Gateway should not point to the external domain (in our case pangeo-dev.research.www.tomorrow.io) but to the internal service ( in our case traefik-www.tomorrow.io-prod-dask-gateway.www.tomorrow.io-prod , where Tomorrow.io-prod is the namespace ) so that it bypasses the IAP challenge.
singleuser:
extraEnv:
# DASK_GATEWAY__ADDRESS: "https://pangeo-dev.research.www.tomorrow.io/services/dask-gateway/" # Wrong
DASK_GATEWAY__ADDRESS: "http://traefik-www.tomorrow.io-prod-dask-gateway.www.tomorrow.io-prod/services/dask-gateway" # Right
To determine your service’s url, take the service name of your Dask Gateway and combine it with the namespace it is running in.
Ensure that both Dask and Bokeh are up to date
During the course of building the Docker images for Kubernetes we ran into two issues that were resolved by upgrading the versions. The first was a breaking change with Dask in 2.19.0 documented here. The second was a bug that caused the dashboard to give intermittent 404s introduced in Bokeh 2.1.0 but resolved in 2.1.1 and documented here.
Now you know how IAP can be leveraged from the group up to create a secure Kubernetes environment for Pangeo, while preserving easy access for your users.
How do you secure your Pangeo cluster?