The world’s leading publication for data science, AI, and ML professionals.

Expose Kubernetes Volumes Securely Over HTTP: How to Serve PVC on the Internet

Creatie Kubernetes manifests to expose PersistentVolumeClaims.

Photo by Uriel Soberanes on Unsplash
Photo by Uriel Soberanes on Unsplash

Intro

You may have encountered a situation in your daily product development where you needed to get your hands on some persisted files residing in the Kubernetes cluster. One common & safe approach is to do port-forwarding, whether with the help of Kubectl or pure SSH using a bastion host.

In either case, after you’re done with the task, you’d terminate the session, and for every future interaction, you’d go through the same manual process every time.

It might be ideal, security-wise, to keep your environment as sealed as possible, not giving the adversaries any chance & it is a valid reason to keep it like that.

But, if you want long-running exposure to the underlying storage out on the internet, this article is for you.

First Things First: Authentication

As this file server will be exposed publicly to the internet, your first and most important line of defense is the authentication layer. To put that into perspective, a formal definition of authentication is necessary.

Authentication is the act of proving an assertion, such as the identity of a computer system user. [source]

In layperson’s terms, authentication happens when a system user proves he is who he claims to be!

Now that we’ve cleared that let’s dig out some options for integrating authentication into our webserver (further below).

  • Using Nginx or Apache as a proxy, with the help of htpasswd, an Apache tool that allows storing an encrypted username-password pair in a file, which can later be used to verify a given password.
  • Ory Oathkeeper as a proxy, with the help of Kratos, another one of Ory’s products, as the identity provider. This is somewhat more complex than the earlier approach, and it takes some learning curve to master the configuration and the provisioning of those two. I will cover another article later about this, so stay tuned! 😉

Of course, you can add many more to this list, but for the sake of keeping this article short, and honestly, because I don’t know many other solutions, I’ll suffice to the two items above for the moment.

Another point I want to mention here is that since this article is about exposure to the internet, I’m not talking about private network solutions here. Still, you can imagine that will also be one safe option.

Now that we know Ory’s products are not the easiest to provision and that the author is not an authentication expert 😁 let’s keep it simple and go for the first approach.

Create the htpasswd File

htpasswd is quite a simple tool to enforce a Basic authentication mechanism into any platform. It works by receiving a username and a password as input. The result will be a one-way hashed password in a file or standard output that can later be used to verify the user credential. Still, it can not be reversed (de-hashed) to the original password in a reasonable amount of time, at least not in 2023, with our current computing capacity!

To have a simple demonstration, look at the snippet below.

This will only create a new file for a user and tries to verify it with both the correct password and the wrong one.

We will use the same in our "Secure File Server," exposed publicly to the internet.

Reverse Proxy

Unless you want to handle the authentication on the file server layer (I know I won’t), you’ll use a reverse proxy to sit right in front, receiving every traffic and failing all those with the wrong credentials. You may even add another restricting measure, including but not limited to rate-limiting, logging, instrumentation, reporting, etc.

Apache & Nginx can both work with a htpasswd generated file to verify the credential. You can see the links below for more information on each:

I’m sure other web servers are fine, doing the same stuff as the one mentioned here.

In this article, I’m going to go with Nginx & since this will be hosted in Kubernetes, it will be a docker container of Nginx. This allows me to mount any number of config files into the /etc/nginx/conf.d directory, and the Nginx web server process picks it up.

So, if I can mount any config file in the directory, I can write a config file in a Kubernetes ConfigMap, and mount that into the container’s desired directory. This is both powerful and quite flexible.

This is the configuration I’m about to mount into the Nginx container.

The entry namedproxy_passyou see in the configuration file points to the file server that will expose the file system’s directory using the HTTP protocol. More on this in the next section.

Photo by Carl Barcelo on Unsplash
Photo by Carl Barcelo on Unsplash

File Server Over HTTP

There are just a few to mention among the many other static web servers.

Of course, this list can grow more, but we’re trying to keep it short and informative. 😇

In this article, I’ll be using Python’s builtin module: http.server. It has got a straightforward and intuitive interface and makes usage straightforward.

The way you can serve static content with it is as below:

ADDRESS=0.0.0.0 PORT=8000 DIRECTORY=/tmp
python -m http.server -b $ADDRESS -d $DIRECTORY $PORT

This works very well, especially since you don’t need to do a lot of magic to make it work.

Having this web server running and accessible from the Nginx container means you can mount your PersistentVolumeClaims in the static web server & place the Nginx described above right in front of it to gate for unauthenticated access to your precious data in the Kubernetes cluster.

Mount Kubernetes ConfigMap as Volume

Before we wrap it all up into one unified manifest, one last critical piece of information is used in this approach and needs a little explanation. But if you’re already a master on how to mount a ConfigMap as a Volume to a container in Kubernetes, feel free to skip this section.

To mount a Kubernetes ConfigMap as a Volume, you use the projection in the volumes’ section of the container definition like below [source]:

Right at the same level as containers , there is a volumes defined, which can accept a couple of volume types, one being ConfigMap. This allows for defining some script and passing that as volume to the running container.

After creating the manifest above, here’s what the logs will show from the container.

kubectl logs job/demo -c demo
total 0      
drwxr-xr-x    2 root     root          45 Feb  4 06:58 ..2023_02_04_06_58_34.2149416564
lrwxrwxrwx    1 root     root          32 Feb  4 06:58 ..data -> ..2023_02_04_06_58_34.2149416564
lrwxrwxrwx    1 root     root          21 Feb  4 06:58 favorite-color -> ..data/favorite-color
lrwxrwxrwx    1 root     root          16 Feb  4 06:58 names.txt -> ..data/names.txt
red
Alex
Jane
Sam

Wrapping it All Together

Now that we’ve seen all the information piece by piece, it’s time to put them together to serve as a unified manifest, which will be used for one sole purpose: An HTTP serving file server.

The second file, an Ingress resource, is optional but still included since this article was about publicly exposing an HTTP static webserver. You will only get internet exposure if you create the Ingress.

To get another safety measure in place, you can assign a UUID-generated value to your subdomain to avoid having only the username & password as your ONLY security measure. It can be something like this:

415cb00c-5310-4877-8e28-34b05cadc99d.example.com

Otherwise, you’re only as safe as your username & password, and if your brand gives away the username you have assigned, then you’re only as secure as your password, and that’s different from the position you’d like to put yourself in!

Also, remember that you will need HTTPS. You never want to have some random fellow eavesdrop on your connection, watching you transmit precious customer data over the internet.

Photo by Haley Phelps on Unsplash
Photo by Haley Phelps on Unsplash

Conclusion

Since this is Kubernetes we’re talking about, there is no tying to any cloud providers, and you simply apply this practice anywhere with a Kubernetes cluster.

This will effectively mean that you can securely expose your dynamic provisioned persistent volumes to the internet!

Take the "secure" part with a grain of salt since this is not the safest bet, but you can still protect yourself if you assign a random string to the subdomain of your Ingress. That way, the attacker will have to find the URL out of many combinations on the internet, effectively taking many years. By which time, we may have all been gone by then!

Have an excellent rest of the day. Stay tuned, and take care!


Acknowledgment

I hope you find this helpful article. Here’s a list of some of my previous work you might enjoy.

How to Set Up Ingress Controller in AWS EKS

What is HAProxy & how to get the most out of it?

How to Write Your Own GitHub Action

12-Factor App For Dummies

Stop Committing Configurations to your Source Code


References


Related Articles