Highly Available MariaDB Galera Cluster in Bare Metal k3s Kubernetes
How Florida SouthWestern State College brought MariaDB Galera to its on-campus bare metal k3s kubernetes cluster.
Written by Ted Tramonte Published:While architecting our first kubernetes cluster at Florida SouthWestern State College (FSW), we had many decisions to make to ensure we finished with a feature filled and production ready environment. Perhaps the biggest decisions we needed to make was how to handle our existing database. At the time, all of FSW's web applications relied on an aging MariaDB Galera cluster. After giving it some thought, we decided that running MariaDB in a container was fundamentally no different from running directly on a host since our backup processes could still reach the data. So, we made the choice to bring up MariaDB Galera in our cluster to provide database services to our applications.
Our goal was to have access to a highly available and replicated MariaDB Galera cluster from inside and outside of our bare metal MetalLB load balanced k3s kubernetes cluster. The installation was simple enough using Bitnami's MariaDB Galera Helm chart. The only thing keeping us from finishing the deployment was exposing the application to our network, which was a problem. FSW's network network is highly utilized and IP reservations are at a premium. Our MetalLB installation was configured to hand out precisely one IP address we were cleared to use. k3s' installation of Traefik had already claimed this IP for its LoadBalancer
. Securing another was potentially impossible. How could we get MariaDB Galera deployed in this uniquely limited scenario?
TL;DR Solution Spoiler
We settled on MetalLB IP address sharing.Attempt #1: Traefik
Since Traefik was already in the cluster, that was our first stop.
I like Traefik a lot, until I have to turn to their documentation. Since Traefik can talk to so many different types of infrastructure, there are pages upon pages of documentation on how to achieve a particular goal. This is aggravated by the transition from Traefik 1.x to Traefik 2.x leaving conflicting statements and Google results laying around. Overall, things are not great for the user who just needs to get things done.
My first instinct was to simply (naively) create an IngressRoute
in the cluster pointing to the ClusterIP
service deployed by the MariaDB Galera Helm chart. That will do the job in almost every case. However, the more networking savvy members of the team pointed out that MariaDB doesn't speak HTTP, only TCP.
Fortunately, Traefik can speak TCP directly with an IngressRouteTCP
. After realizing that Traefik's TCP Routers only have one possible routing rule (HostSNI
) which can only be read from TLS requests, which MariaDB also doesn't speak, but that HostSNI(`*`)
would allow non-TLS rquests to use that router(seriously, look at this madness), traffic was still unable to reach the database from outside of the cluster.
In a moment of clarity, we realized that k3s' Traefik installation ships with two external entrypoints for ports 80 and 443, but none for MariaDB's port 3306. After playing around with different ways of configuring a new entrypoint on k3s' installation of Traefik, we still saw HTTP 404 responses rather than TCP traffic, meaning we were still talking to Traefik rather than MariaDB.
We decided to start investigating other solutions.
Attempt #2: NodePort
NodePort
services are the easiest way to expose an application outside of your cluster. The port you specify is opened on every node in the cluster and mapped to the application's port. Trying this was as easy as modifying our MariaDB Galera values to use a NodePort
rather than ClusterIP
.
While this worked, there was no load balancing. Clients had to connect with a specific NodeIP:port
combination and would always connect to the pod on that node. To reach a different pod, clients had to manually switch IPs, and that's not highly available.
Solution: MetalLB IP Address Sharing
Finally, we turned our attention back to MetalLB. Since our IP address pool is limited, we needed to find a way to reuse the IP that Traefik uses.
According to MetalLB's documentation, it is possible to colocate services on the same IP in situations where certain conditions are met. After reviewing our needs, we decided we met these conditions. In order to start sharing our single IP address between Traefik and MariaDB Galera, we had to set annotations with the same value on their respective services.
Configuring k3s Deployed Traefik for MetalLB IP Sharing
Our first task was to modify k3s' installation of Traefik to have a specific annotation on its LoadBalancer
service. While attempting to configure an IngressRouteTCP
above, I learned that Rancher has abstracted away Helm charts and Helm chart configurations into HelmChart
and HelmChartConfig
CRDs, for better or worse. I prefer to keep things as vanilla as possible, so we decided to leverage HelmChartConfig
to add the necessary annotation.
Let's assume that our single available IP for MetalLB to hand out is 10.0.10.10
. Our HelmChartConfig
would look something like this:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
service:
enabled: true
type: LoadBalancer
annotations: metallb.universe.tf/allow-shared-ip: our-shared-ip
spec:
loadBalancerIP: "10.0.10.10"
The value of the annotation, our-shared-ip
, just needs to be the same across the annotations. As far as I can tell, the reason for this is so that cluster administrators can section off access to sharing specific IPs by using a unique value per IP address. We chose a slightly more descriptive name to better indicate what's being configured to our future selves.
We used kubectl apply
to deploy our configuration and waited for k3s to redeploy Traefik, after which we verified that Traefik's LoadBalancer
service had the proper annotation.
Configuring MariaDB Galera for MetalLB IP Sharing
Next, we needed to modify the values.yaml
providing the configuration for the MariaDB Galera Helm chart. This is a much less involved process, as the Helm chart already accepts values to custom the service resource.
Again, assuming our single available IP for MetalLB to hand out is 10.0.10.10
and our annotation value is our-shared-ip
, our values.yaml
would contain a fragment similar to this:
# ...
service:
type: LoadBalancer
port: 3306
loadBalancerIP: "10.0.10.10"
annotations:
metallb.universe.tf/allow-shared-ip: our-shared-ip
# ...
After modifying our values.yaml
, we could either helm upgrade
our existing installation of MariaDB Galera or tear it down and helm install
from scratch in order to make the change. We opted to start over, since the database had yet to be modified.
Results
Finally, after configuring the annotations on both Traefik and MariaDB Galera, we looked at the services in our cluster. Both applications had services of type LoadBalancer
with external IPs of 10.0.10.10
:
[root@vm /]# kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
kube-system traefik LoadBalancer 10.43.148.64 10.0.10.10 80:31570/TCP,443:30813/TCP 1d
...
mariadb mariadb LoadBalancer 10.43.64.58 10.0.10.10 3306:30260/TCP 1d
...
We then verified that various addresses would resolve how we desired. Requests to dns-for-my-cluster.domain.example:3306
, 10.0.10.10:3306
, and to any nodeIP:30260
combination (a random port assigned by Kubernetes, like NodePort
) were all met with responses from MariaDB Galera. Further, traffic sent to any of those addresses was being routed to any of the available MariaDB Galera pods, giving us high availability no matter how a client was configured. At this point, we were satisfied with our results.
Getting this configured was very much a team effort and I got to experience first hand some quirks of running applications in a kubernetes cluster designed before the containerized microservice paradigm became mainstream. Despite MetalLB pretty plainly mentioning this ability in its documentation, I don't think IP address sharing is a very common configuration scenario, and I hope this post serves to help others see what options they have for deploying MariaDB Galera in a bare metal, k3s flavored, production environment cluster.