Nomad on premises, Traefik and high availability

Nomad on premises, Traefik and high availability

When managing a cluster, it is necessary to handle incoming traffic from the internet. In our case, the traffic from a public address is directed towards a load balancer, which happens to be Traefik.

Traefik operates as a solitary instance on a specific machine within the cluster, and the machine can vary depending on the cluster's health. Consequently, we must determine how to link an external ingress to a dynamic IP address, similar to a network load balancer. If you have experience with Kubernetes in an on-premises environment, you may be familiar with MetalLB.

MetalLB, bare metal load-balancer for Kubernetes

The approach we have found is to combine Traefik with a companion keepalived daemon within the same group. If Traefik shuts down and is recreated on a different machine, the keepalived daemon takes responsibility for assigning a floating IP to the machine where it is being executed.

In this scenario, a network of nomad nodes is allocated within the 192.168.8.140-250 range, and a floating IP address of 192.168.8.100 is assigned. This floating IP is subject to DNAT (Destination Network Address Translation) from the firewall for ports 80 and 443. Only one instance of keepalived will be in the master state to handle the management of this floating IP.

While it may not be the primary use case for keepalived, it excels at broadcasting ARP (Address Resolution Protocol) when it becomes active or alive.

Below is a Nomad HCL configuration file for setting up Traefik with Let's Encrypt and a floating IP:

job "traefik" {
  datacenters = ["dc1"]
  type        = "service"

  group "traefik" {

    constraint {
      operator = "distinct_hosts"
      value    = "true"
    }
    
    volume "traefik_data_le" {
      type            = "csi"
      source          = "traefik_data"
      read_only       = false
      attachment_mode = "file-system"
      access_mode     = "multi-node-multi-writer"
    }


    network {
      port "http" {
        static = 80
      }
      port "https" {
        static = 443
      }
      port "admin" {
        static = 8080
      }
    }

    service {
      name     = "traefik-http"
      provider = "nomad"
      port     = "http"
    }

    service {
      name     = "traefik-https"
      provider = "nomad"
      port     = "https"
    }
    
    task "keepalived" {
      driver = "docker"
      env {
        KEEPALIVED_VIRTUAL_IPS = "192.168.8.100/24"
        KEEPALIVED_UNICAST_PEERS = ""
        KEEPALIVED_STATE       = "MASTER"
        KEEPALIVED_VIRTUAL_ROUTES = ""
      }
      config {
        image        = "visibilityspots/keepalived:2.2.7"
        network_mode = "host"
        privileged   = true
        cap_add      = ["NET_ADMIN", "NET_BROADCAST", "NET_RAW"]
      }
    }

    task "server" {
      driver = "docker"

      config {
        image = "traefik:2.10.4"
        network_mode = "host"
        ports = ["admin", "http", "https"]
        args = [
          "--api.dashboard=true",
          "--entrypoints.web.address=:${NOMAD_PORT_http}",
          "--entrypoints.websecure.address=:${NOMAD_PORT_https}",
          "--entrypoints.traefik.address=:${NOMAD_PORT_admin}",
          "--certificatesresolvers.letsencryptresolver.acme.email=email@email",
          "--certificatesresolvers.letsencryptresolver.acme.storage=/letsencrypt/acme.json",
          "--certificatesresolvers.letsencryptresolver.acme.httpchallenge.entrypoint=web",
          "--entrypoints.web.http.redirections.entryPoint.to=websecure",
          "--entrypoints.web.http.redirections.entryPoint.scheme=https",
          "--providers.nomad=true",
          "--providers.nomad.endpoint.address=http://192.168.8.140:4646" ### IP to your nomad server 
        ]
      }
      volume_mount {
        volume      = "traefik_data_le"
        destination = "/letsencrypt/"
      }
    }

  }
}

For keepalived to run, you should allow some CAP in docker plugin config

plugin "docker" {
    config {
        allow_privileged = true
	    allow_caps = [...,"NET_ADMIN","NET_BROADCAST","NET_RAW"]
    }
}

If you have any feedback, please contact us : contact@educ.cloud