I ran into several issues when trying to create a Fedora CoreOS template on Proxmox using Packer, so here’s how I got it working.

Overview

  • Using Packer, we will boot a VM with a CoreOS live cd ISO attached. We will then run coreos-installer to install CoreOS to a disk attached to that VM, and finally Packer will convert that VM to a template.
  • The Packer VM needs to boot up with qemu-guest-agent installed for Packer to read its IP and run commands on it for further setup.
  • The Packer VM needs SSH public keys for Packer to be able to SSH to it.

Ignition configs

CoreOS uses a tool called Ignition to perform first-boot configuration. While you can write Ignition configurations by hand (it’s JSON after all), Fedora recommends writing your configs in Butane and using the eponymous CLI to convert them to the Ignition format. We need our Ignition/Butane config to do two things:

  1. Run qemu-guest-agent on startup
  2. Create a user with sudo privileges and our SSH public key

Fedora CoreOS does not have a package manager, therefore qemu-guest-agent is not easily obtainable. The only way I could think of was to run it inside a docker container with --network host so that it would report the host VM’s IP address and not the container’s internal one. There is no official docker image for qemu-guest-agent; I opted to use this one. You can either use it (the Dockerfile looks trustworthy) or build your own.

Additionally, I ran into a third thing that’s needed specifically for the Packer VM: docker needs to be given a different data-root. By default, the docker daemon writes data to a subdirectory in /var. However, the qemu-guest-agent docker container proved too large for whatever storage device the ISO mounted at /var. After logging in, I noticed that /tmp had nearly a gigabyte of free space, so I changed the config to configure docker to use /tmp/docker as its data directory specifically for the Packer VM.

All that said, here’s the butane config file for the Packer VM:

variant: fcos
version: 1.4.0
storage:
  files:
    - path: /etc/docker/daemon.json
      mode: 0600
      contents:
        inline: |
          {
            "data-root": "/tmp/docker"
          }          
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3...
      groups:
        - wheel
        - sudo
systemd:
  units:
    - name: qemu-guest-agent.service
      enabled: true
      contents: |
        [Unit]
        Description=Runs qemu guest agent inside docker
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=docker run -d --privileged --network host -v /dev/virtio-ports/org.qemu.guest_agent.0:/dev/virtio-ports/org.qemu.guest_agent.0  eleh/qemu-guest-agent
        [Install]
        WantedBy=multi-user.target        

Next up, one final VM (the one Packer will convert into a template) needs its own Ignition config file. This will be usesd by Ignition on first boot of any VM you create by cloning the template. It is largely identical to the config file used for installation, minus the changes to Docker’s data root.

variant: fcos
version: 1.4.0
storage:
  files:
    - path: /etc/docker/daemon.json
      mode: 0600
      contents:
        inline: |
          {
            "data-root": "/tmp/docker"
          }          
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3...
      groups:
        - wheel
        - sudo
systemd:
  units:
    - name: qemu-guest-agent.service
      enabled: true
      contents: |
        [Unit]
        Description=Runs qemu guest agent inside docker
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=docker run -d --privileged --network host -v /dev/virtio-ports/org.qemu.guest_agent.0:/dev/virtio-ports/org.qemu.guest_agent.0  eleh/qemu-guest-agent
        [Install]
        WantedBy=multi-user.target        

You can convert a butane file to the ignition format by running

butane --pretty --strict config/input.bu > output.ign

Packer build

We’re going to use a few features of Packer to achieve our goal:

  • The Proxmox ISO builder provided by Packer
  • The HTTP server Packer provides to serve our Packer VM’s Ignition config
  • additional_iso_files provided by the Proxmox builder to serve our template’s Ignition config
  • An additional ISO attached to the Packer VM with our template’s Ignition config
  • The Shell provisioner provided by Packer to install CoreOS

Create a directory with the following files:

packer-root
.
|- config
|  |- installer.bu
|  |- template.bu
|- proxmox-coreos.pkr.hcl

In the below config, I’ve embedded secrets inline for readability. You should use Variables to store your secrets (such as your Proxmox token/password) and keep those files private (aka off GitHub). Change the values of fields like the vm_id and iso_file to taste, and note that you will have to manually download that ISO to your Proxmox node first. If your storage pools on Proxmox have different names, you’ll have to change the local to your pool’s name in several places too.

packer {
  required_plugins {
    proxmox = {
      version = ">= 1.1.0"
      source = "github.com/hashicorp/proxmox"
    }
  }
}

source "proxmox" "coreos" {
  // proxmox configuration
  insecure_skip_tls_verify = true
  node = var.proxmox.node
  username = var.proxmox.username
  token = var.proxmox.token
  proxmox_url = var.proxmox.api_url

  # Commands packer enters to boot and start the auto install
  boot_wait = "2s"
  boot_command = [
    "<spacebar><wait><spacebar><wait><spacebar><wait><spacebar><wait><spacebar><wait>",
    "<tab><wait>",
    "<down><down><end>",
    " ignition.config.url=http://{{ .HTTPIP }}:{{ .HTTPPort }}/installer.ign",
    "<enter>"
  ]

  # This supplies our installer ignition file
  http_directory = "config"

  # This supplies our template ignition file
  additional_iso_files {
    cd_files = ["./config/template.ign"]
    iso_storage_pool = "local"
    unmount = true
  }

  # CoreOS does not support CloudInit
  cloud_init = false
  qemu_agent = true

  scsi_controller = "virtio-scsi-pci"

  cpu_type = "host"
  cores = "2"
  memory = "2048"
  os = "l26"

  vga {
    type = "qxl"
    memory = "16"
  }

  network_adapters {
    model = "virtio"
    bridge = "vmbr0"
  }

  disks {
    disk_size = "45G"
    storage_pool = "local-lvm"
    storage_pool_type = "lvm"
    type = "virtio"
  }

  iso_file = "local:iso/fedora-coreos-37.20221106.3.0-live.x86_64.iso"
  unmount_iso = true
  template_name = "coreos-37.20221106.3.0"
  template_description = "Fedora CoreOS"

  ssh_username = "core"
  ssh_private_key_file = "~/.ssh/id_rsa"
  ssh_timeout = "20m"
}

build {
  sources = ["source.proxmox.coreos"]

  provisioner "shell" {
    inline = [
      "sudo mkdir /tmp/iso",
      "sudo mount /dev/sr1 /tmp/iso -o ro",
      "sudo coreos-installer install /dev/vda --ignition-file /tmp/iso/template.ign",
      # Packer's shutdown command doesn't seem to work, likely because we run qemu-guest-agent
      # inside a docker container.
      # This will shutdown the VM after 1 minute, which is less than the duration that Packer
      # waits for its shutdown command to complete, so it works out.
      "sudo shutdown -h +1"
    ]
  }
}

Bringing it all together

Go to packer-root and generate the ignition configs:

butane --pretty --strict config/installer.bu > config/installer.ign
butane --pretty --strict config/template.bu > config/template.ign

Build the template using packer:

packer build --on-error=ask .

Assuming the build succeeds, you should see your new VM template on your Proxmox node. If it fails, check the console of the VM created by Packer for errors. You can also get detailed Packer logs by exporting the following environment variables before running packer build:

export PACKER_LOG_PATH="/tmp/packer.log"
export PACKER_LOG=10

Running packer after exporting these variables will create a detailed log file at /tmp/packer.log.

Alternate approach: static IPs and rpm-ostree

If you don’t want to use an unofficial docker image for qemu-guest-agent, you can assign static IPs to the Packer VM and to the template. Additionally, you can install qemu-guest-agent in the template (but not in the Packer VM because its root filesystem is mounted as read-only) using rpm-ostree.

Static IP for Packer VM

To boot the Packer VM with a static IP, change the boot_command line that starts with ignition.config.url to:

" ignition.config.url=http://{{ .HTTPIP }}:{{ .HTTPPort }}/installer.ign net.ifnames=0",

and change your installer.bu file to:

variant: fcos
version: 1.4.0
storage:
  files:
    - path: /etc/NetworkManager/system-connections/eth0.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=eth0
          type=ethernet
          interface-name=eth0
          [ipv4]
          address1=192.168.1.200/24,192.168.1.1
          dhcp-hostname=k3s-test-controller
          dns=192.168.1.1;
          dns-search=
          may-fail=false
          method=manual          
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCkayHzoWIWE4P1z3+qOoyfdnapU8ATcYUriXDsdGkyncEZpnz4jHqZsp0EVZhtSg668H8+aEDd4RSYHvmprXWZJQe+CUIQRIfazch8mCmlVYpRVqtjms3ya7S6WWl96+jwecEwQf0eDYojFry+S5A8+cZmIZfsQb6PkRr350OxzufH2dii96zS9aIOFz7NiVn/qB+mhyMuicrPqzx0HJjK4t8p2WFMAQsPrFqWwWlX/nDr0xFDmPUZlh4SEhznSB+ai99B0FFsjaHyhlSGBL56Sy0TL3CGXWcaW5kwQhzf9P1n/WK+83j8CLkD/xwxhB5MdhNUWIY7c02QWIeU9RPOU6Y8Qf4sgKpd6/CKROJC/SkBDFpE6MMX24/UejR1PPFP+qwg6XnX2g08gIonfI9tKBTsMAPib2D13ZSUK/QgxmOV33hfbiDPXmyXFeLuzW/GIuP9PWbe6qNYoDL2ZUk/BK3kgLWd4gXtVS3Gtu/DEiw+3kCwjP85VBW0NUx7GbM= amey@ubuntu
      groups:
        - wheel
        - sudo

Static IP for template VM

Change your template.bu file to:

variant: fcos
version: 1.4.0
kernel_arguments:
  should_exist:
    - net.ifnames=0
storage:
  files:
    - path: /etc/NetworkManager/system-connections/eth0.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=eth0
          type=ethernet
          interface-name=eth0
          [ipv4]
          address1=192.168.1.201/24,192.168.1.1
          dhcp-hostname=k3s-test-controller
          dns=192.168.1.1;
          dns-search=
          may-fail=false
          method=manual          
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCkayHzoWIWE4P1z3+qOoyfdnapU8ATcYUriXDsdGkyncEZpnz4jHqZsp0EVZhtSg668H8+aEDd4RSYHvmprXWZJQe+CUIQRIfazch8mCmlVYpRVqtjms3ya7S6WWl96+jwecEwQf0eDYojFry+S5A8+cZmIZfsQb6PkRr350OxzufH2dii96zS9aIOFz7NiVn/qB+mhyMuicrPqzx0HJjK4t8p2WFMAQsPrFqWwWlX/nDr0xFDmPUZlh4SEhznSB+ai99B0FFsjaHyhlSGBL56Sy0TL3CGXWcaW5kwQhzf9P1n/WK+83j8CLkD/xwxhB5MdhNUWIY7c02QWIeU9RPOU6Y8Qf4sgKpd6/CKROJC/SkBDFpE6MMX24/UejR1PPFP+qwg6XnX2g08gIonfI9tKBTsMAPib2D13ZSUK/QgxmOV33hfbiDPXmyXFeLuzW/GIuP9PWbe6qNYoDL2ZUk/BK3kgLWd4gXtVS3Gtu/DEiw+3kCwjP85VBW0NUx7GbM= amey@ubuntu
      groups:
        - wheel
        - sudo

Install qemu-guest-agent using rpm-ostree

Full credit for this approach goes to the author of this blog post. Add the following to the relevant sections of your template.bu file:

storage:
  files:
    - path: /usr/local/bin/install-qemu-guest-agent
      mode: 0755
      contents:
        inline: |
          #!/usr/bin/env bash
          set -euo pipefail

          rpm-ostree install qemu-guest-agent          
systemd:
  units:
    - name: install-qemu-guest-agent.service
      enabled: true
      contents: |
        [Unit]
        After=network-online.target
        Wants=network-online.target
        Before=systemd-user-sessions.service
        OnFailure=emergency.target
        OnFailureJobMode=replace-irreversibly
        ConditionPathExists=!/var/lib/qemu-guest-agent-installed
        [Service]
        RemainAfterExit=yes
        Type=oneshot
        ExecStart=/usr/local/bin/install-qemu-guest-agent
        ExecStartPost=/usr/bin/touch /var/lib/qemu-guest-agent-installed
        ExecStartPost=/usr/bin/systemctl --no-block reboot
        StandardOutput=kmsg+console
        StandardError=kmsg+console
        [Install]
        WantedBy=multi-user.target        

The main terraform provider for proxmox is not very polished. It works, but doesn’t appear to perform any validation of your inputs, and also does a terrible job of communicating errors thrown by proxmox. Consequently, any time you make a mistake in your resource, you’re likely to see an extremely unhelpful message that says:

400 Parameter Validation failed

When this happens, add pm_debug = true to your provider configuration:

provider "proxmox" {
  ...
  pm_debug = true
}

and run TF_LOG_TRACE=TRACE terraform apply to get detailed logs from terraform. The actual problem with your resource will be somethere in the output.


While attempting to build an Ubuntu 22.04 image using Packer, with the build running on a Proxmox VM, I got the following error:

==> proxmox.ubuntu_k3s: Post "https://<ip>/api2/json/nodes/proxmox/storage/local/upload": write tcp <local ip>-><ip>: write: broken pipe
==> proxmox.ubuntu_k3s: delete volume failed: 501 Method 'DELETE /nodes/proxmox/storage/local/content/' not implemented
Build 'proxmox.ubuntu_k3s' errored after 20 milliseconds 690 microseconds: 501 Method 'DELETE /nodes/proxmox/storage/local/content/' not implemented

After a few searches, I found an open issue on GitHub pointing to permissions being lacking for the user configured for Packer. The fix was to add the correct Datastore permission. I probably added too many because I wasn’t sure which one to add, but here’s the set that worked for me:

pveum role modify <role-name-here> -privs "VM.Allocate VM.Clone VM.Config.CDROM VM.Config.CPU VM.Config.Cloudinit VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Monitor VM.Audit VM.PowerMgmt Datastore.AllocateSpace Datastore.Allocate Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Sys.Audit VM.Console"
Tagged:

Recently, I had a hell of a time figuring out how to handle authentication for running Terraform against Google Cloud. Most of Google’s documentation is way more complicated and uses a lot more jargon than the corresponding AWS documentation. Additionally, most of the existing blog posts I could find by other users talked about creating a service account and then downloading its key, and Google recommends against that for security reasons. Instead, they recommend setting up Workload Identity Federation, but all of their documents link to more documents, without actually telling you where to start.

After a few hours of experimentation and searching, I finally have it: you don’t need any provider configuration in Terraform.

For desktop usage

On your desktop, run gcloud auth application-default login and the Google Cloud provider for terraform will pick up whatever it needs from the config file that the gcloud CLI creates. After this, terraform apply will just work.

For GitHub actions

You’re welcome.


Today, I found myself needing to change the master/leader node on my k3s cluster. Or rather, I found myself needing to enable full-disk encryption on the hard disk attached to the master node. After trying and failing to add luks encryption to one of the worker nodes (it wouldn’t finish booting up after) I ended up starting from scratch with a fresh OS installation on all the workers. I didn’t want to do this on the master since that would have (maybe?) forced me to start my cluster from scratch.

If you’re running an HA cluster, you can create a new node, add it to the control plane, and then remove the old master node. I’m running a single-master cluster, so that option was out. The way I ended up achieving this result was by following this sequence of steps:

  1. Attach a new hard disk to the master node
  2. Boot from a live cd and install a fresh copy of the OS to the new hard disk, this time with full-disk encryption
  3. Boot into the new OS
  4. Mount the old hard disk
  5. Copy over the following files and folders from the old disk to the new one:
    1. The k3s binary, usually at /usr/local/bin/k3s
    2. /etc/rancher
    3. /var/lib/rancher
    4. /etc/systemd/system/k3s.service
  6. Once all of the above are copied over, enable and start the service with:
    sudo systemctl enable k3s
    sudo systemctl start k3s
    
  7. Shutdown and remove the old hard disk
  8. Boot up the master node

With this, your cluster should be up and running once more. While I followed these steps for changing the hard disk, if you want to additionally change the node too, just make sure that the new master node has the same hostname and IP address as the old master node.