Evolution of GitHub Action for Kamal

Right after my first try of Kamal (MRSK) in the spring of 2023, I understood that an ideal use case would be running it as a GitHub Action. Almost a year passed, and my 30-line action has grown and become full-featured, configurable, and reusable. In this post, I will share the evolution of the action and the lessons learned.

Before we start, let me remind you what Kamal is. Kamal is a Ruby library created by 37signals to orchestrate the deployment of Docker containers. Before switching to Kamal, I had a bunch of scripts and technologies to deploy my applications. Kamal allowed me to simplify the deployment process and make it more reliable. Also, there were already some GitHub workflows to run lints and tests. In this article, I will mention only the deployment part.

First Try

In my initial article about Kamal, I already posted the first version of the GitHub Action. It was a simple action that used the ruby/setup-ruby action to install Ruby, then webfactory/ssh-agent to configure SSH agent, then prepared AWS crednetials and then run the kamal envify and kamal deploy command. The code is below.


name: Kamal

on:
  push:
    branches:
      - main

jobs:
  spec:
    uses: ./.github/workflows/specs.yml
  lint:
    uses: ./.github/workflows/lint_code.yml

  build_and_deploy:
    needs: [spec, lint]
    runs-on: ubuntu-latest
    timeout-minutes: 20
    outputs:
      image: ${{ steps.build.outputs.image }}
    env:
      RAILS_ENV: production
    steps:
      - uses: actions/checkout@v3
        with:
          ref: ${{ github.event.workflow_run.head_branch }}

      - uses: webfactory/ssh-agent@v0.7.0
        with:
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}

      - uses: ruby/setup-ruby@v1
        env:
          BUNDLE_GEMFILE: ./kamal/Gemfile
        with:
          ruby-version: 3.2.2
          bundler-cache: true

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver-opts: image=moby/buildkit:master

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id    : ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region           : us-east-1
          mask-aws-account-id  : 'true'

      - name: Login to Amazon ECR
        id  : login-ecr
        uses: aws-actions/amazon-ecr-login@v1

      - name: Kamal Envify
        id  : kamal-envify
        env :
          KAMAL_REGISTRY_PASSWORD: ${{ steps.login-ecr.outputs.docker_password_YOUR_AWS_ACCOUNT_ID_dkr_ecr_YOUR_AWS_REGION_amazonaws_com }}
          DATABASE_URL: ${{ secrets.DATABASE_URL }}
          REDIS_URL: ${{ secrets.REDIS_URL }}
          RAILS_MASTER_KEY: ${{ secrets.RAILS_MASTER_KEY }}
          DOCKER_BUILDKIT: 1
          BUNDLE_GEMFILE: ./kamal/Gemfile
        run: |
          ./bin/kamal envify

      - name: Kamal Deploy
        id: kamal-deploy
        run: |
          ./bin/kamal deploy

There are several things to pay attention to. First, at lines 27-29, I setup SSH agent to be able to connect to the instances and run Docker commands, SSH private key is stored in the GitHub secrets, which is a convenient way to store sensitive data.

On lines 38-41 I setted up Docker Buildx, and it was very important to download Buildkit from the repository master branch. At the moment of spring-winter of 2023, Buildkit didn’t support cache manifests for AWS ECR, and I had to use the master branch to get the feature (Issue #876).

On lines 43-53, I configured AWS credentials and logged in to Amazon ECR. It is not the most secure way to login to AWS, but definitely was the easiest way to do it at that time.

This setup had several drawbacks.

  • The first and the most important, the action was not reusable. To configure deploy to another environment, I had to copy-paste the action completely.

  • Besides that, the action didn’t provide an ability to run deploy without running specs and lints. It was not a big deal, but sometimes I wanted to run deploy separately.

  • The action didn’t provide an ability to run deploy to another branch.

  • Sometimes I wanted to restart Traefik container. This cound be done from the local machine, but for me it would be better to do it from the action.

Action’s File Structure

Together with JetRockets DevOps team we incrementally improved the action, added new features, and made it more configurable. Finally we come the set of actions that can be used in almost any project.

For a better understanding of the changes, let’s look at the directory structure of the actions first.


.github/
├─ workflows/
   ├─ build_deploy/
      ├─ action.yaml
   ├─ pre-build/
      ├─ action.yaml
   ├─ 01.build_deploy_production.yaml
   ├─ 02.build_deploy_staging.yaml
   ├─ 03.database_backup.yaml
   ├─ 04.build_deploy_manually.yaml
   ├─ 05.validate_pull_request.yaml
   ├─ 06.kamal_run_command.yaml
   ├─ _lint.yaml
   ├─ _specs.yaml

You may notice two directories, build_deploy and pre-deploy, both have file action.yaml inside. These are composite actions that include all the necessary steps to build and deploy the application. Also, some workflows are named with a leading underscore. These are reusable workflows that consist of several jobs and steps.

How are Composite Actions different from Reusable Workflows?

  • Composite Actions allow you to bundle multiple existing workflow steps into a single action.
  • A Composite Action cannot be used without a repo checkout while Reusable Workflows can be used without a checkout.
  • A Reusable Workflow can include multiple jobs and multiple steps within those jobs. However, Composite Actions can only have one job.
  • Reusable Workflow can use Secrets by declaring them to a workflow via parameters while Composite Actions cannot use Secrets in a flexible way.

Files with numeric prefixes are main workflow definitions that call the reusable workflows and composite actions. They cover the most common usecases for a modern Rails project: deploy to production and staging, database backup, manual deploy, pull request validation, and Kamal command execution.

Pre Build Action

Lets start with the pre-build action. It is a composite action that includes all the necessary steps to prepare the environment for the build and deploy action. The file is below.


# pre-build/action.yml

name: Pre-Build

inputs:
  database-url:
    type: string
  redis-url:
    type: string
  rails-master-key:
    type: string
  aws_role_access:
    type: string
  ssh_private_key:
    type: string
  environment:
    type: string

runs:
  using: composite
  steps:
    - uses: webfactory/ssh-agent@v0.8.0
      with:
        ssh-private-key: ${{ inputs.ssh-private-key }}

    - uses: ruby/setup-ruby@v1
      env:
        BUNDLE_GEMFILE: ./Gemfile
      with:
        ruby-version: .ruby-version
        bundler-cache: true

    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3

    - name: aws-cred-configure
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume   : ${{ inputs.aws-role-access }}
        role-session-name: samplerolesession
        aws-region       : es-east-1
        mask-aws-account-id: 'true'

    - name: login-to-aws-ecr
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v2
      with:
        mask-password: 'true'

    - name: Kamal Envify
      shell: bash
      id  : kamal-envify
      env :
        KAMAL_REGISTRY_PASSWORD: ${{ steps.login-ecr.outputs.docker_password_AWS_ACCOUNT_ID_dkr_ecr_eu_west_2_amazonaws_com }}
        DATABASE_URL: ${{ inputs.database-url }}
        REDIS_URL: ${{ inputs.redis-url }}
        RAILS_MASTER_KEY: ${{ inputs.rails-master-key }}
        DOCKER_BUILDKIT: 1
      run: |
        ./bin/kamal envify --destination=${{ inputs.environment }}

Lets walk line by line through the file. Lines 5-17 describe the inputs of the action and are not interesting to us. The first step is to setup SSH agent, and it is the same as in the first version of the action. However, as you may see, it uses ssh-private-key from the action inputs, which allows us to use the action in different environments. After that, I setup Ruby, and it is also the same as in the first version of the action.

Docker Buildx setup step differs from the initial workflow definition. It has been updated to the version 3 and now supports AWS ECR image cache out the box, so we don’t need to define driver-opts anymore.

The next step is to configure AWS credentials. It is completely different from what I initially had. Instead of access-key-id and secret-access-key authentication, I switched to the UIDC role-based authentication, which is more secure and GitHub advises to use it. If you need a more detailed explanation of how to configure OpenID Connect in AWS, I suggest you read this excellent guide. After the authentication is done, I login to Amazon ECR on lines 44-48.

The final step is to run kamal-envify, which prepares environment variables for the deployment. The command is the same as in the first version of the action, but I added the --destination flag to the command, which allows me to deploy to different environments.

Build & Deploy Action

The next composite action is defined in build_deploy folder and it is relatively simple.


# build-deploy/action.yml
name: Build & Deploy

inputs:
  environment:
    type: string

runs:
  using: composite
  steps:
    - name: Kamal Deploy
      shell: bash
      id: kamal-deploy
      run: |
        ./bin/kamal deploy --destination=${{ inputs.environment }}

    - name: Kamal Release
      shell: bash
      if: ${{ cancelled() }}
      run: |
        ./bin/kamal lock release --destination=${{ inputs.environment }}

Since all preparations are done in pre-build, when this action starts, I am ready to run the kamal deploy command with the selected environment.

Kamal creates a lock file before starting the build and deployment process. Usually lock is released when deployment is finished, but if the deployment is cancelled, the lock is not released. And next workflow run will fail. To avoid this, I added the kamal lock release command to the action. Later this article, we will use this small hack to handle concurrent deployments correctly.

Workflow Definitions

The main workflow definitions are very simple. They just call the composite action together with reusable workflows and pass the necessary parameters. Below is the example of the 01.build_deploy_production.yaml file.


# 01.build_deploy_production.yaml
name: 01. Build & Deploy Production

permissions:
  id-token: write
  contents: read

on:
  release:
    types: [published]

jobs:
  spec:
    uses: ./.github/workflows/_specs.yaml
    secrets: inherit

  lint:
    uses: ./.github/workflows/_lint_code.yaml
    secrets: inherit

  build_and_deploy:
    name: build-deploy-production
    concurrency:
      group: production_environment
      cancel-in-progress: true
    environment:
      name: production
      url: https://onetribe.team
    needs:
      - spec
      - lint
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.workflow_run.head_branch }}

      - name: Pre Build
        uses: ./.github/workflows/pre-build
        with:
          database-url: ${{ secrets.DATABASE_URL_PRODUCTION_ADMIN }}
          redis-url: ${{ secrets.REDIS_URL_PRODUCTION_ADMIN }}
          rails-master-key: ${{ secrets.RAILS_MASTER_KEY }}
          aws-role-access: ${{ secrets.AWS_ROLE_ACCESS }}
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
          environment: production

      - name: Build & Deploy
        uses: ./.github/workflows/build-deploy
        with:
          environment: production

The concurrency section in lines 23-25 deserves attention in the file. It allows me to run only one deployment at a time. GitHub has a great documentation section that covers all possible use cases. If a deployment is already running, it will be canceled, and the "Kamal Release" step from the previous workflow run will be executed. This is an essential feature because it allows me to avoid concurrent deployments and handle them correctly.

Staging deploy is defined in the 02.build_deploy_staging.yaml file and is similar to production, except for the event that starts the workflow: for staging deploy I use the push event to GIT staging branch, instead of the release.


# 02.build_deploy_staging.yaml
name: 02. Build Staging

permissions:
  id-token: write
  contents: read

on:
  push:
    branches:
      - staging

# ...

In this article I will not cover database backup workflow, defined in the 03.database_backup.yaml file, because it is not related to the theme of the article. However lets look at 04.deploy_manually.yaml, 05.validate_pull_request.yaml, and 06.kamal_run_command.yaml files.

The 04.build_deploy_manually.yaml file is below.


# 04.build_deploy_manually.yaml
name: 04. Deploy Manually

permissions:
  id-token: write
  contents: read

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment'
        required: true
        default: 'staging'
        type: choice
        options:
          - production
          - staging

jobs:
  build-production:
    name: deploy-production
    concurrency:
      group: production_environment
      cancel-in-progress: true
    environment:
      name: production
      url: https://onetribe.team
    if: ${{ github.event.inputs.environment == 'production' }}
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.workflow_run.head_branch }}

      - name: Pre Build
        uses: ./.github/workflows/pre-build
        with:
          database-url: ${{ secrets.DATABASE_URL_PRODUCTION_ADMIN }}
          redis-url: ${{ secrets.REDIS_URL_PRODUCTION_ADMIN }}
          rails-master-key: ${{ secrets.RAILS_MASTER_KEY }}
          aws-role-access: ${{ secrets.AWS_ROLE_ACCESS }}
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
          environment: production

      - name: Build and Deploy
        uses: ./.github/workflows/build-deploy
        with:
          environment: production

  build-staging:
    name: deploy-staging
    concurrency:
      group: staging_environment
      cancel-in-progress: true

    # ...
    # staging deploy is similar to production, described above and I will not show it completely.

Pull request validation is defined in the 05.validate_pull_request.yaml, it is the most small and simple workflow. It runs specs and lints, triggered by the pull_request event and also can be triggered manually.


# 05.validate_pull_request.yaml
name: 05. Validate Pull Request

permissions:
  id-token: write
  contents: read

on:
  pull_request:
  workflow_dispatch:

jobs:
  spec:
    uses: ./.github/workflows/_specs.yaml
    secrets: inherit

  lint:
    uses: ./.github/workflows/_lint_code.yaml
    secrets: inherit

The last workflow that I want to cover in this article is the 06.kamal_run_command.yaml file. Sometimes I need to restart the Traefik container or maybe start or stop accessory containers. I can do it from the local machine, but this requires environment setup and is not always convenient. This workflow allows me to run any command from the list of predefined commands.


name: 06. Kamal run command

permissions:
  id-token: write
  contents: read

on:
  workflow_dispatch:
    inputs:
      command:
        description: 'Commands'
        required: true
        type: choice
        options:
          - traefik reboot --rolling
          - accessory reboot pg_hero
      environment:
        description: 'Environment'
        required: true
        type: choice
        options:
          - staging
          - production

jobs:
  kamal_run_command:
    name: Kamal run command
    runs-on: ubuntu-latest
    timeout-minutes: 20
    concurrency:
      group: ${{ github.event.inputs.environment }}_environment
      cancel-in-progress: false
    environment:
      name: ${{ github.event.inputs.environment }}
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.workflow_run.head_branch }}

      - uses: ./.github/workflows/pre-build
        name: Pre Build
        with:
          database-url: ${{ github.event.inputs.environment == 'production' && secrets.DATABASE_URL_PRODUCTION || secrets.DATABASE_URL_STAGING }}
          redis-url: ${{ github.event.inputs.environment == 'production' && secrets.REDIS_URL_PRODUCTION || secrets.REDIS_URL_STAGING }}
          rails-master-key: ${{ secrets.RAILS_MASTER_KEY }}
          aws-role-access: ${{ secrets.AWS_ROLE_ACCESS }}
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
          environment: ${{ github.event.inputs.environment }}

      - name: kamal ${{ github.event.inputs.command }} --destination=${{ github.event.inputs.environment }}
        run: |
          ./bin/kamal ${{ github.event.inputs.command }} --destination=${{ github.event.inputs.environment }}

Conclusion

The action has grown from a simple 30-line action to a set of reusable workflows and composite actions. It is now full-featured, configurable, and reusable. It allows me to run deploy to different environments, run deploy without running specs and lints, and restart Traefik container and accessories.

I use this or similar setup of workflows for about six months and what can I say? It covers all my needs and can be easily adopted for any new features. I hope this article will help you to build your own action and workflows. If you have any questions, feel free to ask me in the comments.