Separating Cloud and Non-Cloud Functional Tests in PR Workflows

By Yetkin Timocin | Wednesday, September 18, 2024

Functional Testing in Radius

Radius applies the test pyramid to divide the tests into groups for each feature. We have unit tests to check the functionality of individual units, such as a function, and functional tests to check the integration and interaction of multiple components of Radius to ensure they work together as expected.

Functional tests in Radius create resources in Kubernetes clusters that are spun up specifically for the tests and destroyed at the end of each test run. Additionally, some functional tests create and use cloud resources on Azure and AWS, which require sensitive data, such as provider secrets, to be used in the tests. For example, when one of our functional tests creates a Radius environment, that environment needs cloud credentials to create resources on Azure and/or AWS. Some of this sensitive information is stored at the organization level in GitHub, while other sensitive data is kept at the repository level.

Given that we handle sensitive information in the functional tests and the need to run these tests for every pull request, we implemented a process where a Radius maintainer or approver must approve the functional test run. This blog explores on the challenges faced with the functional testing workflow and how we implemented a new workflow to streamline our development process, reduce the bottlenecks in contributing and enhance the overall contributor experience.

Challenges with Functional Testing in Radius

As discussed above, in Radius, we have two types of functional tests: those that create and use cloud resources, and those that do not require any cloud resources. You can learn more about our functional tests from here. For functional tests that create and use cloud resources, we have added several tests that create resources on different clouds, such as Azure and AWS.

One of the most important challenges identified by the Radius development team with the functional testing workflow is the need to validate pull requests from forked repositories for attempts to expose sensitive data, such as cloud credentials, other secrets, or configurations. After an initial review of the pull request, a maintainer or approver of the main repository must approve and initiate the functional test check. If all tests pass, then the pull request can be marked as good-to-go.

This process can sometimes slow down the pull request turnaround (the time it takes from PR creation to merging into the main branch). We knew we needed to improve the efficiency of our pull request process to provide a smoother experience for our contributors. At Radius, we are always striving to enhance the experience for our users and and contributors.

In brief, the challenge was to separate the functional tests that use cloud resources from those that don’t, reducing the number of tests requiring maintainer approval.

Previously, we ran all the functional tests together in a single workflow. That workflow, now renamed to functional-test-cloud.yaml, remains largely the same with a few changes. The most important change, as you can guess, is that now it only runs the functional tests that create and use cloud resources. Before running the functional tests, we need to create the necessary images with the changes introduced in the pull request and push them to a container registry accessible by the host machine created by the workflow. Radius uses GHCR as the container registry and pushes all the images used by the tests there.

Our Solution

Adding the New Workflow

As mentioned above, we ended up renaming our existing workflow to functional-test-cloud and added another one called functional-test-noncloud. The new workflow runs functional tests that don’t use cloud resources without requiring approval from a maintainer or approver of Radius.

In this new workflow, we aimed to eliminate any dependency on cloud resources; everything was designed to run within the host machine and the Kubernetes cluster within that host machine. This meant that we would no longer run our functional tests on an AKS or EKS cluster, nor would we use any resource groups from Azure or any other resource from AWS. Additionally, no repository or organizational level secrets were to be used.

The decision was to use a KinD cluster and a secure Docker registry for uploading the images specific to each run. Each test would create its own KinD cluster and secure Docker registry on the host machine, and after each run, they would be destroyed. This approach ensured that we wouldn’t have any dangling resources in the cloud or leftover images on GHCR. Additionally, we wouldn’t need any secrets for this workflow or approval from a maintainer or approver.

Creating the Secure Docker Registry

Documentation on how to create an unsecured (HTTP) Docker registry is widely available, but there are not a lot of them geared towards creating secure (HTTPS) ones. This user guide on creating a KinD cluster and a local registry is a good place to start if you are experimenting with KinD cluster and Docker registry.

Here are the steps to create a secure Docker registry:

Create a directory for the certificates that you will be generating for the HTTPS (HTTP over TLS) communication.
Create certificates for the Docker registry. You can see how we did this in Radius here.
Add the certificate to the system trust store in the host machine.
If you have a specific registry name, you should add it to /etc/hosts so that it can point to the localhost in the host machine.
Create the secure Docker registry by running docker run command. You need to pass in certificate details to the command.

Creating the KinD Cluster

After setting up the secure Docker registry on the host machine, the next step is to create the Kubernetes cluster for running the functional tests. We chose KinD (Kubernetes in Docker) for managing these clusters. Here is an example of how you can create a KinD cluster:

cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
    - containerPath: "/etc/containerd/certs.d/${{ inputs.registry-name }}"
      hostPath: "${{ inputs.temp-cert-dir }}/certs/${{ inputs.registry-server }}"
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/etc/containerd/certs.d"
EOF

As you can see, the script mounts the directory from the host machine containing certificates into the container at a specified path. These certificates are the certificates of the secure Docker registry. They need to be recognized by the cluster to enable communication between the cluster and the registry.

You can find the details of the action we created, which sets up a KinD cluster with or without a secure Docker registry, here.

After creating the secure Docker registry and connecting it with the KinD cluster, the workflow is ready to run the functional tests.

Summary

As a frequent contributor to Radius, I think separating these tests has made the pull request process smoother for me. I know that I still need approval to kick-start some of the functional tests, but the set of functional tests that require approval is now smaller. This change has significantly reduced the time it takes to get feedback on my pull requests, allowing me to iterate more quickly and efficiently.

Additionally, this separation has made it easier for all contributors to get involved without facing delays. By running non-cloud tests immediately, we can catch issues earlier in the development process. This not only improves the overall quality of the code but also fosters a more collaborative and inclusive environment for all contributors.

We are always looking to improve our process, so please let us know what you think about this addition to Radius.

Learn more and contribute

The Radius maintainers are excited to continue collaborating with the open-source community to grow its feature set and welcome all contributions from the community.

We’re looking for people to join us! To get started with Radius today, please see:

Learn more from the documentation.
Explore the open-source code repositories.
Engage with the community.

References

https://kind.sigs.k8s.io/docs/user/local-registry/

←Previous