Hands-on Kubernetes Operator Development: Reconcile loop
- Introduction & Environment Bootstrap
- Implementing Main Reconcile Logic (this post)
- Implementing Resource Cleanup
- Implementing Webhooks
- Testing Your Operator
In the first part of our series, we introduced the concept of Kubernetes Operators and walked through setting up a new project using Kubebuilder. We ended by defining the structure of our custom resource, the Tenant
, and discussed how this resource will be used to manage multi-tenant environments in a Kubernetes cluster.
In this second part, we're diving deep into the core of our operator - the reconciliation loop.
What is the Reconciliation Loop?
At the heart of every operator is the reconciliation loop. This is a function that observes the current state of the system and compares it to the desired state, as defined by our Tenant custom resource. If the current and desired states differ, the reconcile function makes the necessary changes to bring the system to its desired state.
The reconciliation loop is called every time a watch event is triggered for the operator's primary resources, in our case, the Tenant custom resources.
In order to react to changes to our CRD, the Tenant operator watches events (create, update, delete) on instances of the Tenant CRD. Whenever a Tenant object is created, updated, or deleted in the Kubernetes cluster, an event is fired and our operator's Reconciler gets triggered. This is already taken care of by kubebuilder (in internal/controller/tenant_controller.go):
// SetupWithManager sets up the controller with the Manager.
func (r *TenantReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&multitenancyv1.Tenant{}).
Complete(r)
}
The .For(&multitenancyv1.Tenant{})
specifies the type of resource to watch. This function tells the controller to watch for changes to Tenant resources.
Understanding the Reconcile function
In our project, the main reconciler is represented by the TenantReconciler
struct. The Client
field in this struct is used to read and write Kubernetes objects, and the Scheme
field is used to convert between different API versions.
type TenantReconciler struct {
client.Client
Scheme *runtime.Scheme
}
The Reconcile
function is what the Operator will execute whenever a Tenant object changes in the Kubernetes cluster. This function is where we define how our Operator should react to these events and take corrective measures to ensure the actual state matches the desired state defined in the Tenant object.
The Reconcile
method has the following signature:
// +kubebuilder:rbac:groups=multitenancy.codereliant.io,resources=*,verbs=*
// +kubebuilder:rbac:groups="",resources=namespaces,verbs=*
// +kubebuilder:rbac:groups=rbac.authorization.k8s.io,resources=*,verbs=*
func (r *TenantReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// reconcile implementation
}
Let's dissect the components of this function:
// +kubebuilder:rbac...
: markers for the kubebuilder tool that are used to generate the RBAC rules that the operator needs in order to function. In our case we need access to manage tenants, namespaces and roleBindings.ctx context.Context
: The first parameter is a context, which is commonly used in Go to control the execution of functions that might take some time to complete. This can be used to handle timeouts or cancel long-running tasks.req ctrl.Request
: This request object contains the information about the event that triggered the reconciliation.ctrl.Result
: This is the result that the function returns. It can be used to specify that the function should be requeued and executed again after some time, which can be useful in cases where not all conditions for state transition can be met in a single execution of the function.
It's important to remember that the reconciliation function is idempotent - it can be called multiple times for the same resource, and the result should be the same. This function needs to handle all edge cases that might occur and recover from possible errors.
Reconciliation implementation
As we discussed in the previous post, our controller will create namespaces and roleBindgins based on the Tenant spec. So let's start with implementing the Reconcile function
func (r *TenantReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
tenant := &multitenancyv1.Tenant{}
log.Info("Reconciling tenant")
// Fetch the Tenant instance
if err := r.Get(ctx, req.NamespacedName, tenant); err != nil {
// Tenant object not found, it might have been deleted
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Loop through each namespace defined in the Tenant Spec
// Ensure the namespace exists, and if not, create it
// Then ensure RoleBindings for each namespace
for _, ns := range tenant.Spec.Namespaces {
log.Info("Ensuring Namespace", "namespace", ns)
if err := r.ensureNamespace(ctx, tenant, ns); err != nil {
log.Error(err, "unable to ensure Namespace", "namespace", ns)
return ctrl.Result{}, err
}
log.Info("Ensuring Admin RoleBinding", "namespace", ns)
if err := r.EnsureRoleBinding(ctx, ns, tenant.Spec.AdminGroups, "admin"); err != nil {
log.Error(err, "unable to ensure Admin RoleBinding", "namespace", ns)
return ctrl.Result{}, err
}
if err := r.EnsureRoleBinding(ctx, ns, tenant.Spec.UserGroups, "edit"); err != nil {
log.Error(err, "unable to ensure User RoleBinding", "namespace", ns)
return ctrl.Result{}, err
}
}
// Update the Tenant status with the current state
tenant.Status.NamespaceCount = len(tenant.Spec.Namespaces)
tenant.Status.AdminEmail = tenant.Spec.AdminEmail
if err := r.Status().Update(ctx, tenant); err != nil {
log.Error(err, "unable to update Tenant status")
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
This function attempts to fetch a Tenant instance and, if it exists, loops over each namespace defined in the Tenant spec, ensuring each namespace and corresponding RoleBinding exist. If they do not, the function creates them. Once completed, it updates the Tenant status with the number of namespaces and admin email, mirroring the current state of the resource in the cluster. If at any point an error is encountered during the fetch or update steps, the function will log the error and requeue the request.
Now let's implement the corresponding "ensure" functions, starting with EnsureNamespace
:
const (
tenantOperatorAnnotation = "tenant-operator"
)
func (r *TenantReconciler) ensureNamespace(ctx context.Context, tenant *multitenancyv1.Tenant, namespaceName string) error {
log := log.FromContext(ctx)
// Define a namespace object
namespace := &corev1.Namespace{}
// Attempt to get the namespace with the provided name
err := r.Get(ctx, client.ObjectKey{Name: namespaceName}, namespace)
if err != nil {
// If the namespace doesn't exist, create it
if apierrors.IsNotFound(err) {
log.Info("Creating Namespace", "namespace", namespaceName)
namespace := &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Name: namespaceName,
Annotations: map[string]string{
"adminEmail": tenant.Spec.AdminEmail,
"managed-by": tenantOperatorAnnotation,
},
},
}
// Attempt to create the namespace
if err = r.Create(ctx, namespace); err != nil {
return err
}
} else {
return err
}
} else {
// If the namespace already exists, check for required annotations
log.Info("Namespace already exists", "namespace", namespaceName)
// Logic for checking annotations
return nil
}
Similarly the roleBinding management function:
func (r *TenantReconciler) EnsureRoleBinding(ctx context.Context, namespaceName string, groups []string, clusterRoleName string) error {
// roleBinding management implementation
}
Verifying the Operator
Now that we're done with the first pass implementation let's test it out! With Kubebuilder and Kind it's as simple as executing make run
$ make run
go fmt ./...
go vet ./...
go run ./cmd/main.go
2023-07-06T20:21:45-07:00 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
2023-07-06T20:21:45-07:00 INFO setup starting manager
And the controller is up and running on your local environment. By checking the logs we can see that it's already doing some work, based on the sample Tenant custom resource we created earlier:
2023-07-06T20:21:45-07:00 INFO Reconciling tenant {"controller": "tenant", "controllerGroup": "multitenancy.codereliant.io", "controllerKind": "Tenant", "Tenant": {"name":"tenant-sample"}, "namespace": "", "name": "tenant-sample", "reconcileID": "b6b1864b-7b6e-46ab-baa1-e3e4bc741f67"}
2023-07-06T20:21:45-07:00 INFO Ensuring Namespace {"controller": "tenant", "controllerGroup": "multitenancy.codereliant.io", "controllerKind": "Tenant", "Tenant": {"name":"tenant-sample"}, "namespace": "", "name": "tenant-sample", "reconcileID": "b6b1864b-7b6e-46ab-baa1-e3e4bc741f67", "namespace": "tenant-sample-ns1"}
2023-07-06T20:21:45-07:00 INFO Ensuring Admin RoleBinding {"controller": "tenant", "controllerGroup": "multitenancy.codereliant.io", "controllerKind": "Tenant", "Tenant": {"name":"tenant-sample"}, "namespace": "", "name": "tenant-sample", "reconcileID": "b6b1864b-7b6e-46ab-baa1-e3e4bc741f67", "namespace": "tenant-sample-ns1"}
We can confirm that resources were successfully created:
$ kubectl get namespaces
...
tenant-sample-ns1 Active 28s
tenant-sample-ns2 Active 28s
tenant-sample-ns3 Active 28s
$ kubectl get rolebinding -n tenant-sample-ns1
NAME ROLE AGE
tenant-sample-ns1-admin-rb ClusterRole/admin 28s
tenant-sample-ns1-edit-rb ClusterRole/edit 28s
$ kubectl get tenants
NAME EMAIL NAMESPACECOUNT
tenant-sample admin@yourdomain.com 3
Congrats, your controller is up and running!
Wrapping Up
That concludes the second part of our series of creating a Kubernetes operator from scratch.
But the journey doesn't stop here. There are plenty of aspects we will explore to enhance our operator's functionality and reliability. In the next post we'll talk about cleaning up the resources once the custom resource is deleted.
The code discussed in this blog series can be found in this github repo.
Member discussion