Build Your Own Serverless: part 4
In the part-3 of the build your own serverless series, we went over adding sqlite using Gorm, using a new hostname strategy, docker go client instead of exec.Command
, and we added graceful shutdown to clean up resources.
In this post we will be covering advanced topics:
- Support of versioning
- Version Promotion (Canary, A/B & more)
- Environment variables
- Garbage collection of idle containers
Follow along to learn these advanced topics and take our serverless platform to the next level!
Versioning & Environment Variables
Versioning is a powerful capability that will enable us to easily create and track new versions of our Cless containers. Each version will have a clear relationship to our overall service definition, giving us full visibility into what container versions are currently running as well as a history of how our service has evolved over time.
DataStructure Definition:
Previously we put the serverless resource into our ServiceDefinition object, which contained all the necessary attributes to start a container and keep track of it.
Instead we will extract the service container attributes into a new struct named ServiceVersion
and add property EnvVars
as datatypes.JSONSlice[string]
, which will store a container environment variables like ENV=prod
:
Now we need to establish a relationship of has_many
between ServiceDefinition
and ServiceVersion
:
Methods & Persistence:
Now that types has been clearly defined we will need to refactor the isValid
method on ServiceDefinition
and add same method to ServiceVersion
:
Let's extend our repository definition to support persistence of new versions:
In our SqliteServiceDefinitionRepository
we will implement the new function AddVersion
, we use Gorm Associations to make managing relationship easier:
Server API
We need a rest api for managing ServiceVersion
similar to how we did before for ServiceDefinition
, we need a function on ServiceDefinitionManager
cause we don't want to be calling the repository directly from our http server route:
within StartAdminServer
we will add ServiceVersion
routes:
As you can see with minimal refactoring we were able to incorporate versions in cLess, but that is only like one third of the story we still need a way to promote new versions to be served, and that's why we will need a concept for the proxy to know which version to serve.
Version Promotion (Canary, A/B & more):
We will implement powerful versioning capabilities that go beyond just pinning a versionID
to ServiceDefinition
. We can actually split traffic between multiple versions with precise weighting - enabling advanced deployment patterns like canary testing. Rather than being limited to routing 100% of traffic to a single version, we can divide traffic across versions as needed - sending 1% to a new version for incremental testing, 50/50 split to compare A/B, or any ratio we choose. This unlocks fine-grained control over our version rollout strategy.
Traffic Weights:
Let's create a struct TrafficWeight
that will contains traffic split definitions and reference it in ServiceDefinition
:
we will use a has_many
relationship between ServiceDefinition
and TrafficWeight
. This provides full historical traceability of how traffic has been split across versions over time. If a particular traffic split causes issues, we can quickly revert back to a known good distribution. The history of TrafficWeights
gives us an audit trail showing how traffic shifted as new versions rolled out. And it allows us to easily recreate previous traffic splitting strategies. This time-travel capability ensures risk-free rollbacks while freeing us to confidently experiment with different version promotion patterns. The combination of fine-grained traffic splitting and historical TrafficWeights
gives us unparalleled control over our version rollout timeline.
Method for validating TrafficWeight
:
Let's add persistence as well similar to what we did with ServiceVersion
:
To manage these traffic weights we will need a rest api and an AddTrafficWeight
in our ServiceDefinitionManager
:
Now that we have the ability to store and manage traffic weights, let's add the ability to use them. We will implement a method called ChooseVersion
, which will help pick a version from the latest TrafficWeight
of a ServiceDefinition
.
We will see later how to use this method in main.go
.
Container Manager:
With all these changes, minimal refactoring will also be needed in the DockerContainerManager
. Currently it relies on ServiceDefinition
, so instead we will create a struct called ExternalServiceDefinition
, which will help integrate with other modules and keep the admin module loosely coupled with the container module.
Now we just need to replace ServiceDefinition
with ExternalServiceDefinition
in DockerContainerManager
, the change isn't hard, here is an example for GetRunningServiceForHost
:
Main.go
The only thing that will change in our main.go is the http handler:
Environment variables
We have already added the EnvVars
property in ServiceVersion
above, here we will cover how to add it to the container that is going to be started by the container manger:
Not a lot was needed as ServiceVersion
has the same format for env variable as what's required by the go docker client api. And we just do a pass through to the container.Config
.
Garbage Collection of Idle Containers
Since cLess is a serverless platform we should have a way to stop containers that are idle -which haven't received traffic for a specific amount of time; for instance, we can say that if containers haven't received traffic for 2 minutes we can garbage collect them.
for that we need a way to track the last time a RuningService
has been accessed:
the LastTimeAccessed property will keep track of last time the running service served traffic.
We will also need to keep it updated, we will do that in GetRunningServiceForHost
, since that method gets called every time the service gets accessed:
func (cm *DockerContainerManager) GetRunningServiceForHost(host string, version uint) (*string, error) {
.....
cm.mutex.Lock()
defer cm.mutex.Unlock()
rSvc, exists := cm.containers[sExternalDef.GetKey()]
if !exists {
rSvc, err = cm.startContainer(sExternalDef)
if err != nil {
return nil, err
}
}
rSvc.LastTimeAccessed = time.Now()
.........
}
Let's create a method on DockerContainerManager
named garbageCollectIdleContainers
, which will have an infinite loop with a time.Sleep
between runs:
We need to call this method without blocking when we create the singleton DockerContainerManger
:
Testing Time:
For testing all the new features we added to cLess, we will create a simple python flask app:
from flask import Flask
import os
app = Flask(__name__)
@app.route('/')
def hello_world():
gretting = os.environ.get('GREETING', 'cLess')
return f'Hello, {gretting}!'
we will skip the ceremony of packaging the app in a docker file and building the image. In case you need help with this you can get it at the github repo.
we will issue few curl commands to create service, versions:
curl -X POST -H "Content-Type: application/json" \
-d '{"name":"my-python-app"}' \
http://admin.cless.cloud/serviceDefinitions
curl -X POST -H "Content-Type: application/json" \
-d '{"image_name":"python-docker", "image_tag":"latest", "port":8080, "env_vars":["GREETING=Go"]}' \
http://admin.cless.cloud/serviceDefinitions/my-python-app/versions
curl -X POST -H "Content-Type: application/json" \
-d '{"image_name":"python-docker", "image_tag":"latest", "port":8080, "env_vars":["GREETING=Docker"]}' \
http://admin.cless.cloud/serviceDefinitions/my-python-app/versions
curl -X POST -H "Content-Type: application/json" \
-d '{"image_name":"python-docker", "image_tag":"latest", "port":8080, "env_vars":["GREETING=sqlite"]}' \
http://admin.cless.cloud/serviceDefinitions/my-python-app/versions
The difference between all these versions is the env variable GREETING
.
We can get the version IDs by listing versions:
curl -s http://admin.cless.cloud/serviceDefinitions/my-python-app/versions | jq '.[].ID'
6
7
8
We will use these version IDs to create the traffic weight distribution:
curl -X POST -H "Content-Type: application/json" \
-d '{"weights":[{"service_version_id":6, "weight": 10}, {"service_version_id":7, "weight": 30}, {"service_version_id":8, "weight": 60}]}' \
http://admin.cless.cloud/serviceDefinitions/my-python-app/trafficWeights
This will split traffic across 3 versions:
- Version 6 (
GREETING=Go
): 10% of traffic. - Version 7 (
GREETING=Docker
): 30% of traffic. - Version 8 (
GREETING=sqlite
): 60% of traffic.
Let's get the host assigned to this app, so we can test the distribution with a bash command:
curl -s http://admin.cless.cloud/serviceDefinitions/my-python-app | jq '.host'
"app-51.cless.cloud"
for i in {1..100}; do curl -s app-51.cless.cloud >> data.txt; echo "" >> data.txt; done
cat data.txt | sort | uniq -c
36 Hello, Docker!
8 Hello, Go!
56 Hello, sqlite!
We run 100 requests, and we can see that the distribution is pretty close to the traffic weight we set up.
In the logs we can also see the idle containers garbage collector at work:
{"level":"info","time":"2023-08-23T09:29:19-04:00","message":"Garbage collecting idle containers"}
{"level":"info","time":"2023-08-23T09:30:29-04:00","message":"Garbage collecting idle containers"}
{"level":"info","svc key":"2:6","containerID":"ab61f05eb96bcc31c3ce96270744b421aa1a11a9399cdda0fc2422c8ba38a83d","time":"2023-08-23T09:30:29-04:00","message":"Removing idle container"}
{"level":"info","svc key":"2:7","containerID":"7021c36041f51e053c596eb3da0527423fc84e3601fd6c2c23978605e8c7bc44","time":"2023-08-23T09:30:29-04:00","message":"Removing idle container"}
{"level":"info","svc key":"2:8","containerID":"0d1809adbc97bc3c9b5e42eb4187569131946dccd079150939064b1b7935c512","time":"2023-08-23T09:30:29-04:00","message":"Removing idle container"}
and running docker ps
confirm that no idle container is running.
Conclusion:
Versioning:
With versioning, we were able to iteratively improve our cLess platform while maintaining governance over our container portfolio.Upgrading becomes simplified as we can promote validated versions into production with minimal effort. Versioning unlocks new levels of agility, control and audit for our critical cLess applications.
Traffic Weights:
we can now validate new versions with a portion of live traffic before ramping up. Or roll back issues by shifting traffic away from faulty versions. This will give us the advanced tools to implement the versioning workflows that best suit our use case. The end result is lower risk deployments and greater application stability.
Idle Containers:
Now cLess provides an automatic garbage collector for idle containers, unlocking new optimization capabilities. When container instances are inactive for a specified time period, cLess will gracefully scale them to zero. This allows us to run lean - paying only for the compute we need at any given moment. As demand ramps up, cLess will seamlessly spin a new container to server traffic. The end result is maximized efficiency and minimized waste.
The full code is in part-4 of the github repo.
Member discussion