The Fail Fast Principle
The fail fast principle is a design pattern used in software development to immediately report any exception in an application, rather than trying to continue execution. It aims to immediately detect and propagate failures to prevent localized faults from cascading across system components.
Applying fail fast principles in distributed architectures provides several advantages:
- Localizes failures - Failing components quickly contains issues before they cascade. Failures are isolated to specific services.
- Reduces debugging costs - When processes terminate immediately at the source of errors, it's easier to pinpoint root causes based on crash logs and traces.
- Allows graceful degradation - Services shutting down rapidly allows load balancers to route traffic to healthy nodes. The overall system remains operational (in a degraded mode).
- Improves reliability - By assuming processes can crash anytime, developers build more resilient systems. Failures are handled gracefully.
Practical Examples
Let's consider 3 scenarios where fail fast pattern would be applicable
Failing Fast with Network Calls
Network communication between services is prone to timeouts and failures. Make requests fail fast by setting short timeouts and immediately returning errors:
// Timeout after 100ms
client := &http.Client{Timeout: 100 * time.Millisecond}
resp, err := client.Get("http://remote-service")
if err != nil {
return fmt.Errorf("Request failed: %v", err)
}
This prevents the system from waiting on delayed responses or retrying failed requests that are unlikely to succeed. When you don't set aggressive downstream timeouts your service will keep these connections open and it can exhaust sockets/resources and bring your service to a halt.
Validating Startup Health Checks
Services should check dependent resources like databases during initialization and terminate early if unavailable:
db, err := sql.Open("mysql", "localhost:3306")
if err != nil {
log.Fatal("Failed to connect to database")
}
err = db.Ping()
if err != nil {
log.Fatal("Database unavailable")
}
Failing fast on startup ensures components don't stay up in degraded modes. It also reduces debugging costs and MTTR time if the proper monitoring and alerting is in place.
Securing APIs with Request Validation
APIs should validate headers, auth tokens, and payload before handling requests:
func authenticate(r *http.Request) error {
token := r.Header.Get("Auth-Token")
if token == "" {
return fmt.Errorf("no auth token provided")
}
// Validate token...
return nil
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
if err := authenticate(r); err != nil {
http.Error(w, "authentication failed", 401)
return
}
// Process request
}
Defensive programming with proper request validation is fundamental to secure cloud-native applications. The fail fast principle says to reject bad inputs early before any damage is done.
Best practices
Incorporating fail fast pattern into your software can add some overhead and even make things less stable, so you need to make sure you apply this practice carefully and utilize it for good.
Backoff Strategies
Backoff strategies are important for retry situations when a failed component or service is being restarted. This prevents a thundering herd problem where all clients retry simultaneously and overload the recovering service.
Two common backoff approaches are: