StatefulSets & DaemonSets
Managing stateful applications and node-level daemons in Kubernetes
Introduction to StatefulSets
StatefulSets are specialized workload resources in Kubernetes designed for applications that require stable networking, persistent storage, and ordered, graceful deployment and scaling. Unlike Deployments, which treat pods as interchangeable, StatefulSets maintain a sticky identity for each pod, ensuring predictability in distributed systems.
StatefulSets are ideal for applications such as:
- Distributed databases (MongoDB, Cassandra, MySQL)
- Message brokers with persistent storage (Kafka, RabbitMQ)
- Applications requiring stable network identifiers
- Systems that need ordered scaling, updates, or rollbacks
The StatefulSet controller creates pods with predictable naming patterns (e.g., app-0
, app-1
, app-2
) and ensures stable network identities through headless services, making them suitable for clustered applications where peers need to discover each other.
StatefulSet Core Concepts
Pod Identity
- Stable network identity: Each pod maintains the same hostname and DNS record across rescheduling
- Ordinal indices: Pods are named with sequential indices (0, 1, 2...) that never change
- DNS entries: Each pod gets a DNS record:
pod-name.service-name.namespace.svc.cluster.local
- Sticky identity: Pods maintain their identity even after being rescheduled
- Predictable naming: Facilitates peer discovery and leader election
- Service targeting: Allows targeting specific instances by ordinal index
Ordered Deployment
- Sequential creation: Pods are created one at a time in order (0, 1, 2...)
- Ordered readiness: Each pod must be running and ready before next one starts
- Predictable bootstrapping: Important for databases that need primary node first
- Failure handling: If any pod fails to become ready, subsequent creations are blocked
- Manual intervention: May require troubleshooting when deployment stalls
- Initialization control: Ensures proper cluster formation for distributed systems
Ordered Scaling
- Controlled scale-up: Pods are created in ascending order (0, 1, 2...)
- Controlled scale-down: Pods are terminated in descending order (N, N-1...)
- Safety guarantees: Prevents split-brain scenarios in clustered applications
- Data consistency: Helps maintain consensus in distributed systems
- Replica management: Total ordering of pod lifecycle operations
- Predictable behavior: Critical for stateful applications with peer awareness
Persistent Storage
- Stable storage: Each pod gets its own PersistentVolumeClaim
- Storage retention: PVCs are preserved when pods are rescheduled
- Volume templates: Automatically create PVCs for each pod
- Storage identity: PVCs maintain a 1:1 relationship with pods
- Deletion protection: Storage survives pod termination by default
- Data persistence: Critical for databases and other stateful workloads
StatefulSet YAML Definition
This StatefulSet will create three MongoDB pods (mongodb-0
, mongodb-1
, mongodb-2
) with dedicated persistent storage for each pod.
StatefulSet Lifecycle
The StatefulSet controller manages the full lifecycle of the pods it creates:
- Creation
- Pods are created in strict sequential order, starting with index 0
- Each new pod waits for the previous pod to be running and ready
- If a pod fails to become ready, creation of subsequent pods is blocked
- Updates
- With RollingUpdate strategy, pods are updated in reverse order, starting with the highest index
- Each pod is updated one at a time, waiting for readiness before proceeding
- Updates can be paused if a pod fails to become ready
- Scaling
- When scaling up, new pods are created in sequential order
- When scaling down, pods are terminated in reverse order
- Scaling operations respect the same ordering guarantees as creation
- Deletion
- When a StatefulSet is deleted, pods are terminated in reverse order
- PersistentVolumeClaims are not deleted automatically
- Orphaned PVCs need to be manually deleted if no longer needed
StatefulSet deletion does not guarantee termination of all pods. For graceful termination, scale the StatefulSet to 0 replicas before deletion:
This approach helps prevent data corruption in stateful applications by ensuring clean shutdown.
StatefulSet Usage Scenarios
Distributed Database Cluster
StatefulSets are ideal for database clusters where each node needs:
- A stable identity for peer discovery
- Persistent storage that survives restarts
- Predictable ordering for primary/secondary initialization
For example, a Cassandra cluster requires nodes to know about each other and join the ring in a controlled manner:
Replicated Key-Value Store
For services like etcd or ZooKeeper that require quorum-based consensus:
Introduction to DaemonSets
DaemonSets ensure that a copy of a pod runs on all (or a subset of) nodes in a Kubernetes cluster. As nodes are added to the cluster, pods are automatically added to them; as nodes are removed, those pods are garbage collected. Deleting a DaemonSet cleans up all the pods it created.
DaemonSets are ideal for:
- Node monitoring agents (Prometheus Node Exporter, Fluentd)
- Log collectors (Filebeat, Logstash)
- Network plugins (CNI providers like Calico, Weave)
- Storage daemons (Ceph, GlusterFS)
- Security agents that need to run on every node
The key characteristic of a DaemonSet is that it places exactly one pod instance per qualifying node, making it perfect for infrastructure-level services.
DaemonSet Core Concepts
Node Coverage
- One pod per node: Ensures a single instance runs on each node
- Automatic pod creation: New pods are created when nodes join the cluster
- Automatic cleanup: Pods are removed when nodes are drained or deleted
- Node selection: Can target specific nodes using nodeSelector or node affinity
- Taint tolerance: Can run on nodes with taints by specifying tolerations
- Guaranteed presence: Critical for node-level monitoring and services
Pod Scheduling
- Bypass scheduler: Traditionally used the NodeController to place pods directly
- Modern scheduling: Uses default scheduler with node affinity in newer versions
- Placement guarantee: Ensures pods run even on nodes that might be unschedulable
- Update strategy: Controls how pods are updated across the cluster
- Pod management policy: Determines if pods are created in parallel or sequentially
- Revision history: Tracks changes for rollback purposes
Use Cases
- Cluster networking: CNI plugin components on every node
- Monitoring: Node exporters for metrics collection
- Log aggregation: Collecting logs from every node
- Storage provisioners: Node-level storage drivers
- Security scanners: Vulnerability detection on each node
- Service meshes: Proxies or sidecars that need node-wide presence
DaemonSet YAML Definition
This DaemonSet deploys Fluentd for log collection on every node in the cluster, including control plane nodes (via tolerations).
DaemonSet Scheduling
DaemonSet pods are scheduled on nodes in two different ways, depending on the Kubernetes version:
- Traditional (Pre-1.12): DaemonSet controller directly created pods on nodes, bypassing the scheduler
- Modern (1.12+): DaemonSet controller creates pods with
NodeAffinity
and atolerationSeconds: 0
to ensure they're scheduled on appropriate nodes
The modern approach offers several advantages:
- Respects resource constraints and quality of service
- Allows pod preemption
- Provides consistent scheduling behavior
- Integrates with scheduling features like pod priority
The DaemonSet controller ensures that pods are placed on all qualifying nodes even if the node is marked as unschedulable (cordoned) to prevent normal pods from being scheduled.
Node Selection for DaemonSets
DaemonSets can target specific nodes using three mechanisms:
- nodeSelector: Simple key-value matching for node labels
- Node Affinity: More expressive matching with logical operators
- Taints and Tolerations: Allow pods to schedule on nodes with matching taints
These mechanisms can be combined to create sophisticated node targeting rules for DaemonSet pods.
DaemonSet Updates
DaemonSets support two update strategies:
- OnDelete: Pods are only updated when manually deleted
- No automatic updates
- Administrator controls timing
- Manual rollout across the cluster
- Good for critical infrastructure components
- RollingUpdate (default): Pods are automatically updated in controlled fashion
- Controlled by
maxUnavailable
(default: 1) - Updates occur in batches
- Respects pod disruption budget
- Supports automatic rollbacks
- Controlled by
Rolling updates allow for controlled deployment of new versions:
DaemonSet Usage Scenarios
Monitoring Agent
Deploy a Prometheus Node Exporter on every node to collect system metrics:
Storage Provisioner
Deploy a storage provisioner daemon on selected nodes with specific hardware:
StatefulSets vs DaemonSets vs Deployments
Understanding when to use each controller type is crucial for proper application architecture:
Feature | StatefulSet | DaemonSet | Deployment |
---|---|---|---|
Use case | Stateful applications | Node-level services | Stateless applications |
Pod identity | Stable, persistent | Tied to node | Interchangeable |
Scaling | Ordered, sequential | One per node | Any order, parallel |
Storage | Persistent per pod | Usually host mounts | Usually ephemeral |
Networking | Stable DNS names | Usually host network | Service abstraction |
Update strategy | Ordered, in-place | Per node, controlled | Controlled replacement |
Node placement | Any suitable node | One per matching node | Any suitable node |
Typical examples | Databases, message queues | Monitoring, logging agents | Web servers, API services |
Best Practices
StatefulSet Best Practices
- Use headless services
- Create a headless service (clusterIP: None) for StatefulSet DNS entries
- Essential for peer discovery in clustered applications
- Enables direct pod-to-pod communication
- Plan for pod failures
- Implement proper readiness probes to prevent cascading failures
- Use PodDisruptionBudgets to maintain quorum during maintenance
- Create proper initialization and shutdown procedures
- Manage storage carefully
- Use storage classes appropriate for your application's performance needs
- Implement backup solutions for persistent volumes
- Consider using volumeClaimTemplates with different storage classes for different roles
- Initialize stateful applications properly
- Provide initialization containers for complex setup
- Use pod hostname and ordinal index for dynamic configuration
- Consider custom bootstrapping scripts for cluster formation
- Plan updates strategically
- Use partition updates for controlled rollouts
- Test update strategies thoroughly before production
- Consider manual updates for critical database workloads
DaemonSet Best Practices
- Resource management
- Always set resource limits and requests
- Keep resource footprint small to avoid impacting node capacity
- Consider using priorityClass for critical system DaemonSets
- Minimize privileges
- Only request necessary permissions and capabilities
- Use SecurityContext to restrict container privileges
- Avoid running as root unless absolutely necessary
- Handle node variants
- Use node affinity and tolerations for different node types
- Consider separate DaemonSets for significantly different node architectures
- Handle OS-specific variations with appropriate container images
- Update cautiously
- Use slow rolling updates for critical components
- Test updates thoroughly before cluster-wide deployment
- Consider canary deployments on a subset of nodes first
- Monitor performance impact
- Ensure DaemonSets don't degrade node performance
- Monitor resource usage across the cluster
- Be aware of the cumulative impact of multiple DaemonSets
Advanced Features and Patterns
StatefulSet Update Partitioning
Update partitioning allows rolling out changes to a subset of pods:
This creates a controlled update where only some pods receive the new version, enabling advanced rollout strategies.
Parallel Pod Management
For StatefulSets that don't require ordered deployment:
This allows pods to be created or terminated in parallel, speeding up scaling operations when strict ordering isn't needed.
DaemonSet Rolling Back
To roll back a DaemonSet to a previous revision:
Controlling Pod Deletion Cost
For DaemonSets, you can influence which pods are deleted first during updates:
This can be useful when some nodes are more critical than others.
Troubleshooting
Common issues and their solutions:
- StatefulSet pod stuck in Pending
- Check PVC provisioning (
kubectl get pvc
) - Verify storage class exists and provisioner is running
- Check for resource constraints on nodes
- Check PVC provisioning (
- DaemonSet not scheduling on a node
- Verify node labels match nodeSelector
- Check for taints that need tolerations
- Examine node conditions and capacity
- StatefulSet update stuck
- Check readiness probes in updated pods
- Verify ordinal index causing the blockage
- May need manual intervention if pod can't become ready
- DaemonSet creates too many or too few pods
- Verify node selectors and affinities
- Check if nodes are cordoned or have unexpected taints
- Ensure nodes are in Ready state
Conclusion
StatefulSets and DaemonSets are specialized controllers that extend Kubernetes' capabilities beyond simple stateless applications. StatefulSets enable complex distributed systems with state, while DaemonSets ensure critical services run on every appropriate node.
Understanding when and how to use these controllers is crucial for building robust, scalable Kubernetes applications. By following the patterns and best practices outlined in this guide, you can effectively leverage these powerful resources to build sophisticated application architectures on Kubernetes.