Welcome to from-docker-to-kubernetes

Docker Storage Drivers

Understanding Docker storage drivers, their types, and best practices

Introduction to Docker Storage Drivers

Docker storage drivers (also known as graph drivers) are responsible for managing how images and containers are stored and accessed on your Docker host. They handle the details of how read-write layers are implemented and how data is shared between images and containers.

Storage drivers form a critical part of Docker's architecture, enabling the efficient storage and management of container data. They implement Docker's Union File System (UnionFS) concept, which allows multiple file systems to be mounted simultaneously with a unified view.

Why Storage Drivers Matter

Storage drivers directly impact:

  • Container performance and efficiency
  • Image build and deployment times
  • Disk space utilization
  • System stability and reliability
  • Application I/O performance
  • Docker host resource consumption

Understanding how different storage drivers work and their trade-offs is essential for optimizing Docker deployments, especially in production environments where performance and reliability are critical.

Storage Driver Types

Docker supports several storage drivers, each with its own advantages and trade-offs:

  • Most widely used and recommended driver
  • Best performance and stability for most use cases
  • Supports both Linux and Windows
  • Uses copy-on-write and page cache efficiently
  • Lower memory usage compared to other drivers
  • Requires kernel version 4.0 or higher for best performance
  • Supported on most modern Linux distributions out of the box
  • Provides good balance between performance and functionality
  • Uses native overlay filesystem support in the kernel
  • Efficiently manages layer sharing between containers

AUFS (Advanced Union File System)

  • One of the oldest storage drivers
  • Good stability but being phased out
  • Only works on Ubuntu and Debian
  • Higher memory usage than overlay2
  • Complex implementation with many layers
  • Used to be the default driver for Docker
  • Provides stable performance for legacy systems
  • Better read performance than write performance
  • Not included in mainline Linux kernel
  • Required for older Docker deployments

Devicemapper

  • Block-level storage rather than file-level
  • Better for high-write workloads
  • Common in Red Hat Enterprise Linux
  • Direct-lvm mode recommended for production
  • Higher CPU usage compared to overlay2
  • Allows for storage quotas and snapshot capabilities
  • Supports thin provisioning for better space utilization
  • Provides good isolation between containers
  • Performance can degrade with many layers
  • Requires proper configuration for production use

BTRFS

  • Advanced filesystem features
  • Built-in volume management
  • Supports snapshots and quotas
  • Higher disk space usage
  • Limited platform support
  • Provides native copy-on-write capabilities
  • Excellent for systems already using BTRFS
  • Efficient for large files and databases
  • Integrated compression features
  • Requires dedicated partition

ZFS

  • Advanced filesystem features
  • Data integrity protection
  • Built-in volume management
  • Higher memory requirements
  • Limited platform support
  • Native copy-on-write implementation
  • Excellent data protection and integrity
  • Advanced compression capabilities
  • Supports encryption and deduplication
  • Memory-intensive but very reliable

VFS

  • Simple but inefficient
  • No copy-on-write support
  • Used mainly for testing
  • Works everywhere
  • Not recommended for production
  • Completely stable and predictable behavior
  • Each layer is a complete copy (no sharing)
  • Consumes the most disk space
  • Useful for debugging and specialized environments
  • Provides baseline for comparing other drivers

Storage Driver Architecture

Understanding how storage drivers work:

Docker Storage Architecture in Detail

Directory Structure

Docker organizes its storage in specific locations:

# Default Docker root directory
/var/lib/docker/

# Storage driver-specific directory
/var/lib/docker/<storage-driver>/

# Image layer storage
/var/lib/docker/<storage-driver>/imagedb/

# Container layer storage
/var/lib/docker/<storage-driver>/layerdb/

# Volumes (storage outside of Union filesystem)
/var/lib/docker/volumes/

The exact structure varies depending on the storage driver in use, but this general organization applies to all drivers.

Layer Organization

Images and containers are organized as a series of layers:

  1. Base Layer: Usually a minimal operating system from the base image
  2. Intermediate Layers: Each instruction in a Dockerfile creates a layer
  3. Container Layer: A writable layer created when a container starts

When you build or pull an image, Docker downloads and extracts each layer individually, and the storage driver assembles them into a cohesive filesystem.

Copy-on-Write (CoW)

The fundamental principle behind Docker storage drivers:

  1. Initial State
    • Files shared from image layers
    • No duplicate storage
    • Fast container startup
    • Efficient memory usage
    • Minimal disk space consumption
  2. Modification Process
    1. File read request
    2. Search through layers (top-down)
    3. Return first found instance
    4. For writes: copy file to container layer
    5. Modify copy in container layer
    6. Subsequent reads return the modified file
    
  3. Performance Implications
    • First write is expensive (copy operation)
    • Subsequent reads are fast (from container layer)
    • Layer depth affects performance (search time)
    • Size of files impacts efficiency (copy time)
    • Fragmentation can occur over time
    • Write amplification with large files
    • Random writes can be slower than sequential
    • Metadata operations have varying costs
    • Driver-specific optimizations apply
    • Underlying storage media matters significantly

Copy-Up Operation

When a container needs to modify a file that exists in a lower layer:

  1. The storage driver performs a "copy-up" operation to copy the file to the container's writable layer
  2. The container then modifies its own copy of the file
  3. All subsequent read operations on that file are served from the container layer
  4. Other containers using the same image continue to use the original, unmodified file

This process is fundamental to Docker's efficiency but can impact performance for write-heavy workloads or when modifying large files.

Storage Driver Comparison

Selecting the right storage driver involves considering various performance characteristics:

DriverRead PerformanceWrite PerformanceSpace EfficiencyMemory UsageUse Case
overlay2ExcellentGoodGoodLowGeneral purpose
aufsGoodModerateGoodModerateLegacy systems
devicemapperModerateGoodExcellentModerateWrite-intensive
btrfsGoodGoodModerateModerateData integrity
zfsGoodGoodExcellentHighData protection
vfsModeratePoorPoorLowTesting

Performance Characteristics

Each storage driver handles different operations with varying efficiency:

  1. Reading Files
    • First read can be slower due to layer lookup
    • Subsequent reads benefit from page cache
    • Layer depth impacts read performance
    • Small files generally perform better than large files
  2. Writing New Files
    • Generally fast across all drivers
    • Written directly to the container layer
    • Performance similar to native filesystem
  3. Modifying Existing Files
    • Performance varies significantly by driver
    • Copy-up operation can be expensive
    • Large files suffer more performance penalty
    • Block-based drivers can be more efficient for large files
  4. Deleting Files
    • Implementation varies by driver
    • Can create "whiteout" files in some drivers
    • May not reclaim space immediately
    • Impact on read performance varies

Storage Driver Configuration

Configuring storage drivers in Docker:

{
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true",
    "overlay2.size=20G"
  ]
}

This configuration goes in /etc/docker/daemon.json or can be specified with the --storage-driver flag when starting the Docker daemon.

Driver-Specific Options

Each storage driver supports specific configuration options:

Overlay2 Options:

{
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true",
    "overlay2.size=20G"
  ]
}

Devicemapper Options:

{
  "storage-driver": "devicemapper",
  "storage-opts": [
    "dm.thinpooldev=/dev/mapper/thin-pool",
    "dm.use_deferred_removal=true",
    "dm.use_deferred_deletion=true",
    "dm.basesize=20G"
  ]
}

ZFS Options:

{
  "storage-driver": "zfs",
  "storage-opts": [
    "zfs.fsname=zroot/docker"
  ]
}

Configuration Guidelines

Best Practices

Performance Optimization

  1. Layer Management
    # Good - Fewer layers
    RUN apt-get update && \
        apt-get install -y package1 package2 && \
        rm -rf /var/lib/apt/lists/*
    
    # Bad - Multiple layers
    RUN apt-get update
    RUN apt-get install -y package1
    RUN apt-get install -y package2
    

    Reducing the number of layers in your images has several benefits:
    • Faster image builds
    • More efficient storage usage
    • Better layer cache utilization
    • Improved container startup time
    • Reduced complexity in the union mount
  2. File Operations
    • Minimize large file operations
    • Use appropriate file sizes
    • Consider layer impact
    • Optimize write patterns
    • Avoid frequent modifications to large files
    • Use volumes for write-intensive workloads
    • Batch small file operations when possible
    • Be aware of file fragmentation
    • Consider file compression strategies
    • Implement appropriate buffer sizes
  3. Cache Usage
    • Leverage build cache
    • Use multi-stage builds
    • Implement proper layer ordering
    • Clean up unnecessary files
    • Be strategic about file copying
    • Optimize package manager caches
    • Remove temporary files in the same layer
    • Use .dockerignore effectively
    • Consider buildkit cache mounts
    • Implement CI/CD cache strategies

Production Recommendations

Detailed Driver Configurations

Overlay2 Optimization

Overlay2 is the recommended driver for most use cases. To optimize it:

  1. Filesystem Selection
    • Use XFS for best performance (ext4 also works well)
    • Ensure d_type support is enabled
    • Consider noatime mount option
    • Use appropriate filesystem block size
  2. Configuration Tuning
    {
      "storage-driver": "overlay2",
      "storage-opts": [
        "overlay2.size=20G",
        "overlay2.override_kernel_check=true"
      ]
    }
    
  3. Kernel Parameters
    • Ensure kernel version is 4.0 or higher (5.0+ recommended)
    • Check for overlay-related kernel modules
    • Consider increasing inotify limits
    • Tune page cache parameters

Devicemapper Production Setup

For production use, devicemapper should be configured in direct-lvm mode:

  1. Create a thin pool
    # Create physical volume
    pvcreate /dev/xvdf
    
    # Create volume group
    vgcreate docker /dev/xvdf
    
    # Create logical volumes
    lvcreate --wipesignatures y -n thinpool docker -l 95%VG
    lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
    
    # Convert to thin pool
    lvconvert -y --zero n -c 512K --thinpool docker/thinpool \
      --poolmetadata docker/thinpoolmeta
    
  2. Configure thin pool autoextension
    # /etc/lvm/profile/docker-thinpool.profile
    activation {
      thin_pool_autoextend_threshold=80
      thin_pool_autoextend_percent=20
    }
    
  3. Apply the profile
    lvchange --metadataprofile docker-thinpool docker/thinpool
    
  4. Configure Docker daemon
    {
      "storage-driver": "devicemapper",
      "storage-opts": [
        "dm.thinpooldev=/dev/mapper/docker-thinpool",
        "dm.use_deferred_removal=true",
        "dm.use_deferred_deletion=true",
        "dm.basesize=20G",
        "dm.fs=xfs"
      ]
    }
    

Troubleshooting

Common storage driver issues and solutions:

  1. Performance Problems
    • Check layer depth
    • Monitor I/O patterns
    • Evaluate driver settings
    • Review resource usage
    • Analyze disk I/O metrics
    • Check for fragmentation
    • Review container activity
    • Analyze application I/O patterns
    • Check filesystem mount options
    • Verify hardware performance
  2. Space Issues
    # Check space usage
    docker system df
    
    # Detailed space usage
    docker system df -v
    
    # Clean up unused resources
    docker system prune
    
    # Aggressive cleanup
    docker system prune -a --volumes
    
    # Inspect specific container
    docker inspect container_id
    

    Common space-related issues:
    • Leaked volume mounts
    • Orphaned containers
    • Unused images and layers
    • Excessive container logs
    • Unmanaged build cache
    • Improper layer management
    • Incorrectly sized thin pools
    • Database growth in containers
    • Log file accumulation
    • Temporary file buildup
  3. Driver-Specific Issues
    Overlay2 Issues:
    • Inode exhaustion
    • d_type not supported
    • Kernel version incompatibility
    • SELinux conflicts
    • Mount option incompatibilities

    Devicemapper Issues:
    • Thin pool exhaustion
    • Metadata space depletion
    • Device busy errors
    • Udev sync issues
    • Device removal problems

    BTRFS/ZFS Issues:
    • Fragmentation
    • Memory pressure
    • Snapshot management
    • Dataset limits
    • Pool exhaustion

Storage Driver Internals

Understanding the internal mechanisms of storage drivers can help with troubleshooting and optimization:

Overlay2 Internals

Overlay2 uses the overlay filesystem to implement layers:

upperdir/     (container layer)
lowerdir[0]/  (topmost image layer)
lowerdir[1]/  (next image layer)
...
lowerdir[n]/  (base image layer)
merged/       (unified view)
work/         (overlay work directory)

When a file is accessed:

  1. The overlay filesystem checks the upperdir first
  2. If not found, it checks each lowerdir in order
  3. When a file is modified, it's copied to the upperdir

Overlay2 is efficient because:

  • It uses the page cache effectively
  • Lookup is optimized with multiple lowerdirs
  • It has relatively low metadata overhead
  • Modern kernel implementations are highly optimized

Devicemapper Internals

Devicemapper works at the block level rather than the file level:

  1. Thin Provisioning: Allocates blocks only when written
  2. Snapshots: Creates block-level deltas between layers
  3. Copy-on-Write: Copies blocks when they're modified

The devicemapper driver creates:

  • A base device for the base image
  • Snapshot devices for each layer
  • A snapshot device for the container

Each layer contains:

  • Block-level differences from its parent
  • Metadata about the device
  • Reference counting for shared blocks

Monitoring and Maintenance

Essential monitoring practices:

  1. Resource Monitoring
    # Storage usage
    docker system df -v
    
    # Container disk usage
    docker ps -s
    
    # Detailed inspection
    docker inspect container_id
    
    # Layer information
    docker history image_name
    
    # Real-time disk I/O
    iostat -xz 5
    
    # Filesystem-specific stats
    df -i  # inode usage
    
  2. Performance Metrics
    Key metrics to monitor:
    • I/O operations per second (IOPS)
    • I/O throughput (MB/s)
    • I/O latency
    • Read vs write ratio
    • Sequential vs random access
    • Layer access times
    • Cache hit rates
    • Storage latency
    • Metadata operation costs
    • Container startup time
  3. Maintenance Tasks
    Regular maintenance procedures:
    # Prune unused containers
    docker container prune
    
    # Prune unused images
    docker image prune
    
    # Prune unused volumes
    docker volume prune
    
    # Prune everything unused
    docker system prune -a
    
    # Reclaim space in thin pools (devicemapper)
    sudo lvs -a | grep docker
    sudo lvchange -y -an docker/thinpool
    sudo lvchange -y -ay docker/thinpool
    

Storage Driver Migration

Sometimes you may need to change storage drivers. This process requires careful planning:

  1. Pre-Migration Preparation
    • Back up all important data
    • Document existing configurations
    • Save image list: docker image ls -a > images.txt
    • Save container list: docker ps -a > containers.txt
    • Ensure sufficient disk space
    • Plan for downtime
  2. Migration Process
    # Stop Docker
    sudo systemctl stop docker
    
    # Edit daemon.json to change driver
    sudo nano /etc/docker/daemon.json
    
    # Move or backup old data
    sudo mv /var/lib/docker /var/lib/docker.bak
    
    # Start Docker with new driver
    sudo systemctl start docker
    
    # Verify new driver
    docker info | grep "Storage Driver"
    
  3. Post-Migration Tasks
    • Re-pull or reload images
    • Recreate containers
    • Verify application functionality
    • Monitor performance
    • Clean up old storage if migration is successful

Future Developments

Emerging trends in Docker storage:

  1. Enhanced Drivers
    • Better performance
    • Improved security
    • Enhanced features
    • Greater compatibility
    • More efficient algorithms
    • Lower overhead implementations
    • Better caching mechanisms
    • More efficient copy-on-write
    • Improved isolation guarantees
    • Enhanced monitoring capabilities
  2. Integration Improvements
    • Cloud storage integration
    • Kubernetes compatibility
    • Enhanced monitoring
    • Automated optimization
    • CSI (Container Storage Interface) adoption
    • Cross-platform storage solutions
    • Hybrid storage strategies
    • Edge computing support
    • Standardized benchmarking
    • Seamless migration tools
  3. Security Enhancements
    • Enhanced isolation
    • Better access controls
    • Improved encryption
    • Advanced auditing
    • CVE vulnerability reduction
    • Rootless container storage
    • Content trust integration
    • Compliance-focused features
    • Supply chain security
    • Runtime attestation

Specialized Use Cases

High-Performance Computing

For I/O intensive workloads:

  • Consider direct-lvm devicemapper with SSD
  • Use volumes with XFS for write-heavy workloads
  • Tune filesystem parameters for large file operations
  • Consider binding to specific high-performance devices
  • Evaluate custom storage solutions with pass-through capabilities

Edge and IoT Devices

For resource-constrained environments:

  • Use overlay2 with size limits
  • Implement aggressive pruning policies
  • Consider read-only container filesystems
  • Use tmpfs for temporary data
  • Implement wear-leveling strategies for flash storage

Large-Scale Deployments

For deployments with thousands of containers:

  • Implement centralized monitoring
  • Use consistent storage drivers across hosts
  • Consider storage networking impacts
  • Implement automated maintenance
  • Plan for non-disruptive upgrades

Conclusion

Docker storage drivers provide the foundation for container storage management. Understanding their characteristics, strengths, and limitations is essential for optimizing Docker deployments.

Key takeaways:

  • Overlay2 is the recommended driver for most use cases
  • Choose your storage driver based on workload characteristics
  • Use volumes for persistent, performance-critical data
  • Implement regular maintenance and monitoring
  • Keep up with driver developments and best practices

By applying these principles and best practices, you can ensure your Docker storage infrastructure is performant, reliable, and maintainable.