System Documentation · Build Journal

Self-Hosted
Homelab

A privacy-first, production-grade homelab running across five servers - built and iterated through February to May 2026. Every architectural decision, automation script and security hardening choice documented here as a living build journal.

4 + 1 Servers

18+ Services

14 TB Storage

0 Cloud Dependencies

Ubuntu 24.04 · DietPi Docker · LVM · WireGuard Cloudflare Tunnel · Zero Trust NUT UPS · DoH Feb – May 2026

Fully live · In Production

Architecture Overview

A five-server environment built around a single-ingress reverse proxy architecture - all external traffic enters through one node, with each server scoped to a distinct role. Cloudflare sits in front for DNS, TLS and bot protection. WireGuard provides encrypted remote access. NUT monitors the UPS and triggers a coordinated graceful shutdown on power loss.

H-01 host_1_home

Mini PC · J4125 · 8 GB DDR4

Network ingress, DNS filtering and VPN gateway. All external traffic enters here via Nginx Proxy Manager. Hosts the central observability stack.

NPM Pi-hole WireGuard Cloudflare Tunnel Grafana Prometheus Loki

H-02 host_2_cloud

ThinkStation M700 SFF · i5-6400 · 32 GB DDR4

Primary services host. Runs all user-facing applications on Docker with an 8 TB data volume via LVM.

Nextcloud Immich Jellyfin DocBot Feedback API

H-03 host_3_mail

ThinkCentre M900 Tiny · i7-6700T · 32 GB DDR4

Dedicated mail and document server. Full email sovereignty with Mailcow and Paperless-ngx for document management and few other misc. services

Mailcow Paperless-ngx Brevo Relay

H-04 host_4_backup

ThinkCentre M910q Tiny · i5-6400T · 16 GB DDR4

Dedicated backup node for multiple services. Also hosts network monitoring services. Planned for SSO authentication and password management services later.

NetAlertX

H-05 host_5_power

Raspberry Pi 4 · 2 GB · DietPi

UPS monitoring primary (NUT), secondary DNS filtering (Pi-hole) and remote Alloy metrics agent. The UPS is connected here - on power loss it coordinates a graceful shutdown across all servers.

NUT Primary Pi-hole DoH Alloy

Key Learnings

01 / Architecture

Single-ingress scales cleanly

Routing all external traffic through one reverse proxy node makes TLS management, port exposure and hardening dramatically simpler than distributing ingress across servers.

02 / Storage

LVM planning pays dividends

Designing volume groups with unallocated buffer upfront means storage can expand without rebuilding services or migrating data mid-operation.

03 / Email

Email sovereignty is achievable

Residential deliverability is the core challenge. A reputable SMTP relay solves it without building IP reputation - the rest of the stack is surprisingly manageable.

04 / VPN

Native beats Docker for critical path

WireGuard running native rather than Docker proved more reliable for boot persistence - Docker adds a dependency layer exactly where you need reliability most.

05 / Operations

Automate operational noise early

Biweekly maintenance automation and email alerting eliminate the mental overhead of remembering to patch and verify - the stack becomes genuinely low-maintenance.

06 / AI

Local AI is more capable than expected

A CPU-only RAG assistant running on consumer hardware is surprisingly usable for document querying - the privacy trade-off versus cloud APIs is well worth it.

07 / Power

UPS integration is non-negotiable

NUT with voltage-based thresholds (battery.charge not available on all hardware) gives a reliable 10-minute window for coordinated graceful shutdown across the full fleet.

08 / Privacy

DoH closes the last DNS gap

Pi-hole filters ads but DNS queries remain visible to the ISP without DoH. dnscrypt-proxy as the Pi-hole upstream closes this gap with minimal complexity.

Network Ingress & Reverse Proxy

Single-Ingress via Nginx Proxy Manager

host_1_home · NPM · Cloudflare Tunnel · Let's Encrypt

All external traffic enters through host_1_home via Nginx Proxy Manager, which routes requests to backend services across the LAN. Only one node needs to be hardened for external exposure. Cloudflare sits in front with Bot Fight Mode, email obfuscation and proxied DNS. TLS certificates are issued via Let's Encrypt using the Cloudflare DNS challenge - no port 80 exposure required.

A Cloudflare Tunnel replaces DDNS-managed A records for all web-facing subdomains. The server initiates an outbound-only connection to Cloudflare over port 443 - home IP is never exposed in public DNS. Force SSL is disabled on all tunneled NPM proxy hosts; Cloudflare terminates SSL at the edge and the tunnel carries HTTP to NPM internally.

Four browser-based SSH endpoints are exposed via Cloudflare Zero Trust - email OTP authentication, country-restricted to two regions, 6-hour session. No SSH client installation required for emergency access from any device.

Key Decisions

Cloudflare Tunnel over DDNS A records - home IP never appears in public DNS, traffic routes through Cloudflare's trusted IP ranges.

DNS challenge for SSL avoids exposing port 80 on a residential connection for ACME validation.

Force SSL disabled on tunneled NPM hosts - Cloudflare terminates TLS at edge; enabling it causes redirect loops.

Zero Trust browser SSH over Apache Guacamole - simpler, no VM required, already integrated with the tunnel infrastructure.

Nginx Proxy Manager Cloudflare Tunnel Cloudflare Zero Trust Let's Encrypt DNS Challenge

DNS Filtering & DNS-over-HTTPS

Dual Pi-hole with DoH Upstream

host_1_home · host_5_power · Pi-hole · dnscrypt-proxy · Hagezi

Two Pi-hole instances run across the fleet - primary on host_1_home (Docker), secondary on host_5_power (DietPi native). The router's DHCP server assigns both as DNS servers for all LAN clients, with the secondary as pure failover - not active-active.

Both instances use the Hagezi Multi Pro blocklist alongside StevenBlack. DNS queries are forwarded upstream through dnscrypt-proxy 2.1.15 running on each host, wrapping all queries in HTTPS to Cloudflare's DoH endpoint - ISP cannot see resolved domains.

Pi-hole is deployed as opt-in per device rather than a network-wide blanket, protecting household appliances and guests from DNS filtering side-effects. Corporate devices using Zscaler tunnel their DNS through the VPN automatically - no conflict.

Key Decisions

dnscrypt-proxy binary from GitHub (v2.1.15) over apt package - the apt version is too old for custom listen address configuration.

Independent secondary Pi-hole (no Gravity Sync) - simpler to manage; blocklist updates are independent and failures are isolated.

Opt-in per device rather than network-wide - avoids breaking smart home devices and household appliances that misbehave with aggressive DNS filtering.

Pi-hole dnscrypt-proxy Hagezi Pro DoH Cloudflare

VPN

WireGuard VPN Gateway

host_1_home · Native · wg0

WireGuard runs natively on host_1_home as a system service, providing encrypted remote access to the full homelab subnet. Peers include the Windows laptop, Chromebook and Android phone - each provisioned with individual keys. SSH access to all servers is gated to LAN and WireGuard subnet only.

Key Decisions

Native WireGuard over Docker - better boot persistence and fewer moving parts in a critical path component that must survive Docker daemon restarts.

SSH restricted to LAN and VPN subnet only - no public SSH exposure on any server.

WireGuard wg-quick UFW

Container Management

Portainer CE · Gitea · Multi-host Stack Management

host_2_cloud (server) · host_1_home, host_3_mail, host_4_backup (agents)

Portainer CE 2.39.0 LTS is deployed as the unified container management layer across the homelab, with Gitea 1.26.2 running co-located on host_2_cloud as the local Git backend for all stack definitions. Every Docker service across all four hosts is defined as a Git-backed compose stack, version-controlled in Gitea and deployed through Portainer.

Portainer Server runs on host_2_cloud - selected for its 32 GB RAM headroom. Standard Portainer Agents are deployed on host_1_home, host_3_mail and host_4_backup, each reachable by the server over the LAN on a custom port. host_5_power is excluded entirely. It carries no Docker runtime and Alloy runs there as a native binary. All compose files live under a specific location on each host, with one dedicated Gitea repo per host.

Stack secrets are managed as Portainer environment variables - no .env files remain on disk. Portainer and Gitea both use bind mounts, covered by the existing rsync backup job on sunnybackup. Portainer is accessible via NPM reverse proxy in a subdomain - no public exposure.

Mailcow on sunnymail is deliberately Portainer-unmanaged. Its update script (mailcow-update.sh) is tightly coupled to code location and runs its own docker compose internally - Portainer managing this stack would conflict with the native updater and break the update path. Mailcow retains its own systemd startup handling; all other stacks restart cleanly via Docker restart policy after reboot.

Key Decisions

Portainer Server on host_2_cloud - 32 GB RAM provides sufficient headroom.

Standard agent over edge agent - LAN-only homelab; edge agent complexity adds nothing in this topology.

One Gitea repo per host - Portainer Git stack integration points to repo root; per-host isolation prevents cross-host deployment mistakes.

Portainer env vars over .env files - centralised secret management; secrets survive stack redeployment without file presence on the host.

Startup ownership: systemd owns Portainer itself; all other stacks rely on restart: unless-stopped - clean separation removes the race condition between docker-compose-startup.service and Portainer on reboot.

Mailcow left unmanaged - update toolchain conflict risk too high; mailcow-update.sh owns its own compose lifecycle.

Portainer CE Gitea Docker Compose Git-backed Stacks Portainer Agent NPM Reverse Proxy Ubuntu 24.04

Management Coverage

Portainer CE - live environment view across all four managed hosts

Public Services

All services are self-hosted, accessible over HTTPS via Cloudflare Tunnel - home IP never exposed in DNS. These might not open in corporate networks as they are hosted on a different domain.

Photos

Immich - AI-powered self-hosted photo & video library

host_2_cloud

Open

Cloud

Nextcloud - personal cloud storage & file sync

host_2_cloud

Open

Mail

Mailcow - full email sovereignty with DKIM/SPF/DMARC

host_3_mail

Open

Media

Jellyfin - private media server for films, TV & music

host_2_cloud

Open

Documents

Paperless-ngx - self-hosted document management with OCR

host_3_mail

Open

Authentik

Self-hosted SSO and identity provider - centralised authentication for homelab services.

host_2_cloud

Open

Memos

Self-hosted note-taking and personal knowledge base - lightweight, private, local.

host_2_cloud

Open

Vault

Vaultwarden - self-hosted Bitwarden-compatible password manager

host_2_cloud

Open

Internal Services

LAN and WireGuard VPN access only - not publicly exposed. Click any card for details.

Portainer

Container management across all Docker hosts - Git-backed stacks via Gitea.

LAN & VPN only

Gitea

Self-hosted Git backend - source of truth for all homelab compose files.

LAN & VPN only

DocBot

Local RAG AI assistant - query personal documents without cloud APIs.

LAN & VPN only

Pi-hole

Network-wide DNS filtering - ad and tracker blocking with query logging.

LAN & VPN only

Fleet Dashboard

Live server metrics across all nodes - CPU, RAM, containers and uptime.

LAN & VPN only

Monitoring

Grafana dashboards across all nodes - metrics, logs and alert rules.

LAN & VPN only

NetAlertX

Connected device monitoring - detects new and unknown devices joining the network.

LAN & VPN only

FileUpload

Browser based file upload portal for connected devices on Home network - no app required.

LAN & VPN only

Cloud & Media Stack

Self-Hosted Cloud, Photos & Media

host_2_cloud · Nextcloud · Immich · Jellyfin · DocBot · LVM

Nextcloud is the primary cloud storage platform - a full self-hosted alternative to Google Drive, running on Docker with an 8 TB external data volume managed via LVM. The LVM layout was designed from the start to allow incremental expansion without service interruption.

Immich runs alongside Nextcloud for photo management, deliberately separated to leverage its machine-learning photo features and superior mobile experience. External libraries mount Nextcloud photo folders as read-only - single source of truth, with Immich's AI features on top.

Jellyfin runs in host network mode for smooth local media streaming, accessible on the home network or remotely via WireGuard. DocBot is a private RAG pipeline - Ollama, Mistral 7B, FastAPI, ChromaDB - for querying personal documents without sending data to external APIs.

Key Decisions

LVM designed for incremental expansion - storage grows without rebuilding containers or reconfiguring services.

Immich separated from Nextcloud for performance and AI features; external libraries keep Nextcloud as the single source of truth.

DocBot intentionally runs on CPU inference - fully local with no GPU dependency or cloud API calls.

Nextcloud Immich Jellyfin Ollama ChromaDB FastAPI Docker Compose LVM

Mail, Documents & Automation

Self-Hosted Mail & Document Stack

host_3_mail · Mailcow · Paperless-ngx

Mailcow manages the complete email stack - DKIM, SPF and DMARC all verified and active. Outbound mail routes through a Brevo SMTP relay to solve residential IP deliverability without months of reputation building. A dedicated 500 GB LVM volume is allocated for mail data with growth headroom reserved.

Paperless-ngx runs on the same server for document management with OCR - all household documents, receipts and correspondence are ingested, tagged and made searchable. Public access via Cloudflare Tunnel at a dedicated subdomain.

Key Decisions

Brevo SMTP relay for deliverability - avoids the months-long process of building residential IP reputation from scratch.

Mailcow's built-in netfilter handles mail port protection - adding Fail2ban on top would conflict with its own port management.

Paperless publicly accessible via tunnel - document access from anywhere without VPN dependency.

Mailcow Brevo Relay DKIM · SPF · DMARC Paperless-ngx LVM Docker

Visitor Feedback System

Self-Hosted Feedback Widget

host_2_cloud · FastAPI · SQLite · Docker · Cloudflare Tunnel

A fully self-hosted feedback widget deployed across both sites. Visitors can leave a thumbs up/down rating and a comment - all data stays on-premises with zero third-party exposure. Formspree was evaluated and rejected to avoid handing visitor data to an external service.

The API runs as a Docker container on host_2_cloud, exposed via Cloudflare Tunnel through NPM. Submissions are stored in SQLite and trigger an email via the self-hosted Mailcow stack. CORS is locked to both domains, rate-limited to 5 submissions per IP per hour. The mail password is injected at container startup via an entrypoint script reading from an .env file - never baked into the image.

Key Decisions

Self-hosted over Formspree - visitor feedback is data; keeping it on-premise was a deliberate privacy choice.

SQLite over a full database - feedback volume is low; single-container simplicity wins.

Password via .env and entrypoint script - secrets stay out of the image layer, survive rebuilds without re-entering credentials.

FastAPI SQLite Docker msmtp CORS CSP

Hardening & Access Control

Server Hardening Baseline

All servers · SSH · Fail2ban · UFW · nftables

SSH runs on a non-standard port across all servers with key-based authentication only - keys provisioned from known devices (laptop, Chromebook, phone). Root login is disabled everywhere. Fail2ban is configured with a 3-retry threshold and an aggressive ban duration, with LAN subnet whitelisted to prevent self-lockout. The recidive jail escalates repeat offenders.

UFW rules on the three Ubuntu servers restrict all traffic to expected ports only - SSH, HTTP/HTTPS and service-specific ports scoped to LAN or NPM IP only. host_5_power uses nftables with a drop-all policy; only DNS, SSH and Pi-hole ports are open. IPv6 is disabled across all servers - no global IPv6 address is assigned.

Docker services on host_2_cloud are bound to 127.0.0.1 for internal-only services - DocBot backend and Ollama were unintentionally exposed on 0.0.0.0 and corrected during hardening review.

Key Decisions

Aggressive Fail2ban ban duration - residential IPs rarely change legitimately, so an aggressive ban meaningfully reduces brute-force surface.

Mailcow's built-in netfilter handles mail port protection exclusively - host Fail2ban not installed on host_3_mail to avoid conflicts.

host_5_power uses nftables drop-all - SSH is LAN-only and Fail2ban is unnecessary with this posture.

Fail2ban UFW nftables SSH Key Auth Non-standard Port

Vulnerability Assessment

VAPT - April 2026

7 domains · nmap · sslyze · nikto · header audit

A structured vulnerability and penetration test was run across all 7 public-facing domains using nmap, sslyze, nikto and a full HTTP security header audit. SSL/TLS was scanned against the origin directly (bypassing Cloudflare) via LAN IP. All findings were triaged, fixed or formally accepted with rationale.

Findings & Outcomes

HSTS missing - fixed via Cloudflare dashboard and NPM nginx config. Max-age 12 months, includeSubDomains, preload.

Missing security headers on static sites - X-Frame-Options, CSP, Referrer-Policy, Permissions-Policy added to NPM Advanced config.

Port 80 returning HTTP 200 on mail subdomain - accepted as-is; Mailcow default, modifying risks mail delivery.

Missing headers on photos subdomain - accepted; Immich sets its own internal headers and overrides NPM-level headers.

nmap sslyze nikto HSTS CSP TLS 1.2 / 1.3

Header	Static Sites	Cloud / Photos	Mail
Strict-Transport-Security	✓ 12mo	✓	⚠ 6mo
X-Frame-Options	✓	✓	✓
X-Content-Type-Options	✓	✓	✓
Content-Security-Policy	✓	accepted	accepted
Referrer-Policy	✓	accepted	✓
Permissions-Policy	✓	accepted	accepted

Next VAPT scheduled: October 2026

UPS Monitoring & Graceful Shutdown

NUT UPS Fleet Protection

host_5_power · NUT Primary · Green Cell AIO 600VA

A Green Cell AIO 600VA UPS is connected via USB to host_5_power, which acts as the NUT primary server. All four main servers are NUT secondaries, polling the primary every 5 seconds. On power loss, a 10-minute countdown begins - if mains is not restored within that window, a coordinated graceful shutdown fires across all four servers simultaneously via SSH, then host_5_power powers off 30 seconds later.

The UPS does not report battery.charge - only battery.voltage. A voltage-based low threshold is set via override in the NUT driver config. USB cable disconnection has a 1-hour grace period to avoid false shutdowns from accidental cable disconnects. Total time from outage to all servers off is approximately 12 minutes.

This system replaced the previous ping-based Power Sentinel watchdog, which was a dead man's switch dependent on continuous network reachability. NUT is purpose-built for UPS integration and provides more reliable shutdown semantics.

Key Decisions

10-minute countdown before shutdown - provides sufficient window for brief outages and power flickers without triggering a false shutdown.

Voltage-based threshold over charge percentage - this UPS model does not expose battery.charge; voltage is the reliable signal.

USB cable disconnect grace period (1 hour) - prevents false shutdowns from accidental disconnects; fail-safe is disabled by design.

NUT replaced ping-based Power Sentinel - purpose-built for UPS integration, more reliable shutdown semantics, no dependency on network reachability.

NUT nutdrv_qx Green Cell AIO 600VA SSH Shutdown Systemd

Docker Boot Resilience

Automatic Stack Recovery on Boot

All servers · systemd · docker-compose-startup · live-restore

After a graceful shutdown or Docker package upgrade, containers were not restarting automatically despite restart: always policies. Two root causes: Docker package upgrades via the weekly maintenance script restarted the daemon and wiped container state; and Docker's policy was not restoring state reliably on boot across all machines.

Three fixes were applied fleet-wide. A systemd oneshot service fires on every boot, waiting for Docker to fully initialise before bringing up all stacks in order. The ubuntu-update.sh maintenance script was updated to restart all stacks after any Docker package upgrade. Live restore is enabled on all three servers so containers survive daemon restarts without stopping.

host_2_cloud has an additional boot dependency - Docker itself is held until all five LVM data mounts are confirmed ready, preventing container startup races against slow disk initialisation.

Key Decisions

systemd oneshot over restart: always alone - oneshot gives explicit ordering control and logs to a dedicated file; restart policy is a fallback, not the primary mechanism.

10-second pre-start delay - gives Docker daemon time to fully initialise before stacks are started.

Mount gating on host_2_cloud - Docker held until all five LVM volumes are ready, preventing Nextcloud and Immich from starting before their data volumes mount.

Systemd Oneshot Docker Live Restore LVM Mount Gating Docker Compose

Maintenance Automation

Biweekly Automated Maintenance Cycle

All servers · cron · msmtp · ordered shutdown

Each server runs an independent biweekly maintenance script on a fixed Sunday schedule. The cycle has four phases: a 24-hour advance notice email on Saturday, a maintenance-start announcement on Sunday, an ordered Docker stack shutdown followed by reboot and a post-reboot health report emailed to the ops log recipients.

A biweekly gate function inside each script anchors to a fixed start date - cron fires every weekend, but the script exits silently if it is not a maintenance week. All scripts send email via msmtp routing through the self-hosted Mailcow stack. The post-reboot report includes a live docker ps output so any failed containers are immediately visible.

Key Decisions

Biweekly cadence - short enough to stay current with security patches, long enough to avoid operational noise.

Ordered Docker shutdown before reboot - prevents unclean container state and data corruption on storage-heavy services.

msmtp on all servers via self-hosted mail stack - every automated script sends real, deliverable email reports with no external dependency.

Bash Automation Cron msmtp Ordered Shutdown Email Alerting

Monitoring Stack

Grafana · Prometheus · Loki · Alloy

All servers · host_1_home central · Grafana Alloy agents

A full self-hosted observability stack is deployed across all five servers. Prometheus handles metrics collection, Loki aggregates logs with 30-day retention and Grafana provides unified dashboards and alerting - all running centrally on host_1_home. Grafana Alloy is the unified agent on every node, shipping both metrics and logs to the central stack over the LAN.

Remote Alloy agents run as Docker containers on host_2_cloud, host_3_mail and host_4_backup. On host_5_power (DietPi, no Docker runtime), Alloy runs as a native ARM64 binary with a systemd service. A standalone cAdvisor container runs on all Docker servers for per-container metrics - required because Alloy's embedded cAdvisor does not support cgroup v2 with systemd driver on Ubuntu 24.04.

A lightweight docker-api service runs on each Docker server, exposing a live docker ps feed as JSON. This powers the Fleet Dashboard - a LAN-only overview at a dedicated subdomain showing live CPU, RAM, temperature, uptime and container count per server. Clicking any server card opens a drill-down modal with per-container CPU, RAM, port mappings and uptime. Auto-refreshes every 30 seconds, accessible via WireGuard from anywhere.

Grafana is accessible on the LAN only - no public Cloudflare record - so observability data stays fully off the public internet. SMTP alerting routes through the self-hosted Mailcow stack.

Key Decisions

Central stack on host_1_home - already the ingress node; co-locating observability minimises inter-server hops for scraping.

Alloy over individual node_exporter + Promtail - one agent per node handles both metrics and logs, reducing configuration surface.

Standalone cAdvisor over Alloy's embedded cAdvisor - Alloy's embedded version does not support cgroup v2 + systemd driver, which is the Ubuntu 24.04 default.

Native binary on host_5_power - DietPi has no Docker; ARM64 Alloy binary with systemd is the clean fit, no runtime dependency added.

Fleet Dashboard LAN-only, no public Cloudflare record - observability data (IPs, container names, port mappings) stays fully off the public internet.

Grafana LAN-only, no public Cloudflare record - same reasoning as above.

Loki 30-day retention - balances audit usefulness against disk and auto-purges logs containing IPs and ban events.

Grafana Prometheus Loki Grafana Alloy cAdvisor docker-api Fleet Dashboard Node Exporter SMTP Alerting

Active Alert Rules

Three alert rules are active, all routed to email via the self-hosted Mailcow stack.

High RAM Usage

Fires when node memory utilisation exceeds 85% for more than 5 minutes. Evaluated across all four nodes via the nodename label.

Source: Prometheus · Pending: 5m · Keep firing: 5m

Container Down

Fires when any named container disappears from recent Prometheus scrapes - indicating an unexpected stop or crash.

Source: Prometheus · Pending: 2m · No-data: OK

Fail2ban Ban Detected

Fires when a ban event appears in the Loki log stream. Useful for monitoring active brute-force attempts across all servers.

Source: Loki · Pending: none · Keep firing: 5m

Agent Coverage

Data Source	host_1_home	host_2_cloud	host_3_mail	host_4_backup	host_5_power
Node metrics (CPU/RAM/disk)	✓	✓	✓	✓	✓
Docker container metrics (cAdvisor)	✓	✓	✓	✓	-
Fail2ban logs	✓	✓	✓	✓	-
UFW / firewall logs	✓	✓	✓	✓	-
Syslog	✓	✓	✓	✓	-
Pi-hole metrics	✓	-	-	-	✓
Mail logs (Mailcow)	-	-	✓	-	-
NPM proxy logs	✓	-	-	-	-

Backup Architecture

Incremental rsync Mirror to a Dedicated Node

host_4_backup · rsync over SSH · WD Red 2 TB NAS SSD

host_4_backup is a dedicated ThinkCentre M910q running as the sole backup destination for the fleet. A 1 TB ext4 partition on a WD Red NAS SSD is mounted at a fixed path and serves as the single landing zone for all inbound backup jobs. A further ~800 GB remains unallocated on the drive, reserved for future growth without repartitioning.

All backups use rsync over SSH in a push model - each source server initiates its own transfer on schedule, writing to a dedicated subdirectory on host_4_backup. This keeps the backup node passive: it never reaches out to source servers and requires no knowledge of their internal layout. Delta-only transfers keep transfer windows short even for large data volumes.

All jobs run hot against host filesystem paths - no containers are stopped or paused during backup. For the mail stack specifically, rsync reads directly from Docker volume paths on the host filesystem, bypassing the mail daemon entirely and avoiding the UID remapping issues that container-level stops can trigger.

Key Decisions

rsync mirror over snapshots - single live copy with delta transfers; no versioning overhead for a homelab context.

Push model - source servers own their backup schedule; host_4_backup remains a passive receiver with no credentials to source systems.

Hot backup against host filesystem paths - avoids container restarts, eliminates UID remapping risk on the mail stack.

Dedicated node over co-hosting - isolates backup I/O from production workloads and keeps failure domains separate.

1 TB active partition with ~800 GB unallocated - expansion without repartitioning or service interruption.

rsync SSH ext4 LVM Cron Bash

Backup Schedule

Four independent backup jobs, each scoped to one source server and running on its own cron schedule. All land on host_4_backup.

Source	What	Cadence
host_3_mail	Mail stack - vmail, database, SOGo data, config	Every 4 hours
host_3_mail	Document management - media, data, database, inbox	Daily 01:45
host_2_cloud	Docker volumes + service configs	Daily 03:00
host_2_cloud	Personal cloud storage - selected folders only	Daily 03:00
host_1_home	Service configs + Monitoring stack volume	Daily 02:00

Exclusions

Several data categories are deliberately excluded from backup. All are either fully rebuildable or carry insufficient value to justify the transfer and storage cost.

Data	Source	Reason
AI model weights (LLM)	host_2_cloud	Fully rebuildable via `ollama pull` - 4.4 GB, not worth transfer cost
AI photo model cache	host_2_cloud	Auto-regenerated on service start - ~800 MB
Screenshots	host_2_cloud	Transient / low value - excluded by design
WhatsApp media exports	host_2_cloud	Large volume, already retained on device - excluded to save space
Mail spam/AV databases	host_3_mail	Auto-regenerated by Rspamd and ClamAV on startup
Mail cache (Redis)	host_3_mail	In-memory cache - rebuildable, no persistent value

Script Overview

One dedicated backup script per source server, running as root via cron. All scripts share a common pattern: Berlin timezone logging, clean trap handling on interruption and email notification on success or failure via the self-hosted mail stack.

Script	Runs on	Covers
`mailcow-backup`	host_3_mail	Mail stack volumes + config
`paperless-backup`	host_3_mail	Document management volumes
`cloud-backup`	host_2_cloud	Docker volumes, service configs, cloud storage folders
`home-backup`	host_1_home	Service configs, observability stack volume

Common Script Pattern

Timezone-aware logging - all timestamps use Europe/Berlin, generated fresh per log entry.

Trap on INT/TERM - logs the interruption cleanly and exits without leaving partial transfers.

Runs as root via cron - no sudo calls inside scripts; avoids credential prompts in unattended context.

Email notification on success and failure - routed through the self-hosted mail stack via msmtp.

Self-HostedHomelab

Self-Hosted
Homelab