Architecture, servers and key learnings Overview
Sections
sunnycloud.win / personal-infrastructure

System Documentation · Build Journal

Self-Hosted
Homelab

Subhajit Debnath

A privacy-first, production-grade homelab running across five servers - built and iterated through February to May 2026. Every architectural decision, automation script and security hardening choice documented here as a living build journal.

4 + 1 Servers
18+ Services
14 TB Storage
0 Cloud Dependencies
Ubuntu 24.04 · DietPi Docker · LVM · WireGuard Cloudflare Tunnel · Zero Trust NUT UPS · DoH Feb – May 2026
Fully live · In Production

A five-server environment built around a single-ingress reverse proxy architecture - all external traffic enters through one node, with each server scoped to a distinct role. Cloudflare sits in front for DNS, TLS and bot protection. WireGuard provides encrypted remote access. NUT monitors the UPS and triggers a coordinated graceful shutdown on power loss.

H-01 host_1_home
Mini PC · J4125 · 8 GB DDR4
Network ingress, DNS filtering and VPN gateway. All external traffic enters here via Nginx Proxy Manager. Hosts the central observability stack.
NPM Pi-hole WireGuard Cloudflare Tunnel Grafana Prometheus Loki
H-02 host_2_cloud
ThinkStation M700 SFF · i5-6400 · 32 GB DDR4
Primary services host. Runs all user-facing applications on Docker with an 8 TB data volume via LVM.
Nextcloud Immich Jellyfin DocBot Feedback API
H-03 host_3_mail
ThinkCentre M900 Tiny · i7-6700T · 32 GB DDR4
Dedicated mail and document server. Full email sovereignty with Mailcow and Paperless-ngx for document management and few other misc. services
Mailcow Paperless-ngx Brevo Relay
H-04 host_4_backup
ThinkCentre M910q Tiny · i5-6400T · 16 GB DDR4
Dedicated backup node for multiple services. Also hosts network monitoring services. Planned for SSO authentication and password management services later.
NetAlertX
H-05 host_5_power
Raspberry Pi 4 · 2 GB · DietPi
UPS monitoring primary (NUT), secondary DNS filtering (Pi-hole) and remote Alloy metrics agent. The UPS is connected here - on power loss it coordinates a graceful shutdown across all servers.
NUT Primary Pi-hole DoH Alloy
01 / Architecture
Single-ingress scales cleanly
Routing all external traffic through one reverse proxy node makes TLS management, port exposure and hardening dramatically simpler than distributing ingress across servers.
02 / Storage
LVM planning pays dividends
Designing volume groups with unallocated buffer upfront means storage can expand without rebuilding services or migrating data mid-operation.
03 / Email
Email sovereignty is achievable
Residential deliverability is the core challenge. A reputable SMTP relay solves it without building IP reputation - the rest of the stack is surprisingly manageable.
04 / VPN
Native beats Docker for critical path
WireGuard running native rather than Docker proved more reliable for boot persistence - Docker adds a dependency layer exactly where you need reliability most.
05 / Operations
Automate operational noise early
Biweekly maintenance automation and email alerting eliminate the mental overhead of remembering to patch and verify - the stack becomes genuinely low-maintenance.
06 / AI
Local AI is more capable than expected
A CPU-only RAG assistant running on consumer hardware is surprisingly usable for document querying - the privacy trade-off versus cloud APIs is well worth it.
07 / Power
UPS integration is non-negotiable
NUT with voltage-based thresholds (battery.charge not available on all hardware) gives a reliable 10-minute window for coordinated graceful shutdown across the full fleet.
08 / Privacy
DoH closes the last DNS gap
Pi-hole filters ads but DNS queries remain visible to the ISP without DoH. dnscrypt-proxy as the Pi-hole upstream closes this gap with minimal complexity.
01
Single-Ingress via Nginx Proxy Manager
host_1_home · NPM · Cloudflare Tunnel · Let's Encrypt

All external traffic enters through host_1_home via Nginx Proxy Manager, which routes requests to backend services across the LAN. Only one node needs to be hardened for external exposure. Cloudflare sits in front with Bot Fight Mode, email obfuscation and proxied DNS. TLS certificates are issued via Let's Encrypt using the Cloudflare DNS challenge - no port 80 exposure required.

A Cloudflare Tunnel replaces DDNS-managed A records for all web-facing subdomains. The server initiates an outbound-only connection to Cloudflare over port 443 - home IP is never exposed in public DNS. Force SSL is disabled on all tunneled NPM proxy hosts; Cloudflare terminates SSL at the edge and the tunnel carries HTTP to NPM internally.

Four browser-based SSH endpoints are exposed via Cloudflare Zero Trust - email OTP authentication, country-restricted to two regions, 6-hour session. No SSH client installation required for emergency access from any device.

Key Decisions
Cloudflare Tunnel over DDNS A records - home IP never appears in public DNS, traffic routes through Cloudflare's trusted IP ranges.
DNS challenge for SSL avoids exposing port 80 on a residential connection for ACME validation.
Force SSL disabled on tunneled NPM hosts - Cloudflare terminates TLS at edge; enabling it causes redirect loops.
Zero Trust browser SSH over Apache Guacamole - simpler, no VM required, already integrated with the tunnel infrastructure.
Nginx Proxy Manager Cloudflare Tunnel Cloudflare Zero Trust Let's Encrypt DNS Challenge
02
Dual Pi-hole with DoH Upstream
host_1_home · host_5_power · Pi-hole · dnscrypt-proxy · Hagezi

Two Pi-hole instances run across the fleet - primary on host_1_home (Docker), secondary on host_5_power (DietPi native). The router's DHCP server assigns both as DNS servers for all LAN clients, with the secondary as pure failover - not active-active.

Both instances use the Hagezi Multi Pro blocklist alongside StevenBlack. DNS queries are forwarded upstream through dnscrypt-proxy 2.1.15 running on each host, wrapping all queries in HTTPS to Cloudflare's DoH endpoint - ISP cannot see resolved domains.

Pi-hole is deployed as opt-in per device rather than a network-wide blanket, protecting household appliances and guests from DNS filtering side-effects. Corporate devices using Zscaler tunnel their DNS through the VPN automatically - no conflict.

Key Decisions
dnscrypt-proxy binary from GitHub (v2.1.15) over apt package - the apt version is too old for custom listen address configuration.
Independent secondary Pi-hole (no Gravity Sync) - simpler to manage; blocklist updates are independent and failures are isolated.
Opt-in per device rather than network-wide - avoids breaking smart home devices and household appliances that misbehave with aggressive DNS filtering.
Pi-hole dnscrypt-proxy Hagezi Pro DoH Cloudflare
03
WireGuard VPN Gateway
host_1_home · Native · wg0

WireGuard runs natively on host_1_home as a system service, providing encrypted remote access to the full homelab subnet. Peers include the Windows laptop, Chromebook and Android phone - each provisioned with individual keys. SSH access to all servers is gated to LAN and WireGuard subnet only.

Key Decisions
Native WireGuard over Docker - better boot persistence and fewer moving parts in a critical path component that must survive Docker daemon restarts.
SSH restricted to LAN and VPN subnet only - no public SSH exposure on any server.
WireGuard wg-quick UFW
01
Portainer CE · Gitea · Multi-host Stack Management
host_2_cloud (server) · host_1_home, host_3_mail, host_4_backup (agents)

Portainer CE 2.39.0 LTS is deployed as the unified container management layer across the homelab, with Gitea 1.26.2 running co-located on host_2_cloud as the local Git backend for all stack definitions. Every Docker service across all four hosts is defined as a Git-backed compose stack, version-controlled in Gitea and deployed through Portainer.

Portainer Server runs on host_2_cloud - selected for its 32 GB RAM headroom. Standard Portainer Agents are deployed on host_1_home, host_3_mail and host_4_backup, each reachable by the server over the LAN on a custom port. host_5_power is excluded entirely. It carries no Docker runtime and Alloy runs there as a native binary. All compose files live under a specific location on each host, with one dedicated Gitea repo per host.

Stack secrets are managed as Portainer environment variables - no .env files remain on disk. Portainer and Gitea both use bind mounts, covered by the existing rsync backup job on sunnybackup. Portainer is accessible via NPM reverse proxy in a subdomain - no public exposure.

Mailcow on sunnymail is deliberately Portainer-unmanaged. Its update script (mailcow-update.sh) is tightly coupled to code location and runs its own docker compose internally - Portainer managing this stack would conflict with the native updater and break the update path. Mailcow retains its own systemd startup handling; all other stacks restart cleanly via Docker restart policy after reboot.

Key Decisions
Portainer Server on host_2_cloud - 32 GB RAM provides sufficient headroom.
Standard agent over edge agent - LAN-only homelab; edge agent complexity adds nothing in this topology.
One Gitea repo per host - Portainer Git stack integration points to repo root; per-host isolation prevents cross-host deployment mistakes.
Portainer env vars over .env files - centralised secret management; secrets survive stack redeployment without file presence on the host.
Startup ownership: systemd owns Portainer itself; all other stacks rely on restart: unless-stopped - clean separation removes the race condition between docker-compose-startup.service and Portainer on reboot.
Mailcow left unmanaged - update toolchain conflict risk too high; mailcow-update.sh owns its own compose lifecycle.
Portainer CE Gitea Docker Compose Git-backed Stacks Portainer Agent NPM Reverse Proxy Ubuntu 24.04
Portainer CE - live environment view across all four managed hosts
Portainer CE - live environment view across all four managed hosts

LAN and WireGuard VPN access only - not publicly exposed. Click any card for details.

Portainer
Container management across all Docker hosts - Git-backed stacks via Gitea.
LAN & VPN only
Gitea
Self-hosted Git backend - source of truth for all homelab compose files.
LAN & VPN only
DocBot
Local RAG AI assistant - query personal documents without cloud APIs.
LAN & VPN only
Pi-hole
Network-wide DNS filtering - ad and tracker blocking with query logging.
LAN & VPN only
Fleet Dashboard
Live server metrics across all nodes - CPU, RAM, containers and uptime.
LAN & VPN only
Monitoring
Grafana dashboards across all nodes - metrics, logs and alert rules.
LAN & VPN only
NetAlertX
Connected device monitoring - detects new and unknown devices joining the network.
LAN & VPN only
FileUpload
Browser based file upload portal for connected devices on Home network - no app required.
LAN & VPN only
01
Self-Hosted Cloud, Photos & Media
host_2_cloud · Nextcloud · Immich · Jellyfin · DocBot · LVM

Nextcloud is the primary cloud storage platform - a full self-hosted alternative to Google Drive, running on Docker with an 8 TB external data volume managed via LVM. The LVM layout was designed from the start to allow incremental expansion without service interruption.

Immich runs alongside Nextcloud for photo management, deliberately separated to leverage its machine-learning photo features and superior mobile experience. External libraries mount Nextcloud photo folders as read-only - single source of truth, with Immich's AI features on top.

Jellyfin runs in host network mode for smooth local media streaming, accessible on the home network or remotely via WireGuard. DocBot is a private RAG pipeline - Ollama, Mistral 7B, FastAPI, ChromaDB - for querying personal documents without sending data to external APIs.

Key Decisions
LVM designed for incremental expansion - storage grows without rebuilding containers or reconfiguring services.
Immich separated from Nextcloud for performance and AI features; external libraries keep Nextcloud as the single source of truth.
DocBot intentionally runs on CPU inference - fully local with no GPU dependency or cloud API calls.
Nextcloud Immich Jellyfin Ollama ChromaDB FastAPI Docker Compose LVM
02
Self-Hosted Mail & Document Stack
host_3_mail · Mailcow · Paperless-ngx

Mailcow manages the complete email stack - DKIM, SPF and DMARC all verified and active. Outbound mail routes through a Brevo SMTP relay to solve residential IP deliverability without months of reputation building. A dedicated 500 GB LVM volume is allocated for mail data with growth headroom reserved.

Paperless-ngx runs on the same server for document management with OCR - all household documents, receipts and correspondence are ingested, tagged and made searchable. Public access via Cloudflare Tunnel at a dedicated subdomain.

Key Decisions
Brevo SMTP relay for deliverability - avoids the months-long process of building residential IP reputation from scratch.
Mailcow's built-in netfilter handles mail port protection - adding Fail2ban on top would conflict with its own port management.
Paperless publicly accessible via tunnel - document access from anywhere without VPN dependency.
Mailcow Brevo Relay DKIM · SPF · DMARC Paperless-ngx LVM Docker
03
Self-Hosted Feedback Widget
host_2_cloud · FastAPI · SQLite · Docker · Cloudflare Tunnel

A fully self-hosted feedback widget deployed across both sites. Visitors can leave a thumbs up/down rating and a comment - all data stays on-premises with zero third-party exposure. Formspree was evaluated and rejected to avoid handing visitor data to an external service.

The API runs as a Docker container on host_2_cloud, exposed via Cloudflare Tunnel through NPM. Submissions are stored in SQLite and trigger an email via the self-hosted Mailcow stack. CORS is locked to both domains, rate-limited to 5 submissions per IP per hour. The mail password is injected at container startup via an entrypoint script reading from an .env file - never baked into the image.

Key Decisions
Self-hosted over Formspree - visitor feedback is data; keeping it on-premise was a deliberate privacy choice.
SQLite over a full database - feedback volume is low; single-container simplicity wins.
Password via .env and entrypoint script - secrets stay out of the image layer, survive rebuilds without re-entering credentials.
FastAPI SQLite Docker msmtp CORS CSP
01
Server Hardening Baseline
All servers · SSH · Fail2ban · UFW · nftables

SSH runs on a non-standard port across all servers with key-based authentication only - keys provisioned from known devices (laptop, Chromebook, phone). Root login is disabled everywhere. Fail2ban is configured with a 3-retry threshold and an aggressive ban duration, with LAN subnet whitelisted to prevent self-lockout. The recidive jail escalates repeat offenders.

UFW rules on the three Ubuntu servers restrict all traffic to expected ports only - SSH, HTTP/HTTPS and service-specific ports scoped to LAN or NPM IP only. host_5_power uses nftables with a drop-all policy; only DNS, SSH and Pi-hole ports are open. IPv6 is disabled across all servers - no global IPv6 address is assigned.

Docker services on host_2_cloud are bound to 127.0.0.1 for internal-only services - DocBot backend and Ollama were unintentionally exposed on 0.0.0.0 and corrected during hardening review.

Key Decisions
Aggressive Fail2ban ban duration - residential IPs rarely change legitimately, so an aggressive ban meaningfully reduces brute-force surface.
Mailcow's built-in netfilter handles mail port protection exclusively - host Fail2ban not installed on host_3_mail to avoid conflicts.
host_5_power uses nftables drop-all - SSH is LAN-only and Fail2ban is unnecessary with this posture.
Fail2ban UFW nftables SSH Key Auth Non-standard Port
02
VAPT - April 2026
7 domains · nmap · sslyze · nikto · header audit

A structured vulnerability and penetration test was run across all 7 public-facing domains using nmap, sslyze, nikto and a full HTTP security header audit. SSL/TLS was scanned against the origin directly (bypassing Cloudflare) via LAN IP. All findings were triaged, fixed or formally accepted with rationale.

Findings & Outcomes
HSTS missing - fixed via Cloudflare dashboard and NPM nginx config. Max-age 12 months, includeSubDomains, preload.
Missing security headers on static sites - X-Frame-Options, CSP, Referrer-Policy, Permissions-Policy added to NPM Advanced config.
Port 80 returning HTTP 200 on mail subdomain - accepted as-is; Mailcow default, modifying risks mail delivery.
Missing headers on photos subdomain - accepted; Immich sets its own internal headers and overrides NPM-level headers.
nmap sslyze nikto HSTS CSP TLS 1.2 / 1.3
Header Static Sites Cloud / Photos Mail
Strict-Transport-Security✓ 12mo⚠ 6mo
X-Frame-Options
X-Content-Type-Options
Content-Security-Policyacceptedaccepted
Referrer-Policyaccepted
Permissions-Policyacceptedaccepted

Next VAPT scheduled: October 2026

01
NUT UPS Fleet Protection
host_5_power · NUT Primary · Green Cell AIO 600VA

A Green Cell AIO 600VA UPS is connected via USB to host_5_power, which acts as the NUT primary server. All four main servers are NUT secondaries, polling the primary every 5 seconds. On power loss, a 10-minute countdown begins - if mains is not restored within that window, a coordinated graceful shutdown fires across all four servers simultaneously via SSH, then host_5_power powers off 30 seconds later.

The UPS does not report battery.charge - only battery.voltage. A voltage-based low threshold is set via override in the NUT driver config. USB cable disconnection has a 1-hour grace period to avoid false shutdowns from accidental cable disconnects. Total time from outage to all servers off is approximately 12 minutes.

This system replaced the previous ping-based Power Sentinel watchdog, which was a dead man's switch dependent on continuous network reachability. NUT is purpose-built for UPS integration and provides more reliable shutdown semantics.

Key Decisions
10-minute countdown before shutdown - provides sufficient window for brief outages and power flickers without triggering a false shutdown.
Voltage-based threshold over charge percentage - this UPS model does not expose battery.charge; voltage is the reliable signal.
USB cable disconnect grace period (1 hour) - prevents false shutdowns from accidental disconnects; fail-safe is disabled by design.
NUT replaced ping-based Power Sentinel - purpose-built for UPS integration, more reliable shutdown semantics, no dependency on network reachability.
NUT nutdrv_qx Green Cell AIO 600VA SSH Shutdown Systemd
02
Automatic Stack Recovery on Boot
All servers · systemd · docker-compose-startup · live-restore

After a graceful shutdown or Docker package upgrade, containers were not restarting automatically despite restart: always policies. Two root causes: Docker package upgrades via the weekly maintenance script restarted the daemon and wiped container state; and Docker's policy was not restoring state reliably on boot across all machines.

Three fixes were applied fleet-wide. A systemd oneshot service fires on every boot, waiting for Docker to fully initialise before bringing up all stacks in order. The ubuntu-update.sh maintenance script was updated to restart all stacks after any Docker package upgrade. Live restore is enabled on all three servers so containers survive daemon restarts without stopping.

host_2_cloud has an additional boot dependency - Docker itself is held until all five LVM data mounts are confirmed ready, preventing container startup races against slow disk initialisation.

Key Decisions
systemd oneshot over restart: always alone - oneshot gives explicit ordering control and logs to a dedicated file; restart policy is a fallback, not the primary mechanism.
10-second pre-start delay - gives Docker daemon time to fully initialise before stacks are started.
Mount gating on host_2_cloud - Docker held until all five LVM volumes are ready, preventing Nextcloud and Immich from starting before their data volumes mount.
Systemd Oneshot Docker Live Restore LVM Mount Gating Docker Compose
03
Biweekly Automated Maintenance Cycle
All servers · cron · msmtp · ordered shutdown

Each server runs an independent biweekly maintenance script on a fixed Sunday schedule. The cycle has four phases: a 24-hour advance notice email on Saturday, a maintenance-start announcement on Sunday, an ordered Docker stack shutdown followed by reboot and a post-reboot health report emailed to the ops log recipients.

A biweekly gate function inside each script anchors to a fixed start date - cron fires every weekend, but the script exits silently if it is not a maintenance week. All scripts send email via msmtp routing through the self-hosted Mailcow stack. The post-reboot report includes a live docker ps output so any failed containers are immediately visible.

Key Decisions
Biweekly cadence - short enough to stay current with security patches, long enough to avoid operational noise.
Ordered Docker shutdown before reboot - prevents unclean container state and data corruption on storage-heavy services.
msmtp on all servers via self-hosted mail stack - every automated script sends real, deliverable email reports with no external dependency.
Bash Automation Cron msmtp Ordered Shutdown Email Alerting
01
Grafana · Prometheus · Loki · Alloy
All servers · host_1_home central · Grafana Alloy agents

A full self-hosted observability stack is deployed across all five servers. Prometheus handles metrics collection, Loki aggregates logs with 30-day retention and Grafana provides unified dashboards and alerting - all running centrally on host_1_home. Grafana Alloy is the unified agent on every node, shipping both metrics and logs to the central stack over the LAN.

Remote Alloy agents run as Docker containers on host_2_cloud, host_3_mail and host_4_backup. On host_5_power (DietPi, no Docker runtime), Alloy runs as a native ARM64 binary with a systemd service. A standalone cAdvisor container runs on all Docker servers for per-container metrics - required because Alloy's embedded cAdvisor does not support cgroup v2 with systemd driver on Ubuntu 24.04.

A lightweight docker-api service runs on each Docker server, exposing a live docker ps feed as JSON. This powers the Fleet Dashboard - a LAN-only overview at a dedicated subdomain showing live CPU, RAM, temperature, uptime and container count per server. Clicking any server card opens a drill-down modal with per-container CPU, RAM, port mappings and uptime. Auto-refreshes every 30 seconds, accessible via WireGuard from anywhere.

Grafana is accessible on the LAN only - no public Cloudflare record - so observability data stays fully off the public internet. SMTP alerting routes through the self-hosted Mailcow stack.

Key Decisions
Central stack on host_1_home - already the ingress node; co-locating observability minimises inter-server hops for scraping.
Alloy over individual node_exporter + Promtail - one agent per node handles both metrics and logs, reducing configuration surface.
Standalone cAdvisor over Alloy's embedded cAdvisor - Alloy's embedded version does not support cgroup v2 + systemd driver, which is the Ubuntu 24.04 default.
Native binary on host_5_power - DietPi has no Docker; ARM64 Alloy binary with systemd is the clean fit, no runtime dependency added.
Fleet Dashboard LAN-only, no public Cloudflare record - observability data (IPs, container names, port mappings) stays fully off the public internet.
Grafana LAN-only, no public Cloudflare record - same reasoning as above.
Loki 30-day retention - balances audit usefulness against disk and auto-purges logs containing IPs and ban events.
Grafana Prometheus Loki Grafana Alloy cAdvisor docker-api Fleet Dashboard Node Exporter SMTP Alerting

Three alert rules are active, all routed to email via the self-hosted Mailcow stack.

High RAM Usage
Fires when node memory utilisation exceeds 85% for more than 5 minutes. Evaluated across all four nodes via the nodename label.
Source: Prometheus · Pending: 5m · Keep firing: 5m
Container Down
Fires when any named container disappears from recent Prometheus scrapes - indicating an unexpected stop or crash.
Source: Prometheus · Pending: 2m · No-data: OK
Fail2ban Ban Detected
Fires when a ban event appears in the Loki log stream. Useful for monitoring active brute-force attempts across all servers.
Source: Loki · Pending: none · Keep firing: 5m
Data Source host_1_home host_2_cloud host_3_mail host_4_backup host_5_power
Node metrics (CPU/RAM/disk)
Docker container metrics (cAdvisor) -
Fail2ban logs -
UFW / firewall logs -
Syslog -
Pi-hole metrics ---
Mail logs (Mailcow) ----
NPM proxy logs ----
01
Incremental rsync Mirror to a Dedicated Node
host_4_backup · rsync over SSH · WD Red 2 TB NAS SSD

host_4_backup is a dedicated ThinkCentre M910q running as the sole backup destination for the fleet. A 1 TB ext4 partition on a WD Red NAS SSD is mounted at a fixed path and serves as the single landing zone for all inbound backup jobs. A further ~800 GB remains unallocated on the drive, reserved for future growth without repartitioning.

All backups use rsync over SSH in a push model - each source server initiates its own transfer on schedule, writing to a dedicated subdirectory on host_4_backup. This keeps the backup node passive: it never reaches out to source servers and requires no knowledge of their internal layout. Delta-only transfers keep transfer windows short even for large data volumes.

All jobs run hot against host filesystem paths - no containers are stopped or paused during backup. For the mail stack specifically, rsync reads directly from Docker volume paths on the host filesystem, bypassing the mail daemon entirely and avoiding the UID remapping issues that container-level stops can trigger.

Key Decisions
rsync mirror over snapshots - single live copy with delta transfers; no versioning overhead for a homelab context.
Push model - source servers own their backup schedule; host_4_backup remains a passive receiver with no credentials to source systems.
Hot backup against host filesystem paths - avoids container restarts, eliminates UID remapping risk on the mail stack.
Dedicated node over co-hosting - isolates backup I/O from production workloads and keeps failure domains separate.
1 TB active partition with ~800 GB unallocated - expansion without repartitioning or service interruption.
rsync SSH ext4 LVM Cron Bash

Four independent backup jobs, each scoped to one source server and running on its own cron schedule. All land on host_4_backup.

Source What Cadence
host_3_mail Mail stack - vmail, database, SOGo data, config Every 4 hours
host_3_mail Document management - media, data, database, inbox Daily 01:45
host_2_cloud Docker volumes + service configs Daily 03:00
host_2_cloud Personal cloud storage - selected folders only Daily 03:00
host_1_home Service configs + Monitoring stack volume Daily 02:00

Several data categories are deliberately excluded from backup. All are either fully rebuildable or carry insufficient value to justify the transfer and storage cost.

Data Source Reason
AI model weights (LLM) host_2_cloud Fully rebuildable via ollama pull - 4.4 GB, not worth transfer cost
AI photo model cache host_2_cloud Auto-regenerated on service start - ~800 MB
Screenshots host_2_cloud Transient / low value - excluded by design
WhatsApp media exports host_2_cloud Large volume, already retained on device - excluded to save space
Mail spam/AV databases host_3_mail Auto-regenerated by Rspamd and ClamAV on startup
Mail cache (Redis) host_3_mail In-memory cache - rebuildable, no persistent value

One dedicated backup script per source server, running as root via cron. All scripts share a common pattern: Berlin timezone logging, clean trap handling on interruption and email notification on success or failure via the self-hosted mail stack.

Script Runs on Covers
mailcow-backup host_3_mail Mail stack volumes + config
paperless-backup host_3_mail Document management volumes
cloud-backup host_2_cloud Docker volumes, service configs, cloud storage folders
home-backup host_1_home Service configs, observability stack volume
Common Script Pattern
Timezone-aware logging - all timestamps use Europe/Berlin, generated fresh per log entry.
Trap on INT/TERM - logs the interruption cleanly and exits without leaving partial transfers.
Runs as root via cron - no sudo calls inside scripts; avoids credential prompts in unattended context.
Email notification on success and failure - routed through the self-hosted mail stack via msmtp.