backup_service/README.md

# Backup Service

This repository contains a simple backup system composed of a server and a
client.  The service was designed to meet the requirements outlined in the
problem statement: it provides a REST API for storing and retrieving
deduplicated backups for multiple users, a web interface for administrators to
monitor clients, configurable retention policies (by age or version count),
support for both local filesystem storage and Amazon S3, and a Dockerised
client that periodically uploads files and runs pre‑backup commands.

## Architecture Overview

The system is split into two components:

1. **Server** (`./server`)
   * Built with [FastAPI](https://fastapi.tiangolo.com/) and SQLAlchemy.
   * Stores user accounts, clients, deduplicated file hashes, backup records
     and client logs in a relational database.
   * Uses SHA‑256 to detect duplicate uploads and stores each unique file only
     once.  A key–value style table (`file_hashes`) maps the content hash to
     the storage path【744670406339295†L270-L339】.
   * Supports local filesystem storage or S3.  When the `S3_BUCKET`
     environment variable is set, files are uploaded to S3; otherwise they are
     saved under `./data`.  S3 lifecycle rules can be used to automatically
     expire old versions of objects【17949889377376†L188-L219】.
   * Implements retention policies on a per‑user basis.  Administrators can
     specify either a maximum number of versions or a maximum age (in days);
     when a new backup is uploaded, older versions outside the policy are
     pruned, preserving only the latest copy【709290716836410†L142-L159】.
   * Provides a minimal HTML dashboard (`/`) displaying clients, their tokens,
     and last ping/backup times.  Forms for creating users and clients are
     included as a starting point.

2. **Client** (`./client`)
   * Written in Python and runs continuously inside a container.
   * Authenticates to the server using credentials provided via environment
     variables, registers itself to obtain a unique client token, then
     periodically sends pings and backups.
   * Recursively scans directories listed in `MONITORED_PATHS`, computes
     SHA‑256 hashes of each file and uploads only those that have changed
     since the previous run.  This reduces network and storage overhead while
     still allowing the server to deduplicate identical content.
   * Supports optional `PRE_COMMANDS` that run before each backup cycle.  This
     feature can be used to generate database dumps (e.g. running
     `pg_dump`) or any other preparatory work.
   * Sends log messages to the server when errors occur to aid debugging.

### Web Interface and Configuration

The server now includes a simple but more complete web interface built with
Jinja2 templates:

* `/clients` – lists all registered clients with last ping/backup times and
  displays the pre‑backup commands configured for each client.  It includes
  forms to create new users and new clients.
* `/clients/{id}` – shows details for a specific client.  Administrators can
  edit the **pre‑backup commands** for that client using a multiline text
  area.  The page also lists recent backups (with download links) and the
  last 50 log messages.

Behind the scenes, pre‑backup commands are stored in the client record in
the database.  Clients call `/api/clients/{token}/config` to retrieve their
commands before each backup cycle.  This allows administrators to update
backup behaviour centrally without redeploying clients.

### Client Web Interface

In addition to the server dashboard, the backup **client** offers its own
minimal web interface.  When the client container starts it opens a small
FastAPI application on port **8080** that presents a form where you can
enter the server URL, username and password for registration, an optional
client name, and the directories to monitor for backups.  Once you submit
the form the client stores the configuration, registers itself with the
server, and begins running backup cycles automatically.  A confirmation
page provides a direct link to the new client's page on the server.

This interface is enabled by default to make configuration easy.  You can
disable it by setting the environment variable `CLIENT_UI_ENABLED=false`.
When the UI is disabled the client does not launch the HTTP server and will
exit if mandatory environment variables (`SERVER_URL`, `USERNAME`,
`PASSWORD` and `MONITORED_PATHS`) are missing.  Use a `.env` file (see
`.env.example`) or set environment variables in your compose file to
configure the client non‑interactively.

## Deployment with Docker Compose

Rather than a single all‑in‑one compose file, the repository now provides
three Compose configurations to support a variety of deployment scenarios:

| Compose file                  | Description                                                                       |
|------------------------------|-----------------------------------------------------------------------------------|
| **`docker-compose.yml`**     | Launches the server, PostgreSQL, MinIO and a client in one stack.  Useful for
|                              | local testing or demonstration where all components run on the same host.         |
| **`docker-compose.server.yml`** | Starts only the server stack (FastAPI app, database and MinIO).  Use this when
|                              | deploying the server to a dedicated host or cloud.                                |
| **`docker-compose.client.yml`** | Runs just the client container.  Use this to deploy the backup agent on a
|                              | separate machine and point it at your existing server.  The client exposes
|                              | port 8080 for its configuration UI.                                               |

To run the all‑in‑one configuration:

```bash
cd backup_service
docker compose up --build
```

To run just the server or just the client, specify the appropriate compose file:

```bash
docker compose -f docker-compose.server.yml up --build
```

and, on a different host or in a separate terminal:

```bash
docker compose -f docker-compose.client.yml up --build
```

The default server configuration uses SQLite for simplicity, storing files in
a volume mounted at `/app/data`.  The provided compose files demonstrate how
to switch to PostgreSQL and MinIO by setting `DATABASE_URL`, `S3_BUCKET`,
`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION` and `S3_ENDPOINT`.
Consult the comments in the compose files and the included `.env.example`
for guidance.  The client container mounts a volume called `client_data` at
`/data`; any files placed in this directory will be backed up.  You can
configure the client by editing environment variables in the compose file,
by supplying a `.env` file, or via the built‑in web interface on port 8080.

## Usage Notes

* Before starting the client you must create a user on the server.  One way
  to do this is to run the server, visit `http://localhost:8000` in a
  browser, authenticate using a token from `/api/login`, and use the “Create
  User” form.  Alternatively, you can call the `/api/register_user` endpoint
  directly using a bearer token from an existing admin.
* On first startup, the server will create an initial admin user if the
  database is empty.  Configure `ADMIN_USERNAME` and `ADMIN_PASSWORD` (or
  `USERNAME`/`PASSWORD` for backward compatibility) in your environment or
  `.env` file to control these credentials.
* Ensure that the retention policies set on each user reflect your backup
  strategy.  For example, specifying `retention_versions=5` keeps the five
  most recent versions of each file; specifying `retention_days=30` retains
  versions from the last 30 days【709290716836410†L142-L159】.
* When using S3, consider configuring lifecycle rules to automatically expire
  old objects or transition them to cheaper storage classes.  S3 lifecycle
  rules can automate the deletion of objects after a specified period to meet
  data retention requirements【17949889377376†L188-L219】.

## Limitations and Future Work

This sample implementation is intended as a starting point.  Some features
that could be improved include:

* A richer web interface for managing users, clients and retention policies.
* More granular client configuration (e.g. inclusion/exclusion patterns,
  incremental or differential backups) and scheduling via cron.
* Support for compressing and encrypting data before upload.
* Streaming large files to the server to avoid loading them entirely into
  memory.
* Integration tests and better error handling.

Despite these limitations, the provided code demonstrates the core
functionality required for a secure, deduplicated backup service and
provides a foundation for further development.