Server
The server is hosted in the Lys data center at EPFL. Due to legal limitation on EPFL's side, it is fed a direct connection to the ISP, and thus is not on the school's network. Accesses to the server room can be requested through the AGEP's IT manager which is responsible for the managment of the rack.
It can be accessed by ssh through the user clic-admin, below is a corresponding .ssh/config entry. Key management is done through the init.yaml playbook on the infra repo.
Host clic
HostName clic.epfl.ch
User clic-admin
Connectivity
The connection is provided by Saitis, along with an IPv4 (62.220.153.185) and an IPv6 subnet (2001:788:f185::/48) exposed onto 2001:788:153:a:185::2/80 (important: the IPv6 subnet is not yet setup on the server). It uses the port 35 of the Saitis switch. The current netplan config, /etc/netplan/static.yaml, is shown below and can be deployed using netplan generate && netplan apply.
network:
version: 2
renderer: networkd
ethernets:
enp7s4:
addresses:
- 62.220.153.185/24
nameservers:
addresses:
- 8.8.8.8
routes:
- to: default
via: 62.220.153.1
metric: 200
Physical access
The server is stored in the Lys datacenter, rack 19, slots 9-11. It can be accessed at anytime, although the security staff sometimes locks the room after 9PM. Obviously, make sure that everything is in place when leaving the room: both front and back panel of the rack are closed, screens is put back on the right of the entrance (there are marks on the ground), etc.
The datacenter provides screens and keyboards, so no need to bring yours. Note that the server's USB ports are quite capricious, you should use the ones directly below the ethernet port to avoid most issues. The server can be power buttons (red: start, black: reset) are within the case and can be reached from the back using a pen.
Infrastructure
Infrastructure configuration is on the Clic Infra repository. Each service is hosted as a Docker stack. This makes it easy to add new services without worrying about dependencies, architecture, etc. Updating the server when the configuration has been modified is done using the [Ansible] tool (https://www.ansible.com/).
Important note: the reverse-proxy (Caddy) is not in a Docker container, but directly installed on the machine. See the repo's README.
Add/Remove/Modify a service
Modifications to the infrastructure must first be made on the infra repository, then applied to the server using Ansible (see below).
Each service is configured via a docker-compose.yaml file (doc), in a role unique to the service. Services generally never communicate with each other, except via their exposed APIs (e.g. the website accesses Directus via its public url, not via a dedicated virtual network). If a service requires a database, it is instantiated in the same stack as the service and is exclusively dedicated to it.
Warning: stacks must be modifiable with a simple
docker stack deploy; don't useconfig, as these are immutable. The stack would have to be removed and then redeployed, which is sub-optimal. Instead, configure services via their environment variables.
If the service needs to be accessible via the Internet, it must expose one or more ports. As of now, all ports of the machine are accessible, but this should be changed for security reasons.
Once everything is setup and pushed on the clic-infra repository, use ansible-pull -K -U https://github.com/clicepfl/clic-infra.git -e @/var/secrets.yaml deploy.yaml -e SERVICES=<new-service>,webhook -e WH_BUILD=false. The webhook needs to be reconfigured in order to allow redeployement requests for that service.
Secret management
Services may require private (or server-specific) information, such as an admin password, or a mailbox password. These values are stored in a secrets.yaml file (located at /var/secrets.yaml), and loaded by Ansible during deployment. They must be referenced in the service role argument list (roles/<service>/meta/argument_specs.yaml) passed as environment variables to the stack deployment task (roles/<service>/tasks/main.yaml). They can then be retrieved from each stack's configuration using the syntax ${MY_SECRET}.
Deployment
Deployment is carried out using the playbook deploy.yaml, which the stacks referenced in the SERVICES variable (using a comma separated list), or all if the variable is empty/undefined. The WH_BUILD variable controls whether the webhook needs to be rebuild, if set to false the hook will only be reconfigured.
Example:
# Deploy nextcloud
ansible-pull -K -U https://github.com/clicepfl/clic-infra.git -e @/var/secrets.yaml deploy.yaml -e SERVICES=nextcloud
# Deploy the website and directus
ansible-pull -K -U https://github.com/clicepfl/clic-infra.git -e @/var/secrets.yaml deploy.yaml -e SERVICES=website,directus
# Build and configure the webhook
ansible-pull -K -U https://github.com/clicepfl/clic-infra.git -e @/var/secrets.yaml deploy.yaml -e SERVICES=webhook
# Configure the webhook without rebuilding it
ansible-pull -K -U https://github.com/clicepfl/clic-infra.git -e @/var/secrets.yaml deploy.yaml -e SERVICES=webhook -e WH_BUILD=false
The -K arguments forces the command to ask for the BECOME password, i.e. the password for sudo. This is required as some services like Caddy or the webhook requires root access to be redeployed.
Backups
Backups are managed by BorgBackup, enabling encrypted, incremental backups. The Borgmatic tool, configured by Playbook deploy.yaml, periodically generates and uploads backups. Its configuration (also in the Playbook) must be updated whenever a service is added/removed/modified.
To manually create a backup, you can run sudo borgmatic (ideally in a detached screen screen sudo borgmatic, since it takes several hours to complete). You can inspect the backup repository using the borg CLI tool.
Installation
In the event of a complete server reinstallation, here are the steps to follow:
- Use the init playbook: create an inventory on your computer (see the docs), then
ansible-playbook -i <inventory> <path/to/init.yaml>. - SSH into the server.
- Set up the
/var/secrets.yamlfile (see the example in the infra repo). Move the database dumps to the directories set in this file. If you do not have access to the Directus tokens, use a stub. - Generate an SSH key for the default user an one for
rootand save both in the deployment keys of the infra repository. - Launch the stacks with Playbook
deploy.yaml:ansible-pull -U https://github.com/clicepfl/clic-infra.git -e @/var/secrets.yaml deploy.yaml. - Check the databases content, using
docker exec -it <container-name> bashthen eitherpsql -U <user> <database>ormysql --user <user> -p. The dump is present in/docker-entrypoint-initdb.d/<smth>.sqland should already be applied; if its not the case it can be piped into the database connection command shown previously:psql -U <user> <database> < /docker-entrypoint-initdb.d/init.sql. - Stop all the services (
docker stack rm <stack list>, usingdocker stack lsto list the stacks). - Copy/Move the file content in the volumes (stored under
/var/lib/docker/volumes/<volume-name>/_data), after having remove the content placed by the first run of each service. Skip database volumes (already initialized). - If needed, regenerate token for Directus and set them in the secrets file.
- Restart all the services using the playbook
deploy.yaml.
Everything should be set now !
Webhook
The webhook is a systemd service running on the server baremetal (i.e. not in a container). It allows GitHub to redeploy some services, for example when a new version is available. It is exposed on the port 4000.
For more detail, see the dedicated README in the infra repository.
Configuring the webhook for a Github repository
- On GitHub, open the settings of the repository, then go to "Webhooks" and "Add webhook".
- Set the following values:
- Url:
https://clic.epfl.ch:4000/website - Content type:
application/json - Secret: Found in
/var/secrets, underservices.webhook.github.secret - SSL Verification: Enabled
- Events: Depends on when you want the redeployment to be done, but usually "Packages" if there is a docker build in the CI (in that case, don't forget to uncheck "Pushes").
- Url:
- Save the webhook
From now on, every time a new version is available, the server will automatically redeploy the service.
Help I lost access to the server :'(
In the unfortunate case where the server is unreachable, it can be physically accessed at the Lys. See the section above.
No comments to display
No comments to display