# Installing Prometheus + Grafana with Xen-Exporter for XCP-NG monitoring. (+extras)

## Hardware requirements:
- Processor: 1 core (single core processes)
- ~512 ram (recommended 1gb on first install)
- 10gb (system + ~5gb enough for ~30days data at 60sec/tick)

## Software used:
- OS: Ubuntu Server Minimal
- Prometheus → scrapes VM metrics.
- Grafana -> displays Prometheus metrics
- Xen-exporter -> gather metrics from XCP-NG
- Node-exporter (optional) -> gather metrics from any PC
- Alertmanager (optional) -> handles alerts triggered by Prometheus rules.

## First step:
- Boot your VM with Ubuntu Server Minimal and install the system, reboot.

## Installation:

1. Update system and install dependencies
`sudo apt update && sudo apt upgrade -y`
`sudo apt install -y wget curl tar git software-properties-common`

2. Install Prometheus
`cd /tmp`
`wget https://github.com/prometheus/prometheus/releases/download/v3.6.0/prometheus-3.6.0.linux-amd64.tar.gz`
`tar xvf prometheus-3.6.0.linux-amd64.tar.gz`
`sudo mv prometheus-3.6.0.linux-amd64 /usr/local/prometheus`

3. Create a Prometheus user
`sudo useradd -rs /bin/false prometheus`
`sudo mkdir /etc/prometheus`
`sudo mkdir /var/lib/prometheus`
`sudo chown prometheus:prometheus /usr/local/prometheus /etc/prometheus /var/lib/prometheus`

4. Create a basic Prometheus config
`sudo nano /etc/prometheus/prometheus.yml`
```
global:
  scrape_interval: 60s
  scrape_timeout: 10s
  evaluation_interval: 60s

# ----- This is the WebUI:
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
```
5. Set proper ownership
`sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml`

6. Create a systemd service
`sudo nano /etc/systemd/system/prometheus.service`
```
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/usr/local/prometheus/consoles \
  --web.console.libraries=/usr/local/prometheus/console_libraries \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=5GB

Restart=always

[Install]
WantedBy=multi-user.target
```
7. Start Prometheus
`sudo systemctl daemon-reload`
`sudo systemctl enable --now prometheus`
`sudo systemctl status prometheus`

The Prometheus web interface will be in http://PROMETHEUS-VM-IP:9090

8. Install Grafana
`sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"`
`wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -`
`sudo apt update && sudo apt install -y grafana`

9. Check the systemd service for Grafana (default from install)
`/etc/systemd/system/grafana-server.service`
```
[Unit]
Description=Grafana instance
Documentation=http://docs.grafana.org
Wants=network-online.target
After=network-online.target
After=postgresql.service mariadb.service mysql.service influxdb.service

[Service]
EnvironmentFile=/etc/default/grafana-server
User=grafana
Group=grafana
Type=simple
Restart=on-failure
WorkingDirectory=/usr/share/grafana
RuntimeDirectory=grafana
RuntimeDirectoryMode=0750
ExecStart=/usr/share/grafana/bin/grafana server                                     \
                            --config=${CONF_FILE}                                   \
                            --pidfile=${PID_FILE_DIR}/grafana-server.pid            \
                            --packaging=deb                                         \
                            cfg:default.paths.logs=${LOG_DIR}                       \
                            cfg:default.paths.data=${DATA_DIR}                      \
                            cfg:default.paths.plugins=${PLUGINS_DIR}                \
                            cfg:default.paths.provisioning=${PROVISIONING_CFG_DIR}

LimitNOFILE=10000
TimeoutStopSec=20
CapabilityBoundingSet=
DeviceAllow=
LockPersonality=true
MemoryDenyWriteExecute=false
NoNewPrivileges=true
PrivateDevices=true
PrivateTmp=true
ProtectClock=true
ProtectControlGroups=true
ProtectHome=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectProc=invisible
ProtectSystem=full
RemoveIPC=true
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
SystemCallArchitectures=native
UMask=0027

[Install]
WantedBy=multi-user.target
```

10. Enable and start Grafana
`sudo systemctl daemon-reload`
`sudo systemctl enable --now grafana-server`
`sudo systemctl status grafana-server`

The Grafana web interface will be in http://GRAFANA-VM-IP:3000 (u:`admin` p:`admin`)

# Installing Xen-exporter without docker
1. Clone the repo
`cd ~/`
`git clone https://github.com/MikeDombo/xen-exporter.git && cd xen-exporter`
2. Install python required packages
`sudo apt install python3-pip python3-venv -y`
3. Install requirements in the virtual env
`python3 -m venv venv`
`source venv/bin/activate`
`pip install -r requirements.txt`

You can test it manually before wrapping in a system service:

Replace `<YOUR-USER>`, `<YOUR-GROUP>`, `YOUR-XCP-IP`, `YOUR-XCP-PASSWORD`
```
XEN_HOST="YOUR-XEN-IP" \
XEN_USER="root" \
XEN_PASSWORD="YOUR-XCP-PASSWORD" \
XEN_SSL_VERIFY="false" \
python3 xen-exporter.py
```
4. Create a systemd service
`sudo nano /etc/systemd/system/xen-exporter.service`

```
[Unit]
Description=Xen Exporter for Prometheus
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/<YOUR-USER>/xen-exporter
ExecStart=/home/<YOUR-USER>/xen-exporter/venv/bin/python3 /home/<YOUR-USER>/xen-exporter/xen-exporter.py
Environment="XEN_HOST=YOUR-XCP-IP"  # CHANGE TO YOURS!
Environment="XEN_USER=root"
Environment="XEN_PASSWORD=YOUR-XCP-PASSWORD"  # CHANGE TO YOURS!
Environment="XEN_SSL_VERIFY=false"
Restart=always
User=<YOUR-USER>  # CHANGE TO YOURS!
Group=<YOUR-GROUP>  # CHANGE TO YOURS!

[Install]
WantedBy=multi-user.target
```
5. Enable and start
`sudo systemctl daemon-reload`
`sudo systemctl enable --now xen-exporter`

6. Add it to Prometheus
`sudo nano /etc/prometheus/prometheus.yml`
```
  - job_name: 'xenserver'
    static_configs:
    - targets: ['dashboard-vm-ip:9100']
```
7. Restart Prometheus
`sudo systemctl restart prometheus`

## Configure Grafana to see Xen-exporter

1. Open Grafana: http://GRAFANA-VM-IP:3000
2. Login: admin/admin → change password
3. Add data source:
```
Type: Prometheus
URL: http://PROMETHEUS_VM_IP:9090
```
4. Add the Dashboards
  - Click `+` -> `Import`
  - Enter dashboard **ID 16588** (Xen Prometheus)
  - Select **Prometheus** data source -> `Import`

# Installing Node-exporter (optional)
### **IMPORTANT** Do these steps **on each VM** you want data exported.
*Note: If you want metrics from the VM running prometheus itself too, use a diferent port (eg. 9111) when doing these steps on this specific VM so it doesn't clash with xen-exporter port 9100. Or ignore this note if you wont use xen-exporter at all.*
1. Create a Node Exporter user on each VM
`sudo useradd -rs /bin/false node_exporter`

2. Download Node Exporter on each VM
`cd /tmp`
`wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz`
`tar xvf node_exporter-1.9.1.linux-amd64.tar.gz`
`sudo mv node_exporter-1.9.1.linux-amd64 /usr/local/node_exporter`
`sudo chown -R node_exporter:node_exporter /usr/local/node_exporter`

3. Create systemd service for Node Exporter on each VM
`sudo nano /etc/systemd/system/node_exporter.service`
```
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
```
4. Enable and start the service on each VM
`sudo systemctl daemon-reload`
`sudo systemctl enable --now node_exporter`
`sudo systemctl status node_exporter`

Test with: `curl http://localhost:9100/metrics` on each VM.

### *Back to the Prometheus VM!*
5. Configure Prometheus to scrape all VMs
`sudo nano /etc/prometheus/prometheus.yml`

Replace `VM_IP#` with the VM IP or DDNS

```
global:
  scrape_interval: 60s
  scrape_timeout: 10s
  evaluation_interval: 60s

# ------- WebUI:
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

# ------- XCP-NG:  (if using xen-exporter)
  - job_name: 'xenserver'
    static_configs:
    - targets: ['dashboard-vm-ip:9100']

# ------- VMS:
  - job_name: 'node_exporters'
    static_configs:
      - targets:
          - 'VM1_IP:9100'
          - 'VM2_IP:9100'
          - 'localhost:9111'  # eg. for node-exporter in Prometheus VM
```
6. Reload Prometheus
`sudo systemctl restart prometheus`

You can now verify the target in: http://PROMETHEUS-VM-IP:9090/targets


## Configure Grafana to see Node-exporter

1. Open Grafana: http://GRAFANA-VM-IP:3000
2. Login: admin/admin → change password
3. Add data source:
```
Type: Prometheus
URL: http://PROMETHEUS_VM_IP:9090
```
4. Add the Dashboards
  - Click `+` -> `Import`
  - Enter dashboard **ID 1860** (Node Exporter Full)
  - Select **Prometheus** data source -> `Import`

# Configuring Alerts with Alertmanager (WhatsApp Webhook)

1. Download and install Alertmanager
`cd /tmp`
`wget https://github.com/prometheus/alertmanager/releases/download/v0.28.1/alertmanager-0.28.1.linux-amd64.tar.gz`
`tar xvf alertmanager-0.28.1.linux-amd64.tar.gz`
`sudo mkdir -p /usr/local/alertmanager`
`sudo mv alertmanager-0.28.1.linux-amd64/{alertmanager,amtool} /usr/local/alertmanager/`
`sudo mkdir -p /etc/alertmanager`
`sudo cp alertmanager-0.28.1.linux-amd64/alertmanager.yml /etc/alertmanager/alertmanager.yml`

2. Configure alertmanager.yml
`sudo nano /etc/alertmanager/alertmanager.yml`

**This config is based on a webhook.**
There are other ways Prometheus can alert like email etc. not covered here.
```
route:
  group_by: ['alertname']
  group_wait: 15s
  group_interval: 1m
  repeat_interval: 5m
  receiver: 'web.hook'
receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:9095/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
```

3. Create a systemd service for alertmanager
`sudo nano /etc/systemd/system/alertmanager.service`

Replace `YOUR-USER` `YOUR-GROUP`
```
[Unit]
Description=Prometheus Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=YOUR-USER  # CHANGE TO YOURS!
Group=YOUR-GROUP  # CHANGE TO YOURS!
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager
Restart=always

[Install]
WantedBy=multi-user.target
```
4. Fix permissions
`sudo mkdir -p /var/lib/alertmanager`
`sudo chown -R prometheus:prometheus /var/lib/alertmanager /etc/alertmanager`

5. Start the alertmanager service
`sudo systemctl daemon-reload`
`sudo systemctl enable --now alertmanager`
`sudo systemctl status alertmanager`

Alertmanager should now be listening on port 9093. You can verify: `ss -tlnp | grep 9093`

6. Callmebot webhook bridge
This is a tiny Python HTTP server that listens on 127.0.0.1:9095 and invokes `callmebot "message"` command for each alert received.
`sudo apt install python3-flask -y`
`sudo nano /usr/local/bin/whatsapp-webhook.py`
```
#!/usr/bin/env python3
from flask import Flask, request
import subprocess

app = Flask(__name__)

@app.route('/', methods=['POST'])
def webhook():
    data = request.json
    if not data:
        return "no data", 400
    alerts = data.get("alerts", [])
    for alert in alerts:
        summary = alert.get("annotations", {}).get("summary", "No summary")
        subprocess.Popen(["/usr/local/bin/callmebot", summary])
    return "ok", 200

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=9095)
```
Make it executable.
`chmod +x /usr/local/bin/whatsapp-webhook.py`

7. Create a systemd service for the webhook
`sudo nano /etc/systemd/system/whatsapp-webhook.service`
Replace `YOUR-USER`
```
[Unit]
Description=WhatsApp Webhook for Alertmanager
After=network.target

[Service]
User=YOUR-USER  # CHANGE TO YOURS!
ExecStart=/usr/bin/python3 /usr/local/bin/whatsapp-webhook.py
Restart=always

[Install]
WantedBy=multi-user.target
```

8. Enable and start the service
`sudo systemctl daemon-reload`
`sudo systemctl enable --now whatsapp-webhook`
`sudo systemctl status whatsapp-webhook`

Verify it listens only on localhost: `ss -tlnp | grep 9095` (should show 0.0.0.0:9095)

9. Wire Prometheus -> Alertmanager
`sudo nano /etc/prometheus/prometheus.yml`
```
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']
```
10. Restart Prometheus
`sudo systemctl restart prometheus.service`

## Creating Alert rules
`sudo nano /etc/prometheus/alert.rules.yml`
```
groups:
  - name: basic-alerts
    rules:
      - alert: NodeDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} has been unreachable for 1 minutes."
```

1. Tell Prometheus about the new rules
`sudo nano /etc/prometheus/prometheus.yml`
```
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

rule_files:
  - "/etc/prometheus/alert.rules.yml"
```

2. Restart Prometheus
`sudo systemctl restart prometheus.service`


# Final working config files from **my** setup
### These are the files on **my VM**, with the *user/group* `dash`
Use for reference in case of errors but remember to change user/groups and credentials.
- `/etc/prometheus/prometheus.yml`
```
global:
  scrape_interval: 60s
  scrape_timeout: 10s
  evaluation_interval: 60s

# ------ Alertmanager:
# Config file-> /etc/alertmanager/alertmanager.yml
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - "localhost:9093"

rule_files:
  - "/etc/prometheus/alert.rules.yml"

# ------- WebUI:
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

# ------- VMS:
  - job_name: 'node_exporters'
    static_configs:
      - targets: ['192.168.15.221:9100']
        labels:
          hostname: 'xo'
      - targets: ['192.168.15.222:9100']
        labels:
          hostname: 'pihole'
      - targets: ['192.168.15.223:9100']
        labels:
          hostname: 'copyparty'
      - targets: ['192.168.15.224:9100']
        labels:
          hostname: 'media'
      - targets: ['192.168.15.225:9100']
        labels:
          hostname: 'vaultwarden'
      - targets: ['192.168.15.226:9100']
        labels:
          hostname: 'minecraft'

# -------- XCP-NG Host: (xen-exporter on localhost)
  - job_name: 'xen_exporter'
    static_configs:
      - targets: ['192.168.15.227:9100']
        labels:
          hostname: 'XCP-NG'

# -------- Oracle VPS tunnel: (Forward ssh tunnel on localhost)
  - job_name: 'vps_node_exporter'
    static_configs:
      - targets: ['localhost:9101']
        labels:
          hostname: 'Oracle VPS'

# -------- Prometheus VM itself
  - job_name: 'prometheus_vm'
    static_configs:
      - targets: ['localhost:9102']
        labels:
          hostname: 'Dashboards'
```

- `/etc/alertmanager/alertmanager.yml`
```
route:
  group_by: ['alertname']
  group_wait: 15s
  group_interval: 1m
  repeat_interval: 5m
  receiver: 'web.hook'
receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:9095/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
```
- `/etc/prometheus/alert.rules.yml`
```
groups:
  - name: critical-alerts
    rules:
      # Node unreachable
      - alert: NodeDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.hostname }} down"
          description: "{{ $labels.hostname }} has been unreachable for 1 minute."

      # CPU over 90% for 10 minutes
      - alert: CPUPeak
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "CPU from instance {{ $labels.hostname }} peaking"
          description: "{{ $labels.hostname }} has been over 90% CPU for 10 minutes."

      # Disk low space (less than 10% free)
      - alert: DiskLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
#        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.hostname }}"
          description: "Less than 10% disk space available on {{ $labels.hostname }}."

      # RAM usage over 95%
      - alert: RAMHigh
        expr: (1 - ((node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes)) * 100 > 95
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "RAM usage high on {{ $labels.hostname }}"
          description: "Memory usage is over 95% on {{ $labels.hostname }}."

  - name: warning-alerts
    rules:
      # CPU over 70% for 5 minutes
      - alert: CPUWarning
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 70
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CPU usage high on {{ $labels.hostname }}"
          description: "CPU usage is above 70% on {{ $labels.hostname }}."

      # Disk less than 20% free
      - alert: DiskWarning
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 20
#        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Disk usage warning on {{ $labels.hostname }}"
          description: "Disk usage is above 80% on {{ $labels.hostname }}."

      # RAM over 85%
      - alert: RAMWarning
        expr: (1 - ((node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes)) * 100 > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "RAM usage warning on {{ $labels.hostname }}"
          description: "Memory usage is over 85% on {{ $labels.hostname }}."

      # Network: high incoming traffic (>80% of interface capacity)
      - alert: NetworkInHigh
        expr: rate(node_network_receive_bytes_total[5m]) * 8 > (0.8 * 1e9)  # adjust 1e9 = 1Gbps link
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High network IN traffic on {{ $labels.hostname }}"
          description: "Incoming network traffic exceeded 80% of interface capacity."

      # Network: high outgoing traffic (>80% of interface capacity)
      - alert: NetworkOutHigh
        expr: rate(node_network_transmit_bytes_total[5m]) * 8 > (0.8 * 1e9)  # adjust 1e9 = 1Gbps link
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High network OUT traffic on {{ $labels.hostname }}"
          description: "Outgoing network traffic exceeded 80% of interface capacity."

      # Load average over 5 (1-min load)
      - alert: LoadHigh
        expr: node_load1 > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High load on {{ $labels.hostname }}"
          description: "1-minute load average is over 5."

      # Load average over 3 for 5-min average
      - alert: LoadHigh5
        expr: node_load5 > 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High load on {{ $labels.hostname }}"
          description: "5-minute load average is over 3."

      # Load average over 2 for 15-min average
      - alert: LoadHigh15
        expr: node_load15 > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High load on {{ $labels.hostname }}"
          description: "15-minute load average is over 2."
```
- `/etc/systemd/system/alertmanager.service`
```
[Unit]
Description=Prometheus Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=dash
Group=dash
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager
Restart=always

[Install]
WantedBy=multi-user.target
```
- `/etc/systemd/system/prometheus.service`
```
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/usr/local/prometheus/consoles \
  --web.console.libraries=/usr/local/prometheus/console_libraries \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=5GB

Restart=always

[Install]
WantedBy=multi-user.target
```
- `/etc/systemd/system/whatsapp-webhook.service`
```
[Unit]
Description=WhatsApp Webhook for Alertmanager
After=network.target

[Service]
User=dash
ExecStart=/usr/bin/python3 /usr/local/bin/whatsapp-webhook.py
Restart=always

[Install]
WantedBy=multi-user.target
```
- `/etc/systemd/system/xen-exporter.service` (Change to your credentials!)
```
[Unit]
Description=Xen Exporter for Prometheus
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/dash/xen-exporter
ExecStart=/home/dash/xen-exporter/venv/bin/python3 /home/dash/xen-exporter/xen-exporter.py
Environment="XEN_HOST=myXCPngIP"  # CHANGE TO YOURS!
Environment="XEN_USER=root"
Environment="XEN_PASSWORD=myAwesomePassword"  # CHANGE TO YOURS!
Environment="XEN_SSL_VERIFY=false"
Restart=always
User=dash
Group=dash

[Install]
WantedBy=multi-user.target
```
- `/usr/local/bin/whatsapp-webhook.py`
```
#!/usr/bin/env python3
from flask import Flask, request
import subprocess

app = Flask(__name__)

@app.route('/', methods=['POST'])
def webhook():
    data = request.json
    if not data:
        return "no data", 400
    alerts = data.get("alerts", [])
    for alert in alerts:
        summary = alert.get("annotations", {}).get("summary", "No summary")
        subprocess.Popen(["/usr/local/bin/callmebot", summary])
    return "ok", 200

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=9095)
```
- `/usr/local/bin/callmebot` (Change to your credentials!)
```
#!/bin/bash
#
# callmebot - Send WhatsApp messages via CallMeBot from the terminal
#
set +H

# Your phone number and API key
PHONE="5521YOURPHONENUMBER"  # CHANGE TO YOURS!
APIKEY="myAPIkey"  # CHANGE TO YOURS!

# Check if a message was passed
if [ $# -eq 0 ]; then
    echo "Usage: callmebot \"Your message here\""
    exit 1
fi

# Join all arguments into a single string
MESSAGE="$*"

# URL encode only reserved characters, leave UTF-8 (like emojis) alone
rawurlencode() {
  local string="$1"
  local encoded=""
  local i c

  for (( i=0; i<${#string}; i++ )); do
    c="${string:$i:1}"
    case "$c" in
      [a-zA-Z0-9.~_-]) encoded+="$c" ;;
      ' ') encoded+='+' ;;
      *) 
        # Encode only ASCII < 128
        if [[ "$c" =~ [[:cntrl:]] || $(LC_CTYPE=C printf '%d' "'$c") -lt 128 ]]; then
            printf -v o '%%%02X' "'$c"
            encoded+="$o"
        else
            encoded+="$c"
        fi
        ;;
    esac
  done

  echo "$encoded"
}

# Encode the message safely
ENCODED_MESSAGE=$(rawurlencode "$MESSAGE")

# Send the message
curl -s "https://api.callmebot.com/whatsapp.php?phone=${PHONE}&text=${ENCODED_MESSAGE}&apikey=${APIKEY}" \
    > /dev/null

# Optional confirmation
echo "Message sent: $MESSAGE"
```

## Setup Callmebot
1. Add the phone number **+34 684 783 708** into your Phone Contacts. (Name it it as you wish)
2. Send this message **"I allow callmebot to send me messages"** to the new Contact created (using WhatsApp of course)
3. Wait until you receive the message "API Activated for your phone number. Your APIKEY is 123123" from the bot.
    *Note: If you don't receive the ApiKey in 2 minutes, please try again after 24hs.*
4. The WhatsApp message from the bot will contain the apikey needed to send messages using the API.

# Tips and Tricks

### Remove all data to start from scratch:
`sudo systemctl stop prometheus`
`sudo rm -rf /var/lib/prometheus/*`
`sudo systemctl restart prometheus`
- This will remove all gathered data and Grafana will have no data to display. This is useful after setting everything up since renaming hosts or changing config files may duplicate data on dashboards.

### Change data retention period/size:
- Edit the file `/etc/systemd/system/multi-user.target.wants/prometheus.service`
- Change these lines to what you wish for
`  --storage.tsdb.retention.time=30d \` Keeps at most 30 days of data.
`  --storage.tsdb.retention.size=5GB` Keeps at most 5gb of data.
- Restart the service: `sudo systemctl daemon-reload && sudo systemctl restart prometheus`

### Change `Nodename` in Grafana (defaults to VM's /etc/hostname):
Enter the dashboard in http://GRAFANA-VM-IP:3000 and click on `edit`->`settings`->`variables`.
Click on the `Nodename` variable and it can be changed to *uuid*, *hostname* etc.

### Force hostnames on Prometheus metrics:
You can add specific hostnames to machines in prometheus.yml
From:
```
# ------- VMS:
  - job_name: 'node_exporters'
    static_configs:
      - targets: 
        - '192.168.15.221:9100'
        - '192.168.15.222:9100'
```
To:
```
# ------- VMS:
  - job_name: 'node_exporters'
    static_configs:
      - targets: ['192.168.15.221:9100']
        labels:
          hostname: 'xo'
      - targets: ['192.168.15.222:9100']
        labels:
          hostname: 'pihole'
```

### Alerts in general:
The `alert_rules.yml` and `alertmanager.yml` are part of Prometheus default alert system and they have no direct relation to **my** whatsapp webhook. **If you want to use a different alert type keep these** and delete (or dont install) the whatsapp* related files/services. *So keep the alert services and configs.*

### Prometheus/Grafana/Alerts syncronization:
<p>With this setup, scrapes happen every 1 minute, so Grafana also has a 1 min refresh and alerts wait for at least 1 minute. If you change scrape time on <b><i>prometheus.yml</b></i> keep in mind you may need to change timers all around to match so services are not being wasted.</p>
<b>Also keep in mind that reducing scrape time GREATLY increases hard drive usage, so plan accordingly.</b>

### Grafana's own alerts *vs* Prometheus Alertmanager:
<p> Grafana has it's own alert system. They can be used on very specific cases I guess, I'm not using them.
You can play with Grafana's alerts within the UI itself by using the `...` on each metric dashboard.</p>
<p> Prometheus alerts on the other hand will fire for all vm's (unless excluded) so it's not grafana/dashboard dependant and usually the more "correct" and robust way to do this.</p>