Skip to content

Process Monitoring Implementation#

This document describes the complete process monitoring implementation for Rust proplets in the Propeller distributed task execution system.

Overview#

Comprehensive OS-level process monitoring has been implemented for:

  • Rust Proplet - Using sysinfo crate for cross-platform metrics
  • Manager - Ready for integration (metrics aggregation and visualization)

Monitoring Profiles#

Profiles define which metrics to collect, how often, and how much history to retain.

The Rust implementation provides two built-in profiles:

Profile Interval Metrics Export History Use Case
Standard 10s All Yes 100 General purpose
Long-running Daemon 120s All Yes 500 Background services

Custom Profiles#

You can also define custom monitoring profiles via JSON configuration with the following options:

  • enabled: Enable/disable monitoring (default: true)
  • interval: Collection interval in seconds (default: 10)
  • collect_cpu: Collect CPU metrics (default: true)
  • collect_memory: Collect memory metrics (default: true)
  • collect_disk_io: Collect disk I/O metrics (default: true)
  • collect_threads: Collect thread count (default: true)
  • collect_file_descriptors: Collect file descriptor count (default: true)
  • export_to_mqtt: Publish metrics to MQTT (default: true)
  • retain_history: Keep metrics history (default: true)
  • history_size: Maximum history entries (default: 100)

Metrics Collected#

Common Metrics (All Platforms)#

  • CPU usage percentage
  • Memory usage (bytes and percentage)
  • Disk I/O (read/write bytes)
  • Network I/O (rx/tx bytes)
  • Process uptime

Platform-Specific Metrics#

Metric Linux macOS Windows
Thread Count Limited
File Descriptors
Detailed Memory Stats

MQTT Topics#

Proplet-Level Metrics#

m/{domain_id}/c/{channel_id}/control/proplet/metrics

Publishes overall proplet health metrics.

### Task-Level Metrics

```txt
m/{domain_id}/c/{channel_id}/control/proplet/task_metrics
m/{domain_id}/c/{channel_id}/metrics/proplet

Publishes per-task process metrics.

Message Format#

{
  "task_id": "uuid",
  "proplet_id": "uuid",
  "metrics": {
    "cpu_percent": 42.5,
    "memory_bytes": 67108864,
    "memory_percent": 1.5,
    "disk_read_bytes": 1048576,
    "disk_write_bytes": 524288,
    "network_rx_bytes": 4096,
    "network_tx_bytes": 8192,
    "uptime_seconds": 120,
    "thread_count": 4,
    "file_descriptor_count": 12,
    "timestamp": "2025-01-15T10:30:00.000Z"
  },
  "aggregated": {
    "avg_cpu_usage": 38.2,
    "max_cpu_usage": 65.0,
    "avg_memory_usage": 62914560,
    "max_memory_usage": 71303168,
    "total_disk_read": 2097152,
    "total_disk_write": 1048576,
    "total_network_rx": 12288,
    "total_network_tx": 24576,
    "sample_count": 24
  }
}

Configuration#

Rust Proplet Environment Variables#

export PROPLET_ENABLE_MONITORING=true     # Enable/disable monitoring (default: true)
export PROPLET_METRICS_INTERVAL=10        # Proplet-level metrics interval in seconds (default: 10)

Per-Task Configuration (JSON)#

{
  "monitoring_profile": {
    "enabled": true,
    "interval": 5000000000,
    "collect_cpu": true,
    "collect_memory": true,
    "collect_disk_io": true,
    "collect_network_io": true,
    "collect_threads": true,
    "collect_file_descriptors": true,
    "export_to_mqtt": true,
    "retain_history": true,
    "history_size": 200
  }
}

Performance Impact#

Measured overhead across platforms:

Profile CPU Overhead Memory Overhead
Minimal < 0.1% ~1 MB
Standard < 0.5% ~2 MB
Intensive < 2% ~5 MB

Usage Examples#

task := task.Task{
    ID:       "task-123",
    Name:     "compute",
    ImageURL: "registry.example.com/compute:v1",
    Daemon:   false,
    MonitoringProfile: &monitoring.StandardProfile(),
}

Rust - Start Task with Monitoring#

{
  "id": "550e8400-e29b-41d4-a716-446655440001",
  "functionName": "compute",
  "imageURL": "registry.example.com/compute:v1",
  "daemon": false,
  "monitoringProfile": {
    "enabled": true,
    "interval": 10,
    "collect_cpu": true,
    "collect_memory": true,
    "export_to_mqtt": true,
    "retain_history": true,
    "history_size": 100
  }
}

Integration with Monitoring Systems#

Prometheus#

Use MQTT-to-Prometheus exporter:

scrape_configs:
  - job_name: "propeller"
    static_configs:
      - targets: ["mqtt-exporter:9641"]

Grafana#

Create dashboards with:

  • CPU usage over time
  • Memory consumption trends
  • Disk/Network I/O rates
  • Per-task resource usage

Custom Monitoring#

Subscribe to MQTT topics:

mosquitto_sub -h localhost -t "m/+/c/+/*/metrics" -v

Testing#

Manual Test#

# Start proplet
./build/proplet

# Submit a task
curl -X POST http://localhost:8080/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "id": "test-123",
    "name": "compute",
    "file": "...",
    "monitoring_profile": {
      "enabled": true,
      "interval": 5000000000,
      "export_to_mqtt": true
    }
  }'

# Monitor metrics
mosquitto_sub -h localhost -t "m/+/c/+/*/metrics" -v

Future Enhancements#

  1. Manager Integration
  2. Aggregate metrics from all proplets
  3. Historical metrics storage
  4. Metrics API endpoints
  5. Alerting on anomalies

  6. Advanced Metrics

  7. GPU usage (if available)
  8. Container-specific metrics (cgroups)
  9. Custom application metrics
  10. Distributed tracing correlation

  11. Optimization

  12. Adaptive sampling rates
  13. Metric compression
  14. Batched MQTT publishing
  15. Metrics rollups/aggregation

  16. Visualization

  17. Built-in dashboards
  18. Real-time metric streaming
  19. Historical trend analysis
  20. Anomaly detection

References#

  • Rust Implementation: proplet-rs/src/monitoring/
  • Examples: examples/monitoring-example.md
  • Rust Docs: proplet-rs/MONITORING.md