iotDashboard/services/db_write/README.md

# Database Writer Service

A robust, production-ready service that reads sensor data from Redis streams and writes it to PostgreSQL/TimescaleDB. Part of the IoT Dashboard project.

## Features

- ✅ **Reliable consumption** from Redis streams using consumer groups
- ✅ **Batch processing** for high throughput
- ✅ **At-least-once delivery** with message acknowledgments
- ✅ **Dead letter queue** for failed messages
- ✅ **Connection pooling** for database efficiency
- ✅ **Graceful shutdown** handling
- ✅ **Flexible schema** that adapts to changes
- ✅ **Structured logging** with JSON output
- ✅ **Health checks** for monitoring
- ✅ **TimescaleDB support** for time-series optimization

## Architecture

```
Redis Streams → Consumer Group → Transform → Database → Acknowledge
                                     ↓
                              Failed messages
                                     ↓
                            Dead Letter Queue
```

### Components

- **`main.py`**: Service orchestration and processing loop
- **`redis_reader.py`**: Redis stream consumer with fault tolerance
- **`db_writer.py`**: Database operations with connection pooling
- **`schema.py`**: Data transformation and validation
- **`config.py`**: Configuration management

## Quick Start

### Prerequisites

- Python 3.13+
- [uv](https://github.com/astral-sh/uv) package manager
- Redis server with streams
- PostgreSQL or TimescaleDB

### Installation

1. **Navigate to the service directory**:
   ```bash
   cd services/db_write
   ```

2. **Copy and configure environment variables**:
   ```bash
   cp .env.example .env
   # Edit .env with your DATABASE_URL and other settings
   ```

3. **Install dependencies**:
   ```bash
   uv sync
   ```

4. **Setup database schema** (IMPORTANT - do this before running):
   ```bash
   # Review the schema in models.py first
   cat models.py

   # Create initial migration
   chmod +x migrate.sh
   ./migrate.sh create "initial schema"

   # Review the generated migration
   ls -lt alembic/versions/

   # Apply migrations
   ./migrate.sh upgrade
   ```

5. **Run the service**:
   ```bash
   uv run main.py
   ```

   Or use the standalone script:
   ```bash
   chmod +x run-standalone.sh
   ./run-standalone.sh
   ```

### ⚠️ Important: Schema Management

This service uses **Alembic** for database migrations. The service will NOT create tables automatically.

- Schema is defined in `models.py`
- Migrations are managed with `./migrate.sh` or `alembic` commands
- See `SCHEMA_MANAGEMENT.md` for detailed guide

## Schema Management

This service uses **SQLAlchemy** for models and **Alembic** for migrations.

### Key Files

- **`models.py`**: Define your database schema here (SQLAlchemy models)
- **`alembic/`**: Migration scripts directory
- **`migrate.sh`**: Helper script for common migration tasks
- **`SCHEMA_MANAGEMENT.md`**: Comprehensive migration guide

### Quick Migration Commands

```bash
# Create a new migration after editing models.py
./migrate.sh create "add new column"

# Apply pending migrations
./migrate.sh upgrade

# Check migration status
./migrate.sh check

# View migration history
./migrate.sh history

# Rollback last migration
./migrate.sh downgrade 1
```

**See `SCHEMA_MANAGEMENT.md` for detailed documentation.**

## Configuration

All configuration is done via environment variables. See `.env.example` for all available options.

### Required Settings

```bash
# Redis connection
REDIS_HOST=localhost
REDIS_PORT=6379

# Database connection
DATABASE_URL=postgresql://user:password@localhost:5432/iot_dashboard
```

### Optional Settings

```bash
# Consumer configuration
CONSUMER_GROUP_NAME=db_writer      # Consumer group name
CONSUMER_NAME=worker-01            # Unique consumer name
BATCH_SIZE=100                     # Messages per batch
BATCH_TIMEOUT_SEC=5                # Read timeout
PROCESSING_INTERVAL_SEC=1          # Delay between batches

# Stream configuration
STREAM_PATTERN=mqtt_stream:*       # Stream name pattern
DEAD_LETTER_STREAM=mqtt_stream:failed

# Database
TABLE_NAME=sensor_readings         # Target table name
ENABLE_TIMESCALE=false             # Use TimescaleDB features

# Logging
LOG_LEVEL=INFO                     # DEBUG, INFO, WARNING, ERROR
LOG_FORMAT=json                    # json or console
```

## Data Flow

### Input (Redis Streams)

The service reads from Redis streams with the format:
```
mqtt_stream:{device_id}:{sensor_type}
```

Each message contains:
```
{
  "value": "23.5",
  "timestamp": "2023-10-18T14:30:00Z",
  "metadata": "{...}" (optional)
}
```

### Output (Database)

Data is written to the `sensor_readings` table:

```sql
CREATE TABLE sensor_readings (
    id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    device_id VARCHAR(100) NOT NULL,
    sensor_type VARCHAR(100) NOT NULL,
    value DOUBLE PRECISION NOT NULL,
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
```

**Note**: The table is automatically created if it doesn't exist.

## Running with Docker

### Build the image

```bash
docker build -t db-writer:latest .
```

### Run the container

```bash
docker run -d \
  --name db-writer \
  -e REDIS_HOST=redis \
  -e DATABASE_URL=postgresql://user:pass@postgres:5432/iot \
  db-writer:latest
```

## Consumer Groups

The service uses Redis consumer groups for reliable, distributed processing:

- **Multiple instances**: Run multiple workers for load balancing
- **Fault tolerance**: Messages are not lost if a consumer crashes
- **Acknowledgments**: Messages are only removed after successful processing
- **Pending messages**: Unacknowledged messages can be reclaimed

### Running Multiple Workers

```bash
# Terminal 1
CONSUMER_NAME=worker-01 uv run main.py

# Terminal 2
CONSUMER_NAME=worker-02 uv run main.py
```

All workers in the same consumer group will share the load.

## Error Handling

### Dead Letter Queue

Failed messages are sent to the dead letter stream (`mqtt_stream:failed`) with error information:

```
{
  "original_stream": "mqtt_stream:esp32:temperature",
  "original_id": "1634567890123-0",
  "device_id": "esp32",
  "sensor_type": "temperature",
  "value": "23.5",
  "error": "Database connection failed",
  "failed_at": "1634567890.123"
}
```

### Retry Strategy

- **Transient errors**: Automatic retry with backoff
- **Data errors**: Immediate send to DLQ
- **Connection errors**: Reconnection attempts

## Monitoring

### Health Checks

Check service health programmatically:

```python
from main import DatabaseWriterService

service = DatabaseWriterService()
health = service.health_check()
print(health)
# {
#   'running': True,
#   'redis': True,
#   'database': True,
#   'stats': {...}
# }
```

### Logs

The service outputs structured logs:

```json
{
  "event": "Processed batch",
  "rows_written": 100,
  "messages_acknowledged": 100,
  "timestamp": "2023-10-18T14:30:00Z",
  "level": "info"
}
```

### Statistics

Runtime statistics are tracked:
- `messages_read`: Total messages consumed
- `messages_written`: Total rows inserted
- `messages_failed`: Failed messages sent to DLQ
- `batches_processed`: Number of successful batches
- `errors`: Total errors encountered

## Development

### Project Structure

```
db_write/
├── config.py          # Configuration management
├── db_writer.py       # Database operations
├── redis_reader.py    # Redis stream consumer
├── schema.py          # Data models and transformation
├── main.py            # Service entry point
├── pyproject.toml     # Dependencies
├── .env.example       # Configuration template
└── README.md          # This file
```

### Adding Dependencies

```bash
uv add package-name
```

### Running Tests

```bash
uv run pytest
```

## Troubleshooting

### Service won't start

1. **Check configuration**: Verify all required environment variables are set
2. **Test connections**: Ensure Redis and PostgreSQL are accessible
3. **Check logs**: Look for specific error messages

### No messages being processed

1. **Check streams exist**: `redis-cli KEYS "mqtt_stream:*"`
2. **Verify consumer group**: The service creates it automatically, but check Redis logs
3. **Check stream pattern**: Ensure `STREAM_PATTERN` matches your stream names

### Messages going to dead letter queue

1. **Check DLQ**: `redis-cli XRANGE mqtt_stream:failed - + COUNT 10`
2. **Review error messages**: Each DLQ entry contains the error reason
3. **Validate data format**: Ensure messages match expected schema

### High memory usage

1. **Reduce batch size**: Lower `BATCH_SIZE` in configuration
2. **Check connection pool**: May need to adjust pool size
3. **Monitor pending messages**: Use `XPENDING` to check backlog

## Performance Tuning

### Throughput Optimization

- **Increase batch size**: Process more messages per batch
- **Multiple workers**: Run multiple consumer instances
- **Connection pooling**: Adjust pool size based on load
- **Processing interval**: Reduce delay between batches

### Latency Optimization

- **Decrease batch size**: Process smaller batches more frequently
- **Reduce timeout**: Lower `BATCH_TIMEOUT_SEC`
- **Single worker**: Avoid consumer group coordination overhead

## Production Deployment

### Recommended Settings

```bash
BATCH_SIZE=500
PROCESSING_INTERVAL_SEC=0.1
LOG_LEVEL=INFO
LOG_FORMAT=json
ENABLE_TIMESCALE=true
```

### Monitoring

- Monitor consumer lag using Redis `XPENDING`
- Track database insert latency
- Alert on error rate > 5%
- Monitor DLQ depth

### Scaling

1. **Horizontal**: Add more consumer instances with unique `CONSUMER_NAME`
2. **Vertical**: Increase resources for database writes
3. **Database**: Use TimescaleDB for better time-series performance

## License

Part of the IoT Dashboard project.