Database Writer Service
A robust, production-ready service that reads sensor data from Redis streams and writes it to PostgreSQL/TimescaleDB. Part of the IoT Dashboard project.
Features
- ✅ Reliable consumption from Redis streams using consumer groups
- ✅ Batch processing for high throughput
- ✅ At-least-once delivery with message acknowledgments
- ✅ Dead letter queue for failed messages
- ✅ Connection pooling for database efficiency
- ✅ Graceful shutdown handling
- ✅ Flexible schema that adapts to changes
- ✅ Structured logging with JSON output
- ✅ Health checks for monitoring
- ✅ TimescaleDB support for time-series optimization
Architecture
Redis Streams → Consumer Group → Transform → Database → Acknowledge
↓
Failed messages
↓
Dead Letter Queue
Components
main.py: Service orchestration and processing loopredis_reader.py: Redis stream consumer with fault tolerancedb_writer.py: Database operations with connection poolingschema.py: Data transformation and validationconfig.py: Configuration management
Quick Start
Prerequisites
- Python 3.13+
- uv package manager
- Redis server with streams
- PostgreSQL or TimescaleDB
Installation
-
Navigate to the service directory:
cd services/db_write -
Copy and configure environment variables:
cp .env.example .env # Edit .env with your DATABASE_URL and other settings -
Install dependencies:
uv sync -
Setup database schema (IMPORTANT - do this before running):
# Review the schema in models.py first cat models.py # Create initial migration chmod +x migrate.sh ./migrate.sh create "initial schema" # Review the generated migration ls -lt alembic/versions/ # Apply migrations ./migrate.sh upgrade -
Run the service:
uv run main.pyOr use the standalone script:
chmod +x run-standalone.sh ./run-standalone.sh
⚠️ Important: Schema Management
This service uses Alembic for database migrations. The service will NOT create tables automatically.
- Schema is defined in
models.py - Migrations are managed with
./migrate.shoralembiccommands - See
SCHEMA_MANAGEMENT.mdfor detailed guide
Schema Management
This service uses SQLAlchemy for models and Alembic for migrations.
Key Files
models.py: Define your database schema here (SQLAlchemy models)alembic/: Migration scripts directorymigrate.sh: Helper script for common migration tasksSCHEMA_MANAGEMENT.md: Comprehensive migration guide
Quick Migration Commands
# Create a new migration after editing models.py
./migrate.sh create "add new column"
# Apply pending migrations
./migrate.sh upgrade
# Check migration status
./migrate.sh check
# View migration history
./migrate.sh history
# Rollback last migration
./migrate.sh downgrade 1
See SCHEMA_MANAGEMENT.md for detailed documentation.
Configuration
All configuration is done via environment variables. See .env.example for all available options.
Required Settings
# Redis connection
REDIS_HOST=localhost
REDIS_PORT=6379
# Database connection
DATABASE_URL=postgresql://user:password@localhost:5432/iot_dashboard
Optional Settings
# Consumer configuration
CONSUMER_GROUP_NAME=db_writer # Consumer group name
CONSUMER_NAME=worker-01 # Unique consumer name
BATCH_SIZE=100 # Messages per batch
BATCH_TIMEOUT_SEC=5 # Read timeout
PROCESSING_INTERVAL_SEC=1 # Delay between batches
# Stream configuration
STREAM_PATTERN=mqtt_stream:* # Stream name pattern
DEAD_LETTER_STREAM=mqtt_stream:failed
# Database
TABLE_NAME=sensor_readings # Target table name
ENABLE_TIMESCALE=false # Use TimescaleDB features
# Logging
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
LOG_FORMAT=json # json or console
Data Flow
Input (Redis Streams)
The service reads from Redis streams with the format:
mqtt_stream:{device_id}:{sensor_type}
Each message contains:
{
"value": "23.5",
"timestamp": "2023-10-18T14:30:00Z",
"metadata": "{...}" (optional)
}
Output (Database)
Data is written to the sensor_readings table:
CREATE TABLE sensor_readings (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
device_id VARCHAR(100) NOT NULL,
sensor_type VARCHAR(100) NOT NULL,
value DOUBLE PRECISION NOT NULL,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Note: The table is automatically created if it doesn't exist.
Running with Docker
Build the image
docker build -t db-writer:latest .
Run the container
docker run -d \
--name db-writer \
-e REDIS_HOST=redis \
-e DATABASE_URL=postgresql://user:pass@postgres:5432/iot \
db-writer:latest
Consumer Groups
The service uses Redis consumer groups for reliable, distributed processing:
- Multiple instances: Run multiple workers for load balancing
- Fault tolerance: Messages are not lost if a consumer crashes
- Acknowledgments: Messages are only removed after successful processing
- Pending messages: Unacknowledged messages can be reclaimed
Running Multiple Workers
# Terminal 1
CONSUMER_NAME=worker-01 uv run main.py
# Terminal 2
CONSUMER_NAME=worker-02 uv run main.py
All workers in the same consumer group will share the load.
Error Handling
Dead Letter Queue
Failed messages are sent to the dead letter stream (mqtt_stream:failed) with error information:
{
"original_stream": "mqtt_stream:esp32:temperature",
"original_id": "1634567890123-0",
"device_id": "esp32",
"sensor_type": "temperature",
"value": "23.5",
"error": "Database connection failed",
"failed_at": "1634567890.123"
}
Retry Strategy
- Transient errors: Automatic retry with backoff
- Data errors: Immediate send to DLQ
- Connection errors: Reconnection attempts
Monitoring
Health Checks
Check service health programmatically:
from main import DatabaseWriterService
service = DatabaseWriterService()
health = service.health_check()
print(health)
# {
# 'running': True,
# 'redis': True,
# 'database': True,
# 'stats': {...}
# }
Logs
The service outputs structured logs:
{
"event": "Processed batch",
"rows_written": 100,
"messages_acknowledged": 100,
"timestamp": "2023-10-18T14:30:00Z",
"level": "info"
}
Statistics
Runtime statistics are tracked:
messages_read: Total messages consumedmessages_written: Total rows insertedmessages_failed: Failed messages sent to DLQbatches_processed: Number of successful batcheserrors: Total errors encountered
Development
Project Structure
db_write/
├── config.py # Configuration management
├── db_writer.py # Database operations
├── redis_reader.py # Redis stream consumer
├── schema.py # Data models and transformation
├── main.py # Service entry point
├── pyproject.toml # Dependencies
├── .env.example # Configuration template
└── README.md # This file
Adding Dependencies
uv add package-name
Running Tests
uv run pytest
Troubleshooting
Service won't start
- Check configuration: Verify all required environment variables are set
- Test connections: Ensure Redis and PostgreSQL are accessible
- Check logs: Look for specific error messages
No messages being processed
- Check streams exist:
redis-cli KEYS "mqtt_stream:*" - Verify consumer group: The service creates it automatically, but check Redis logs
- Check stream pattern: Ensure
STREAM_PATTERNmatches your stream names
Messages going to dead letter queue
- Check DLQ:
redis-cli XRANGE mqtt_stream:failed - + COUNT 10 - Review error messages: Each DLQ entry contains the error reason
- Validate data format: Ensure messages match expected schema
High memory usage
- Reduce batch size: Lower
BATCH_SIZEin configuration - Check connection pool: May need to adjust pool size
- Monitor pending messages: Use
XPENDINGto check backlog
Performance Tuning
Throughput Optimization
- Increase batch size: Process more messages per batch
- Multiple workers: Run multiple consumer instances
- Connection pooling: Adjust pool size based on load
- Processing interval: Reduce delay between batches
Latency Optimization
- Decrease batch size: Process smaller batches more frequently
- Reduce timeout: Lower
BATCH_TIMEOUT_SEC - Single worker: Avoid consumer group coordination overhead
Production Deployment
Recommended Settings
BATCH_SIZE=500
PROCESSING_INTERVAL_SEC=0.1
LOG_LEVEL=INFO
LOG_FORMAT=json
ENABLE_TIMESCALE=true
Monitoring
- Monitor consumer lag using Redis
XPENDING - Track database insert latency
- Alert on error rate > 5%
- Monitor DLQ depth
Scaling
- Horizontal: Add more consumer instances with unique
CONSUMER_NAME - Vertical: Increase resources for database writes
- Database: Use TimescaleDB for better time-series performance
License
Part of the IoT Dashboard project.