mirror of
https://github.com/ferdzo/iotDashboard.git
synced 2026-04-05 01:06:24 +00:00
Update README to remove features and architecture
Removed features and architecture sections from README.
This commit is contained in:
@@ -2,405 +2,6 @@
|
||||
|
||||
A robust, production-ready service that reads sensor data from Redis streams and writes it to PostgreSQL/TimescaleDB. Part of the IoT Dashboard project.
|
||||
|
||||
## Features
|
||||
|
||||
- ✅ **Reliable consumption** from Redis streams using consumer groups
|
||||
- ✅ **Batch processing** for high throughput
|
||||
- ✅ **At-least-once delivery** with message acknowledgments
|
||||
- ✅ **Dead letter queue** for failed messages
|
||||
- ✅ **Connection pooling** for database efficiency
|
||||
- ✅ **Graceful shutdown** handling
|
||||
- ✅ **Flexible schema** that adapts to changes
|
||||
- ✅ **Structured logging** with JSON output
|
||||
- ✅ **Health checks** for monitoring
|
||||
- ✅ **TimescaleDB support** for time-series optimization
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Redis Streams → Consumer Group → Transform → Database → Acknowledge
|
||||
↓
|
||||
Failed messages
|
||||
↓
|
||||
Dead Letter Queue
|
||||
```
|
||||
|
||||
### Components
|
||||
|
||||
- **`main.py`**: Service orchestration and processing loop
|
||||
- **`redis_reader.py`**: Redis stream consumer with fault tolerance
|
||||
- **`db_writer.py`**: Database operations with connection pooling
|
||||
- **`schema.py`**: Data transformation and validation
|
||||
- **`config.py`**: Configuration management
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.13+
|
||||
- [uv](https://github.com/astral-sh/uv) package manager
|
||||
- Redis server with streams
|
||||
- PostgreSQL or TimescaleDB
|
||||
|
||||
### Installation
|
||||
|
||||
1. **Navigate to the service directory**:
|
||||
```bash
|
||||
cd services/db_write
|
||||
```
|
||||
|
||||
2. **Copy and configure environment variables**:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your DATABASE_URL and other settings
|
||||
```
|
||||
|
||||
3. **Install dependencies**:
|
||||
```bash
|
||||
uv sync
|
||||
```
|
||||
|
||||
4. **Setup database schema** (IMPORTANT - do this before running):
|
||||
```bash
|
||||
# Review the schema in models.py first
|
||||
cat models.py
|
||||
|
||||
# Create initial migration
|
||||
chmod +x migrate.sh
|
||||
./migrate.sh create "initial schema"
|
||||
|
||||
# Review the generated migration
|
||||
ls -lt alembic/versions/
|
||||
|
||||
# Apply migrations
|
||||
./migrate.sh upgrade
|
||||
```
|
||||
|
||||
5. **Run the service**:
|
||||
```bash
|
||||
uv run main.py
|
||||
```
|
||||
|
||||
Or use the standalone script:
|
||||
```bash
|
||||
chmod +x run-standalone.sh
|
||||
./run-standalone.sh
|
||||
```
|
||||
|
||||
### ⚠️ Important: Schema Management
|
||||
|
||||
This service uses **Alembic** for database migrations. The service will NOT create tables automatically.
|
||||
|
||||
- Schema is defined in `models.py`
|
||||
- Migrations are managed with `./migrate.sh` or `alembic` commands
|
||||
- See `SCHEMA_MANAGEMENT.md` for detailed guide
|
||||
|
||||
## Schema Management
|
||||
|
||||
This service uses **SQLAlchemy** for models and **Alembic** for migrations.
|
||||
|
||||
### Key Files
|
||||
|
||||
- **`models.py`**: Define your database schema here (SQLAlchemy models)
|
||||
- **`alembic/`**: Migration scripts directory
|
||||
- **`migrate.sh`**: Helper script for common migration tasks
|
||||
- **`SCHEMA_MANAGEMENT.md`**: Comprehensive migration guide
|
||||
|
||||
### Quick Migration Commands
|
||||
|
||||
```bash
|
||||
# Create a new migration after editing models.py
|
||||
./migrate.sh create "add new column"
|
||||
|
||||
# Apply pending migrations
|
||||
./migrate.sh upgrade
|
||||
|
||||
# Check migration status
|
||||
./migrate.sh check
|
||||
|
||||
# View migration history
|
||||
./migrate.sh history
|
||||
|
||||
# Rollback last migration
|
||||
./migrate.sh downgrade 1
|
||||
```
|
||||
|
||||
**See `SCHEMA_MANAGEMENT.md` for detailed documentation.**
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration is done via environment variables. See `.env.example` for all available options.
|
||||
|
||||
### Required Settings
|
||||
|
||||
```bash
|
||||
# Redis connection
|
||||
REDIS_HOST=localhost
|
||||
REDIS_PORT=6379
|
||||
|
||||
# Database connection
|
||||
DATABASE_URL=postgresql://user:password@localhost:5432/iot_dashboard
|
||||
```
|
||||
|
||||
### Optional Settings
|
||||
|
||||
```bash
|
||||
# Consumer configuration
|
||||
CONSUMER_GROUP_NAME=db_writer # Consumer group name
|
||||
CONSUMER_NAME=worker-01 # Unique consumer name
|
||||
BATCH_SIZE=100 # Messages per batch
|
||||
BATCH_TIMEOUT_SEC=5 # Read timeout
|
||||
PROCESSING_INTERVAL_SEC=1 # Delay between batches
|
||||
|
||||
# Stream configuration
|
||||
STREAM_PATTERN=mqtt_stream:* # Stream name pattern
|
||||
DEAD_LETTER_STREAM=mqtt_stream:failed
|
||||
|
||||
# Database
|
||||
TABLE_NAME=sensor_readings # Target table name
|
||||
ENABLE_TIMESCALE=false # Use TimescaleDB features
|
||||
|
||||
# Logging
|
||||
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
|
||||
LOG_FORMAT=json # json or console
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Input (Redis Streams)
|
||||
|
||||
The service reads from Redis streams with the format:
|
||||
```
|
||||
mqtt_stream:{device_id}:{sensor_type}
|
||||
```
|
||||
|
||||
Each message contains:
|
||||
```
|
||||
{
|
||||
"value": "23.5",
|
||||
"timestamp": "2023-10-18T14:30:00Z",
|
||||
"metadata": "{...}" (optional)
|
||||
}
|
||||
```
|
||||
|
||||
### Output (Database)
|
||||
|
||||
Data is written to the `sensor_readings` table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE sensor_readings (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
device_id VARCHAR(100) NOT NULL,
|
||||
sensor_type VARCHAR(100) NOT NULL,
|
||||
value DOUBLE PRECISION NOT NULL,
|
||||
metadata JSONB,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
**Note**: The table is automatically created if it doesn't exist.
|
||||
|
||||
## Running with Docker
|
||||
|
||||
### Build the image
|
||||
|
||||
```bash
|
||||
docker build -t db-writer:latest .
|
||||
```
|
||||
|
||||
### Run the container
|
||||
|
||||
```bash
|
||||
docker run -d \
|
||||
--name db-writer \
|
||||
-e REDIS_HOST=redis \
|
||||
-e DATABASE_URL=postgresql://user:pass@postgres:5432/iot \
|
||||
db-writer:latest
|
||||
```
|
||||
|
||||
## Consumer Groups
|
||||
|
||||
The service uses Redis consumer groups for reliable, distributed processing:
|
||||
|
||||
- **Multiple instances**: Run multiple workers for load balancing
|
||||
- **Fault tolerance**: Messages are not lost if a consumer crashes
|
||||
- **Acknowledgments**: Messages are only removed after successful processing
|
||||
- **Pending messages**: Unacknowledged messages can be reclaimed
|
||||
|
||||
### Running Multiple Workers
|
||||
|
||||
```bash
|
||||
# Terminal 1
|
||||
CONSUMER_NAME=worker-01 uv run main.py
|
||||
|
||||
# Terminal 2
|
||||
CONSUMER_NAME=worker-02 uv run main.py
|
||||
```
|
||||
|
||||
All workers in the same consumer group will share the load.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Dead Letter Queue
|
||||
|
||||
Failed messages are sent to the dead letter stream (`mqtt_stream:failed`) with error information:
|
||||
|
||||
```
|
||||
{
|
||||
"original_stream": "mqtt_stream:esp32:temperature",
|
||||
"original_id": "1634567890123-0",
|
||||
"device_id": "esp32",
|
||||
"sensor_type": "temperature",
|
||||
"value": "23.5",
|
||||
"error": "Database connection failed",
|
||||
"failed_at": "1634567890.123"
|
||||
}
|
||||
```
|
||||
|
||||
### Retry Strategy
|
||||
|
||||
- **Transient errors**: Automatic retry with backoff
|
||||
- **Data errors**: Immediate send to DLQ
|
||||
- **Connection errors**: Reconnection attempts
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Health Checks
|
||||
|
||||
Check service health programmatically:
|
||||
|
||||
```python
|
||||
from main import DatabaseWriterService
|
||||
|
||||
service = DatabaseWriterService()
|
||||
health = service.health_check()
|
||||
print(health)
|
||||
# {
|
||||
# 'running': True,
|
||||
# 'redis': True,
|
||||
# 'database': True,
|
||||
# 'stats': {...}
|
||||
# }
|
||||
```
|
||||
|
||||
### Logs
|
||||
|
||||
The service outputs structured logs:
|
||||
|
||||
```json
|
||||
{
|
||||
"event": "Processed batch",
|
||||
"rows_written": 100,
|
||||
"messages_acknowledged": 100,
|
||||
"timestamp": "2023-10-18T14:30:00Z",
|
||||
"level": "info"
|
||||
}
|
||||
```
|
||||
|
||||
### Statistics
|
||||
|
||||
Runtime statistics are tracked:
|
||||
- `messages_read`: Total messages consumed
|
||||
- `messages_written`: Total rows inserted
|
||||
- `messages_failed`: Failed messages sent to DLQ
|
||||
- `batches_processed`: Number of successful batches
|
||||
- `errors`: Total errors encountered
|
||||
|
||||
## Development
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
db_write/
|
||||
├── config.py # Configuration management
|
||||
├── db_writer.py # Database operations
|
||||
├── redis_reader.py # Redis stream consumer
|
||||
├── schema.py # Data models and transformation
|
||||
├── main.py # Service entry point
|
||||
├── pyproject.toml # Dependencies
|
||||
├── .env.example # Configuration template
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
### Adding Dependencies
|
||||
|
||||
```bash
|
||||
uv add package-name
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
uv run pytest
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service won't start
|
||||
|
||||
1. **Check configuration**: Verify all required environment variables are set
|
||||
2. **Test connections**: Ensure Redis and PostgreSQL are accessible
|
||||
3. **Check logs**: Look for specific error messages
|
||||
|
||||
### No messages being processed
|
||||
|
||||
1. **Check streams exist**: `redis-cli KEYS "mqtt_stream:*"`
|
||||
2. **Verify consumer group**: The service creates it automatically, but check Redis logs
|
||||
3. **Check stream pattern**: Ensure `STREAM_PATTERN` matches your stream names
|
||||
|
||||
### Messages going to dead letter queue
|
||||
|
||||
1. **Check DLQ**: `redis-cli XRANGE mqtt_stream:failed - + COUNT 10`
|
||||
2. **Review error messages**: Each DLQ entry contains the error reason
|
||||
3. **Validate data format**: Ensure messages match expected schema
|
||||
|
||||
### High memory usage
|
||||
|
||||
1. **Reduce batch size**: Lower `BATCH_SIZE` in configuration
|
||||
2. **Check connection pool**: May need to adjust pool size
|
||||
3. **Monitor pending messages**: Use `XPENDING` to check backlog
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Throughput Optimization
|
||||
|
||||
- **Increase batch size**: Process more messages per batch
|
||||
- **Multiple workers**: Run multiple consumer instances
|
||||
- **Connection pooling**: Adjust pool size based on load
|
||||
- **Processing interval**: Reduce delay between batches
|
||||
|
||||
### Latency Optimization
|
||||
|
||||
- **Decrease batch size**: Process smaller batches more frequently
|
||||
- **Reduce timeout**: Lower `BATCH_TIMEOUT_SEC`
|
||||
- **Single worker**: Avoid consumer group coordination overhead
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Recommended Settings
|
||||
|
||||
```bash
|
||||
BATCH_SIZE=500
|
||||
PROCESSING_INTERVAL_SEC=0.1
|
||||
LOG_LEVEL=INFO
|
||||
LOG_FORMAT=json
|
||||
ENABLE_TIMESCALE=true
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
- Monitor consumer lag using Redis `XPENDING`
|
||||
- Track database insert latency
|
||||
- Alert on error rate > 5%
|
||||
- Monitor DLQ depth
|
||||
|
||||
### Scaling
|
||||
|
||||
1. **Horizontal**: Add more consumer instances with unique `CONSUMER_NAME`
|
||||
2. **Vertical**: Increase resources for database writes
|
||||
3. **Database**: Use TimescaleDB for better time-series performance
|
||||
|
||||
## License
|
||||
|
||||
Part of the IoT Dashboard project.
|
||||
|
||||
Reference in New Issue
Block a user