Saving GTFS-RT data to Parquet
This commit is contained in:
235
README.md
235
README.md
@@ -1,217 +1,70 @@
|
||||
# Skopje Bus Tracker
|
||||
# OpenJSP Bus Tracker
|
||||
|
||||
Real-time bus tracking for Skopje public transport. Modular system supporting any stop and route.
|
||||
Real-time Skopje public transport tracking with Bun, GTFS/GTFS-RT ingestion, parquet persistence, and optional S3-compatible segment upload.
|
||||
|
||||
## What Is In This Repo
|
||||
|
||||
- `bus-tracker-json.ts`: terminal tracker for one stop + one route.
|
||||
- `background-tracker.ts`: continuous collector for multiple routes/stops.
|
||||
- `lib/database.ts`: parquet write layer with rolling segments and optional S3 upload.
|
||||
- `lib/gtfs.ts`: GTFS CSV loading helpers.
|
||||
- `config.ts`: API base URL, defaults, and tracker timing.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Bun 1.x+
|
||||
- Network access to the configured GTFS/JSON upstream APIs
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
npm install
|
||||
npm run setup-gtfs # Download latest GTFS data
|
||||
npm run web
|
||||
bun install
|
||||
bun run typecheck
|
||||
```
|
||||
|
||||
Open **http://localhost:3000**
|
||||
|
||||
Visit **http://localhost:3000/analytics.html** for historical data and performance analytics.
|
||||
|
||||
## TimescaleDB Setup
|
||||
|
||||
The application uses TimescaleDB for storing time-series data (vehicle positions, arrivals, delays).
|
||||
|
||||
### Start the database:
|
||||
Run single stop/route terminal tracker:
|
||||
|
||||
```bash
|
||||
cd infrastructure
|
||||
docker compose up -d
|
||||
bun run tracker
|
||||
```
|
||||
|
||||
### Configure environment:
|
||||
|
||||
Create a `.env` file (or use the defaults):
|
||||
Run with custom stop and route IDs:
|
||||
|
||||
```bash
|
||||
POSTGRES_HOST=localhost
|
||||
POSTGRES_PORT=5432
|
||||
POSTGRES_DB=iot_data
|
||||
POSTGRES_USER=postgres
|
||||
POSTGRES_PASSWORD=example
|
||||
bun run tracker -- --stop 1571 --route 125
|
||||
```
|
||||
|
||||
The database will automatically:
|
||||
- Create hypertables for efficient time-series queries
|
||||
- Set up compression and retention policies (90 days)
|
||||
- Build continuous aggregates for hourly metrics
|
||||
- Index data for fast queries
|
||||
|
||||
### Analytics Features:
|
||||
|
||||
- **Vehicle Position History**: Track individual buses over time
|
||||
- **Delay Analysis**: On-time performance, average delays, patterns
|
||||
- **Hourly Patterns**: See when buses are typically late/early
|
||||
- **Route Statistics**: Reliability scores, service quality metrics
|
||||
- **Stop Performance**: Compare delays across different stops
|
||||
|
||||
### Background Tracker:
|
||||
|
||||
For continuous data collection without keeping the web interface open:
|
||||
Run background collection pipeline:
|
||||
|
||||
```bash
|
||||
npm run track
|
||||
bun run track
|
||||
```
|
||||
|
||||
This automatically tracks these popular routes every 30 seconds:
|
||||
- Routes: 2, 4, 5, 7, 15, 21, 22, 24
|
||||
- Private routes: 12П, 19П, 22П, 45П, 52П, 54П, 61П, 9П
|
||||
## Environment
|
||||
|
||||
Data is stored in TimescaleDB for historical analysis. The tracker runs indefinitely until stopped with Ctrl+C.
|
||||
Copy `.env.example` to `.env` and adjust values as needed.
|
||||
|
||||
## Features
|
||||
Key variables:
|
||||
|
||||
- **Fully Modular Web Interface**: Select any stop and route via UI controls or URL parameters
|
||||
- **Dynamic Tracking**: Change stops/routes without restarting the server
|
||||
- Interactive map with live vehicle positions
|
||||
- Real-time arrivals with delays
|
||||
- **Time-Series Data Storage**: Historical tracking with TimescaleDB
|
||||
- **Analytics Dashboard**: Delay statistics, hourly patterns, performance metrics
|
||||
- 5-second auto-refresh (web), 10-second (terminal)
|
||||
- CLI arguments for terminal tracker
|
||||
- Configurable defaults via [config.ts](config.ts)
|
||||
- Shareable URLs with stop/route parameters
|
||||
- `PARQUET_DIR`: local output directory for parquet files.
|
||||
- `PARQUET_ROLL_MINUTES`: segment rotation interval.
|
||||
- `SAVE_ALL_VEHICLE_SNAPSHOTS`: save full raw vehicle feed snapshots.
|
||||
- `SAVE_ALL_VEHICLE_POSITIONS`: persist all vehicle positions (not only route-matched).
|
||||
- `S3_ENABLED`: enable object storage upload.
|
||||
- `S3_BUCKET`, `S3_REGION`, `S3_ENDPOINT`, `S3_PREFIX`: object storage target.
|
||||
- `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`: object storage credentials.
|
||||
- `S3_DELETE_LOCAL_AFTER_UPLOAD`: delete local parquet after successful upload.
|
||||
- `S3_UPLOAD_RETRIES`, `S3_UPLOAD_RETRY_BASE_MS`: upload retry behavior.
|
||||
|
||||
## Commands
|
||||
## Scripts
|
||||
|
||||
```bash
|
||||
npm run setup-gtfs # Download GTFS data
|
||||
npm run find -- --stop "american" # Find stop IDs by name
|
||||
npm run find -- --route "7" # Find route IDs by number/name
|
||||
npm run web # Web interface at http://localhost:3000
|
||||
npm run tracker # Terminal interface (default)
|
||||
npm run tracker -- --stop 1571 --route 125 # Custom stop/route
|
||||
npm run track # Background tracker for popular routes (30s intervals)
|
||||
npm start # Same as web
|
||||
```
|
||||
- `bun run start`: alias for the terminal tracker.
|
||||
- `bun run tracker`: terminal tracker.
|
||||
- `bun run track`: background collector.
|
||||
- `bun run typecheck`: TypeScript no-emit check.
|
||||
|
||||
### Finding Stop and Route IDs
|
||||
## Notes
|
||||
|
||||
Not sure which Stop ID or Route ID to use? Use the find command:
|
||||
|
||||
```bash
|
||||
# Find stops by name (case-insensitive)
|
||||
npm run find -- --stop "american"
|
||||
npm run find -- --stop "центар"
|
||||
|
||||
# Find routes by number or name
|
||||
npm run find -- --route "7"
|
||||
npm run find -- --route "линија"
|
||||
```
|
||||
|
||||
### Web Interface Usage
|
||||
|
||||
1. **Default tracking**: Open `http://localhost:3000` (loads default stop/route, can be changed in UI)
|
||||
2. **Direct URL**: `http://localhost:3000?stopId=1571&routeId=125` (bookmarkable)
|
||||
3. **Change tracking**: Use the controls at the top to enter different Stop ID and Route ID
|
||||
4. **Share**: Copy URL after selecting a stop/route to share with others
|
||||
|
||||
### CLI Arguments
|
||||
|
||||
Terminal tracker supports custom stop and route:
|
||||
|
||||
```bash
|
||||
npm run tracker -- --stop <stopId> --route <routeId>
|
||||
npm run tracker -- --help
|
||||
```
|
||||
|
||||
### API Endpoints
|
||||
|
||||
**This Application's API:**
|
||||
- Complete docs: **[API-DOCUMENTATION.md](API-DOCUMENTATION.md)**
|
||||
- Interactive docs: http://localhost:3000/api-docs.html (when server is running)
|
||||
- OpenAPI spec: **[openapi.yaml](openapi.yaml)**
|
||||
|
||||
**Upstream ModeShift GTFS API:**
|
||||
- Documentation: **[UPSTREAM-API-DOCUMENTATION.md](UPSTREAM-API-DOCUMENTATION.md)**
|
||||
- Provider: ModeShift (Skopje public transport data)
|
||||
|
||||
#### Quick Reference
|
||||
|
||||
Query parameters for custom tracking:
|
||||
|
||||
```
|
||||
GET /api/config?stopId=1571&routeId=125
|
||||
GET /api/arrivals?stopId=1571&routeId=125
|
||||
GET /api/vehicles?routeId=125
|
||||
GET /api/stops # All stops
|
||||
GET /api/routes # All routes
|
||||
|
||||
# Historical Data APIs
|
||||
GET /api/stats/db # Database statistics
|
||||
GET /api/history/vehicle/:vehicleId?hours=24
|
||||
GET /api/history/route/:routeId/vehicles?hours=24
|
||||
GET /api/history/stop/:stopId/arrivals?routeId=125&hours=24
|
||||
GET /api/stats/route/:routeId/delays?hours=24
|
||||
GET /api/stats/stop/:stopId/delays?hours=24
|
||||
GET /api/stats/route/:routeId/hourly?days=7
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Edit [config.ts](config.ts) to set defaults:
|
||||
|
||||
```typescript
|
||||
export const config: AppConfig = {
|
||||
defaultStop: {
|
||||
stopId: '1571',
|
||||
name: 'АМЕРИКАН КОЛЕЏ-КОН ЦЕНТАР',
|
||||
lat: 41.98057556152344,
|
||||
lon: 21.457794189453125,
|
||||
},
|
||||
defaultRoute: {
|
||||
routeId: '125',
|
||||
shortName: '7',
|
||||
name: 'ЛИНИЈА 7',
|
||||
},
|
||||
server: {
|
||||
port: 3000,
|
||||
},
|
||||
tracking: {
|
||||
refreshInterval: {
|
||||
web: 5000, // 5 seconds
|
||||
terminal: 10000, // 10 seconds
|
||||
},
|
||||
minutesAhead: 90,
|
||||
}, + analytics)
|
||||
├── bus-tracker-json.ts # Terminal tracker (CLI args)
|
||||
├── lib/
|
||||
│ ├── gtfs.ts # GTFS loader
|
||||
│ └── database.ts # TimescaleDB time-series storage
|
||||
├── public/
|
||||
│ ├── index.html # Live tracker UI
|
||||
│ └── analytics.html # Analytics dashboard
|
||||
├── infrastructure/
|
||||
│ └── compose.yml # TimescaleDB Docker setup
|
||||
└── gtfs/ ure
|
||||
|
||||
```
|
||||
bus/
|
||||
├── config.ts # Configuration (stops, routes, timing)
|
||||
├── setup-gtfs.ts # GTFS data downloader
|
||||
├── find-stops-routes.ts # Helper to find Stop/Route IDs
|
||||
├── server.ts # Web server (modular API)
|
||||
├── bus-tracker-json.ts # Terminal tracker (CLI args)
|
||||
├── lib/gtfs.ts # GTFS loader
|
||||
├── public/index.html # Frontend (modular UI)
|
||||
└─**TimescaleDB (PostgreSQL)** for time-series data
|
||||
- Leaflet.js + OpenStreetMap
|
||||
- Chart.js for analytics visualizations
|
||||
- GTFS + GTFS-RT Protocol Buffers
|
||||
- Docker Compose for database
|
||||
|
||||
## Stack
|
||||
|
||||
- Node.js + Express + TypeScript
|
||||
- Leaflet.js + OpenStreetMap
|
||||
- GTFS + GTFS-RT Protocol Buffers
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
- Generated parquet files are intentionally ignored by git (`data/*.parquet`).
|
||||
- The background tracker rotates segments and uploads each closed segment when S3 is enabled.
|
||||
- On process shutdown (`SIGINT`/`SIGTERM`), writers are flushed so the current segment is finalized.
|
||||
|
||||
Reference in New Issue
Block a user