Skip to content

Event-Driven Platform for Federal Logistics Operator

Legacy monolith migration to EDA architecture (Event-Driven). MQTT implementation for telemetry, Transactional Outbox pattern, and ERP core protection via Anti-Corruption Layer.

50k+
Telemetry events per second (throughput)
< 50 ms
Internal processing latency
Delivery Guarantee
At-least-once semantics with idempotent processing
6x
Network traffic reduction

Executive Summary

This document presents the architectural validation and results of migrating the logistics platform of a federal 3PL operator (fleet of 5000+ vehicles) from monolithic PHP architecture to an event-driven system (EDA) based on Go/Kafka/MQTT stack.

Key Business Results:

  • Throughput: growth from 800 to 50,000+ RPS — capacity for 10x fleet scaling
  • Operating Costs: 6x reduction in mobile traffic costs (~$200K/year savings on 5000 vehicle fleet)
  • Reliability: transition from 98.5% SLA to 99.9% (from ~130 hours downtime/year to ~9 hours). Architectural readiness for 99.99% with Multi-AZ deployment
  • Investment Protection: Legacy ERP (SAP) preserved, load reduced by 85%

Downtime Cost in Logistics: According to Gartner, one hour of downtime for a large logistics enterprise costs $100,000–$300,000. The implemented architecture eliminates cascading failures typical of monolithic systems.


1. Problem Statement: Legacy Architecture Risk Analysis

1.1 Initial System State

The customer operated a distributed monolith on LAMP stack (PHP 7.4 / MySQL 5.7 / Apache). GPS trackers sent coordinates via direct HTTP POST requests to REST API, which synchronously wrote data to the main transactional database.

1.2 Identified Technical Risks

Table 1. Legacy Architecture Risk Matrix

Risk CategoryManifestationBusiness Consequences
DB LocksLock Wait Timeout during telemetry/manager transaction contentionOrder processing failures during peak hours
Race ConditionsStatus conflicts: cargo "Delivered" but "Not Shipped"Financial discrepancies, customer claims
Data LossGPS point loss during connection breaks (tunnels, highways)Route reconstruction impossible, insurance disputes
Thundering HerdSimultaneous reconnection of 5000 devices after network failureCascading system failure
Vertical Scaling LimitPHP-FPM: 20-30 MB RAM per request × 1000 workers = 30 GB RAMExponential infrastructure cost growth

1.3 Load Quantitative Assessment

Throughput calculation for High-Frequency Telematics:

Polling frequency: 10 Hz (GPS + accelerometer + CAN-bus)
Fleet: 5,000 vehicles
Target growth: 20,000 vehicles
 
Current load: 10 × 5,000 = 50,000 events/sec
Target load: 10 × 20,000 = 200,000 events/sec

10 Hz Frequency Justification: This isn't just GPS tracking. The system collects:

  • Raw accelerometer data — shock detection, potholes, sudden maneuvers for insurance scoring
  • CAN-bus data — engine RPM, pedal positions, fuel consumption
  • ML driving style scoring — requires granular data for accurate accident reconstruction

This is a competitive advantage: standard GPS tracker (0.2 Hz) doesn't allow building driver behavior ML models. For regular monitoring, data is downsampled, but raw stream is stored for insurance cases and retrospective analysis.

Conclusion: Synchronous Blocking I/O model of the original monolith (limit ~800 RPS) exhausted scaling limit. Architectural transformation required.


2. Architectural Solutions

2.1 Design Principles

We applied data flow separation into Hot Path (real-time telemetry) and Cold Path (reporting, ERP synchronization), implementing Event-Driven Architecture (EDA) pattern.

2.2 Target Architecture Components

Key Components:

  • MQTT Broker: Lightweight protocol for IoT devices with QoS guarantees
  • Apache Kafka: Event streaming platform for durable message storage
  • Go Services: High-performance microservices for event processing
  • ClickHouse: Columnar database for telemetry analytics
  • Anti-Corruption Layer: Protection layer for legacy ERP integration

2.3 Transactional Outbox Pattern

To ensure exactly-once delivery semantics between services, we implemented the Transactional Outbox pattern:

  1. Business transaction and outbox event written atomically to PostgreSQL
  2. CDC (Debezium) captures outbox events and publishes to Kafka
  3. Consumers process events with idempotency keys

3. Results and Metrics

3.1 Performance Improvements

MetricBeforeAfterImprovement
Throughput800 RPS50,000+ RPS62×
Latency (p99)2-5 sec< 50 ms100×
SLA98.5%99.9%14× less downtime
Traffic Cost$X/year$X/6 year-83%

3.2 Architectural Benefits

The event-driven architecture enables independent scaling of each component, fault isolation, and seamless addition of new event consumers without affecting existing systems.

Event-Driven Platform for Federal Logistics Operator — Softenq