CASE STUDY

340 million Records a Day. No Operator. No Backlog.

A telemetry pipeline rebuild that turned a cascading complex problem into a working fact.

The Challenge

A technology client's analytics and alerting teams depended on near-real-time application telemetry - quality-of-experience data that had to move from source to destination within minutes of being generated, continuously, 24 hours a day.

The data volumes were unforgiving: 1.2 million rows every five minutes, across a wide source schema, totalling roughly 340 million rows per day. The pipeline had to keep pace with every single window, in auto mode, where missing even a few cycles would generate a large backlog.

What Was Breaking

The existing pipeline treated each five-minute window as a batch: read everything into memory, transform it, write it out. On peak windows this meant holding 3-4 GB of data in RAM before a single row reached the destination - and taking 5-7 minutes to do it. Longer than the interval between triggers.

New pipeline runs were launching before previous ones had finished. Runs overlapped. A backlog formed and grew. The system meant to deliver real-time telemetry was perpetually chasing its own tail.

What We Changed

Overlapping the stages. The fundamental shift was architectural: rather than sequencing extract → transform → load, we ran them concurrently. Data moved from source in small increments while transformation and load preparation happened downstream in parallel. Extract and transform time for a full 1.2M-row window dropped to ~45 seconds on a near-fixed memory footprint.

Loading data the way the database expects it. The destination was a time-series database with a faster bulk ingestion path that the original pipeline was bypassing entirely, hitting a row-level trigger on every insert. Switching to bulk load operations with the correct routing configuration brought load time down to ~25 seconds at roughly 45,000 rows per second.

Running fewer, larger writes. Conventional wisdom says parallelise for throughput. For time-series hypertable loads it's the opposite - multiple parallel writers compete for internal locks. Consolidating into fewer, larger write streams consistently outperformed aggressive parallelisation.

Built to Stay Honest

Fast isn't enough if it's fragile. The pipeline ran with a deliberate offset to account for late-arriving source rows. A lightweight background process continuously walked back the last 10 hours, comparing source and destination, and backfilling any gaps — idempotently, without intervention.

Every run validated itself. Row counts on both sides of each window were checked automatically - mismatches surfaced within five minutes, not in a downstream report weeks later.

The Outcome

A pipeline that had been a source of daily operational anxiety became a better managed infrastructure. It ran. It kept up. It caught its own gaps. When something went wrong, it failed immediately rather than silently falling behind.

340 million rows a day, unattended.

DATA ENGINEERING · REAL-TIME PIPELINES · LETSAI SOLUTIONS

CASE STUDY

340 million Records a Day. No Operator. No Backlog.

The Challenge

A technology client's analytics and alerting teams depended on near-real-time application telemetry - quality-of-experience data that had to move from source to destination within minutes of being generated, continuously, 24 hours a day.

Subscribe to Our Newsletter