Skip to content

Data Architecture

What this chapter covers

Korido's data is organized into four planes: what the road is, what work is planned, what actually happened, and what the fleet has learned. This chapter maps those planes and their major tables, then explains how a live dashboard is composed from them — including the one denormalized row that answers "where is this truck and what is it doing right now" — and why the raw record of what happened is written once and never edited.

The picture

The four planes flow in one direction. Reference geometry is configured once and underpins everything. Plans are drawn on top of that geometry. Observations stream in from the trucks and are matched against the plans. And intelligence is distilled from the accumulated observations back onto the geometry, where it sharpens the next plan.

The major entities and how they relate:

The four planes

Reference — the road, configured once. corridors are the named routes the fleet runs; each is sequenced from waypoints (ports, borders, checkpoints, depots, fuel stations, cities) and paved with road_segments (the geometry between consecutive waypoints). geocode_cache holds reverse-geocoded place names, so a coordinate becomes a human label without asking a mapping provider every time. This plane is shared operational reference: platform admins curate the corridor geometry, and owners select those corridors when they create missions rather than editing the road network themselves.

Plan — the work to be done. A mission is one truck's assignment along a corridor, shipped for a client, optionally moving as part of a convoy. It can be spun up from a mission_template (a reusable shape for a recurring journey) and governed by a rule_set (the driving rules and checkpoints it must honor). A mission is decomposed into ordered mission_segments — the waypoint-to-waypoint legs that make the journey measurable.

Observed — what actually happened. This is the plane the trucks write. Raw positions stream in from the tracker; from them the engine derives the structural record of activity: trips (movement), stops (rest), data_gaps (stretches where the signal went dark), waypoint_visits (arrivals at known places), segment_traversals (a leg completed, with how long it actually took), fuel_events (fills and drains from the tank sensor), and vehicle_events (the alert and judgment layer — a drain flagged as suspicious, a device gone offline). Everything in this plane hangs off a vehicle, which in turn is the truck a device is installed on.

Intelligence — learned across the fleet. Distilled from many observations: slowdown_zones (stretches of road that are reliably slow), route_hotspots (places trucks recurringly dwell), and corridor_anomalies (active disruptions on a corridor). This plane feeds better arrival predictions back into the plan, often pooling evidence across fleets — which is why it carries its own privacy boundary: shared rows are aggregate geography, never another fleet's raw movements.

Read models: composing the answer at query time

The planes hold durable facts. The surfaces people actually look at — the fleet overview, the vehicle diary, the live map, a mission's detail, the fuel dashboard, the customer tracking view — are read models composed from those facts at the moment they are asked for. Each read model is a query that joins the open structural rows and current clocks into the shape a screen needs. Keeping durable facts normalized and composing views on demand keeps the truth in exactly one place, current by construction.

The engine's structural facts are the source; the interface never re-derives them. A screen decides whether a truck is driving by reading whether a trip is open, and judges whether the signal is stale by reading whether a data gap is open. This keeps every surface telling the same story, because they all read the same facts.

The vehicle's last-state row

One question is asked constantly and must be cheap: where is this truck and what is it doing, right now? Answering it by scanning a truck's position history on every dashboard load would be wasteful, so the current state is denormalized onto the vehicle's own row and refreshed once at the end of each batch the engine successfully processes. That single row is the spine of the live map and the fleet overview.

Its most subtle property is that it is filled from two different clocks, on purpose:

  • Position — latitude, longitude, speed, heading — is anchored to the last trusted fix. A degraded or rejected trailing frame must never make a stale location look fresh, so the coordinate on the row is always one the engine accepted.
  • Telemetry — ignition, battery, signal strength, external voltage — is taken from the freshest frame, whether or not that frame carried a location. Trackers that report by motion send coordinate-less status frames while parked; those frames still prove the truck's ignition is off and its battery is healthy. So the visible ignition/battery/signal/voltage tiles stay live on a parked truck even as its map position deliberately holds still.

This two-clock split is why a parked truck can honestly show a fresh battery reading and a location from an hour ago at the same time, and why the system never has to choose one misleading "last seen" timestamp.

Where the different shapes of data live

Korido spreads its data across three stores, each used for what it is best at.

Boundary

KV and R2 are not alternate sources of business truth. KV accelerates and coordinates; R2 preserves evidence, replay material, and private document objects. PostgreSQL remains the queryable record for tenant facts and read models.

Implementation note

The four planes are a reading model, not four separate databases. The current implementation keeps the durable relational record in PostgreSQL, while KV and R2 hold the hot coordination state and bulk objects that support that record.

  • PostgreSQL holds the four planes and every read model — the durable, queryable truth.
  • KV is a fast key-value store for hot, ephemeral state: the live-route store behind a customer's tracking view, one-time login codes, refresh-token hashes, rate-limit counters, and the soft locks that keep scheduled jobs from stepping on each other. It is fast and widely replicated, so it is used for coordination and caching, never as the sole home of irreversible business truth.
  • R2 is object storage for cold and bulk data: archived telemetry batches, dead-lettered messages set aside for replay or offline inspection, and private document objects such as driver photos and admin-uploaded PDFs. For telemetry archives and DLQs, R2 is the last-resort evidence store — a successful archive there is proof a message was kept, not proof the database state derived from it is correct.

Why observed facts are immutable and deduplicated

The observed plane is append-only. A raw position is written once and never edited; the structural records derived from it are opened and closed but not rewritten after the fact. Immutability is what makes the record auditable: the history a report or an investigation reads is exactly what the truck reported at the time.

That guarantee only holds if the same observation cannot land twice — and it can arrive twice, because a truck buffers positions through a dead zone and flushes them later, and because a message that fails processing is retried. So every raw observation is keyed by one row per device per capture instant: the device it came from plus the exact timestamp it was captured. A second copy of the same instant is silently dropped on insert. This one rule is what lets the whole pipeline be safely "at-least-once" — a batch can be delivered, retried, or replayed any number of times and the record stays clean, because a duplicated instant collides with the row already there and is discarded. The dedup key is the capture instant, not the mission, the trip, or the provider's own sequence number, so the guarantee survives a device being reassigned or its data being backfilled.

Frames that carry no location — status and voltage heartbeats — are stored the same way, in the same table, on the same one-per-instant key, with an empty position. They stay off the map while still advancing the liveness clocks and feeding the fuel and gap logic, which is how the system can tell "the truck is quietly parked" apart from "the truck has gone dark."

Edge cases

  • A buffered flush after a dead zone. Dozens of positions arrive at once, hours after they were captured. Each keeps its original capture instant, so they slot into history in the right order; any that overlap positions already stored collide on the key and are dropped.
  • A replayed batch. Re-sending a batch that was partly stored before it failed re-inserts only the missing instants and discards the rest — replay is a no-op for anything already recorded.
  • A parked truck streaming coordinate-less frames. Its map position holds at the last trusted fix while its telemetry tiles keep refreshing from the newest status frame, so the dashboard shows a fresh battery and an intentionally still location together.
  • A failed batch. When processing a batch fails, the engine-derived records roll back and the batch is retried, but liveness still advances outside the rolled-back work so a wedged-but-still-transmitting truck still looks alive. A batch with no fix advances only the message clock; a batch that carries a fix also advances the last position from its newest raw fix. The trips, stops, and gaps wait for a clean reprocess.
  • A long-silent truck's fuel gauge. The fleet overview stops showing a fuel reading once a truck has been silent for more than seven days, so the gauge cannot present a week-old value as current — even though the engine still runs fuel detection against that truck's own newest reading.
  • A shared intelligence row. Slowdown zones, hotspots, and corridor anomalies can pool evidence across fleets, but they are stored as aggregate geography. A tenant-facing query never surfaces another fleet's raw movements through them.

How it connects

  • Tenancy and Security — the tenant filter that guards every table in these planes.
  • Reliability — how at-least-once delivery, dedup, and serial-per-vehicle processing keep the observed plane correct under retries.
  • Part 2 — Telemetry and Part 3 — The fleet engine — how raw positions become the trips, stops, gaps, and fuel events of the observed plane.