LLM Workflow · End-to-End Scenario System

Natural language to runnable traffic simulations

This project converts vague traffic requests into executable SUMO scenarios by separating structured parameter extraction from open-ended geometry reasoning, then keeping human corrections as a reusable retraining signal instead of losing them in chat history.

The project covers fine-tuned traffic-parameter extraction from natural-language requests, base-LLM geometry reasoning and XML fallback, and Seoul traffic-data grounding with statistical fallbacks. It also includes execution orchestration, held-out evaluation, correction logging, retraining export, and live admin review.

View Live System GitHub

What this project actually includes

Fine-Tuned LLM

Extract structured traffic parameters from language

The core model is fine-tuned to turn natural-language scene descriptions into speed, volume, lanes, speed limit, sigma, tau, and block-length fields.

Geometry LLM

Classify edits and handle geometry fallback

This layer owns geometry reasoning: it routes edit requests and regenerates road XML when OSM or existing geometry is not enough, while the fine-tuned model stays focused on structured parameter extraction.

Agent and Review Loop

Orchestrate execution, then export trainable corrections

An 11-tool agent runs network build, simulation, and validation, while correction-intent edits are logged with trainability metadata and exported as retraining data.

OpenAI Fine-Tuning SUMO SQLite Multi-LLM Routing Harness Engineering Cloud Run GitHub Actions

The Problem

Why is this hard?

Core Challenge

Creating traffic simulations is still too inconvenient, too slow, and too unrealistic for a natural-language workflow.

Turning a traffic scene into a simulation is still inconvenient and expensive. Building it manually takes too long because road lookup, network construction, config generation, execution, and validation all have to be handled as separate steps before the user can even inspect one result.

Trying to generate the same thing with an LLM is faster, but it often does not represent the real scene well enough. Natural-language requests mix congestion, road type, lane count, speed regime, spacing, and driver behavior, and a general model often fails to convert that into realistic traffic parameters and believable road geometry.

So the result can sound plausible as text while still feeling unrealistic once it is actually run in SUMO. Manual creation takes too long, and LLM-only generation often does not reflect the traffic scene with enough fidelity to be useful.

This project asks: can fine-tuned LLM extraction and a role-separated workflow translate natural-language traffic scenes into runnable simulations with higher fidelity, while making the overall generation process fast enough to use as an actual workflow?

Workflow Cost

Simulation creation is still too manual Road lookup, network building, config generation, simulation, and validation usually require too many separate steps for prompt-driven scenario creation to feel immediate.

Parameter Failure

General LLMs miss traffic interdependencies Speed, volume, lane count, speed limit, sigma, tau, and block spacing are connected. If one field is wrong, the resulting simulation drifts away from realistic traffic behavior.

Realism Gap

Natural-language-only generation looks convincing before execution A prompt can sound correct while still producing free-flow defaults, weak geometry, or unrealistic spacing once the scenario is actually run in SUMO.

Feedback Loss

Even good human fixes are easy to waste If a reviewer corrects the result but that edit disappears into chat history, the system stays expensive to improve and keeps repeating the same mistakes.

Approach

Teach traffic-scene context from real road data, then split generation by responsibility

Core Idea

Build the missing traffic-scene dataset first, fine-tune the language-to-parameter layer, then let specialized components assemble the final simulation.

There is no ready-made dataset that cleanly maps natural-language traffic scenes to simulation-ready parameters, so the project first builds that supervision itself. Real road data from Seoul traffic detectors and synthetic traffic-engineering scenarios are turned into prompt-target pairs, so the fine-tuned model can learn how natural-language descriptions map to structured traffic context instead of only memorizing road names or generic prompt patterns.

Once that layer is learned, generation is split by role. The fine-tuned extractor produces structured traffic parameters, the geometry LLM classifies edits and handles geometry reasoning and XML fallback, and the surrounding system rebuilds the network, runs SUMO, and validates the result. Because traffic-scene data is still scarce, the system also logs correction-intent edits and exports them as retrainable data, so the model can keep improving as more real usage accumulates.

The broader project also includes live Seoul traffic lookup, representative road-type statistics, and similar-road estimation tools that support grounding and fallback around the main generation flow.

Stage 1

Build traffic-scene supervision

Extract prompt-target data from real road observations and synthetic scenarios because off-the-shelf natural-language traffic-scene datasets do not really exist.

Stage 2

Fine-tune language to traffic parameters

Teach the model to read the context of a traffic scene from natural language and output structured fields such as speed, volume, lanes, sigma, tau, and spacing.

Stage 3

Split parameter, geometry, and execution roles

Use specialized components across the fine-tuned extractor, the geometry LLM, and the tool-calling agent so each part handles the part it is best at.

Stage 4

Keep extracting retrainable data

Log correction-intent feedback, separate it from tuning requests, and export reusable training data so the system can be tuned again over time.

Fine-Tuning

Real Seoul traffic data becomes supervised prompt–parameter pairs

Core Idea

Extract observed speed and volume from detector data, estimate driver behavior parameters, then diversify prompts so the model learns traffic situations — not road-name lookup.

Training data comes from Seoul Metropolitan Government detector records (2025.10): speed data covering 31-day hourly averages per road segment, volume data with hourly counts per collection point, and the national standard node-link SHP for speed limits and road geometry. The fine-tuned model is gpt-4.1-mini via the OpenAI Fine-Tuning API. ~70 road segments × 7 time periods × 5 prompt variants = ~2,450 total pairs, split 90/10 into train (2,205) and validation (245).

flowchart LR
    A["Speed detectors\n31-day hourly avg"] --> D["Match by\nroad name & link ID"]
    B["Volume detectors\nhourly count"] --> D
    C["Node-link SHP\nspeed limit · geometry"] --> D
    D --> E["Group by 7\ntime periods"]
    E --> F["Reverse-estimate\nsigma, tau\n(Greenshields)"]
    F --> G["Generate 5\nprompt variants"]
    G --> H["train JSONL\n2,205 (90%)"]
    G --> I["val JSONL\n245 (10%)"]

Parameter	Source	Method
`speed_kmh`	Observed	31-day hourly average from speed detectors
`volume_vph`	Observed	Hourly average from volume detectors
`lanes`	Observed	Mode of lane counts across link segments
`speed_limit_kmh`	Observed	Node-link SHP MAX_SPD; road-type heuristic fallback
`avg_block_m`	Observed	Mean link length from node-link geometry
`sigma`	Estimated	Greenshields reverse-estimation from observed speed
`tau`	Estimated	Greenshields reverse-estimation from observed speed
`reasoning`	Generated	Rule-based summary of the above values

Each (road, time period) pair produces 5 prompt variants so the model learns from traffic situations, not road name lookup.

Raw data row (originally Korean — translated)

Yangjae-daero suburban arterial 8-lane afternoon moderate

speed 27.0 km/h volume 4,460 vph limit 50 km/h sigma 0.40 tau 1.5 s block 219 m

↓ 5 prompts × 1 shared output = 5 training pairs

Simulate Yangjae-daero afternoon

Yangjae-daero afternoon

moderate suburban arterial 8-lane afternoon

8-lane arterial afternoon traffic simulation

Yangjae-daero -like arterial afternoon conditions

→ shared output

{ "speed_kmh": 27.0, "volume_vph": 4460, "lanes": 4, "speed_limit_kmh": 50, "sigma": 0.4, "tau": 1.5, "avg_block_m": 219 }

All prompts are originally in Korean — translated for display.

Style	Template	Example (translated)
Road name + time + action	`{road} {time} {action}`	"Simulate Yangjae-daero afternoon"
Road name + time	`{road} {time}`	"Yangjae-daero afternoon"
Situational (no name)	`{congestion} {area} {road_type} {lanes}-lane {time}`	"moderate suburban arterial 8-lane afternoon"
Generic type + time	`{lanes}-lane {road_type} {time} traffic simulation`	"8-lane arterial afternoon traffic simulation"
Mixed	`{road}-like {road_type} {time} conditions`	"Yangjae-daero-like arterial afternoon conditions"

Results

The strongest gain appears in the structured extraction layer

Field	Fine-tuned	Base
speed_kmh	5.1%	74.6%
volume_vph	34.8%	48.1%
lanes	8.9%	13.9%
speed_limit_kmh	1.7%	23.8%
sigma	4.5%	21.3%
tau	4.6%	11.4%
avg_block_m	14.5%	167.6%
Overall	10.6%	51.5%

Benchmark: 30 held-out prompts with labels derived from real Seoul traffic data. The qualitative shift is not just lower error, but also lower domain bias in speed and block-spacing prediction.

Headline

Overall MAPE drops from 51.5% to 10.6%

The fine-tuned extractor reduces structured prediction error by about five times on the held-out benchmark.

Largest gain

Speed bias is dramatically reduced

The base model defaults toward unrealistic free-flow speed, while the fine-tuned model brings the estimate much closer to observed traffic conditions.

Weakest field

Volume still needs more supervision

volume_vph remains the hardest field and the clearest candidate for richer future training data.

System Architecture

Prompt to runnable simulation, then back to reusable evidence

The workflow has two connected halves: an online generation path that turns requests into SUMO runs, and a review path that classifies human edits so the system can improve without contaminating its own data.

flowchart TD
    A[User Natural-Language Request] --> B[Fine-Tuned Parameter Extraction]
    B --> C[Structured Scenario Parameters]
    C --> D{Usable real location?}
    D -->|Yes| E[OSM-Based Network Retrieval]
    D -->|No / Failed| F[Geometry LLM — XML Generation]
    E --> G[SUMO Network Build]
    F --> G
    C --> H[Demand / Route / Config Generation]
    G --> I[Runnable SUMO Artifacts]
    H --> I
    I --> J[Scenario Execution]
    J --> K[Validation and Statistics]
    K --> L[User Review]
    L --> M{Intent}
    M -->|Correction| N[Trainable Signal]
    M -->|Tuning| O[Analysis Only]
    N --> P[Correction Export]
    P --> Q[Future Fine-Tuning Data]

flowchart LR
    subgraph Frontend
        UI["Web UI\nindex.html"]
        AdminUI["Admin Dashboard"]
        AboutUI["About Page"]
    end

    subgraph Backend["server.py"]
        SSE["SSE Streaming"]
        API["REST API"]
    end

    subgraph LLM["LLM Layer"]
        FT["Fine-tuned Model\ngpt-4.1-mini FT"]
        Base["Base LLM\nGPT / Gemini / Claude"]
        Agent["Tool-Calling Agent\n11 tools"]
    end

    subgraph Tools
        OSM["OSM Network"]
        SUMO["SUMO Generator"]
        TOPIS["TOPIS API"]
        Valid["Validator"]
    end

    subgraph Data
        DB[("SQLite DB")]
        JSONL["Training JSONL"]
    end

    UI -->|"POST /api/simulate"| SSE
    AdminUI -->|"GET /api/admin/*"| API
    SSE --> FT --> Base --> SUMO
    Agent --> Tools
    SUMO --> DB
    DB -->|"export"| JSONL

Input

"Create a congested morning intersection in front of a middle school."

↓

1. FT Model — Parameter Extraction

{
  "speed_kmh": 18.5,
  "volume_vph": 2400,
  "lanes": 2,
  "speed_limit_kmh": 30,
  "sigma": 0.72,
  "tau": 0.9,
  "avg_block_m": 120,
  "reasoning": "School zone, 30km/h limit. Morning drop-off congestion, V/C ~0.85."
}

↓

2. Network Build

OSM or LLM-generated XML → netconvert → .net.xml

↓

3. SUMO Execution

avg speed 16.2 km/h · 2,380 vehicles inserted

↓

4. Validation

FT predicted 18.5 km/h vs SUMO 16.2 km/h → error −12.4% → Grade B

Prompt Engineering

Constrained prompts turned format errors from 15% to zero

The fine-tuned model uses a structured system prompt that enforces strict JSON output, required fields, and value-range constraints. This is not a minor implementation detail — without these constraints the model intermittently returned prose, markdown, or partial JSON, making the downstream pipeline unreliable.

System prompt constraints (ft-v1)

Strict JSON-only output — no prose, no markdown, no commentary
All 8 numeric fields required in every response — never empty or "-"
Value ranges enforced: sigma 0–1, tau 0.5–3, lanes 1–8
Domain reasoning required in the reasoning field
Korean road names and locations supported

Domain calibration rules in prompt

School zone — speed_limit_kmh=30, sigma high (0.6+)
Highway / expressway — speed_limit_kmh=80–100, avg_block_m 500+
Side street / alley — speed_limit_kmh=30, lanes=1, avg_block_m 50–80
Rush hour — volume high, speed low
Late night — volume very low, speed high

Actual system prompt (ft-v1)

You are a traffic engineering expert and SUMO simulation engineer.
When the user describes a road/traffic situation, return only JSON
with the parameters needed for SUMO simulation.
You must fill all 8 fields below with numbers.
Never use empty values or the string '-'.

Output format:
{"speed_kmh": number, "volume_vph": number,
 "lanes": one-way lane count,
 "speed_limit_kmh": number,
 "sigma": between 0~1, "tau": between 0.5~3,
 "avg_block_m": intersection spacing (m),
 "reasoning": "rationale"}

Prompt evolution

Version	Approach	Result
`rule-v1`	Rule-based keyword matching, no LLM	Baseline; no domain reasoning
`ft-v1`	Fine-tuned with structured constraints	0% format errors, 10.6% MAPE

The critical shift was not the model change — it was adding output constraints to the system prompt. Free-form prompting with the same fine-tuned model still produced ~15% JSON failures.

Engineering Detail

How each subsystem actually works

Greenshields reverse-estimation for sigma and tau speed → V/C → driver behavior calibration

Driver imperfection (sigma) and desired headway (tau) cannot be directly measured from detector data. They are reverse-estimated from observed speed via the Greenshields model.

The observed speed is divided by free-flow speed to get a speed ratio, which is mapped to a V/C ratio. The V/C ratio determines the congestion band, and sigma and tau are calibrated accordingly.

speed_ratio = v_observed / v_free
V/C ≈ max(0.05, 1.0 − speed_ratio × 0.85)

For example: observed 15 km/h on a 50 km/h limit road → speed_ratio = 0.33 → V/C = 0.72 → congested band → sigma 0.6–0.8, tau 0.8–1.2 s.

V/C range	sigma	tau
> 0.8 (congested)	0.6 – 0.8	0.8 – 1.2 s
0.5 – 0.8 (moderate)	0.4 – 0.6	1.0 – 1.5 s
< 0.5 (free-flow)	0.2 – 0.4	1.5 – 2.5 s

Tool-calling agent — 11-tool orchestration LLM tool-use · autonomous selection

The agent autonomously selects and executes tools based on user requests, using the LLM tool-use feature.

Tool	Description
`search_location`	Geocode area names to coordinates
`build_road_network`	Build SUMO network from OSM
`get_traffic_stats`	Query local Seoul traffic statistics
`generate_simulation`	Generate SUMO config files
`run_sumo`	Execute simulation
`query_topis_speed`	Real-time Seoul traffic API
`load_csv_data`	Load external traffic data
`recommend_road`	Suggest similar roads
`find_similar_roads`	Find roads matching criteria
`validate_simulation`	Validate simulation output
`calibrate_params`	Calibrate parameters from results

Example: input "simulate a congested commute road"

1. search_location("commute road") → no location
2. get_traffic_stats("arterial","rush hour") → refs
3. generate_simulation(params) → .net/.rou/.sumocfg
4. run_sumo(config) → avg 22.3 km/h, 1850 veh
5. validate_simulation(results) → grade B, −8.2%

The agent layer uses LLM tool-use. The base LLM is configurable across Claude, GPT, and Gemini.

Role-separated LLM design — 4 components FT extractor · geometry LLM · agent · logging

Each component owns one responsibility. The fine-tuned model does not attempt geometry; the geometry LLM does not attempt structured extraction.

llm_parser.py

Fine-tuned extractor

Parses natural language into structured simulation parameters. Provides the machine-readable target for the rest of the pipeline.

base_llm.py

Geometry LLM

Classifies each user edit as parameter, geometry, or mixed. Handles road layout reasoning and generates fallback XML when OSM fails.

agent.py

Tool-calling agent

11-tool orchestration via LLM tool-use. Autonomous tool selection based on user intent with multi-turn execution.

session_db.py

Logging and export

Stores simulation runs and modification sessions. Separates trainable corrections from non-trainable tuning. Exports retraining JSONL.

Error pattern analysis — directional bias across 30 samples base overpredicts speed +68%, block spacing +165%

Directional bias reveals where each model systematically over- or under-predicts, beyond just the MAPE number.

Field	FT Bias	FT Accurate	Base Bias	Base Accurate
`speed_kmh`	+0.5% (balanced)	21/30	+68.3% (overpredict)	0/30
`volume_vph`	+25.9% (over)	13/30	+21.4% (over)	3/30
`speed_limit_kmh`	−1.7% (balanced)	29/30	+15.7% (over)	11/30
`sigma`	−0.9% (balanced)	24/30	+9.9% (over)	6/30
`tau`	+1.6% (balanced)	20/30	+0.9% (over)	3/30
`avg_block_m`	−0.6% (balanced)	20/30	+165.0% (overpredict)	1/30

Key finding: the base model defaults to free-flow speeds (+68.3% bias, 0/30 accurate) and lacks urban block-structure knowledge (+165% spacing). Fine-tuning corrects both. The remaining weak point is volume_vph (34.8% MAPE) — the most context-dependent field that would benefit from additional training data.

Modification classification — parameter, geometry, or mixed LLM classifier · keyword fallback

When a user requests a change, the geometry LLM first classifies it before routing to the correct handler.

The geometry LLM receives the user's edit request and returns one word: parameter, geometry, or mixed. If the LLM call fails, a keyword heuristic takes over.

User edit → Geometry LLM classifier
  → "parameter" → update speed/volume/sigma/tau
  → "geometry" → regenerate .nod.xml / .edg.xml
  → "mixed" → both paths, then rebuild

Type	Examples
parameter	"lower the speed", "increase volume to 3000", "make it more congested"
geometry	"add an intersection", "bend the road", "make it a 4-way crossing"
mixed	"set speed limit to 70 and make road straight", "add lane and raise volume"

Correction pipeline — human fixes stored in SQLite, exported as retraining data correction vs tuning · trainable flag · JSONL export

Every modification is stored in SQLite with before/after snapshots, modification type, edit intent, and a trainability flag. Only correction-intent records are exported for retraining.

prompt → FT prediction → simulation → user review
  → Correction: stored with trainable=1
  → Tuning: stored with trainable=0

export_corrections_for_training()
  → SELECT WHERE trainable=1 AND intent='correction'
  → sessions_corrections_openai.jsonl
  → merge with train_real_openai.jsonl → re-fine-tune

This separation prevents preference-driven edits from polluting the fine-tuning signal. The admin dashboard at /admin shows correction history, modification breakdowns, and downloadable exports.

SQLite field	Purpose
`edit_intent`	"correction" or "tuning"
`trainable`	1 = exportable, 0 = analysis only
`modification_type`	parameter / geometry / mixed
`details_json`	Before/after parameter snapshots

Data source	Samples	Ground truth
Real-data (Seoul)	~2,450	Observed speed/volume
Corrections (SQL)	grows over time	Human expert fixes on FT outputs

Runtime parameter wiring — how FT output becomes a SUMO simulation network XML · vType injection · two kinds of "speed"

The FT model predicts eight fields. Some define the physical road, some describe driver behavior, and speed_kmh serves as the validation target for calibration.

FT field	Target	SUMO mechanism
`speed_limit_kmh`	.net.xml	Rewrite lane/edge speed after netconvert
`lanes`	.net.xml	Network topology / capacity
`avg_block_m`	.net.xml	Intersection spacing (generated geometry)
`sigma`	.rou.xml vType	Krauss driver imperfection (0–1)
`tau`	.rou.xml vType	Desired headway in seconds
`volume_vph`	randomTrips.py	Trip generation rate
`max_speed`	.rou.xml vType	Capped at limit × 1.05
`speed_kmh`	Nowhere	Validation target only

Two kinds of "speed"

speed_limit_kmh is a legal/physical cap — "cars cannot go faster than this." Written into network XML.

speed_kmh is the predicted average speed under congestion — used for validation and calibration, not written into SUMO files.

Validation error

error = (sim_speed − FT_speed) / FT_speed × 100%

A ≤ 10% · B ≤ 20% · C ≤ 30% · D ≤ 50% · F > 50%

Automatic calibration loop — proportional control with bounded drift volume · sigma · tau · max 3 iterations · ±20% drift cap

When validation error exceeds ±10%, the calibration loop nudges behavioral parameters so the simulated speed converges toward the FT-predicted target.

Algorithm

1. Compute error = (sim − target) / target
2. If |error| ≤ 10% → converged, stop
3. Adjust proportionally:
  volume × (1 + 0.4 × error)
  sigma + 0.15 × error
  tau + 0.25 × error
4. Clamp to drift bounds
5. Re-run SUMO → repeat (max 3)

Gains are derived from SUMO Krauss model sensitivity analysis. If all parameters hit drift caps without converging, the loop stops early — the error signals a geometry mismatch rather than a parameter error.

Parameter	Drift limit	Effect
`volume_vph`	±20%	Primary congestion lever
`sigma`	±0.15	Driver imperfection → capacity
`tau`	±0.3 s	Headway → throughput

Fixed during calibration

speed_limit_kmh, lanes, avg_block_m, and network geometry are never modified. They define the physical road. Calibration only adjusts how vehicles behave on it.

Data separation

Calibrated values are stored in calibrated_params_json — separate from the original FT output. This prevents calibration artifacts from contaminating retraining data.

Simulation UI

Generate, correct, and tune — all in one conversation

One chat session covers the full cycle. First, pick which LLM handles geometry reasoning. Then describe a traffic scene — the fine-tuned model extracts structured parameters, the system builds the road network, and SUMO executes the scenario. If the result is wrong, open Correction mode: fix the parameter or geometry error, and the delta becomes retraining data for the next fine-tuning round. If the result is acceptable but you want a variant, use Tuning mode instead — the change is logged for analysis but kept out of the training signal so preference edits never pollute the dataset. An optional Calibrate button auto-adjusts behavioral parameters so the simulated speed converges toward the FT-predicted target.

The browser flow is also a real working interface, not just a demo shell: simulation progress streams live, the generated network is previewed in-chat, the latest SUMO artifact set can be downloaded as a ZIP, and, in local environments, completed scenarios can be reopened in sumo-gui for manual inspection.

Admin Dashboard

Monitor corrections, inspect error patterns, and export retraining data

The admin dashboard at /admin closes the feedback loop by surfacing what the model got wrong and making correction data directly exportable. Every correction is traceable end to end: user edit, SQLite log, admin panel, exported JSONL, re-fine-tune.

Why the admin surface matters

Without an inspection layer, corrections disappear into chat history. The admin dashboard makes error patterns visible — which fields drift, which geometry types fail, and how often corrections are trainable — so the next fine-tuning round targets the right gaps.

Three downloadable exports close the loop: Corrections JSONL for re-fine-tuning, Evaluation Report for grade distribution and fidelity, and LLM Evaluation Report for field-level error analysis and parameter deltas.

Admin dashboard showing correction rate, field error bars, and recent modifications

Dashboard Preview

Correction rates, error bars, and modification logs in one place

Top-level cards show correction rate and fail rate. The LLM evaluation panel breaks down per-field errors and geometry correction types.

Open admin dashboard

Overview

Top cards

Correction rate, mixed fail rate, total simulations, parameter/geometry correction counts at a glance.

Error Analysis

LLM Correction Evaluation

Per-field error bar charts, geometry correction type breakdown, and average parameter deltas.

History

Recent Simulations

Prompt, grade (A/B/C/D), and timestamp for the latest 30 runs.

Modifications

Recent Modifications

Edit intent, modification type, trainable flag, and user input for the latest 50 edits.

Lessons Learned

What worked, what stayed fragile, and what the project actually proves

The main win is not perfect scenario generation. It is that the system makes evaluation and future improvement structurally possible after deployment.

The contribution is the workflow split, not just the benchmark number.

Fine-tuning works best on constrained, structured outputs. Geometry and XML generation is handled by a general-purpose LLM because fine-tuning it was not feasible within the project's budget — this is the clearest remaining limitation. The second contribution is the review loop: the system preserves correction intent, session history, and export eligibility so human review can become clean retraining signal rather than one-off conversation debris.

Geometry generation is the clearest fine-tuning gap.

Road layout and XML generation still rely on in-context prompting with the geometry LLM. Fine-tuning this layer would require structured geometry datasets and significantly more API budget — a clear next step if resources allow.

Structured extraction converges fast even with limited data.

Parameter prediction with constrained JSON converged quickly on ~2,450 samples, while open-ended geometry remained too brittle to treat as the main supervised surface.

Correction versus tuning is a data-quality decision.

Without that split, preference-driven edits would silently pollute the exported retraining dataset and weaken future fine-tuning quality.

Fallbacks are part of the product, not backup code.

OSM lookups and external dependencies fail often enough that XML and geometry fallback must remain visible, supported workflow paths rather than hidden error handlers.