OSOL
GitHub

Anonymized

Implementation details have been anonymized to protect intellectual property. The methodology, metrics, and results are real. For the full technical breakdown, reach out at tomas@omnimetrix.io.

Approved2026-02-14queryalgorithm

Loop 08: CTE Re-execution Elimination (Second Pass)

-22.1%961ms → 749ms

BEFORE

Loading...

AFTER

Loading...

Loop 08: CTE Re-execution Elimination (Second Pass)

What changed

Applied the same CTE linearization pattern from Loop 07 one level downstream. A downstream CTE was still referenced by two consumers -- an aggregation to find a derived position and a join to filter by that position. Replaced both with a single window function that computes the position in one pass, eliminating the double-reference.

Why we expected it to work

Pipeline analysis still showed 4 Union blocks (2 logical re-executions) after Loop 07. The downstream CTE was referenced twice, causing the full upstream chain -- including multiple upstream operations -- to execute twice. Eliminating this duplication should halve the remaining re-executions.

Results

Metric Before After Delta
Mean response time (cold) 961ms 749ms -22.1%
MergeTreeSelect operations ~16 8 -50%
Pipeline Union blocks 4 2 -50%

Why it worked

Pipeline complexity dropped by half -- from 4 Union blocks to 2 (the irreducible minimum for pipeline parallelism). The GROUP BY plus JOIN was replaced by a window function that shares an existing partition, adding negligible compute while removing an entire re-execution of the upstream chain. Four of five cold runs showed 211-281ms improvement; the single near-tie run is attributable to shared infrastructure variance.

Cumulative impact

Original: 1,829ms → After Loop 08 (cold): ~550ms (70.0% total reduction)