╔═══════════════════════════════════════════════════════════╗
║ Leftmost-Longest-Common-Substring-Based Alignment with DP ║
╚═══════════════════════════════════════════════════════════╝
[2026-02-18@dgpb]
Find all hierarchical longest common substring matches between two sequences.
Hierarchical matching:
1. Find the leftmost longest common substring (the "anchor")
2. Recursively find matches before and after the anchor
3. Repeat until no matches remain
Example:
seq1: "a_bb_ccc_a_ab_dddd"
seq2: "a+bb+dddd+ccc"
Result: [(0,0,1), (2,2,2), (14,5,4)]
Match 1: 'a' @ (0,0) len=1
Match 2: 'bb' @ (2,2) len=2
Match 3: 'dddd' @ (14,5) len=4
ccc is unmatched because it has conflict with dddd, which being largest has priority.
This document discusses 3 algorithms for computing hierarchical longest common substring matches:
All produce identical results.
Focus is on FindMultiLayer, which reduces recursion depth multiplier to at most O(n^(1/4)) for small alphabet, O(1) for large alphabet.
def find_one_best_recurse(seq1, seq2, offset1=0, offset2=0):
if not seq1 or not seq2:
return []
pos1, pos2, length = find_longest_one_best(seq1, seq2)
if length == 0:
return []
left = find_one_best_recurse(seq1[:pos1], seq2[:pos2], offset1, offset2)
current = [(offset1 + pos1, offset2 + pos2, length)]
right = find_one_best_recurse(
seq1[pos1+length:], seq2[pos2+length:],
offset1 + pos1 + length, offset2 + pos2 + length
)
return left + current + right
find_longest_one_best: Standard DP tracking one best match per i (leftmost in j).
Complexity: O(n₁ × n₂ × depth)
- Best: O(n₁ × n₂) - single match
- Worst: O(n₁ × n₂ × min(n₁,n₂)) - unbalanced tree
Improved recursive approach finding all ordered non-overlapping best matches.
Enhanced DP returning all leftmost non-overlapping ordered blocks in single pass.
These are the same ones that Algorithm 1 would collect in as many DP passes as there are blocks of the same size.
def find_best_layer(seq1, seq2):
j2len = {}
best_k = 0
results = []
last_i_pk = -1
last_j_pk = -1
while i < stop1:
elt = seq1[i]
p2 = seq2_positions[elt]
new_best = False
i_cur = -1
j_cur = -1
k_cur = 0
newj2len = {}
for j in p2:
if OUT_OF_RANGE:
continue
k = j2lenpop(j - 1, 0) + 1
newj2len[j] = k
if k > best_k:
best_k = k
i_cur = i
j_cur = j
new_best = True
elif i_cur == -1 and k == best_k:
if i >= last_i_pk and j >= last_j_pk:
i_cur = i
j_cur = j
if i_cur != -1:
if new_best:
results.clear()
results.append((i_cur, j_cur))
last_i_pk = i_cur + best_k
last_j_pk = j_cur + best_k
j2len = newj2len
i += 1
Key difference: FindOneBest keeps only leftmost j; FindBestLayer keeps all j with max length, giving better subdivisions.
Complexity improvement: Eliminates O(n) worst case by finding better subdivisions. However, O(√n) worst case remains when many matches have different lengths (must recurse through hierarchy).
Maximal match is a match that can not be extended in either direction.
Core idea:
FindBestLayer mode.Why it works - two cases:
# 1. Multi-Layer
collected = {} # dict[length, list[tuple[i_end, j_end]]]
min_ck = 1 # Minimum collected match length
# Set memory limit
memleft = int((len(seq1) + len(seq2)) * memult)
if memleft < 100:
memleft = 100
i = start1
while i < stop1:
elt = seq1[i]
p2 = seq2_positions[elt]
# 1.1. DP loop
newj2len = {}
j2lenpop = j2len.pop
for j in p2:
if OUT_OF_RANGE:
continue
newj2len[j] = j2lenpop(j - 1, 0) + 1
# 1.2. Collect maximal matches
# j2len holds entries not extended (popped) in this iteration
# = maximal matches ending at i-1
i_m1 = i - 1
for j, k in j2len.items():
if k >= min_ck:
collected.setdefault(k, []).append((i_m1, j))
memleft -= 1
j2len = newj2len
i += 1
# 1.3. Trim if needed
while memleft < 0:
if len(collected) == 1:
# Break to single layer
break
# Drop shortest length bucket
# Matches at min_k length are discarded;
# min_ck ensures future matches at this length are not collected
min_k = min(collected)
memleft += len(collected.pop(min_k))
min_ck = min_k + 1
else:
# else runs only if inner while completed
# without break (memory still OK)
continue
# Break to single layer
break
else:
# Finished successfully
# Append maximal matches from last iteration
i_m1 = stop1 - 1
for j, k in j2len.items():
if k >= min_ck:
collected.setdefault(k, []).append((i_m1, j))
# Layer and return
results = self.layer_maximal_matches(collected, min_ck)
return results
# 2. Broke out - continue with single layer (Algorithm 2)
best_k = max(collected)
...
A match is trimmed when it partially overlaps an already-placed results.
Its start or end is adjusted to fit in the adjacent gap, and it is re-queued at the reduced length.
Implementation has few edge cases, but the logic is simple:
collected = {length: [match1, match2, ...], ...}
results = []
for k in sorted(collected, reverse=True):
matches = collected.pop(k)
for match in matches:
if inserts_cleanly(match, results):
insert(match, results)
else:
new_match, new_k = trim_match(match, results)
if new_k >= min_valid_k:
collected[new_k].insert(new_match)
Space: O(n₂ + M), where M = total maximal matches collected (bounded by memleft cap, which is O(n₁ + n₂) in practice).
The cap memleft = n₁ + n₂ can be lowered (e.g. to (n₁ + n₂) / 2) at the cost of pruning more length buckets earlier.
Quality degrades gracefully — little is lost as small lengths have disproportionately many cross-matches.
DP Time: O(n₁ × n₂)
Layering Time: O(M)
Recursion depth multiplier Time: Large Alphabet O(1). Small alphabet worst case O(n^(1/4)) - empirically, on:
seq1 = 'a_aa_aaa_aaaa_aaaaa_aaaaaa_' ...
seq2 = 'a.aa.aaa.aaaa.aaaaa.aaaaaa.' ...
Performance across different sequence lengths (ms, CPython 3.13.4, macOS 11.7.11):
All benchmarks are produced with autojunk=False.
Assumption: DP for Small alphabet cases runs in O(n).
In reality, these cases are not purely O(n) as blocks are made up of many repeated characters.
Just different blocks are made up of different characters.
| Units: ms | FindOneBest | FindBestLayer | FindMultiLayer |
|---|---|---|---|
| rng01 | 91 | 117 | 58 |
| aaa / aaa | 100 | 118 | 97 |
| a_a_a / a.a.a | 3,032 | 32 | 24 |
| a_aa_aaa / a.aa.aaa | 671 | 919 | 105 |
| aaa_aa_a / aaa.aa.a | 671 | 694 | 99 |
| abc / abc | 1 | 1 | 1 |
| a_b_c / a.b.c | 46 | 2 | 1 |
| a_bb_ccc / a.bb.ccc | 40 | 47 | 5 |
| aaa_bb_c / aaa.bb.c | 42 | 49 | 5 |
| ccc_bb_a / a.bb.ccc | 4 | 5 | 4 |
| a.bb.ccc / ccc_bb_a | 4 | 5 | 4 |
| global-block-shuffle-LA | 6 | 8 | 4 |
| local-block-shuffle-LA | 13 | 16 | 5 |
| parallel-block-shuffle-LA | 18 | 16 | 4 |
| diff-simulation | 32 | 31 | 11 |
| Units: ms | FindOneBest | FindBestLayer | FindMultiLayer |
|---|---|---|---|
| rng01 | 323 | 451 | 223 |
| aaa / aaa | 417 | 479 | 429 |
| a_a_a / a.a.a | 22,647 | 126 | 112 |
| a_aa_aaa / a.aa.aaa | 3,703 | 5,055 | 472 |
| aaa_aa_a / aaa.aa.a | 3,826 | 3,852 | 509 |
| abc / abc | 2 | 2 | 2 |
| a_b_c / a.b.c | 180 | 5 | 3 |
| a_bb_ccc / a.bb.ccc | 150 | 184 | 11 |
| aaa_bb_c / aaa.bb.c | 153 | 196 | 13 |
| ccc_bb_a / a.bb.ccc | 12 | 16 | 13 |
| a.bb.ccc / ccc_bb_a | 11 | 14 | 11 |
| global-block-shuffle-LA | 23 | 28 | 14 |
| local-block-shuffle-LA | 33 | 40 | 14 |
| parallel-block-shuffle-LA | 42 | 51 | 14 |
| diff-simulation | 110 | 103 | 35 |
| Units: ms | FindOneBest | FindBestLayer | FindMultiLayer |
|---|---|---|---|
| abc / abc | 18 | 16 | 12 |
| a_b_c / a.b.c | --- | --- | 18 |
| a_bb_ccc / a.bb.ccc | --- | --- | 162 |
| aaa_bb_c / aaa.bb.c | --- | --- | 168 |
| ccc_bb_a / a.bb.ccc | 183 | 306 | 173 |
| a.bb.ccc / ccc_bb_a | 174 | 230 | 175 |
| global-block-shuffle-LA | 398 | 474 | 170 |
| local-block-shuffle-LA | 1,009 | 1,115 | 180 |
| parallel-block-shuffle-LA | 1,076 | 1,194 | 164 |
| diff-simulation | 1,750 | 1,856 | 536 |
Random Binary — rng01
All algorithms ~O(n²) with similar constants.
Single Match — aaa / aaa
All algorithms ~O(n²) with similar constants.
Pathological O(n) Recursion — a_a_a / a.a.a
FindOneBest: O(n²) × O(n) = O(n³) due to O(n) recursion depth → 22,647ms at n=1500.
FindBestLayer/FindMultiLayer: O(n²) — eliminates depth multiplier entirely.
Pathological O(√n) Recursion — a_aa_aaa / a.aa.aaa and aaa_aa_a / aaa.aa.a
FindOneBest/FindBestLayer: O(n²) × O(√n) due to O(√n) recursion depth.
FindMultiLayer: O(n²) × O(n^(1/4)) - worst case.
Single Match — abc / abc
Trivial, one match.
All algorithms O(n).
Pathological O(n) Recursion — a_b_c / a.b.c
FindOneBest: O(n) × O(n) = O(n²) — n matches of length 1, each triggers O(n) DP.
FindBestLayer/FindMultiLayer: O(n) — single pass finds all matches.
Pathological O(√n) Recursion — a_bb_ccc / a.bb.ccc and aaa_bb_c / aaa.bb.c
FindOneBest/FindBestLayer: O(DP) × O(√n).
FindMultiLayer: O(DP) — eliminates depth multiplier.
Single Match — ccc_bb_a and a.bb.ccc variants
Single match invalidating all others.
All algorithms ~O(DP) × O(1).
global/local/parallel-block-shuffle
All algorithms ~O(n), small absolute times (measurement noise dominates).
FindMultiLayer competitive throughout.
diff-simulation
1 block of largest size, 2 blocks of 2nd largest, 3 blocks of the next, etc...
All shuffled in parallel with single character separators between them.
(Real life diffs are simpler.)
FindMultiLayer is 3x faster than FindOneBest and FindBestLayer.
| Units: kB | FindOneBest | FindBestLayer | FindMultiLayer | (n1 + n2) // 8 |
|---|---|---|---|---|
| rng01 | 43 | 49 | 116 | 63 |
| aaa / aaa | 78 | 79 | 198 | 109 |
| a_a_a / a.a.a | 75 | 83 | 135 | 92 |
| a_aa_aaa / a.aa.aaa | 69 | 77 | 153 | 111 |
| aaa_aa_a / aaa.aa.a | 69 | 74 | 149 | 89 |
| abc / abc | 85 | 86 | 86 | 88 |
| a_b_c / a.b.c | 90 | 95 | 123 | 110 |
| a_bb_ccc / a.bb.ccc | 30 | 32 | 91 | 39 |
| aaa_bb_c / aaa.bb.c | 31 | 40 | 88 | 44 |
| ccc_bb_a / a.bb.ccc | 30 | 29 | 88 | 35 |
| a.bb.ccc / ccc_bb_a | 30 | 30 | 90 | 34 |
| global-block-shuffle-LA | 30 | 30 | 89 | 36 |
| local-block-shuffle-LA | 30 | 30 | 88 | 39 |
| parallel-block-shuffle-LA | 30 | 31 | 89 | 40 |
| diff-simulation | 46 | 41 | 99 | 53 |
FindOneBest worst case is 90 kB.FindBestLayer worst case is 95 kB.FindMultiLayer worst case is 198 kB and 111 kB with 1/8th of (n1 + n2) limit.All methods have O(n) memory complexity.
| Complexity | Small Alpha Recursion (*) |
Large Alpha Recursion (*) |
|
|---|---|---|---|
| FindOneBest | very simple | O(√n) - O(n) | O(√n) - O(n) |
| FindBestLayer | simple | Some O(√n) | Some O(√n) |
| FindMultiLayer | Moderate | Rare O(n^(1/4)) | Mostly O(1) |
(*) Beyond O(n log n).
FindMultiLayer worst case recursion multiplier is O(n^(1/4)).FindMultiLayer eliminates recursion entirely.FindMultiLayer is sensible for the improvement it delivers.When autojunk=True, this still leaves the FindMultiLayer with total O(n2) worst case.
The worst case can be constructed as follows:
autojunk thresholdeach = max(1, N // 100 - 10)
b = N // each + 1
chars = ''.join([each * chr(i + 100) for i in range(b)])
blocks = []
i = 1
s = j = 0
while s + i + 1 < N:
blocks.append(chars[j:j+i])
j += i
s += i + 1
i += 1
seq1 = '-'.join(blocks)
seq2 = '.'.join(blocks)
Runtimes for sequences constructed with code above:
| FindMultiLayer | FindMultiLayer | ||
|---|---|---|---|
| FindOneBest | mem=(n1+n2) | mem=(n1+n2) // 8 | |
| size 10K | 10.6 s | 0.15 s | 0.2 s |
| size 15K | 24.1 s | 0.4 s | 0.4 s |
| size 20K | 60.5 s | 0.6 s | 0.8 s |
| size 50K | ~ 13 m | 4.4 s | 6.6 s |
| size 100K | ~ 1.5 h | 20 s | 31 s |
FindMultiLayer with memory=(n1 + n2) processes 90% of size in 1st pass.FindMultiLayer with memory=(n1 + n2) // 8 processes 50% of size in 1st pass.difflib.SequenceMatcher with autojunk=True is used in many applications that occasionally hit pathological cases.
The FindMultiLayer offers non-trivial improvement, which would remove most of related issues.