cligen/abbrev

This module is about construction of shell-* wildcard-like abbreviations for a set of strings. The simplest variant is when the user gives us a specific maximum string length and at least one component of the head,tail slices. In that case, abbrev just wraps Nim string slicing and the set of abbreviations may or may not uniquely cover the input string set.

When the user leaves certain aspects automatic/undefined, various algorithms can be deployed to find optimized answers within various constraints. The most important constraint is likely uniqueness. Can I select a string, paste it into a shell REPL, and expect it to expand to exactly 1 item? The same concept applies to non-filename strings, but there is less help from a shell to expand them. Eg., 10-40 user names on a lightly configured single-user Unix system uses "a directory in your brain" (or you can write a small expander script to spot-check anything non-obvious in context).

The simplest automatic setting is just a fixed wildcard spot protocol such as a specific head or mid-point with binary search quickly finding a smallest max limit which makes abbreviations unique. Visually this results in columns of abbreviations where the wildcard *s line up vertically which seems easier to read. We can also ignore user head,tail specs, instead finding the location that minimizes the width of the string set. This code does that by just trying all possible locations. This starts to be slow for a computer but remains fast for a human (eg. 400 ms on a directory of 12,000 entries).

The next level of optimization/data compression is to allow the location of the wildcard to vary from string to string. After that, allowing >1 '*' can continue to shorten strings. Each optimization level removes more context making strings harder to read & gets slower to compute. Efficient algorithms for this case are a work in progress. This algo research area seems neglected.

Types

Abbrev = object
  mx*, hd, tl, sLen, m, h, t: int

Consts

parseAbbrevHelp = """a\*|M,head(M/2),tail(M-hdSep),sep(\*),?chars
a:bestPos -2:pfx -3:sfx -4:mfx -5:1\* -6:2\*
POSITIVE_NUMBER=thatWidth/head/tail"""

Procs

proc abbrev(a: Abbrev; str: string): string {.inline, ...raises: [KeyError],
    tags: [], forbids: [].}
Abbreviate str as str[0..<hd], sep, str[^tl..^1] only if str.len > mx.
proc expandFit(a: var Abbrev; strs: var seq[string];
               ab0, ab1, wids, colWs: var seq[int]; w, jP, m, nr, nc: int) {.
    ...raises: [KeyError], tags: [], forbids: [].}
Expand any a.sep in strs[m*i + jP][ab0[i] ..< ab1[i]], updating colWs[m*(i div nr) + jP] until all seps gone or colWs.sum==w. I.e. colWs include gap to right. Overall table structure is preserved. Early a.sep instances are fully expanded before later instances change.
proc isAbstract(a: Abbrev): bool {.inline, ...raises: [], tags: [], forbids: [].}
proc parseAbbrev(s: string): Abbrev {....raises: [ValueError], tags: [],
                                      forbids: [].}
Parse comma-separated abbreviation spec [mx][,[hd][,[tl][,[sep]]]] s into Abbrev abb. Non-numeric mx => -1 => caller must set mx and call update. Non-numeric|missing hd => mx-sep.len-tl Non-numeric or missing tl => mx-sep.len-hd. Non-num|missing both => hd=(mx-sep.len)/2; tl=mx-sep.len-hd (which gives tl 1 more for odd mx-sep.len). mx < 0 => various locally optimized uniqueAbbrevs.
proc realize(a: var Abbrev; strs: openArray[string]) {....raises: [KeyError],
    tags: [], forbids: [].}
Semi-efficiently find the smallest max such that strs can be uniquely abbreviated by abbrev(s, mx, hd, tl) for all s in strs.
proc realize[T](a: var Abbrev; tab: Table[T, string])
Find smallest max s.t. abbrev unique over values of tab.
proc uniqueAbbrevs(a: var Abbrev; strs: openArray[string]): seq[string] {.
    ...raises: [], tags: [], forbids: [].}
Return narrowest unique abbrevation set for strs given some number of wildcards (sep, probably *), where both location and number of wildcards can vary from string to string.
proc update(a: var Abbrev) {.inline, ...raises: [], tags: [], forbids: [].}
Call this after any change to a.mx to update derived quantities.