cligen/parseopt3

This module provides a Nim command line parser that is mostly API compatible with the Nim standard library parseopt (and the code derives from that). It supports one convenience iterator over all command line options and some lower-level features. Supported command syntax (here =|: may be any char in sepChars):

  1. short option bundles: -abx (where a, b, x are in shortNoVal)

1a. bundles with one final value: -abc:Bar, -abc=Bar, -c Bar, -abcBar (where c is not in shortNoVal)

  1. long options with values: --foo:bar, --foo=bar, --foo bar (where foo is not in longNoVal)

2a. long options without vals: --baz (where baz is in longNoVal)

  1. command parameters: everything else | anything after "--" or a stop word.

The above is a superset of usual POSIX command syntax - it should accept any POSIX-inspired input, but it also accepts more forms/styles. (Note that POSIX itself is not super strict about this part of the standard. See: http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap12.html)

When optionNormalize(key) is used, command authors provide command users additional flexibility to --spell_multi-word_options -aVarietyOfWays --as-Per_User-Preference. This is similar to Nim style-insensitive identifier syntax, but by default allows dash ('-') as well as underscore ('_') word separation.

The "separator free" forms above require appropriate shortNoVal and longNoVal lists to designate option keys that take no value (as well as requireSeparator == false). If such lists are empty, the user must use separators when providing any value.

A notable subtlety is when the first character of an option value is one of sepChars. Even if requireSeparator is false, passing such option values requires either A) putting the value in the next command parameter, as in "-c :" or B) prefixing the value with an element of sepChars, as in -c=: or -c::. Both choices fit into common quoting styles. It seems likely a POSIX-habituated end-user's second guess (after "-c:" errored out with "argument expected") would just work as they expected. POSIX itself encourages authors & users to use the "-c :" form anyway. This small deviation lets this parser accept valid invocations with the original Nim option parser command syntax (with the same semantics), easing transition.

To ease "nested" command-line parsing (such as with "git" where there may be early global options, a subcommand and later subcommand options), this parser also supports a set of "stop words" - special whole command parameters that prevent subsequent parameters being interpreted as options. This feature makes it easy to fully process a command line and then re-process its tail rather than mandating breaking out at a stop word with a manual test. Stop words are basically just like a POSIX "--" (which this parser also supports - even if "--" is not in stopWords). Such stop words (or "--") can still be the values of option keys with no effect. Only usage as a non-option command parameter acts to stop possible option-treatment of later parameters.

To facilitate syntax for operations beyond simple assignment, opChars is a set of chars that may prefix an element of sepChars. The sep member of OptParser is the actual separator used for the current option, if any. E.g, a user entering "=" causes sep == "=" while entering "+=" gets sep == "+=", and "+/-+=" gets sep == "+/-+=".

Types

CmdLineKind = enum
  cmdEnd,                   ## end of command line reached
  cmdArgument,              ## argument detected
  cmdLongOption,            ## a long option ``--option`` detected
  cmdShortOption,           ## a short option ``-c`` detected
  cmdError                   ## error in primary option syntax usage
the detected command line token
GetoptResult = tuple[kind: CmdLineKind, key, val: TaintedString]
OptParser = object of RootObj
  cmd*: seq[string]          ## command line being parsed
  pos*: int                  ## current command parameter to inspect
  off*: int                  ## current offset in cmd[pos] for short key block
  optsDone*: bool            ## "--" has been seen
  shortNoVal*: set[char]     ## 1-letter options not requiring optarg
  longNoVal*: CritBitTree[string] ## long options not requiring optarg
  stopWords*: CritBitTree[string] ## special literal params acting like "--"
  requireSep*: bool          ## require separator between option key & val
  sepChars*: set[char]       ## all the chars that can be valid separators
  opChars*: set[char]        ## all chars that can prefix a sepChar
  longPfxOk*: bool           ## true means unique prefix is ok for longOpts
  stopPfxOk*: bool           ## true means unique prefix is ok for stopWords
  sep*: string               ## actual string separating key & value
  message*: string           ## message to display upon cmdError
  kind*: CmdLineKind         ## the detected command line token
  key*, val*: TaintedString  ## key and value pair; ``key`` is the option
                             ## or the argument, ``value`` is not "" if
                             ## the option was given a value
object to implement the command line parser

Procs

proc initOptParser(cmdline: seq[string] = commandLineParams();
                   shortNoVal: set[char] = {}; longNoVal: seq[string] = @[];
                   requireSeparator = false; sepChars = {'=', ':'};
                   opChars: set[char] = {}; stopWords: seq[string] = @[];
                   longPfxOk = true; stopPfxOk = true): OptParser {....raises: [],
    tags: [], forbids: [].}

Initializes a parse. cmdline should not contain parameter 0, typically the program name. If cmdline is not given, default to current program parameters.

shortNoVal and longNoVal specify respectively one-letter and long option keys that do not take arguments.

If requireSeparator==true, then option keys&values must be separated by an element of sepChars (default {'=',':'}) in short or long option contexts. If requireSeparator==false, the parser understands that only non-NoVal options will expect args and users may say -aboVal or -o Val or --opt Val { as well as the -o:Val|--opt=Val separator style which always works }.

If opChars is not empty then those characters before the :|== separator are reported in the .sep field of an element parse. This allows "incremental" syntax like --values+=val.

If longPfxOk then unique prefix matching is done for long options. If stopPfxOk then unique prefix matching is done for stop words (usually subcommand names).

Parameters following either "--" or any literal parameter in stopWords are never interpreted as options.

proc initOptParser(cmdline: string): OptParser {....raises: [],
    tags: [ReadIOEffect], forbids: [].}
Initializes option parses with cmdline. Splits cmdline in on spaces and calls initOptParser(openarray[string]). Should use a proper tokenizer.
proc lengthen[T](cb: CritBitTree[T]; key: string; prefixOk = false): string
Use cb to find normalized long form of key. Return empty string if ambiguous or unchanged string on no match.
proc next(p: var OptParser) {....raises: [], tags: [], forbids: [].}
proc optionNormalize(s: string; wordSeparators = "_-"): string {.noSideEffect,
    ...raises: [], tags: [], forbids: [].}

Normalizes option key s to allow command syntax to be style-insensitive in a similar way to Nim identifier syntax.

Specifically this means to convert all but the first char to lower case and remove chars in wordSeparators ('_' and '-') by default. This way users can type "command --my-opt-key" or "command --myOptKey" and so on.

Example:

for kind, key, val in p.getopt():
  case kind
  of cmdLongOption, cmdShortOption:
    case optionNormalize(key)
    of "myoptkey", "m": doSomething()
proc valsWithPfx[T](cb: CritBitTree[T]; key: string): seq[T]

Iterators

iterator getopt(cmdline = commandLineParams(); shortNoVal: set[char] = {};
                longNoVal: seq[string] = @[]; requireSeparator = false;
                sepChars = {'=', ':'}; opChars: set[char] = {};
                stopWords: seq[string] = @[]): GetoptResult {....raises: [],
    tags: [], forbids: [].}
This is an convenience iterator for iterating over the command line. Parameters here are the same as for initOptParser. Example: See above for a more detailed example
for kind, key, val in getopt():
  # this will iterate over all arguments passed to the cmdline.
  continue
iterator getopt(p: var OptParser): GetoptResult {....raises: [], tags: [],
    forbids: [].}
An convenience iterator for iterating over the given OptParser object. Example:
var filename: string = ""
var p = initOptParser("--left --debug:3 -l=4 -r:2")
for kind, key, val in p.getopt():
  case kind
  of cmdArgument:
    filename = key
  of cmdLongOption, cmdShortOption:
    case key
    of "help", "h": writeHelp()
    of "version", "v": writeVersion()
  of cmdEnd: assert(false) # cannot happen
  of cmdError:  quit(p.message, 2)
if filename == "":
  # no filename has been given, so we show the help:
  writeHelp()