Skip to content

Grammar

The grammar for passgen is, at first glance, similar to that of regular expressions, however it is much simplified.

Tokens

Any valid unicode codepoint is a valid input token.

abßü → a b ß ü

In addition, there are some special types of tokens. There is a unicode escape sequence, which looks like this, where E4 stands for the unicode character 0xE4 (expressed in hexadecimal).

\u{E4} → ä

In some cases, characters have to be escaped to be recognised as simple tokens and not control characters. This depends on the context.

ab\[\] → a b [ ]

Syntax

Simple characters are printed verbatim.

abcabc

Groups are delimited by parentheses and have multiple options, delimited by a pipe ('|') character.

(abc|def)abc or def

The root element is a group, so parentheses aren't necessary.

abc|defabc or def

Sets are groups of possible choices for a character, delimited by square brackets. They can also contain ranges of characters delimited by a dash ('-').

[abc]a or b or c
[0-3]0 or 1 or 2 or 3

Special characters are special escaped characters which are replaced by something.

\p[english]{10,12} → broullion

Modifiers

Modifiers can alter the syntax element they apply to. The optional modifier, denoted by a question mark ('?') after the element, makes it optional (meaning there is a fifty percent chance it doesn't appear).

abc?abc or ab
[ab][de]?ad or ae or bd or be or a or b
abc(def)?abc or abcdef

The repetition modifier allows repeating a syntax element an arbitrary amount of times. It uses curly braces and takes either one or multiple numbers. Using a single number repeats the syntax element that many times.

abc{3}abccc
[ab]{2}aa or ab or ba or bb
(word){2}wordword

Using two numbers specifies a minimum and maximum amount of repetitions (the number is chosen randomly).

abc{1,3}abc or abcc or abccc
[ab]{1,2}a or b or aa or ab or ba or bb

Examples

For example, to generate an email address with the user part consisting only of lowercase alphabetical characters, numbers and dots, and the domain consisting only of lowercase alphabetical characters and a TLD, a pattern could look something like this:

$ passgen '[a-z0-9.]{3,10}@[a-z]{3,10}.(com|net|org)'
viy4@kdptilen.org

Formal Grammar

The format grammar of the password pattern grammar is denoted here in BNF form.

tokenunicode | '\' unicode | '\' 'u' '{' hex '}'

patterngroup-inner

group-innersegment ('|' group-inner)? | empty

segmentchar segment | set segment | group segment | empty

modifiers ← '?' modifiers | '{' number '}' modifiers | '{' number ',' number '}' modifiers | empty

group ← '(' group-inner ')' modifiers

chartoken modifier

set ← '[' set-inner ']' modifiers