PARSEC expressions¤
Parsec expressions group represents the concept of Parser combinator.
They provide a way to combine basic parsers in order to construct more complex parsers for specific rules. In this context, a parser is a function that takes string as input and produces a structured output, that indicates successful parsing or provide an error message if the parsing process fails.
Parsec expressions are divided into two groups: parsers and combinators.
Parsers can be seen as the fundamental units or building blocks. They are responsible for recognizing and processing specific patterns or elements within the input string.
Combinators, on the other hand, are operators or functions that allow the combination and composition of parsers.
Every expression starts with !PARSE.
prefix.
!PARSE.DIGIT
: Parse a single digit¤
Type: Parser.
Synopsis:
!PARSE.DIGIT
Example
Input string: 2
!PARSE.DIGIT
!PARSE.DIGITS
: Parse a sequence of digits¤
Type: Parser.
Synopsis:
!PARSE.DIGITS
min: <...>
max: <...>
exactly: <...>
min
, max
and exactly
are optional.
Warning
Exactly
field can't be used together with min
or max
fields. And of course max
value can't be less than min
value.
Example
Input string: 123
!PARSE.DIGITS
max: 4
More examples
Parse as many digits as possible:!PARSE.DIGITS
!PARSE.DIGITS
exactly: 3
!PARSE.DIGITS
min: 2
max: 4
!PARSE.LETTER
: Parse a single letter¤
Latin letters from A to Z, both uppercase and lowercase.
Type: Parser.
Synopsis:
!PARSE.LETTER
Example
Input string: A
!PARSE.LETTER
!PARSE.CHAR
: Parse a single character¤
Any type of character.
Type: Parser.
Synopsis:
!PARSE.CHAR
Example
Input string: @
!PARSE.CHAR
!PARSE.CHARS
: Parse a sequence of characters¤
Type: Parser.
Synopsis:
!PARSE.CHARS
min: <...>
max: <...>
exactly: <...>
min
, max
and exactly
are optional.
Warning
Exactly
field can't be used together with min
or max
fields. And of course max
value can't be less than min
value.
Example
Input string: name@123_
!PARSE.CHARS
max: 8
Tip
Use !PARSE.CHARS
without fields to parse till the end of the string.
More examples
Parse as many chars as possible:!PARSE.CHARS
!PARSE.CHARS
exactly: 3
!PARSE.CHARS
min: 2
max: 4
!PARSE.SPACE
: Parse a single space character¤
Type: Parser.
Synopsis:
!PARSE.SPACE
!PARSE.SPACES
: Parse a sequence of space characters¤
Parse as many space symbols as possible:
Type: Parser.
Synopsis:
!PARSE.SPACES
!PARSE.ONEOF
: Parse a single character from a set of characters¤
Type: Parser.
Synopsis:
!PARSE.ONEOF
what: <...>
!PARSE.ONEOF <...>
Example
Input string: Wow!
!PARSE.ONEOF
what: "!?."
!PARSE.NONEOF
: Parse a single character that is not in a set of characters¤
Type: Parser.
Synopsis:
!PARSE.NONEOF
what: <...>
!PARSE.NONEOF <...>
Example
Input string: Wow!
!PARSE.NONEOF
what: ",;:[]()"
!PARSE.UNTIL
: Parse a sequence of characters until a specific character is found¤
Type: Parser.
Synopsis:
!PARSE.UNTIL
what: <...>
stop: <before/after>
eof: <true/false>
!PARSE.UNTIL <...>
-
stop
- indicates whether the stop character should be parsed or not. Possible values:before
orafter
(default). -
eof
- indicates if we should parse till the end of the string ifwhat
symbol is not found. Possible values:true
orfalse
(default).
Info
Field what
must be a single character. But some whitespace characters can also be used such as tab
.
Example
Input string: 60290:11
!PARSE.UNTIL
what: ":"
More examples
Parse until:
symbol and stop before it:
!PARSE.UNTIL
what: ":"
stop: "before"
!PARSE.UNTIL ' '
,
symbol or parse till the end of the string if it's not found:
!PARSE.UNTIL
what: ","
eof: true
tab
symbol:
!PARSE.UNTIL
what: 'tab'
!PARSE.EXACTLY
: Parse precisely a defined sequence of characters¤
Type: Parser.
Synopsis:
!PARSE.EXACTLY
what: <...>
!PARSE.EXACTLY <...>
Example
Input string: Hello world!
!PARSE.EXACTLY
what: "Hello"
!PARSE.BETWEEN
: Parse a sequence of characters between two specific characters¤
Type: Parser.
Synopsis:
!PARSE.BETWEEN
what: <...>
start: <...>
stop: <...>
escape: <...>
!PARSE.BETWEEN <...>
-
what
- indicates between which same characters we should parse. -
start
,stop
- indicates between which different characters we should parse. -
escape
- indicates escape character.
Example
Input string: [10/May/2023:08:15:54 +0000]
!PARSE.BETWEEN
start: '['
stop: ']'
More examples
Parse between double-quotes:!PARSE.BETWEEN
what: '"'
!PARSE.BETWEEN '"'
Input string:
"one, "two", three"
!PARSE.BETWEEN
what: '"'
escape: '\'
!PARSE.REGEX
: Parse a sequence of characters that matches a regular expression¤
Type: Parser.
Synopsis:
!PARSE.REGEX
what: <...>
Example
Input string: FTVW23_L-C: Message...
Output: FTVW23_L-C
!PARSE.REGEX
what: '[a-zA-Z0-9_\-0]+'
!PARSE.MONTH
: Parse a month name¤
Type: Parser.
Synopsis:
!PARSE.MONTH
what: <...>
!PARSE.MONTH <...>
what
- indicates a format of the month name. Possible values:number
,short
,full
.
Tip
Use !PARSE.MONTH
to parse month name as part of !PARSE.DATETIME
.
Example
Input string: 10/
May/2023:08:15:54
!PARSE.MONTH
what: 'short'
More examples
Parse month in number format:Input string:
2003-10-11
!PARSE.MONTH 'number'
Input string:
2003-OCTOBER-11
!PARSE.MONTH
what: 'full'
!PARSE.FRAC
: Parse a fraction¤
Type: Parser.
Synopsis:
!PARSE.FRAC
base: <...>
max: <...>
base
- indicates a base of the fraction. Possible values:milli
,micro
,nano
.max
- indicates a maximum number of digits depending on thebase
value. Possible values:3
,6
,9
respectively.
Tip
Use !PARSE.FRAC
to parse microseconds or nanoseconds as part of !PARSE.DATETIME
.
Example
Input string: Aug 22 05:40:14
.264
!PARSE.FRAC
base: "micro"
max: 6
!PARSE.DATETIME
: Parse datetime in a given format¤
Type: Parser.
Synopsis:
!PARSE.DATETIME
- year: <...>
- month: <...>
- day: <...>
- hour: <...>
- minute: <...>
- second: <...>
- nanosecond: <...>
- timezone: <...>
- Fields
month
,day
are required. - Field
year
is optional. If not specified, the smart year function will be used. - Fields
hour
,minute
,second
,microsecond
,nanosecond
are optional. If not specified, the default value 0 will be used. - Specifying microseconds field like
microseconds?
, allow to parse microseconds or not depends on their present in the input string. - Field
timezone
is optional. If not specified, the default valueUTC
will be used. It can be specified in two different formats.Z
,+08:00
- parsed from the input string.Europe/Prague
- specified as a constant value.
Shortcuts¤
Shortcut forms are available (in both lower/upper variants):
!PARSE.DATETIME RFC3339
!PARSE.DATETIME iso8601
Example
Input string: 2022-10-13T12:34:56.987654
!PARSE.DATETIME
- year: !PARSE.DIGITS
- '-'
- month: !PARSE.MONTH 'number'
- '-'
- day: !PARSE.DIGITS
- 'T'
- hour: !PARSE.DIGITS
- ':'
- minute: !PARSE.DIGITS
- ':'
- second: !PARSE.DIGITS
- microsecond: !PARSE.FRAC
base: "micro"
max: 6
- timezone: "Europe/Prague"
More examples
Parse datetime without year, with short month form and optional microseconds:Input string:
Aug 17 06:57:05.189
!PARSE.DATETIME
- month: !PARSE.MONTH 'short' # Month
- !PARSE.SPACE
- day: !PARSE.DIGITS # Day
- !PARSE.SPACE
- hour: !PARSE.DIGITS # Hours
- !PARSE.EXACTLY { what: ':' }
- minute: !PARSE.DIGITS # Minutes
- !PARSE.EXACTLY { what: ':' }
- second: !PARSE.DIGITS # Seconds
- microsecond?: !PARSE.FRAC # Microseconds
base: "micro"
max: 6
Input string:
2021-06-29T16:51:43+08:00
!PARSE.DATETIME
- year: !PARSE.DIGITS
- '-'
- month: !PARSE.MONTH 'number'
- '-'
- day: !PARSE.DIGITS
- 'T'
- hour: !PARSE.DIGITS
- ':'
- minute: !PARSE.DIGITS
- ':'
- second: !PARSE.DIGITS
- timezone: !PARSE.CHARS
Input string:
2021-06-29T16:51:43Z
!PARSE.DATETIME RFC3339
Input string:
20201211T111721Z
!PARSE.DATETIME iso8601
Input string:
2023-03-23T07:00:00.734323900
!PARSE.DATETIME
- year: !PARSE.DIGITS
- !PARSE.EXACTLY { what: '-' }
- month: !PARSE.DIGITS
- !PARSE.EXACTLY { what: '-' }
- day: !PARSE.DIGITS
- !PARSE.EXACTLY { what: 'T' }
- hour: !PARSE.DIGITS
- !PARSE.EXACTLY { what: ':' }
- minute: !PARSE.DIGITS
- !PARSE.EXACTLY { what: ':' }
- second: !PARSE.DIGITS
- nanosecond: !PARSE.FRAC
base: "nano"
max: 9
!PARSE.REPEAT
: Parse a repeated pattern¤
Type: Combinator.
Synopsis:
!PARSE.REPEAT
what: <...>
min: <...>
max: <...>
exactly: <...>
Fields min
, max
and exactly
are optional. If none of them is specified, what
will be repeated as many times as possible.
Example¤
Input string: abc_abc
!PARSE.REPEAT
what: !PARSE.ONEOF "abc"
exactly: 3
Output: ['a', 'b', 'c']
More examples
Parsewhat
pattern as many as possible:
!PARSE.REPEAT
what: !PARSE.EXACTLY 'hello'
what
pattern at least 2 times, but not more than 4:
!PARSE.REPEAT
what: !PARSE.EXACTLY 'hello'
min: 2
max: 4
!PARSE.SEPARATED
: Parse a sequence with a separator¤
Type: Combinator.
Synopsis:
!PARSE.SEPARATED
what: <...>
sep: <...>
min: <...>
max: <...>
end: <...>
Fields max
and end
are optional.
end
- indicates if trailing separator is required. By default, it is optional.
Example¤
Input string: 0->1->2->3
Note: trailing separator is optional, so input string 0->1->2->3->
is also valid.
!PARSE.SEPARATED
what: !PARSE.DIGITS
sep: !PARSE.EXACTLY {what: "->"}
min: 3
Output: [0, 1, 2, 3]
More examples
Parsewhat
values separated by sep
in [min;max]
interval, trailing separator is required:Input string:
11,22,33,44,55,66,
!PARSE.SEPARATED
what: !PARSE.DIGITS
sep: !PARSE.EXACTLY {what: ","}
end: True
min: 3
max: 7
what
values separated by sep
in [min;max]
interval, trailing separator is not presented:Input string:
0..1..2..3
!PARSE.SEPARATED
what: !PARSE.DIGITS
sep: !PARSE.EXACTLY {what: ".."}
end: False
min: 3
max: 5
!PARSE.TRIE
: Parse using starting prefix¤
Type: Combinator.
!PARSE.TRIE
expression chooses one of the specified prefixes and parse the rest of the input string using the corresponding parser.
Synopsis:
!PARSE.TRIE
- <prefix1>: <...>
- <prefix2>: <...>
...
Tip
Use !PARSE.TRIE
to parse multivariance log messages.
Example¤
Input string: Received disconnect from 10.17.248.1 port 60290:11: disconnected by user
!PARSE.TRIE
- 'Received disconnect from ': !PARSE.KVLIST
- CLIENT_IP: !PARSE.UNTIL ' '
- 'port '
- CLIENT_PORT: !PARSE.DIGITS
- ':'
- !PARSE.CHARS
- 'Disconnected from user ': !PARSE.KVLIST
- USERNAME: !PARSE.UNTIL ' '
- CLIENT_IP: !PARSE.UNTIL ' '
- 'port '
- CLIENT_PORT: !PARSE.DIGITS
!PARSE.OPTIONAL
: Parse optional pattern¤
Type: Combinator
!PARSE.OPTIONAL
expression tries to parse the input string using the specified parser. If the parser fails, starting position rolls back to the initial one.
Synopsis:
!PARSE.OPTIONAL
what: <...>
!PARSE.OPTIONAL <...>
Example¤
Input strings:
mymachine myproc[10]: DHCPACK to
mymachine myproc[10]DHCPACK to
!PARSE.KVLIST
- HOSTNAME: !PARSE.UNTIL {what: ' '} # mymachine
- TAG: !PARSE.UNTIL {what: '['} # myproc
- PID: !PARSE.DIGITS # 10
- !PARSE.EXACTLY {what: ']'}
- !PARSE.OPTIONAL ':'
- !PARSE.OPTIONAL
what: !PARSE.SPACE
- NAME: !PARSE.UNTIL {what: ' '}
!PARSE.KV
: Parse key-value pair¤
Type: Combinator
Synopsis:
!PARSE.KV
- key: <...>
prefix: <...>
- value: <...>
- <...> # optional elements
Tip
Use combination of !PARSE.REPEAT
and !PARSE.KV
to parse repeated key-value pairs. (see examples)
Example¤
Input string: eventID= "1011"
!PARSE.KV
- key: !PARSE.UNTIL {what: '='}
- !PARSE.SPACE
- value: !PARSE.BETWEEN {what: '"'}
Output: (eventID, 1011)
More examples
Input string:eventID= "1011"
!PARSE.KV
- key: !PARSE.UNTIL {what: '='}
prefix: SD.PARAM.
- !PARSE.SPACE
- value: !PARSE.BETWEEN {what: '"'}
(SD.PARAM.eventID, 1011)
Input string:
devid="FEVM020000191439" vd="root" itime=1665629867
!PARSE.REPEAT
what: !PARSE.KV
- !PARSE.OPTIONAL
what: !PARSE.SPACE
- key: !PARSE.UNTIL '='
- value: !TRY
- !PARSE.BETWEEN '"'
- !PARSE.UNTIL { what: ' ', eof: true}
[(devid, FEVM020000191439), (vd, root), (itime, 1665629867)]
!PARSE.KVLIST
: Parse list of key-value pairs¤
Iterating through list of elements !PARSE.KVLIST
expression collects key-value pairs to list of tuples. Non-key elements are parsed, but not collected.
Nested !PARSE.KVLIST
expressions are joined to the parent one.
Type: Combinator
Synopsis:
!PARSE.KVLIST
- <...>
- key1: <...>
- key2: <...>
- <...>
- !PARSE.KVLIST
- key3: <...>
- <...>
- key4: <...>
Example¤
Input string: <141>May 9 10:00:00 VUW-DC-F5-P2R1.source-net.com notice tmm1[22731]: 01490500:5: /Common/Citrix_Receiver..
!PARSE.KVLIST
# parse Syslog_RFC5424
- '<'
- log.syslog.priority: !PARSE.DIGITS
- '>'
- '@timestamp': !PARSE.DATETIME
- month: !PARSE.MONTH 'short'
- !PARSE.SPACES
- day: !PARSE.DIGITS # Day
- !PARSE.SPACES
- hour: !PARSE.DIGITS # Hours
- ':'
- minute: !PARSE.DIGITS # Minutes
- ':'
- second: !PARSE.DIGITS # Seconds
- timezone: "Europe/Prague"
- !PARSE.SPACES
- host.hostname: !PARSE.UNTIL ' '
- log.level: !PARSE.UNTIL ' '
- log.syslog.appname: !PARSE.UNTIL '['
- process.pid: !PARSE.DIGITS
- ']: '
- message: !PARSE.CHARS
Output: [(log.syslog.priority, 141), (@timestamp, 140994182325993472), (host.hostname, VUW-DC-F5-P2R1.source-net.com), (log.level, notice), (log.syslog.appname, tmm1), (process.pid, 22731), (message, 01490500:5: /Common/Citrix_Receiver..)]
!PARSE.TUPLE
: Parse list of values to tuple¤
Iterating through list of elements !PARSE.TUPLE
expression collects values to tuple.
Type: Combinator
Synopsis:
!PARSE.TUPLE
- <...>
- <...>
- <...>
Example¤
Input string: Hello world!
!PARSE.TUPLE
- 'Hello'
- !PARSE.SPACE
- 'world'
- '!'
Output: ('Hello', ' ', 'world', '!')
!PARSE.RECORD
: Parse list of values to record structure¤
Iterating through list of elements !PARSE.RECORD
expression collects values to record structure.
Type: Combinator
Synopsis:
!PARSE.RECORD
- <...>
- element1: <...>
- element2: <...>
- <...>
Example¤
Input string: <165>1
!PARSE.RECORD
- !PARSE.EXACTLY {what: '<'}
- severity: !PARSE.DIGITS
- !PARSE.EXACTLY {what: '>'}
- version: !PARSE.DIGITS
- !PARSE.EXACTLY {what: ' '}
Output: {'output.severity': 165, 'output.version': 1}