High Performance Parsing¶
High performance parsing is a parsing that is compiled directly to the machine code, thus ensuring highest possible speed of parsing incoming events.
All built-in preprocessors as well as declarative expressions !PARSE
and !DATETIME.PARSE
offer high performance parsing.
Procedural parsing¶
In order for the machine/instruction code to be compiled via LLVM and C, all expressions need to provide definition of the procedural parsing, meaning that each character(s) in the parsing input string needs have defined the output length and output type.
While for preprocessors the procedure is transparent and not shown to the user,
in !PARSE
and !DATETIME.PARSE
expressions, the exact procedure needs with types and format to be defined in the format
attribute:
!DATETIME.PARSE
what: "2021-06-11 17"
format:
- year: {type: ui64, format: d4}
- '-'
- month: {type: ui64, format: d2}
- '-'
- day: {type: ui64, format: d2}
- ' '
- hour: {type: ui64, format: d2}
First item in the format
attribute corresponds to the first character(s) in the incoming message,
here year
is formed from first four characters and traslated to integer (2021
).
If only a single character is specified, it is skipped and not stored in the output parsed structure.
High Performance Expressions¶
!DATETIME.PARSE
¶
!DATETIME.PARSE
implicitly creates a datetime from the parsed structure,
which has following attributes:
-
year
-
month
-
day
-
hour
(optional) -
minute
(optional) -
second
(optional) -
microsecond
(optional)
Format - long version¶
The attributes need to be specified in the format
inlet:
!DATETIME.PARSE
what: "2021-06-11 1712X000014"
format:
- year: {type: ui64, format: d4}
- '-'
- month: {type: ui64, format: d2}
- '-'
- day: {type: ui64, format: d2}
- ' '
- hour: {type: ui64, format: d2}
- minute: {type: ui64, format: d2}
- 'X'
- microsecond: {type: ui64, format: dc6}
Format - short version¶
The format
can use shortened notation with %Y
, %m
, %d
, %H
, %M
, %S
and %u
(microsecond) placeholders,
which represent unsigned numbers based on the format in the example above:
!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: "%Y-%m-%dT%H:%MZ"
The format
statement can be simplified, if the datetime format is standardized, such as RFC3339
or iso8601
:
!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: iso8601
If timezone is different from UTC, also it needs to be explicitly specified:
!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: iso8601
timezone: Europe/Prague
Available types¶
Integer¶
-
{type: ui64, format: d2}
- exactly 2 characters to unsigned integer -
{type: ui64, format: d4}
- exactly 4 characters to unsigned integer -
{type: ui64, format: dc6}
- 1 to 6 characters to unsigned integer