Skip to content

High Performance Parsing

High performance parsing is a parsing that is compiled directly to the machine code, thus ensuring highest possible speed of parsing incoming events.

All built-in preprocessors as well as declarative expressions !PARSE and !DATETIME.PARSE offer high performance parsing.

Procedural parsing

In order for the machine/instruction code to be compiled via LLVM and C, all expressions need to provide definition of the procedural parsing, meaning that each character(s) in the parsing input string needs have defined the output length and output type.

While for preprocessors the procedure is transparent and not shown to the user, in !PARSE and !DATETIME.PARSE expressions, the exact procedure needs with types and format to be defined in the format attribute:

!DATETIME.PARSE
what: "2021-06-11 17"
format:
  - year: {type: ui64, format: d4}
  - '-'
  - month: {type: ui64, format: d2}
  - '-'
  - day: {type: ui64, format: d2}
  - ' '
  - hour: {type: ui64, format: d2}

First item in the format attribute corresponds to the first character(s) in the incoming message, here year is formed from first four characters and traslated to integer (2021).

If only a single character is specified, it is skipped and not stored in the output parsed structure.

High Performance Expressions

!DATETIME.PARSE

!DATETIME.PARSE implicitly creates a datetime from the parsed structure, which has following attributes:

  • year

  • month

  • day

  • hour (optional)

  • minute (optional)

  • second (optional)

  • microsecond (optional)

Format - long version

The attributes need to be specified in the format inlet:

!DATETIME.PARSE
what: "2021-06-11 1712X000014"
format:
  - year: {type: ui64, format: d4}
  - '-'
  - month: {type: ui64, format: d2}
  - '-'
  - day: {type: ui64, format: d2}
  - ' '
  - hour: {type: ui64, format: d2}
  - minute: {type: ui64, format: d2}
  - 'X'
  - microsecond: {type: ui64, format: dc6}

Format - short version

The format can use shortened notation with %Y, %m, %d, %H, %M, %S and %u (microsecond) placeholders, which represent unsigned numbers based on the format in the example above:

!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: "%Y-%m-%dT%H:%MZ"

The format statement can be simplified, if the datetime format is standardized, such as RFC3339 or iso8601:

!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: iso8601

If timezone is different from UTC, also it needs to be explicitly specified:

!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: iso8601
timezone: Europe/Prague

Available types

Integer

  • {type: ui64, format: d2} - exactly 2 characters to unsigned integer

  • {type: ui64, format: d4} - exactly 4 characters to unsigned integer

  • {type: ui64, format: dc6} - 1 to 6 characters to unsigned integer