Parser preprocessor¶
The parser preprocessor allows to preprocess the input event by a imperative code, e.g. Python, Cython, C etc.
Example¶
---
define:
name: Demo of the build-in Syslog preprocessor
type: parser/preprocessor
tenant: Syslog_RFC5424.STRUCTURED_DATA.soc@0.tenant # (optional)
count: CEF.cnt # (optional)
function: lmiopar.preprocessor.Syslog_RFC5424
tenant
specifies the tenant attribute to be read and passed to context['tenant']
for further distribution of parsed and unparsed events to tenant specific
indices/storages in LogMan.io Dispatcher
count
specifies the count attribute
with count of events to be read and passed to context['count']
Built-in preprocessors¶
lmiopar.preprocessor
module contains following commonly used preprocessors.
There preprocessors are optimized for high performace deployments.
Syslog RFC5425 built-in preprocessor¶
function: lmiopar.preprocessor.Syslog_RFC5424
This is a preprocessor for the Syslog protocol (new) according to RFC5425.
The input for this preprocessor is a valid Syslog entry, e.g.:
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog 10 ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry.
The output is, a message part of the log in the event and parsed elements in the context.syslog_rfc5424
.
event: An application event log entry.
context:
Syslog_RFC5424:
PRI: 165
FACILITY: 20
PRIORITY: 5
VERSION: 1
TIMESTAMP: 2003-10-11T22:14:15.003Z
HOSTNAME: mymachine.example.com
APP_NAME: evntslog
PROCID: 10
MSGID: ID47
STRUCTURED_DATA:
exampleSDID@32473:
iut: 3
eventSource: Application
eventID: 1011
...
Syslog RFC3164 built-in preprocessor¶
function: lmiopar.preprocessor.Syslog_RFC3164
This is a preprocessor for the BSD syslog Protocol (old) according to RFC3164.
The Syslog RFC3164 preprocessor can be configured in the define
section:
define:
type: parser/preprocessor
year: 1999
timezone: Europe/Prague
year
specifies the numeric representation of the year that will be applied to the timestamp of the logs.
Also, you may specify smart
(default) for the advanced selection of the year based on the month.
timezone
specifies the timezone of the logs, the default is UTC
.
The input for this preprocessor is a valid Syslog entry, e.g.:
<34>Oct 11 22:14:15 mymachine su[10]: 'su root' failed for lonvick on /dev/pts/8
The output is, a message part of the log in the event and parsed elements in the context.syslog_rfc3164
.
event: "'su root' failed for lonvick on /dev/pts/8"
context:
Syslog_RFC3164:
PRI: 34
PRIORITY: 2
FACILITY: 4
TIMESTAMP: '2003-10-11T22:14:15.003Z'
HOSTNAME: mymachine
TAG: su
PID: 10
TAG
and PID
are optional parameters.
CEF built-in preprocessor¶
function: lmiopar.preprocessor.CEF
This is a preprocessor for the CEF or Common Event Format.
define:
type: parser/preprocessor
year: 1999
timezone: Europe/Prague
year
specifies the numeric representation of the year that will be applied to the timestamp of the logs.
Also, you may specify smart
(default) for the advanced selection of the year based on the month.
timezone
specifies the timezone of the logs, the default is UTC
.
The input for this preprocessor is a valid CEF entry, e.g.:
CEF:0|Vendor|Product|Version|foobar:1:2|Failed password|Medium| eventId=1234 app=ssh categorySignificance=/Informational/Warning categoryBehavior=/Authentication/Verify
The output is, a message part of the log in the event and parsed elements in the context.CEF
:
context:
CEF:
Version: 0
DeviceVendor: Vendor
DeviceProduct: Product
DeviceVersion: Version
DeviceEventClassID: 'foobar:1:2'
Name: Failed password
Severity: Medium
eventId: '1234'
app: ssh
categorySignificance: /Informational/Warning
categoryBehavior: /Authentication/Verify
CEF can contain also a Syslog header. This is supported by chaining relevant Syslog preprocessor with a CEF preprocessor. Please refer to a preprocessor chaining chapter for details.
Apache HTTP Server log formats built-in preprocessor¶
There are high performance preprocessors for common Apache HTTP server access logs.
function: lmiopar.preprocessor.Apache_Common_Log_Format
This is a preprocessor for the Apache Common Log Format.
function: lmiopar.preprocessor.Apache_Combined_Log_Format
This is a preprocessor for the Apache Combined Log Format.
Apache Common Log example¶
Input:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Output:
context:
Apache_Access_Log:
HOST: '127.0.0.1'
IDENT: '-'
USERID: 'frank'
TIMESTAMP: '2000-10-10T20:55:36.000Z'
METHOD: 'GET'
RESOURCE: '/apache_pb.gif'
PROTOCOL: 'HTTP/1.0'
STATUS_CODE: 200
DOWNLOAD_SIZE: 2326
Apache Combined Log example¶
Input:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Output:
context:
Apache_Access_Log:
HOST: '127.0.0.1'
IDENT: '-'
USERID: 'frank'
TIMESTAMP: '2000-10-10T20:55:36.000Z'
METHOD: 'GET'
RESOURCE: '/apache_pb.gif'
PROTOCOL: 'HTTP/1.0'
STATUS_CODE: 200
DOWNLOAD_SIZE: 2326
REFERE': http://www.example.com/start.html
USER_AGENT: Mozilla/4.08 [en] (Win98; I ;Nav)
Microsoft ULS built-in preprocessor¶
function: lmiopar.preprocessor.Microsoft_ULS
This is a preprocessor for the Microsoft_ULS according to Microsoft Docs.
For Microsoft SharePoint ULS logs, that do not contain server name nor correlation fields, a dedicated preprocessor is provided:
function: lmiopar.preprocessor.Microsoft_ULS_Sharepoint
The Microsoft SharePoint ULS preprocessor can be configured in the define section:
define:
type: parser/preprocessor
year: 1999
timezone: Europe/Prague
year
specifies the numeric representation of the year that will be applied to the timestamp of the logs.
Also, you may specify smart
(default) for the advanced selection of the year based on the month.
timezone
specifies the timezone of the logs, the default is UTC
.
The input for this preprocessor is a valid Microsoft ULS Sharepoint entry, e.g.:
04/28/2021 12:31:57.69 mssdmn.exe (0x38E0) 0x4D10 SharePoint Server Search Connectors:SharePoint dvt6 High SetSTSErrorInfo ErrorMessage = Error from SharePoint site: WebExceptionStatus: SendFailure The underlying connection was closed: An unexpected error occurred on a send. hr = 90141214 [sts3util.cxx:6994] search\native\gather\protocols\sts3\sts3util.cxx 3aeca97a-a9db-4010-970e-fe01483bfd4f
The output is, a message part of the log in the event and parsed elements in the context.Microsoft_ULS
.
event: Message included in the log.
context:
Microsoft_ULS:
TIMESTAMP 1619613117.69
PROCESS: mssdmn.exe (0x38E0)
THREAD: 0x4D10
PRODUCT: SharePoint Server Search
CATEGORY: Connectors:SharePoint
EVENTID: dvt6
LEVEL: High
Query String preprocessor¶
function: lmiopar.preprocessor.Query_String
This is a preprocessor for the query string (key=value&key=value...) such as meta information from LogMan.io Collector
Example of input:
file_name=log.log&search=true
The output is, a message part of the log in the event and parsed elements in the context.QUERY_STRING
.
event: Message included in the log.
context:
QUERY_STRING:
file_name: log.log
search: true
JSON built-in preprocessor¶
function: lmiopar.preprocessor.JSON
This is a preprocessor for the JSON format. It expects the input in a binary or textual format, the output dictionary is placed in the event.
Hence, the input for this preprocessor is a valid JSON entry.
XML built-in preprocessor¶
function: lmiopar.preprocessor.XML
This is a preprocessor for the XML format. It expects the input in a binary or textual format, the output dictionary is placed in the event.
Hence, the input for this preprocessor is a valid XML entry, e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Schannel" Guid="{1f678132-5938-4686-9fdc-c8ff68f15c85}" />
<EventID>36884</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2020-06-26T07:12:01.331577900Z" />
<EventRecordID>30286</EventRecordID>
<Correlation ActivityID="{8e20742a-4b06-0002-c274-208e064bd601}" />
<Execution ProcessID="788" ThreadID="948" />
<Channel>System</Channel>
<Computer>XX</Computer>
<Security UserID="S-1-5-21-1627182167-2524376360-74743131-1001" />
</System>
<UserData>
<EventXML xmlns="LSA_NS">
<Name>localhost</Name>
</EventXML>
</UserData>
<RenderingInfo Culture="en-US">
<Message>The certificate received from the remote server does not contain the expected name. It is therefore not possible to determine whether we are connecting to the correct server. The server name we were expecting is localhost. The TLS connection request has failed. The attached data contains the server certificate.</Message>
<Level>Error</Level>
<Task />
<Opcode>Info</Opcode>
<Channel>System</Channel>
<Provider />
<Keywords />
</RenderingInfo>
</Event>
The output of the preprocessor in the event
:
{
"System.EventID": "36884",
"System.Version": "0",
"System.Level": "2",
"System.Task": "0",
"System.Opcode": "0",
"System.Keywords": "0x8000000000000000",
"System.EventRecordID": "30286",
"System.Channel": "System",
"System.Computer": "XX",
"UserData.EventXML.Name": "localhost",
"RenderingInfo.Message": "The certificate received from the remote server does not contain the expected name. It is therefore not possible to determine whether we are connecting to the correct server. The server name we were expecting is localhost. The TLS connection request has failed. The attached data contains the server certificate.",
"RenderingInfo.Level": "Error",
"RenderingInfo.Opcode": "Info",
"RenderingInfo.Channel": "System"
}
CSV built-in preprocessor¶
function: lmiopar.preprocessor.CSV
This is a preprocessor for the CSV format. It expects the input in a binary or textual format, the output dictionary is placed in the event.
Hence, the input for this preprocessor is a valid CSV entry, e.g.:
user,last_name\njack,black\njohn,doe
The output of the preprocessor in the context["CSV"]
:
{
"lines": [
{"user": "jack", "last_name": "black"},
{"user": "john", "last_name": "doe"}
]
}
Parameters¶
In define
section of the CSV preprocessor,
the following parameters may be set for CSV reading:
delimiter: (default: ",")
escapechar: escape character
doublequote: allow doublequote (default: true)
lineterminator: line terminator character, either \n or \r (default is the operation system line separator)
quotechar: default quote character (default: "\"")
quoting: type of quoting
skipinitialspace: skip initial space (default: false)
strict: strict mode (default: false)
Custom preprocessors¶
A custom preprocessors can be called from the parser, the respective code has to be accessible by a parser microservice thru a common Python import way.
---
define:
name: Demo of the custom Python preprocessor
type: parser/preprocessor
function: mypreprocessors.preprocessor
mypreprocessors
is a module respective a folder with __init__.py
that contains a function preprocessor()
.
The parser specifies a function
to call.
It uses Python notation and it will automatically import the module.
The signature of the function:
def preprocessor(context, event):
...
return event
Preprocessor may (1) modify the event (!EVENT
) and/or (2) modify the context (!CONTEXT
).
The output of the preprocessor
function will be passed to a subsequent parsers.
Preprocessor parser doesn't produce parsed events directly.
If the function returns None, the parsing of the eveny is silently terminated.
If the funtion raises the exception, the exception will be logged and the event will be forwarded into unparsed
output.
Chaining of preprocessors¶
Preprocessors can be chained in order to parse more complex input formats. The output (aka event) of the first preprocessor is fed as an input of the second preprocessor (and so on).
For example, the input is a CEF format with Syslog RFC3164 header:
<14>Jan 28 05:51:33 connector-test CEF_PARSED_LOG: CEF:0|Vendor|Product|Version|foobar:1:2|Failed password|Medium| eventId=1234 app=ssh categorySignificance=/Informational/Warning categoryBehavior=/Authentication/Verify
The pipeline contains two preprocessors:
p01_parser.yaml
:
---
define:
name: Preprocessor for Syslog RFC5424 part of the message
type: parser/preprocessor
tenant: Syslog_RFC5424.STRUCTURED_DATA.soc@0.tenant
function: lmiopar.preprocessor.Syslog_RFC5424
p02_parser.yaml
:
---
define:
name: Preprocessor for CEF part of the message
type: parser/preprocessor
function: lmiopar.preprocessor.CEF
and final parser p03_parser.yaml
:
---
define:
name: Finalize by parsing the event into a dictionary
type: parser/cascade
parse:
!DICT
set:
Syslog_RFC5424: !ITEM CONTEXT Syslog_RFC5424
CEF: !ITEM CONTEXT CEF
Message: !EVENT
Output example:
context:
CEF:
Version: 0
DeviceVendor: Vendor
DeviceProduct: Product
DeviceVersion: Version
DeviceEventClassID: 'foobar:1:2'
Name: Failed password
Severity: Medium
eventId: '1234'
app: ssh
categorySignificance: /Informational/Warning
categoryBehavior: /Authentication/Verify
Syslog_RFC3164:
PRI: 14
FACILITY: 1
PRIORITY: 6
HOSTNAME: connector-test'
TAG: CEF_PARSED_LOG
TIMESTAMP': '2020-01-28T05:51:33.000Z'
Message: ''
Cisco ASA built-in preprocessor¶
function: lmiopar.preprocessor.CiscoASA
Warning
This preprocessor will be replaced by SP-Lang based parser.