Skip to content

LogMan.io Collector Inputs

Note

This chapter concerns setup for log sources collected over network, syslog, files, etc. For the setup of event collection from various log sources, see the Log sources subtopic.

Subprocess

Section: input:SubProcess

The SubProcess input runs a command as a subprocess of the LogMan.io collector, while periodically checking for its output at stdout (lines) and stderr.

The configuration options include:

command:  # Specify the command to be run as subprocess (f. e. tail -f /data/tail.log)
output:  # Which output to send the incoming events to
line_len_limit:  # (optional) The length limit of one read line (default: 1048576)
ok_return_codes:  # (optional) Which return codes signify the running status of the command (default: 0)

File tailing

Section: input:SmartFile

Smart File Input is used for collecting events from multiple files whose content may be dynamically modified, or the files may be deleted altogether by another process, similarly to the tail -f shell command.

Smart File Input creates a monitored file object for every file path, that is specified in the configuration in the path options.

The monitored file periodically checks for new lines in the file, and if one occurs, the line is read in bytes and passed further to the pipeline, including meta information such as file name and extracted parts of file path, see extract parameters section.

Various protocols are used for reading from different log file formats:

Required configuration options:

input:SmartFile:MyFile:
    path: |  # File paths separated by newlines
        /first/path/to/log/files/*.log
        /second/path/to/log/files/*.log
        /another/path/*s
    protocol: # Protocol to be used for reading

Optional configuration options:

recursive:  # Recursive scanning of specified paths (default: True)
scan_period:  # File scan period in seconds (default: 3 seconds)
preserve_newline:  # Preserve new line character in the output (default: False)
last_position_storage:  # Persistent storage for the current positions in read files (default: ./var/last_position_storage)

Tip

In more complex setup, such as an extraction of logs from the Windows shared folder, you can utilize rsync to synchronize logs from the shared folder to a local folder at the collector machine. Then Smart File Input reads logs from the local folder.

Warning

Internally, the current position in the file is stored in last position storage in position variable. If the last position storage file is deleted or not specified, all files are read all over again after the LogMan.io Collector restarts, i.e. no persistence means reset of the reading when restarting.

You can configure path for last position storage:

last_position_storage: "./var/last_position_storage"

Warning

If the file size is lower than the previous remembered file size, the file is read as a whole over again and sent to the pipeline split to lines.

File paths

File path globs are separated by newlines. They can contain wildcards (such as *, **, etc.).

path: |
    /first/path/*.log
    /second/path/*.log
    /another/path/*

By default, files are read recursively. You can disable recursive reading with:

recursive: False

Line Protocol

protocol: line
line/C_separator:  # (optional) Character used for line separator. Default: '\n'.

Line Protocol is used for reading messages from line-oriented log files.

XML Protocol

protocol: xml
tag_separator: '</msg>'  # (required) Tag for separator.

XML Protocol is used for reading messages from XML-oriented log files. Parameter tag_separator must be included in configuration.

Example

Example of XML log file:

/xml-logs/log.xml
...
<msg time='2024-04-16T05:47:39.814+02:00' org_id='orgid'>
    <txt>Log message 1</txt>
</msg>
<msg time='2024-04-16T05:47:42.814+02:00' org_id='orgid'>
    <txt>Log message 2</txt>
</msg>
<msg time='2024-04-16T05:47:43.018+02:00' org_id='orgid'>
    <txt>Log message 3</txt>
</msg>
...

Example configuration:

input:SmartFile:Alert:
    path: /xml-logs/*.xml
    protocol: xml
    tag_separator: "</msg>"

W3C Extended Log File Protocol

protocol: w3c_extended

W3C Extended Log File Protocol is used for collecting events from files in W3C Extended Log File Format and serializing them into JSON format.

Example of event collection from Microsoft Exchange Server

LogMan.io Collector Configuration example:

input:SmartFile:MSExchange:
    path: /MicrosoftExchangeServer/*.log
    protocol: w3c_extended
    extract_source: file_path
    extract_regex: ^(?P<file_path>.*)$

Example of log file content:

/MicrosoftExchangeServer/DNSLOG.log
#Software: Microsoft Exchange Server
#Version: 15.02.1544.004
#Log-type: DNS log
#Date: 2024-04-14T00:02:48.540Z
#Fields: Timestamp,EventId,RequestId,Data
2024-04-14T00:02:38.254Z,,9666704,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:46955)"
2024-04-14T00:02:38.254Z,,7204389,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:11737)"
2024-04-14T00:02:38.254Z,,43150675,"Send completed. Error=Success; Details=id=46955; query=AAAA exchange.bradavice.cz; retryCount=0"
...

W3C DHCP Server Format

protocol: w3c_dhcp

W3C DHCP Protocol is used for collecting events from DHCP Server log files. It is very similar to W3C Extended Log File Format with the difference in log file header.

Table of W3C DHCP events identification
Event ID Meaning
00 The log was started.
01 The log was stopped.
02 The log was temporarily paused due to low disk space.
10 A new IP address was leased to a client.
11 A lease was renewed by a client.
12 A lease was released by a client.
13 An IP address was found to be in use on the network.
14 A lease request could not be satisfied because the scope's address pool was exhausted.
15 A lease was denied.
16 A lease was deleted.
17 A lease was expired and DNS records for an expired leases have not been deleted.
18 A lease was expired and DNS records were deleted.
20 A BOOTP address was leased to a client.
21 A dynamic BOOTP address was leased to a client.
22 A BOOTP request could not be satisfied because the scope's address pool for BOOTP was exhausted.
23 A BOOTP IP address was deleted after checking to see it was not in use.
24 IP address cleanup operation has began.
25 IP address cleanup statistics.
30 DNS update request to the named DNS server.
31 DNS update failed.
32 DNS update successful.
33 Packet dropped due to NAP policy.
34 DNS update request failed as the DNS update request queue limit exceeded.
35 DNS update request failed.
36 Packet dropped because the server is in failover standby role or the hash of the client ID does not match.
50+ Codes above 50 are used for Rogue Server Detection information.

Example of event collection from DHCP Server

LogMan.io Collector Configuration example:

input:SmartFile:DHCP-Server-Input:
    path: /DHCPServer/*.log
    protocol: w3c_dhcp
    extract_source: file_path
    extract_regex: ^(?P<file_path>.*)$

Example of DHCP Server log file content:

/DHCPServer/log1.log
                DHCP Service Activity Log
Event ID  Meaning
00      The log was started.
01      The log was stopped.
...
50+     Codes above 50 are used for Rogue Server Detection information.
ID,Date,Time,Description,IP Address,Host Name,MAC Address,User Name, TransactionID, ...
24,04/16/24,00:00:21,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
24,04/16/24,00:00:22,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
...

For instance, ignore_older_than limit for files being red can be set to ignore_older_than: 20d or ignore_older_than: 100s.

Extract parameters

There are also options for the extraction of information from the file name or file path using a regular expression. The extracted parts are then stored as meta data (which implicitly include unique meta ID and file name). The configuration options start with extract_ prefix and include the following:

extract_source:  # (optional) file_name or file_path (default: file_path)
extract_regex:  # (optional) regex to extract field names from the extract source (disabled by default)

The extract_regex must contain named groups. The group names are used as field keys for the extracted information. Unnamed groups produce no data.

Example of extracting metadata from regex

Collecting from a file /data/myserver.xyz/tenant-1.log The following configuration:

extract_regex: ^/data/(?P<dvchost>\w+)/(?P<tenant>\w+)\.log$

will produce metadata:

{
"meta": {
    "dvchost": "myserver.xyz",
    "tenant": "tenant-1"
    }
}

The following in a working example of configuration of SmartFile input with extraction of attributes from file name using regex, and associated File output:

input:SmartFile:SmartFileInput:
  path: ./etc/tail.log
  extract_source: file_name
  extract_regex: ^(?P<dvchost>\w+).log$
  output: FileOutput

output:File:FileOutput:
  path: /data/my_path.txt
  prepend_meta: true
  debug: true

Prepending information

prepend_meta: true

Prepends the meta information such as the extracted field names to the log line/event as key-values pairs separated by spaces.

Ignore old changes

The following configuration options enable to check that modification time of files being read is not older than the specified limit.

ignore_older_than:  # (optional) Limit in days, hours, minutes or seconds to read only files modified after the limit (default: "", f. e. "1d", "1h", "1m", "1s")

File

Section: input:File, input:FileBlock, input:XML

These inputs read specified files by lines (input:File) or as a whole block (input:FileBlock, input:XML) and pass their content further to the pipeline.

Depending on the mode, the files may be then renamed to <FILE_NAME>-processed and if more of them are specified using a wildcard, another file will be open, read and processed in the same way.

The available configuration options for opening, reading and processing the files include:

path:  # Specify the file path(s), wildcards can be used as well (f. e. /data/lines/*)
chilldown_period:  # If more files or wildcard is used in the path, specify how often in seconds to check for new files (default: 5)
output:  # Which output to send the incoming events to
mode:  # (optional) The mode by which the file is going to be read (default: 'rb')
newline:  # (optional) File line separator (default is value of os.linesep)
post:  # (optional) Specifies what should happen with the file after reading - delete (delete the file), noop (no renaming), move (rename to `<FILE_NAME>-processed`, default)
exclude:  # (optional) Path of filenames that should be excluded (has precedence over 'include')
include:  # (optional) Path of filenames that should be included
encoding:  # (optional) Charset encoding of the file's content
move_destination:  # (optional) Destination folder for post 'move', make sure it is outside of the path specified above
lines_per_event:  # (optional) The number of lines after which the read method enters the idle state to allow other operations to perform their tasks (default: 10000)
event_idle_time:  # (optional) The time in seconds for which the read method enters the idle state, see above (default: 0.01)

HTTP / HTTPS Server

Sections: input:WebServer

Enables input of logs or events via HTTP/HTTPS server. This is a webhook implementation, the event is taken from the body of the incomming HTTP / HTTPS request. The webhook accepts any HTTP method (POST, PUT, GET and so on).

The URL can optionally carry a name of the stream as a last part of the path in URL (appended to a base). If the stream is not provided, the event will appear in the "generic" stream.

Example of the event delivery thru a webhook into my-stream

$ curl -k -X PUT https://my-collector:8080/my-stream -d @event-file.json

Example of the configuration:

input:WebServer:https:
  output: ...
  listen: 8443 ssl
  base: /subpath
  cert: /conf/mycert.pem
  key: /conf/mykey.pem

listen

  • 8080 or *:8080: Listen on a port 8080 all available network interfaces on IPv4 and IPv6
  • 0.0.0.0:8080: Listen on a port 8080 all available network interfaces on IPv4
  • :::8080: Listen on a port 8080 all available network interfaces on IPv6
  • 1.2.3.4:8080: Listen on a port 8080 and specific network interface (1.2.3.4) on IPv4
  • ::1:8080: Listen on a port 8080 and specific network interface (::1) on IPv6

If you append ssl to the listen

Tip

Multiple lines are supported for listen that allows to specify multiple network interfaces to listen.

base (optional)

Specified a base path for URL request that will collect events. The default is /.

cert (optional)

If TLS/SSL is enabled, specifies the server SSL certificate (PEM file) to be used. If not provided, internal CA is used to provide a certificate for SSL.

key (optional)

If TLS/SSL is enabled, specifies the server SSL private key that belongs to a SSL server certificate If not provided, internal CA is used to provide a private key for SSL.

Apache Kafka

Section: input:Kafka

This option is available from version v22.32

Creates a Kafka consumer for the specific .topic(s).

Configuration options related to the connection establishment:

bootstrap_servers: # Kafka nodes to read messages from (such as `kafka1:9092,kafka2:9092,kafka3:9092`)

Configuration options related to the Kafka Consumer setting:

topic:  # Name of the topics to read messages from (such as `lmio-events` or `^lmio.*`)
group_id:  # Name of the consumer group (such as: `collector_kafka_consumer`)
refresh_topics:  # (optional) If more topics matching the topic name are expected to be created during consumption, this options specifies in seconds how often to refresh the topics' subscriptions (such as: `300`)

bootstrap_servers, topic and group_id options are always required

topic can be a name, a list of names separated by spaces or a simple regex (to match all available topics, use ^.*)

For more configuration options, please refer to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

Elasticsearch

Section: input:Elasticsearch

The Elasticsearch input queries Elasticsearch for new records. It is a never-ending scrolling mechanism utilizing Elasticsearch SEARCH API with search after option. For more information, see the official documentation https://www.elastic.co/docs/reference/elasticsearch/rest-apis/paginate-search-results

The configuration options include:

url:  # Specify the URL of the Elasticsearch master or data node, i. e. http://lm1:9200
index:  # Specify the index to read the data from, may be a pattern like lmio-standard-events*
username:  # (optional) Specify the username for Elasticsearch
password:  # (optional) Specify the password for Elasticsearch
source:  # (optional) If the source field within a single hit differs from default _source, use this option
sort_by:  # (optional) Fields the events should be sorted by, defaults to _id - this is needed for the proper search after request
timestamp:  # (optional) If the timestamp field within a single hit differs from default @timestamp, use this option
start_from:  # (optional) Use time from which existing logs to be read, default is now-1h
request_body:  # (optional) Use a custom request body - if used, the timestamp and start_from options will not be utilized
output:  # Which output to send the incoming events to