Skip to content

LogMan.io Collector Inputs

Note

This chapter concerns setup for log sources collected over network, syslog, files, databases, etc. For the setup of event collection from various log sources, see the Log sources subtopic.

Network

Sections: input:TCP, input:Stream, input:UDP, input:Datagram

These inputs listen on a given address using TCP, UDP or Unix Socket.

Tip

Logs should be collected through TCP protocol. Only if it is not possible, use UDP protocol.

The configuration options for listening:

address:  # Specify IPv4, IPv6 or UNIX file path to listen from
output:  # Which output to send the incoming events to

Here are possible form of address:

  • 8080 or *:8080: Listen on a port 8080 all available network interfaces on IPv4 and IPv6
  • 0.0.0.0:8080: Listen on a port 8080 all available network interfaces on IPv4
  • :::8080: Listen on a port 8080 all available network interfaces on IPv6
  • 1.2.3.4:8080: Listen on a port 8080 and specific network interface (1.2.3.4) on IPv4
  • ::1:8080: Listen on a port 8080 and specific network interface (::1) on IPv6
  • /tmp/unix.sock: Listen on a UNIX socket /tmp/unix.sock

The following configuration options are available only for input:Datagram:

max_packet_size:  # (optional) Specify the maximum size of packets in bytes (default: 65536)
receiver_buffer_size:  # (optional) Limit the receiver size of the buffer in bytes (default: 0)

Warning

LogMan.io Collector runs inside Docker container. Propagation of network ports must be enabled like this:

docker-compose.yml
services:
  lmio-collector-tenant:
    network_mode: host

Note

TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are both protocols used for sending data over the network.

TCP is a Stream, as it provides a reliable, ordered, and error-checked delivery of a stream of data.

In contrast, UDP is a datagram that sends packets independently, allowing faster transmission but with less reliability and no guarantee of order, much like individual, unrelated messages.

Tip

For troubleshooting, use tcpdump to capture raw network traffic and then use Wireshark for deeper analysis.

The example of capturing the traffic at TCP/10008 port:

$ sudo tcpdump -i any tcp port 10008 -s 0 -w /tmp/capute.pcap -v

When enough traffic is captured, press Ctrl-C and collect the file /tmp/capture.pcap that contains the traffic capture. This file can be opened in Wireshark.

Syslog

Sections: input:TCPBSDSyslogRFC6587, input:TCPBSDSyslogNoFraming

Special cases of TCP input for parsing SysLog via TCP. For more information, see RFC 6587 and RFC 3164, section 4.1.1

The configuration options for listening on a given path:

address:  # Specify IPv4, IPv6 or UNIX file path to listen from (f. e. 127.0.0.1:8888 or /data/mysocket)
output:  # Which output to send the incoming events to

The following configuration options are available only for input:TCPBSDSyslogRFC6587:

max_sane_msg_len:  # (optional) Maximum size in bytes of SysLog message to be received (default: 10000)

The following configuration options are available only for input:TCPBSDSyslogNoFraming:

buffer_size:  # (optional) Maximum size in bytes of SysLog message to be received (default: 64 * 1024)
variant:  # (optional) The variant of SysLog format of the incoming message, can be `auto`, `nopri` with no PRI number in the beginning and `standard` with PRI (default: auto)

Subprocess

Section: input:SubProcess

The SubProcess input runs a command as a subprocess of the LogMan.io collector, while periodically checking for its output at stdout (lines) and stderr.

The configuration options include:

command:  # Specify the command to be run as subprocess (f. e. tail -f /data/tail.log)
output:  # Which output to send the incoming events to
line_len_limit:  # (optional) The length limit of one read line (default: 1048576)
ok_return_codes:  # (optional) Which return codes signify the running status of the command (default: 0)

File tailing

Section: input:SmartFile

Smart File Input is used for collecting events from multiple files whose content may be dynamically modified, or the files may be deleted altogether by another process, similarly to the tail -f shell command.

Smart File Input creates a monitored file object for every file path, that is specified in the configuration in the path options.

The monitored file periodically checks for new lines in the file, and if one occurs, the line is read in bytes and passed further to the pipeline, including meta information such as file name and extracted parts of file path, see extract parameters section.

Various protocols are used for reading from different log file formats:

Required configuration options:

input:SmartFile:MyFile:
    path: |  # File paths separated by newlines
        /first/path/to/log/files/*.log
        /second/path/to/log/files/*.log
        /another/path/*s
    protocol: # Protocol to be used for reading

Optional configuration options:

recursive:  # Recursive scanning of specified paths (default: True)
scan_period:  # File scan period in seconds (default: 3 seconds)
preserve_newline:  # Preserve new line character in the output (default: False)
last_position_storage:  # Persistent storage for the current positions in read files (default: ./var/last_position_storage)

Tip

In more complex setup, such as an extraction of logs from the Windows shared folder, you can utilize rsync to synchronize logs from the shared folder to a local folder at the collector machine. Then Smart File Input reads logs from the local folder.

Warning

Internally, the current position in the file is stored in last position storage in position variable. If the last position storage file is deleted or not specified, all files are read all over again after the LogMan.io Collector restarts, i.e. no persistence means reset of the reading when restarting.

You can configure path for last position storage:

last_position_storage: "./var/last_position_storage"

Warning

If the file size is lower than the previous remembered file size, the file is read as a whole over again and sent to the pipeline split to lines.

File paths

File path globs are separated by newlines. They can contain wildcards (such as *, **, etc.).

path: |
    /first/path/*.log
    /second/path/*.log
    /another/path/*

By default, files are read recursively. You can disable recursive reading with:

recursive: False

Line Protocol

protocol: line
line/C_separator:  # (optional) Character used for line separator. Default: '\n'.

Line Protocol is used for reading messages from line-oriented log files.

XML Protocol

protocol: xml
tag_separator: '</msg>'  # (required) Tag for separator.

XML Protocol is used for reading messages from XML-oriented log files. Parameter tag_separator must be included in configuration.

Example

Example of XML log file:

/xml-logs/log.xml
...
<msg time='2024-04-16T05:47:39.814+02:00' org_id='orgid'>
    <txt>Log message 1</txt>
</msg>
<msg time='2024-04-16T05:47:42.814+02:00' org_id='orgid'>
    <txt>Log message 2</txt>
</msg>
<msg time='2024-04-16T05:47:43.018+02:00' org_id='orgid'>
    <txt>Log message 3</txt>
</msg>
...

Example configuration:

input:SmartFile:Alert:
    path: /xml-logs/*.xml
    protocol: xml
    tag_separator: "</msg>"

W3C Extended Log File Protocol

protocol: w3c_extended

W3C Extended Log File Protocol is used for collecting events from files in W3C Extended Log File Format and serializing them into JSON format.

Example of event collection from Microsoft Exchange Server

LogMan.io Collector Configuration example:

input:SmartFile:MSExchange:
    path: /MicrosoftExchangeServer/*.log
    protocol: w3c_extended
    extract_source: file_path
    extract_regex: ^(?P<file_path>.*)$

Example of log file content:

/MicrosoftExchangeServer/DNSLOG.log
#Software: Microsoft Exchange Server
#Version: 15.02.1544.004
#Log-type: DNS log
#Date: 2024-04-14T00:02:48.540Z
#Fields: Timestamp,EventId,RequestId,Data
2024-04-14T00:02:38.254Z,,9666704,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:46955)"
2024-04-14T00:02:38.254Z,,7204389,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:11737)"
2024-04-14T00:02:38.254Z,,43150675,"Send completed. Error=Success; Details=id=46955; query=AAAA exchange.bradavice.cz; retryCount=0"
...

W3C DHCP Server Format

protocol: w3c_dhcp

W3C DHCP Protocol is used for collecting events from DHCP Server log files. It is very similar to W3C Extended Log File Format with the difference in log file header.

Table of W3C DHCP events identification
Event ID Meaning
00 The log was started.
01 The log was stopped.
02 The log was temporarily paused due to low disk space.
10 A new IP address was leased to a client.
11 A lease was renewed by a client.
12 A lease was released by a client.
13 An IP address was found to be in use on the network.
14 A lease request could not be satisfied because the scope's address pool was exhausted.
15 A lease was denied.
16 A lease was deleted.
17 A lease was expired and DNS records for an expired leases have not been deleted.
18 A lease was expired and DNS records were deleted.
20 A BOOTP address was leased to a client.
21 A dynamic BOOTP address was leased to a client.
22 A BOOTP request could not be satisfied because the scope's address pool for BOOTP was exhausted.
23 A BOOTP IP address was deleted after checking to see it was not in use.
24 IP address cleanup operation has began.
25 IP address cleanup statistics.
30 DNS update request to the named DNS server.
31 DNS update failed.
32 DNS update successful.
33 Packet dropped due to NAP policy.
34 DNS update request failed as the DNS update request queue limit exceeded.
35 DNS update request failed.
36 Packet dropped because the server is in failover standby role or the hash of the client ID does not match.
50+ Codes above 50 are used for Rogue Server Detection information.

Example of event collection from DHCP Server

LogMan.io Collector Configuration example:

input:SmartFile:DHCP-Server-Input:
    path: /DHCPServer/*.log
    protocol: w3c_dhcp
    extract_source: file_path
    extract_regex: ^(?P<file_path>.*)$

Example of DHCP Server log file content:

/DHCPServer/log1.log
                DHCP Service Activity Log
Event ID  Meaning
00      The log was started.
01      The log was stopped.
...
50+     Codes above 50 are used for Rogue Server Detection information.
ID,Date,Time,Description,IP Address,Host Name,MAC Address,User Name, TransactionID, ...
24,04/16/24,00:00:21,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
24,04/16/24,00:00:22,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
...

For instance, ignore_older_than limit for files being red can be set to ignore_older_than: 20d or ignore_older_than: 100s.

Extract parameters

There are also options for the extraction of information from the file name or file path using a regular expression. The extracted parts are then stored as meta data (which implicitly include unique meta ID and file name). The configuration options start with extract_ prefix and include the following:

extract_source:  # (optional) file_name or file_path (default: file_path)
extract_regex:  # (optional) regex to extract field names from the extract source (disabled by default)

The extract_regex must contain named groups. The group names are used as field keys for the extracted information. Unnamed groups produce no data.

Example of extracting metadata from regex

Collecting from a file /data/myserver.xyz/tenant-1.log The following configuration:

extract_regex: ^/data/(?P<dvchost>\w+)/(?P<tenant>\w+)\.log$

will produce metadata:

{
"meta": {
    "dvchost": "myserver.xyz",
    "tenant": "tenant-1"
    }
}

The following in a working example of configuration of SmartFile input with extraction of attributes from file name using regex, and associated File output:

input:SmartFile:SmartFileInput:
  path: ./etc/tail.log
  extract_source: file_name
  extract_regex: ^(?P<dvchost>\w+).log$
  output: FileOutput

output:File:FileOutput:
  path: /data/my_path.txt
  prepend_meta: true
  debug: true

Prepending information

prepend_meta: true

Prepends the meta information such as the extracted field names to the log line/event as key-values pairs separated by spaces.

Ignore old changes

The following configuration options enable to check that modification time of files being read is not older than the specified limit.

ignore_older_than:  # (optional) Limit in days, hours, minutes or seconds to read only files modified after the limit (default: "", f. e. "1d", "1h", "1m", "1s")

File

Section: input:File, input:FileBlock, input:XML

These inputs read specified files by lines (input:File) or as a whole block (input:FileBlock, input:XML) and pass their content further to the pipeline.

Depending on the mode, the files may be then renamed to <FILE_NAME>-processed and if more of them are specified using a wildcard, another file will be open, read and processed in the same way.

The available configuration options for opening, reading and processing the files include:

path:  # Specify the file path(s), wildcards can be used as well (f. e. /data/lines/*)
chilldown_period:  # If more files or wildcard is used in the path, specify how often in seconds to check for new files (default: 5)
output:  # Which output to send the incoming events to
mode:  # (optional) The mode by which the file is going to be read (default: 'rb')
newline:  # (optional) File line separator (default is value of os.linesep)
post:  # (optional) Specifies what should happen with the file after reading - delete (delete the file), noop (no renaming), move (rename to `<FILE_NAME>-processed`, default)
exclude:  # (optional) Path of filenames that should be excluded (has precedence over 'include')
include:  # (optional) Path of filenames that should be included
encoding:  # (optional) Charset encoding of the file's content
move_destination:  # (optional) Destination folder for post 'move', make sure it is outside of the path specified above
lines_per_event:  # (optional) The number of lines after which the read method enters the idle state to allow other operations to perform their tasks (default: 10000)
event_idle_time:  # (optional) The time in seconds for which the read method enters the idle state, see above (default: 0.01)

ODBC

Section: input:ODBC

Provides input via ODBC driver connection to collect logs form various databases.

Configuration options related to the connection establishment:

host:  # Hostname of the database server
port:  # Port where the database server is running
user:  # Username to loging to the databse server (usually a technical/access account)
password:  # Password for the user specified above
driver:  # Pre-installed ODBC driver (see list below)
db:  # Name of the databse to access
connect_timeout:  # (optional) Connection timeout in seconds for the ODBC pool (default: 1)
reconnect_delay:  # (optional) Reconnection delay in seconds after timeout for the ODBC pool (default: 5.0)
output_queue_max_size:  # (optional) Maximum size of the output queue, i. e. in-memory storage (default: 10)
max_bulk_size:  # (optional) Maximum size of one bulk composed of the incoming records (default 2)
output:  # Which output to send the incoming events to

Configuration options related to querying the database:

query:  # Query to periodically call the database
chilldown_period:  # Specify in seconds how often the query above will be called (default: 5)
last_value_enabled:  # Enable last value duplicity check (true/false)
last_value_table:  # Specify table for SELECT max({}) from {};
last_value_column:  # The column in query to be used for obtainment of last value
last_value_storage:  # Persistent storage for the current last value (default: ./var/last_value_storage)
last_value_query:  # (optional) To specify the last value query entirely (in case this option is set, last_value_table will not be considered)
last_value_start:  # (optional) The first value to start from (default: 0)

Apache Kafka

Section: input:Kafka

This option is available from version v22.32

Creates a Kafka consumer for the specific .topic(s).

Configuration options related to the connection establishment:

bootstrap_servers: # Kafka nodes to read messages from (such as `kafka1:9092,kafka2:9092,kafka3:9092`)

Configuration options related to the Kafka Consumer setting:

topic:  # Name of the topics to read messages from (such as `lmio-events` or `^lmio.*`)
group_id:  # Name of the consumer group (such as: `collector_kafka_consumer`)
refresh_topics:  # (optional) If more topics matching the topic name are expected to be created during consumption, this options specifies in seconds how often to refresh the topics' subscriptions (such as: `300`)

bootstrap_servers, topic and group_id options are always required

topic can be a name, a list of names separated by spaces or a simple regex (to match all available topics, use ^.*)

For more configuration options, please refer to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md