Skip to content

LogMan.io Collector Inputs

Note

For Microsoft Office 365 inputs, see Collecting from Microsoft Office 365 chapter. For Windows inputs, see Collecting from Windows chapter.

Network

Sections: input:TCP, input:Stream, input:UDP, input:Datagram

These inputs listen on a given address using TCP, UDP or Unix Socket.

Tip

Logs should be collect it through TCP, if it is not possible, then use UDP.

The configuration options for listening:

address:  # Specify IPv4, IPv6 or UNIX file path to listen from
output:  # Which output to send the incoming events to

Here are possible form of address:

  • 8080 or *:8080: Listen on a port 8080 all available network interfaces on IPv4 and IPv6
  • 0.0.0.0:8080: Listen on a port 8080 all available network interfaces on IPv4
  • :::8080: Listen on a port 8080 all available network interfaces on IPv6
  • 1.2.3.4:8080: Listen on a port 8080 and specific network interface (1.2.3.4) on IPv4
  • ::1:8080: Listen on a port 8080 and specific network interface (::1) on IPv6
  • /tmp/unix.sock: Listen on a UNIX socket /tmp/unix.sock

The following configuration options are available only for input:Datagram:

max_packet_size:  # (optional) Specify the maximum size of packets in bytes (default: 65536)
receiver_buffer_size:  # (optional) Limit the receiver size of the buffer in bytes (default: 0)

Danger

Make sure, the ports or files are also propagated outside of the Docker container, if using Docker.

Note

TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are both protocols used for sending data over the network. TCP is analogous to a Stream, as it provides a reliable, ordered, and error-checked delivery of a stream of data. In contrast, UDP is likened to a datagram because it sends packets independently, allowing for faster transmission but with less reliability and no guarantee of order, much like individual, unrelated messages.

Tip

For troubleshooting, use tcpdump to capture raw network traffic and then use Wireshark for deeeper analysis.

The example of capturing the traffic at TCP/10008 port:

$ sudo tcpdump -i any tcp port 10008 -s 0 -w /tmp/capute.pcap -v

When enough traffic is captured, press Ctrl-C and collect the file /tmp/capture.pcap that contains the traffic capture. This file can be opened in Wireshark.

Syslog

Sections: input:TCPBSDSyslogRFC6587, input:TCPBSDSyslogNoFraming

Special cases of TCP input for parsing SysLog via TCP. For more information, see https://datatracker.ietf.org/doc/html/rfc6587 and https://datatracker.ietf.org/doc/html/rfc3164#section-4.1.1

The configuration options for listening on a given path:

address:  # Specify IPv4, IPv6 or UNIX file path to listen from (f. e. 127.0.0.1:8888 or /data/mysocket)
output:  # Which output to send the incoming events to

The following configuration options are available only for input:TCPBSDSyslogRFC6587:

max_sane_msg_len:  # (optional) Maximum size in bytes of SysLog message to be received (default: 10000)

The following configuration options are available only for input:TCPBSDSyslogNoFraming:

buffer_size:  # (optional) Maximum size in bytes of SysLog message to be received (default: 64 * 1024)
variant:  # (optional) The variant of SysLog format of the incoming message, can be `auto`, `nopri` with no PRI number in the beginning and `standard` with PRI (default: auto)

Subprocess

Section: input:SubProcess

The SubProcess input runs a command as a subprocess of the LogMan.io collector, while periodically checking for its output at stdout (lines) and stderr.

The configuration options include:

command:  # Specify the command to be run as subprocess (f. e. tail -f /data/tail.log)
output:  # Which output to send the incoming events to
line_len_limit:  # (optional) The length limit of one read line (default: 1048576)
ok_return_codes:  # (optional) Which return codes signify the running status of the command (default: 0)

File tailing

Section: input:SmartFile

Simulates tail -f behavior on multiple files, whose content may be dynamically modified or the files may be deleted altogether by another process.

Smart File input creates a monitored file object for every file path, that is specified in the configuration in the path options.

The monitored file periodically check for new lines in the file, and if one occurs, the line is read in bytes and passed further to the pipeline including meta information such as file name and extracted parts of file path (see extract_ configurations below).

The current position in the file is stored in last position storage in position variable. If the last position storage file is deleted or not specified, all files are read all over again after the LogMan.io Correlator restarts, i.e. no persistence means reset of the reading when restarting.

Beware: If the file size is lower than the previous remembered file size, the file is read as a whole over again and sent to the pipelines split to lines.

The available configuration options include:

path:  File path globs separated by ';' (default: /data/smarttest/*)
last_position_storage:  Persistent storage for the current positions in read files (default: ./var/last_position_storage)
scan_period:  (optional) File scan period in seconds (default: 3)
read_size:  (optional) One read cycle in bytes (default: 4096)
recursive:  (optional) Recursive scanning of specified paths (default: True)
newline:  (optional) File line separator, f. e. \n (default is OS line separator)
preserve_newline:  (optional) Preserve new line character in the output (default: False)

The following configuration options enable to check that modification time of files being read is not older than the specified limit. For instance, ignore_older_than limit for files being red can be set to ignore_older_than: 20d or ignore_older_than: 100s.

ignore_older_than:  (optional) Limit in days, hours, minutes or seconds to read only files modified after the limit (default: "", f. e. "1d", "1h", "1m", "1s")
read_only_increments:  (optional) Read only lines created after the start of the application (default: True)

There are also options for the extraction of information from the file name or file path using a regular expression. The extracted parts are then stored as meta data (which implicitly include unique meta ID and file name). The configuration options start with extract_ prefix and include the following:

extract_source:  # (optional) file_name or file_path (default: file_path)
extract_regex:  # (optional) regex to extract field names from the extract source (disabled by default)

The extract_regex must contain named groups. The group names are used as field keys for the extracted information. Unnamed groups produce no data.

For example, consider the following configuration:

extract_regex: ^/data/(?P<dvchost>\w+)/(?P<tenant>\w+)\.log$

The extracted metadata for file /data/myserver.xyz/tenant-1.log will be

{
  "meta": {
    "dvchost": "myserver.xyz",
    "tenant": "tenant-1"
  }
}

The following in a working example of configuration of SmartFile input with extraction of attributes from file name using regex, and associated File output:

input:SmartFile:SmartFileInput:
  path: ./etc/tail.log
  extract_source: file_name
  extract_regex: ^(?P<dvchost>\w+).log$
  output: FileOutput

output:File:FileOutput:
  path: /data/my_path.txt
  prepend_meta: true
  debug: true

prepend_meta: true prepends the meta information such as the extracted field names to the log line/event as key-values pairs separated by spaces

Tip

In more complex setup, such as an extraction of logs from the Windows shared folder, you can utilize rsync to synchronize logs from the shared folder to a local folder at the collector machine. Then the SmartFile reads logs from the local folder.

File

Section: input:File, input:FileBlock, input:XML

These inputs read specified files by lines (input:File) or as a whole block (input:FileBlock, input:XML) and pass their content further to the pipeline.

Depending on the mode, the files may be then renamed to <FILE_NAME>-processed and if more of them are specified using a wildcard, another file will be open, read and processed in the same way.

The available configuration options for opening, reading and processing the files include:

path:  # Specify the file path(s), wildcards can be used as well (f. e. /data/lines/*)
chilldown_period:  # If more files or wildcard is used in the path, specify how often in seconds to check for new files (default: 5)
output:  # Which output to send the incoming events to
mode:  # (optional) The mode by which the file is going to be read (default: 'rb')
newline:  # (optional) File line separator (default is value of os.linesep)
post:  # (optional) Specifies what should happen with the file after reading - delete (delete the file), noop (no renaming), move (rename to `<FILE_NAME>-processed`, default)
exclude:  # (optional) Path of filenames that should be excluded (has precedence over 'include')
include:  # (optional) Path of filenames that should be included
encoding:  # (optional) Charset encoding of the file's content
move_destination:  # (optional) Destination folder for post 'move', make sure it is outside of the path specified above
lines_per_event:  # (optional) The number of lines after which the read method enters the idle state to allow other operations to perform their tasks (default: 10000)
event_idle_time:  # (optional) The time in seconds for which the read method enters the idle state, see above (default: 0.01)

ODBC

Section: input:ODBC

Provides input via ODBC driver connection to collect logs form various databases.

Configuration options related to the connection establishment:

host:  # Hostname of the database server
port:  # Port where the database server is running
user:  # Username to loging to the databse server (usually a technical/access account)
password:  # Password for the user specified above
driver:  # Pre-installed ODBC driver (see list below)
db:  # Name of the databse to access
connect_timeout:  # (optional) Connection timeout in seconds for the ODBC pool (default: 1)
reconnect_delay:  # (optional) Reconnection delay in seconds after timeout for the ODBC pool (default: 5.0)
output_queue_max_size:  # (optional) Maximum size of the output queue, i. e. in-memory storage (default: 10)
max_bulk_size:  # (optional) Maximum size of one bulk composed of the incoming records (default 2)
output:  # Which output to send the incoming events to

Configuration options related to querying the database:

query:  # Query to periodically call the database
chilldown_period:  # Specify in seconds how often the query above will be called (default: 5)
last_value_enabled:  # Enable last value duplicity check (true/false)
last_value_table:  # Specify table for SELECT max({}) from {};
last_value_column:  # The column in query to be used for obtainment of last value
last_value_storage:  # Persistent storage for the current last value (default: ./var/last_value_storage)
last_value_query:  # (optional) To specify the last value query entirely (in case this option is set, last_value_table will not be considered)
last_value_start:  # (optional) The first value to start from (default: 0)

Apache Kafka

Section: input:Kafka

This option is available from version v22.32

Creates a Kafka consumer for the specific .topic(s).

Configuration options related to the connection establishment:

bootstrap_servers: # Kafka nodes to read messages from (such as `kafka1:9092,kafka2:9092,kafka3:9092`)

Configuration options related to the Kafka Consumer setting:

topic:  # Name of the topics to read messages from (such as `lmio-events` or `^lmio.*`)
group_id:  # Name of the consumer group (such as: `collector_kafka_consumer`)
refresh_topics:  # (optional) If more topics matching the topic name are expected to be created during consumption, this options specifies in seconds how often to refresh the topics' subscriptions (such as: `300`)

bootstrap_servers, topic and group_id options are always required

topic can be a name, a list of names separated by spaces or a simple regex (to match all available topics, use ^.*)

For more configuration options, please refer to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md