LogMan.io Collector Inputs¶
Note
This chapter concerns setup for log sources collected over network, syslog, files, databases, etc. For the setup of event collection from various log sources, see the Log sources subtopic.
Network¶
Sections: input:TCP
, input:Stream
, input:UDP
, input:Datagram
These inputs listen on a given address using TCP, UDP or Unix Socket.
Tip
Logs should be collected through TCP protocol. Only if it is not possible, use UDP protocol.
The configuration options for listening:
address: # Specify IPv4, IPv6 or UNIX file path to listen from
output: # Which output to send the incoming events to
Here are possible form of address
:
8080
or*:8080
: Listen on a port 8080 all available network interfaces on IPv4 and IPv60.0.0.0:8080
: Listen on a port 8080 all available network interfaces on IPv4:::8080
: Listen on a port 8080 all available network interfaces on IPv61.2.3.4:8080
: Listen on a port 8080 and specific network interface (1.2.3.4
) on IPv4::1:8080
: Listen on a port 8080 and specific network interface (::1
) on IPv6/tmp/unix.sock
: Listen on a UNIX socket/tmp/unix.sock
The following configuration options are available only for input:Datagram
:
max_packet_size: # (optional) Specify the maximum size of packets in bytes (default: 65536)
receiver_buffer_size: # (optional) Limit the receiver size of the buffer in bytes (default: 0)
Warning
LogMan.io Collector runs inside Docker container. Propagation of network ports must be enabled like this:
services:
lmio-collector-tenant:
network_mode: host
Note
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are both protocols used for sending data over the network.
TCP is a Stream, as it provides a reliable, ordered, and error-checked delivery of a stream of data.
In contrast, UDP is a datagram that sends packets independently, allowing faster transmission but with less reliability and no guarantee of order, much like individual, unrelated messages.
Tip
For troubleshooting, use tcpdump
to capture raw network traffic and then use Wireshark for deeper analysis.
The example of capturing the traffic at TCP/10008 port:
$ sudo tcpdump -i any tcp port 10008 -s 0 -w /tmp/capute.pcap -v
When enough traffic is captured, press Ctrl-C and collect the file /tmp/capture.pcap
that contains the traffic capture.
This file can be opened in Wireshark.
Syslog¶
Sections: input:TCPBSDSyslogRFC6587
, input:TCPBSDSyslogNoFraming
Special cases of TCP input for parsing SysLog via TCP. For more information, see RFC 6587 and RFC 3164, section 4.1.1
The configuration options for listening on a given path:
address: # Specify IPv4, IPv6 or UNIX file path to listen from (f. e. 127.0.0.1:8888 or /data/mysocket)
output: # Which output to send the incoming events to
The following configuration options are available only for input:TCPBSDSyslogRFC6587
:
max_sane_msg_len: # (optional) Maximum size in bytes of SysLog message to be received (default: 10000)
The following configuration options are available only for input:TCPBSDSyslogNoFraming
:
buffer_size: # (optional) Maximum size in bytes of SysLog message to be received (default: 64 * 1024)
variant: # (optional) The variant of SysLog format of the incoming message, can be `auto`, `nopri` with no PRI number in the beginning and `standard` with PRI (default: auto)
Subprocess¶
Section: input:SubProcess
The SubProcess input runs a command as a subprocess of the LogMan.io collector, while
periodically checking for its output at stdout
(lines) and stderr
.
The configuration options include:
command: # Specify the command to be run as subprocess (f. e. tail -f /data/tail.log)
output: # Which output to send the incoming events to
line_len_limit: # (optional) The length limit of one read line (default: 1048576)
ok_return_codes: # (optional) Which return codes signify the running status of the command (default: 0)
File tailing¶
Section: input:SmartFile
Smart File Input is used for collecting events from multiple files whose content may be dynamically modified,
or the files may be deleted altogether by another process, similarly to the tail -f
shell command.
Smart File Input creates a monitored file object for every file path, that is specified in the configuration in the path
options.
The monitored file periodically checks for new lines in the file, and if one occurs, the line is read in bytes and passed further to the pipeline, including meta information such as file name and extracted parts of file path, see extract parameters section.
Various protocols are used for reading from different log file formats:
- Line Protocol for line-oriented log files
- XML Protocol for XML-oriented log files
- W3C Extended Log File Protocol for log files in W3C Extended Log File Format
- W3C DHCP Server Protocol for DHCP Server log files
Required configuration options:
input:SmartFile:MyFile:
path: | # File paths separated by newlines
/first/path/to/log/files/*.log
/second/path/to/log/files/*.log
/another/path/*s
protocol: # Protocol to be used for reading
Optional configuration options:
recursive: # Recursive scanning of specified paths (default: True)
scan_period: # File scan period in seconds (default: 3 seconds)
preserve_newline: # Preserve new line character in the output (default: False)
last_position_storage: # Persistent storage for the current positions in read files (default: ./var/last_position_storage)
Tip
In more complex setup, such as an extraction of logs from the Windows shared folder, you can utilize rsync
to synchronize logs from the shared folder to a local folder at the collector machine. Then Smart File Input reads logs from the local folder.
Warning
Internally, the current position in the file is stored in last position storage in position variable. If the last position storage file is deleted or not specified, all files are read all over again after the LogMan.io Collector restarts, i.e. no persistence means reset of the reading when restarting.
You can configure path for last position storage:
last_position_storage: "./var/last_position_storage"
Warning
If the file size is lower than the previous remembered file size, the file is read as a whole over again and sent to the pipeline split to lines.
File paths¶
File path globs are separated by newlines. They can contain wildcards (such as *, **
, etc.).
path: |
/first/path/*.log
/second/path/*.log
/another/path/*
By default, files are read recursively. You can disable recursive reading with:
recursive: False
Line Protocol¶
protocol: line
line/C_separator: # (optional) Character used for line separator. Default: '\n'.
Line Protocol is used for reading messages from line-oriented log files.
XML Protocol¶
protocol: xml
tag_separator: '</msg>' # (required) Tag for separator.
XML Protocol is used for reading messages from XML-oriented log files.
Parameter tag_separator
must be included in configuration.
Example
Example of XML log file:
...
<msg time='2024-04-16T05:47:39.814+02:00' org_id='orgid'>
<txt>Log message 1</txt>
</msg>
<msg time='2024-04-16T05:47:42.814+02:00' org_id='orgid'>
<txt>Log message 2</txt>
</msg>
<msg time='2024-04-16T05:47:43.018+02:00' org_id='orgid'>
<txt>Log message 3</txt>
</msg>
...
Example configuration:
input:SmartFile:Alert:
path: /xml-logs/*.xml
protocol: xml
tag_separator: "</msg>"
W3C Extended Log File Protocol¶
protocol: w3c_extended
W3C Extended Log File Protocol is used for collecting events from files in W3C Extended Log File Format and serializing them into JSON format.
Example of event collection from Microsoft Exchange Server
LogMan.io Collector Configuration example:
input:SmartFile:MSExchange:
path: /MicrosoftExchangeServer/*.log
protocol: w3c_extended
extract_source: file_path
extract_regex: ^(?P<file_path>.*)$
Example of log file content:
#Software: Microsoft Exchange Server
#Version: 15.02.1544.004
#Log-type: DNS log
#Date: 2024-04-14T00:02:48.540Z
#Fields: Timestamp,EventId,RequestId,Data
2024-04-14T00:02:38.254Z,,9666704,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:46955)"
2024-04-14T00:02:38.254Z,,7204389,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:11737)"
2024-04-14T00:02:38.254Z,,43150675,"Send completed. Error=Success; Details=id=46955; query=AAAA exchange.bradavice.cz; retryCount=0"
...
W3C DHCP Server Format¶
protocol: w3c_dhcp
W3C DHCP Protocol is used for collecting events from DHCP Server log files. It is very similar to W3C Extended Log File Format with the difference in log file header.
Table of W3C DHCP events identification
Event ID | Meaning |
---|---|
00 | The log was started. |
01 | The log was stopped. |
02 | The log was temporarily paused due to low disk space. |
10 | A new IP address was leased to a client. |
11 | A lease was renewed by a client. |
12 | A lease was released by a client. |
13 | An IP address was found to be in use on the network. |
14 | A lease request could not be satisfied because the scope's address pool was exhausted. |
15 | A lease was denied. |
16 | A lease was deleted. |
17 | A lease was expired and DNS records for an expired leases have not been deleted. |
18 | A lease was expired and DNS records were deleted. |
20 | A BOOTP address was leased to a client. |
21 | A dynamic BOOTP address was leased to a client. |
22 | A BOOTP request could not be satisfied because the scope's address pool for BOOTP was exhausted. |
23 | A BOOTP IP address was deleted after checking to see it was not in use. |
24 | IP address cleanup operation has began. |
25 | IP address cleanup statistics. |
30 | DNS update request to the named DNS server. |
31 | DNS update failed. |
32 | DNS update successful. |
33 | Packet dropped due to NAP policy. |
34 | DNS update request failed as the DNS update request queue limit exceeded. |
35 | DNS update request failed. |
36 | Packet dropped because the server is in failover standby role or the hash of the client ID does not match. |
50+ | Codes above 50 are used for Rogue Server Detection information. |
Example of event collection from DHCP Server
LogMan.io Collector Configuration example:
input:SmartFile:DHCP-Server-Input:
path: /DHCPServer/*.log
protocol: w3c_dhcp
extract_source: file_path
extract_regex: ^(?P<file_path>.*)$
Example of DHCP Server log file content:
DHCP Service Activity Log
Event ID Meaning
00 The log was started.
01 The log was stopped.
...
50+ Codes above 50 are used for Rogue Server Detection information.
ID,Date,Time,Description,IP Address,Host Name,MAC Address,User Name, TransactionID, ...
24,04/16/24,00:00:21,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
24,04/16/24,00:00:22,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
...
For instance, ignore_older_than
limit for files being red can be set to ignore_older_than: 20d
or ignore_older_than: 100s
.
Extract parameters¶
There are also options for the extraction of information from the file name or file path using a regular expression.
The extracted parts are then stored as meta data (which implicitly include unique meta ID and file name).
The configuration options start with extract_
prefix and include the following:
extract_source: # (optional) file_name or file_path (default: file_path)
extract_regex: # (optional) regex to extract field names from the extract source (disabled by default)
The extract_regex
must contain named groups. The group names are used as field keys for the extracted information.
Unnamed groups produce no data.
Example of extracting metadata from regex
Collecting from a file /data/myserver.xyz/tenant-1.log
The following configuration:
extract_regex: ^/data/(?P<dvchost>\w+)/(?P<tenant>\w+)\.log$
will produce metadata:
{
"meta": {
"dvchost": "myserver.xyz",
"tenant": "tenant-1"
}
}
The following in a working example of configuration of SmartFile
input with extraction
of attributes from file name using regex, and associated File
output:
input:SmartFile:SmartFileInput:
path: ./etc/tail.log
extract_source: file_name
extract_regex: ^(?P<dvchost>\w+).log$
output: FileOutput
output:File:FileOutput:
path: /data/my_path.txt
prepend_meta: true
debug: true
Prepending information¶
prepend_meta: true
Prepends the meta information such as the extracted field names to the log line/event as key-values pairs separated by spaces.
Ignore old changes¶
The following configuration options enable to check that modification time of files being read is not older than the specified limit.
ignore_older_than: # (optional) Limit in days, hours, minutes or seconds to read only files modified after the limit (default: "", f. e. "1d", "1h", "1m", "1s")
File¶
Section: input:File
, input:FileBlock
, input:XML
These inputs read specified files by lines (input:File
) or as a whole block (input:FileBlock
, input:XML
)
and pass their content further to the pipeline.
Depending on the mode, the files may be then renamed to <FILE_NAME>-processed
and if more of them are specified using a wildcard, another file will be open,
read and processed in the same way.
The available configuration options for opening, reading and processing the files include:
path: # Specify the file path(s), wildcards can be used as well (f. e. /data/lines/*)
chilldown_period: # If more files or wildcard is used in the path, specify how often in seconds to check for new files (default: 5)
output: # Which output to send the incoming events to
mode: # (optional) The mode by which the file is going to be read (default: 'rb')
newline: # (optional) File line separator (default is value of os.linesep)
post: # (optional) Specifies what should happen with the file after reading - delete (delete the file), noop (no renaming), move (rename to `<FILE_NAME>-processed`, default)
exclude: # (optional) Path of filenames that should be excluded (has precedence over 'include')
include: # (optional) Path of filenames that should be included
encoding: # (optional) Charset encoding of the file's content
move_destination: # (optional) Destination folder for post 'move', make sure it is outside of the path specified above
lines_per_event: # (optional) The number of lines after which the read method enters the idle state to allow other operations to perform their tasks (default: 10000)
event_idle_time: # (optional) The time in seconds for which the read method enters the idle state, see above (default: 0.01)
ODBC¶
Section: input:ODBC
Provides input via ODBC driver connection to collect logs form various databases.
Configuration options related to the connection establishment:
host: # Hostname of the database server
port: # Port where the database server is running
user: # Username to loging to the databse server (usually a technical/access account)
password: # Password for the user specified above
driver: # Pre-installed ODBC driver (see list below)
db: # Name of the databse to access
connect_timeout: # (optional) Connection timeout in seconds for the ODBC pool (default: 1)
reconnect_delay: # (optional) Reconnection delay in seconds after timeout for the ODBC pool (default: 5.0)
output_queue_max_size: # (optional) Maximum size of the output queue, i. e. in-memory storage (default: 10)
max_bulk_size: # (optional) Maximum size of one bulk composed of the incoming records (default 2)
output: # Which output to send the incoming events to
Configuration options related to querying the database:
query: # Query to periodically call the database
chilldown_period: # Specify in seconds how often the query above will be called (default: 5)
last_value_enabled: # Enable last value duplicity check (true/false)
last_value_table: # Specify table for SELECT max({}) from {};
last_value_column: # The column in query to be used for obtainment of last value
last_value_storage: # Persistent storage for the current last value (default: ./var/last_value_storage)
last_value_query: # (optional) To specify the last value query entirely (in case this option is set, last_value_table will not be considered)
last_value_start: # (optional) The first value to start from (default: 0)
Apache Kafka¶
Section: input:Kafka
This option is available from version v22.32
Creates a Kafka consumer for the specific .topic(s).
Configuration options related to the connection establishment:
bootstrap_servers: # Kafka nodes to read messages from (such as `kafka1:9092,kafka2:9092,kafka3:9092`)
Configuration options related to the Kafka Consumer setting:
topic: # Name of the topics to read messages from (such as `lmio-events` or `^lmio.*`)
group_id: # Name of the consumer group (such as: `collector_kafka_consumer`)
refresh_topics: # (optional) If more topics matching the topic name are expected to be created during consumption, this options specifies in seconds how often to refresh the topics' subscriptions (such as: `300`)
bootstrap_servers
, topic
and group_id
options are always required
topic
can be a name, a list of names separated by spaces or a simple regex (to match all available topics, use ^.*
)
For more configuration options, please refer to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md