Table of Contents
Introduction ↵
TeskaLabs LogMan.io documentation¶
Welcome to TeskaLabs LogMan.io documentation.
TeskaLabs LogMan.io¶
TeskaLabs LogMan.io™️ is a software product for log collection, log aggregation, log storage and retention, real-time log analysis and prompt incident response for an IT infrastructure, collectively known as log management.
TeskaLabs LogMan.io consists of a central infrastructure and log collectors, that resides on monitored systems such as servers or network appliances. Log collectors collect various logs (operation system, applications, databases) and system metrics such as CPU usage, memory usage, disk space etc. Collected events are sent in real-time to central infrastructure for consolidation, orchestration and storage. Thanks to its real-time nature, LogMan.io provides alerts for anomalous situation in perspective of system operation (e.g. is disk space running low), availability (e.g. is the application running?), business (e.g. is number of transaction below normal?) or security (e.g. any unusual access to servers?).
TeskaLabs SIEM¶
TeskaLabs SIEM is a real-time Security Information and Event Managemet tool. TeskaLabs SIEM provides real-time analysis and correlations of security events and alerts processed by a TeskaLabs LogMan.io. We designed TeskaLabs SIEM to enhance cyber security posture and compliance with regulatory.
More components
TeskaLabs SIEM and TeskaLabs LogMan.io are standalone products. Thanks to its modular architecture, these products also include other TeskaLabs technologies:
- TeskaLabs SeaCat Auth for authentification, authorization including user roles and fine-grained access control.
- TeskaLabs SP-Lang is an expression language used on many places in the product.
Made with ❤️ by TeskaLabs
TeskaLabs LogMan.io™️ is a product of TeskaLabs.
Features¶
TeskaLabs LogMan.io is a real-time SIEM with log management.
- Multitenancy: a single instance of TeskaLabs LogMan.io can serve multiple tenants (customers, departments).
- Multiuser: TeskaLabs LogMan.io can be used by unlimmited number of users simultanously.
Technologies¶
Cryptography¶
- Transport layer: TLS 1.2, TLS 1.3 and better
- Symmetric cryptography: AES-128, AES-256, AES-512
- Asymmetric cryptography: RSA, ECC
- Hash methods: SHA-256, SHA-384, SHA-512
- MAC functions: HMAC
- HSM: PKCS#11 interface
Note
TeskaLabs LogMan.io uses only strong cryptography, it means that we use only these ciphers, hashes and other algorithms that are recongized as secure by cryptographic comunity and by organizations such as ENISA or NIST.
Supported Log Sources¶
TeskaLabs LogMan.io supports a variety of different technologies, which we have listed below.
Formats¶
- Syslog RFC 5424 (IEFT)
- Syslog RFC 3164 (BSD)
- Syslog RFC 3195 (BEEP profile)
- Syslog RFC 6587 (Frames over TCP)
- Reliable Event Logging Protocol (REPL), including SSL
- Windows Event Log
- SNMP
- ArcSight CEF
- LEEF
- JSON
- JSON
- XML
- YAML
- Avro
- Custom/raw log format
And many more.
Vendors and Products¶
Cisco¶
- Cisco Firepower Threat Defense (FTD)
- Cisco Adaptive Security Appliance (ASA)
- Cisco Identity Services Engine (ISE)
- Cisco Meraki (MX, MS, MR devices)
- Cisco Catalyst Switches
- Cisco IOS
- Cisco WLC
- Cisco ACS
- Cisco SMB
- Cisco UCS
- Cisco IronPort
- Cisco Nexus
- Cisco Routers
- Cisco VPN
- Cisco Umbrella
Palo Alto Networks¶
- Palo Alto Next-Generation Firewalls
- Palo Alto Panorama (Centralized Management)
- Palo Alto Traps (Endpoint Protection)
Fortinet¶
- FortiGate (Next-Generation Firewalls)
- FortiSwitch (Switches)
- FortiAnalyzer (Log Analytics)
- FortiMail (Email Security)
- FortiWeb (Web Application Firewall)
- FortiADC
- FortiDDos
- FortiSandbox
Juniper Networks¶
- Juniper SRX Series (Firewalls)
- Juniper MX Series (Routers)
- Juniper EX Series (Switches)
Check Point Software Technologies¶
- Check Point Security Gateways
- Check Point SandBlast (Threat Prevention)
- Check Point CloudGuard (Cloud Security)
Microsoft¶
- Microsoft Windows (Operating System)
- Microsoft Azure (Cloud Platform)
- Microsoft SQL Server (Database)
- Microsoft IIS (Web Server)
- Microsoft Office 365
- Microsoft Exchange
- Microsoft Sharepoint
Linux¶
- Ubuntu (Distribution)
- CentOS (Distribution)
- Debian (Distribution)
- Red Hat Enterprise Linux (Distribution)
- IPTables
- nftables
- Bash
- Cron
- Kernel (dmesg)
Oracle¶
- Oracle Database
- Oracle WebLogic Server (Application Server)
- Oracle Cloud
Amazon Web Services (AWS)¶
- Amazon EC2 (Virtual Servers)
- Amazon RDS (Database Service)
- AWS Lambda (Serverless Computing)
- Amazon S3 (Storage Service)
VMware¶
- VMware ESXi (Hypervisor)
- VMware vCenter Server (Management Platform)
F5 Networks¶
- F5 BIG-IP (Application Delivery Controllers)
- F5 Advanced Web Application Firewall (WAF)
Barracuda Networks¶
- Barracuda CloudGen Firewall
- Barracuda Web Application Firewall
- Barracuda Email Security Gateway
Sophos¶
- Sophos XG Firewall
- Sophos UTM (Unified Threat Management)
- Sophos Intercept X (Endpoint Protection)
Aruba Networks (HPE)¶
- Aruba Switches
- Aruba Wireless Access Points
- Aruba ClearPass (Network Access Control)
- Aruba Mobility Controller
HPE¶
- iLO
- IMC
- HPE StoreOnce
- HPE Primera Storage
- HPE 3PAR StoreServ
Trend Micro¶
- Trend Micro Deep Security
- Trend Micro Deep Discovery
- Trend Micro TippingPoint (Intrusion Prevention System)
- Trend Micro Endpoint Protection Manager
- Trend Micro Apex One
Zscaler¶
- Zscaler Internet Access (Secure Web Gateway)
- Zscaler Private Access (Remote Access)
Akamai¶
- Akamai (Content Delivery Network and Security)
- Akamai Kona Site Defender (Web Application Firewall)
- Akamai Web Application Protector
Imperva¶
- Imperva Web Application Firewall (WAF)
- Imperva Database Security (Database Monitoring)
SonicWall¶
- SonicWall Next-Generation Firewalls
- SonicWall Email Security
- SonicWall Secure Mobile Access
WatchGuard Technologies¶
- WatchGuard Firebox (Firewalls)
- WatchGuard XTM (Unified Threat Management)
- WatchGuard Dimension (Network Security Visibility)
Apple¶
- macOS (Operating System)
Apache¶
- Apache Cassandra (Database)
- Apache HTTP Server
- Apache Kafka
- Apache Tomcat
- Apache Zookeeper
NGINX¶
- NGINX (Web Server and Reverse Proxy Server)
Docker¶
- Docker (Container Platform)
Kubernetes¶
- Kubernetes (Container Orchestration)
Atlassian¶
- Jira (Issue and Project Tracking)
- Confluence (Collaboration Software)
- Bitbucket (Code Collaboration and Version Control)
Cloudflare¶
- Cloudflare (Content Delivery Network and Security)
SAP¶
- SAP HANA (Database)
Balabit¶
- syslog-ng
Open-source¶
- PostgreSQL (Database)
- MySQL (Database)
- OpenSSH (Remote access)
- Dropbear SSH (Remote access)
- Jenkins (Continuous Integration and Continuous Delivery)
- rsyslog
- GenieACS
- Haproxy
- spamassasin
- FreeRadius
- Bind
- DHCP
- Postfix
- Squid Cache
- Zabbix
- FileZilla
IBM¶
- IBM Db2 (Database)
- IBM AIX (Operating System)
- IBM i (Operating System)
Brocade¶
- Brocade Switches
Google¶
- Google Cloud
- Pub/Sub & BigQuery
Elastic¶
- ElasticSearch
Citrix¶
- Citrix Virtual Apps and Desktops (Virtualization)
- Citrix Hypervisor (Virtualization)
- Citrix ADC, NetScaler
- Citrix Gateway (Remote access)
- Citrix SD-WAN
- Citrix Endpoint Management (MDM, MAM)
Dell¶
- Dell EMC Isilon (network-attached storage)
- Dell PowerConnect Switches
- Dell W-Series (Access points)
- Dell iDRAC
- Dell Force10 Switches
FlowMon¶
- Flowmon Collector
- Flowmon Probe
- Flowmon ADS
- Flowmon FPI
- Flowmon APM
GreyCortex¶
- GreyCortex Mendel
Huawei¶
- Huawei Routers
- Huawei Switches
- Huawei Unified Security Gateway (USG)
Synology¶
- Synology NAS
- Synology SAN
- Synology NVR
- Synology Wi-Fi routers
Ubiquity¶
- UniFi
Avast¶
- Avast Antivirus
Kaspersky¶
- Kaspesky Endpoint Security
- Kaspesky Security Center
Kerio¶
- Kerio Connect
- Kerio Control
- Kerio Clear Web
Symantec¶
- Symantec Endpoint Protection Manager
- Symantec Messaging Gateway
ESET¶
- ESET Antivirus
- ESET Remote Administrator
AVG¶
- AVG Antivirus
Extreme Networks¶
- ExtremeXOS
IceWarp¶
- IceWarp Mail Server
Mikrotik¶
- Mikrotic Routers
- Mikrotik Switches
Pulse Secure¶
- Pulse Connect Secure SSL VPN
QNAP¶
- QNAP NAS
Safetica¶
- Safetica DLP
Veeam¶
- Veeam Backup and Restore
SuperMicro¶
- IPMI
Mongo¶
- MongoDB
YSoft¶
- SafeQ
Bitdefender¶
- Bitdefender GravityZone
- Bitdefender Network Traffic Security Analytics (NTSA)
- Bitdefender Advanced Threat Intelligence
This list is not exhaustive, as there are many other vendors and products that can send logs to TeskaLabs LogMan.io using standard protocols such as Syslog. Please contact us if you seek for a specific technology to be integrated.
SQL log extraction¶
TeskaLabs LogMan.io can extract logs from various SQL databases using ODBC (Open Database Connectivity).
Among supported databases are:
- PostgreSQL
- Oracle Database
- IBM Db2
- MySQL
- SQLite
- MariaDB
- SAP HANA
- Sybase ASE
- Informix
- Teradata
- Amazon RDS (Relational Database Service)
- Google Cloud SQL
- Azure SQL Database
- Snowflake
TeskaLabs LogMan.io Architecture¶
lmio-collector¶
LogMan.io Collector serves to receive log lines from various sources such as SysLog NG, files, Windows Event Forwarding, databases via ODBC connectors and so on. The log lines may be further processed by a declarative processor and put into LogMan.io Ingestor via WebSocket.
lmio-ingestor¶
LogMan.io Ingestor receives events via WebSocket, transforms them to Kafka-readable format
and put them to Kafka collected-
topic. There are multiple ingestors for different
event formats, such as SysLog, databases, XML and so on.
lmio-parser¶
LogMan.io Parser runs in multiple instances to receive different formats of incoming events (different Kafka topics) and/or the same events (the instances then run in the same Kafka group to distribute events among them). LogMan.io Parser loads the LogMan.io Library via ZooKeeper or from files to load declarative parsers and enrichers from configured parsing groups.
If the events were parsed by the loaded parser, they are put to lmio-events
Kafka topic, otherwise
they enter the lmio-others
Kafka topic.
lmio-dispatcher¶
LogMan.io Dispatcher loads events from lmio-events
Kafka topic and sends them both to all
subscribed (via ZooKeeper) LogMan.io Correlator instances and ElasticSearch in the
appropriate index, where all events can be queried and visualized using Kibana.
LogMan.io Dispatcher runs in multiple instances as well.
lmio-correlator¶
LogMan.io Correlator uses ZooKeeper to subscribe to all LogMan.io Dispatcher instances to receive parsed events (log lines etc.). Then LogMan.io Correlator loads the LogMan.io Library from ZooKeeper or from files to create correlators based on the declarative configuration. Events produced by correlators (Window Correlator, Match Correlator) are then handed down to LogMan.io Dispatcher and LogMan.io Watcher via Kafka.
lmio-watcher¶
LogMan.io Watcher observes changes in lookups used in LogMan.io Parsers and LogMan.io Correlators
instances. When a change occurs, all running components that use LogMan.io Library are notified
via Kafka topic lmio-lookups
about the change and the lookup is updated in the ElasticSearch,
which serves as a persistent storage for all lookups.
lmio-integ¶
LogMan.io Integ allows LogMan.io to be integrated with supported external systems via expected message format and output/input protocol.
Support¶
Live help¶
Our team is available at our live support channel at Slack. You can message our internal experts directly, consult your plans, problems and challenges and even get online live help over share screen so that you don't need to be afraid of major upgrades and so on. The access is provided to customers with an active support plan.
Email support¶
Contact us at: support@teskalabs.com
Support hours¶
The 5/8 support level is available at working days based on Czech calendar, 09-18 Central European Time (Europe/Prague).
The 24/7 support level is also available, depending on your active support plan.
Ended: Introduction
User Manual ↵
Welcome¶
What's in the User Manual?
Here, you can learn how to use the TeskaLabs LogMan.io app. For information about setup, configuration, and maintenance, visit the Administration Manual or the Reference guide. If you can't find the help you need, contact Support.
Quickstart¶
Jump to:
- Get an overview of all events in your system (Home)
- Read incoming logs, and filter logs by field and time (Discover)
- View and filter your data as charts and graphs (Dashboards)
- View and print reports (Reports)
- Run, download, and manage exports (Export)
- Change your general or account settings
Some features are only visible to administrators, so you might not see all of the features that are included in the User Manual in your own version of TeskaLabs LogMan.io.
Administrator quickstart¶
Are you an administrator? Jump to:
- Add or edit files in the library, such as dashboards, reports, and exports (Library)
- Add or edit lookups (Lookups)
- Access external components that work with TeskaLabs LogMan.io (Tools)
- Change the configuration of your interface (Configuration)
- See microservices (Services)
- Manage user permissions (Auth)
Settings¶
Use these controls in the top right corner of your screen to change settings:
Tenants¶
A tenant is one entity collecting data from a group of sources. When you're using the program, you can only see the data belonging to the selected tenant. A tenant's data is completely separated from all other tenants' data in TeskaLabs LogMan.io (learn about multitenancy). Your company might have just one tenant, or possibly multiple tenants (for different departments, for example). If you're distributing or managing TeskaLabs LogMan.io for other clients, you have multiple tenants, at least one per client.
Tenants can be accessible by multiple users, and users can have access to multiple tenants. Learn more about tenancy here.
Tips¶
If you're new to log collection, click on the tip boxes to learn why you might want to use a feature.
Why use TeskaLabs LogMan.io?
TeskaLabs LogMan.io collects logs, which are records of every single event in your network system. This information can help you:
- Understand what's happening in your network
- Troubleshoot network problems
- Investigate security issues
Managing your account¶
Your account name is at the top right corner of your screen:
Changing your password¶
- Click on your account name.
- Click Change a password.
- Enter your current password and new password.
- Click Set password.
You should see confirmation of your password change. To return to the page you were on before changing your password, click Go back.
Changing account information¶
- Click on your account name.
- Click Manage.
- Here you can:
- Change your password
- Change your email address
- Change or add your phone number
- Log out
- Click on what you want to do, and make your changes. The changes won't be visible immediately - they'll be visible when you log out and log back in.
Seeing your access permissions¶
- Click on your account name.
- Click Access control, and you'll see what permissions you have.
Logging out¶
- Click on your account name.
- Click Logout.
You can also log out from the Manage screen.
Logging out from all devices¶
- Click on your account name.
- Click Manage.
- Click Logout from all devices.
When you log out, you'll be automatically taken to the login screen.
Using the Home page¶
The Home page gives you an overview of your data sources and critical incoming events. You'll be on the Home page by default when you log in, but you can also get to the Home page from the buttons on the left.
Viewing options¶
Chart and list view¶
To switch between chart and list view, click the list button.
Getting more details¶
Clicking on any portion of a chart takes you to Discover, where you then see the list of logs that make up this portion of the chart. From there, you can examine and filter these logs.
You can see here that Discover is automatically filtering for events from the selected dataset (from the chart on the Home page), event.dataset:devolutions
.
Using Discover¶
Discover gives you an overview of all logs being collected in real time. Here, you can filter the data by time and field.
Navigating Discover¶
Terms¶
Total count: The total number of logs in the timeframe being shown.
Aggregated by: In the bar chart, each bar represents the count of logs collected within a time interval. Use Aggregated by to choose the time interval. For example, Aggregated by: 30m means that each bar in the bar chart shows the count of all of the logs collected in a 30 minute timeframe. If you change to Aggregated by: hour, then each bar represents one hour of logs. The available options change based on the overall timeframe you are viewing in Discover.
Filtering data¶
Change the timeframe from which logs appear, and filter logs by field.
Tip: Why filter data?
Logs contain a lot of information, more than you need to accomplish most tasks. When you filter data, you choose which information you see. This can help you learn more about your network, identify trends, and even hunt for threats.
Examples:
- You want to see login data from just one user, so you filter the data to show logs containing their username.
- You had a security event on Wednesday night, and you want to learn more about it, so you filter the data to show logs from that time period.
- You notice you don't see any data from one of your network devices. You can filter the data to see all the logs from just that device. Now, you can see when the data stopped coming, and what the last event was that might have caused the problem.
Changing the timeframe¶
You can view logs from a specified timeframe. Set the timeframe by choosing start and end points using this tool:
Remember: Once you change the timeframe, press the blue refresh button to update your page.
Using the time setting tool¶
Setting a relative start/end point¶
To set the start or end point to a time relative to now, use the Relative tab.
Quick time settings
Use the quick now- ("now minus") options to set the timeframe to a preset with one click. Selecting one of these options affects both the start and end point. For example, if you choose now-1 week, the start point will be one week ago, and the end point will be "now." Choosing a now- option from the end point does the same thing as choosing a now- option from the start point. (You can't use the now- options to set the end point to anything besides "now.")
Drop-down options
To set a relative time (such as 15 minutes ago) for the start or end point, use the relative time options below the quick setting options. Select your unit of time from the drop-down list, and type or click to your desired number.
To confirm your choice, click Set relative time, and view the logs by clicking on the refresh button.
Example shown: This selection will show logs collected starting from one day ago until now.
Setting an exact start/end point¶
To choose the exact day and time for the start or end point, use the Absolute tab and select a date and time on the calendar.
To confirm your choice, click Set date.
Example shown: This selection will show logs collected starting from June 7, 2023 at 6:00 until now.
Auto refresh¶
To update the view automatically at a set time interval, choose a refresh rate:
Refresh¶
To reload the view with your changes, click the blue refresh button.
Note: Don't choose "Now" as your start point. Since the program can't show data newer than "now," it's not valid, so you'll see an error message.
Using the time selector¶
To select a more specific time period within the current timeframe, click and drag on the graph.
Filtering by field¶
In Discover, you can filter data by any field in multiple ways.
Using the field list¶
Use the search bar to find the field you want, or scroll through the list.
Isolating fields¶
To choose which fields you see in the log list, click the + symbol next to the field name. You can select multiple fields.
Seeing all occuring values in one field¶
To see a percentage breakdown of all the values from one field, click the magnifying glass next to the field name (the magnifying glass appears when you hover over the field name).
Tip: What does this mean?
This list of values from the field http.response.status_code compares how often users are getting certain http response codes. 51.4% of the time, users are getting a 404 code, meaning the webpage wasn't found. 43.3% of the time, users are getting a 200 code, which means that the request succeeded. The high percentage of "not found" response codes could inform a website administrator that one or more of their frequently clicked links are broken.
Viewing and filtering log details¶
To view the details of individual logs as a table or in JSON, click the arrow next to the timestamp. You can apply filters using the field names in the table view.
Filtering from the expanded table view¶
You can use controls in the table view to filter logs:
Filter for logs that contain the same value in the selected field (update_item
in action
in the example)
Filter for logs that do NOT contain the same value in the selected field (update_item
in action
in the example)
Show a percentage breakdown of values in this field (the same function as the magnifying glass in the fields list on the left)
Add to list of displayed fields for all visible logs (the same function as in the fields list on the left)
Query bar¶
You can filter for field (not time) using the query bar. The query bar tells you which query language to use. The query language depends on your data source. Use Lucene Query Syntax for data stored using ElasticSearch.
After you type your query, set the timeframe and click the refresh button. Your filters will be applied to the visible incoming logs.
Investigating IP addresses¶
You can investigate IP addresses using external analysis tools. You might want to do this, for example, if you see multiple suspicious logins from one IP address.
Using external IP analysis tools
1. Click on the IP address you want to analyze.
2. Click on the tool you want to use. You'll be taken to the tool's website, where you can see the results of the IP address analysis.
Using Dashboards¶
A dashboard is a set of charts and graphs that represent data from your system. Dashboards allow you to quickly get a sense for what's going on in your network.
Your administrator sets up dashboards based on the data sources and fields that are most useful to you. For example, you might have a dashboard that shows graphs related only to email activity, or only to login attempts. You might have many dashboards for different purposes.
You can filter the data to change which data the dashboard shows within its preset constraints.
How can dashboards help me?
By having certain data arranged into a chart, table, or graph, you can get a visual overview of activity within your system and identify trends. In this example, you can see that a high volume of emails were sent and received on June 19th.
Navigating Dashboards¶
Opening a dashboard¶
To open a dashboard, click on its name.
Dashboard controls¶
Setting the timeframe¶
You can change the timeframe the dashboard represents. Find the time-setting guide here. To refresh the dashboard with your new timeframe, click on the refresh button.
Note: There is no auto-refresh rate in Dashboards.
Filtering dashboard data¶
To filter the data the dashboard shows, use the query bar. The query language you need to use depends on your data source. The query bar tells you which query language to use. Use Lucene Query Syntax for data stored using ElasticSearch.
Moving widgets¶
You can reposition and resize each widget. To move widgets, click on the dashboard menu button and select Edit.
To move a widget, click anywere on the widget and drag. To resize a widget, click on the widget's bottom right corner and drag.
To save your changes, click the green save button. To cancel the changes, click the red cancel button.
Printing dashboards¶
To print a dashboard, click on the dashboard menu button and select Print. Your browser opens a window, and you can choose your print settings there.
Reports¶
Reports are printer-friendly visual representations of your data, like printable dashboards. Your administrator chooses what information goes into your reports based on your needs.
Find and print a report¶
- Select the report from your list, or use the search bar to find your report.
- Click Print. Your browser opens a print window where you can choose your print settings.
Using Export¶
Turn sets of logs into downloadable, sendable files in Export. You can keep these files on your computer, inspect them in another program, or send them via email.
What is an export?
An export is not a file, but a process that creates a file. The export contains and follows your instructions for which data to put in the file, what type of file to create, and what to do with the file. When you run the export, you create the file.
Why would I export logs?
Being able to see a group of logs in one file can help you inspect the data more closely. A few reasons you might want to export logs are:
- To investigate an event or attack
- To send data to an analyst
- To explore the data in a program outside TeskaLabs LogMan.io
Navigating Export¶
List of exports
The List of exports shows you all the exports that have been run.
From the list page, you can:
- See an export's details by clicking on the export's name
- Download the export by clicking on the cloud beside its name
- Delete the export by clicking on the trash can beside its name
- Search for exports using the search bar
Export status is color-coded:
- Green: Completed
- Yellow: In progress
- Blue: Scheduled
- Red: Failed
Jump to:¶
Run an export¶
Running an export adds the export to your List of exports, but it does not automatically download the export. See Download an export for instructions.
Run an export based on a preset¶
1. Click New on the List of exports page. Now you can see the preset exports:
2. To run a preset export, click the run button beside the export name.
OR
2. To edit the export before running, click on the edit button beside the export name. Make your changes, and then click Start. (Use this guide to learn about making changes.)
Once you run the export, you are automatically brought back to the list of exports, and your export appears at the top of the list.
Note
Export presets are created by administrators.
Run an export based on an export you've run before¶
You can re-run an export. Running an export again does not overwrite the original export.
1. On the List of exports page, click on the name of the export you want to run again.
2. Click Restart.
3. You can make changes here (see this guide) or run as-is.
4. Click Start.
Once you run the export, you are automatically brought back to the list of exports, and your new export appears at the top of the list.
Create a new export¶
Create an export from a blank form¶
1. In List of exports, click New, then click Custom.
2. Fill in the fields.
Note
The options in the drop down menus might change based on the selections you make.
Name
Name the export.
Data Source
Select your data source from the drop-down list.
Output
Choose the file type for your data. It can be:
- Raw: If you want to download the export and import the logs into different software, choose raw. If the data source is Elasticsearch, the raw file format is .json.
- .csv: Comma-separated values
- .xlsx: Microsoft Excel format
Compression
Choose to zip your export file, or leave it uncompressed. A zipped file is compressed, and therefore smaller, so it's easier to send, and it takes up less space in your computer.
Target
Choose the target for your file. It can be:
- Download: A file you can download to your computer.
- Email: Fill in the email fields. When you run the export, the email sends. You can still download the data file any time in the List of exports.
- Jupyter: Saves the file in the Jupyter notebook, which you can access through the Tools page. You need to have administrator permissions to access the Jupyter notebook, so only choose Jupyter as the target if you're an administrator.
Separator
If you select .csv as your output, choose what character will mark the separation between each value in each log. Even though CSV means comma-separated values, you can choose to use a different separator, such as a semicolon or space.
Schedule (optional)¶
To schedule the export, rather than running it immediately, click Add schedule.
-
Schedule once:
- To run the export one time at a future time, type the desired date and time in
YYYY-MM-DD HH:mm
format, for example2023-12-31 23:59
(December 31st, 2023, at 23:59).
- To run the export one time at a future time, type the desired date and time in
-
Schedule a recurring export:
-
To set up the export to run automatically on a regular schedule, use
cron
syntax. You can learn more aboutcron
from Wikipedia, and use this tool and these examples by Cronitor to help you writecron
expressions. -
The Schedule field also supports random
R
usage and Vixie cron-style@
keyword expressions.
-
Query
Type a query to filter for certain data. The query determines which data to export, including the timeframe of the logs.
Warning
You must include a query in every export. If you run an export without a query, all of the data stored in your program will be exported with no filter for time or content. This could create an extremely large file and put strain on data storage components, and the file likely won't be useful to you or to analysts.
If you accidentally run an export without a query, you can delete the export while it's still running in the List of exports by clicking on the trash can button.
TeskaLabs LogMan.io uses the Elasticsearch Query DSL (Domain Specific Language).
Here's the full guide to the Elasticsearch Query DSL.
Example of a query:
{
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lt": "now/d"
}
}
},
{
"prefix": {
"event.dataset": {
"value": "microsoft-office-365"
}
}
}
]
}
}
Query breakdown:
bool
: This tells us that the whole query is a Boolean query, which combines mutliple conditions such as "must," "should," and "must not" Here, it's using filter
to find characteristics the data must have to make it into the export. filter
can have mutliple conditions.
range
is the first filter condition. Since it refers to the field below it, which is @timestamp
, it will filter for logs based on a range of values in the timestamp field.
@timestamp
tells us that the query is filtering for time, so it will export logs from a certain timeframe.
gte
: This means "greater than or equal to," which is set to the value now-1d/d
, meaning the earliest timestamp (the first log) will be from exactly one day ago at the moment you run the export.
lt
means "less than," and it is set to now/d
, so the latest timestamp (the last log) will be the newest at the moment you run the export ("now").
prefix
is the second filter condition. It looks for logs where the value of a field, in this case event.dataset
, starts with microsoft-office-365
.
So, what does this query mean?
This export will show all logs from Microsoft Office 365 from the last 24 hours.
3. Add columns
For .csv and .xlsx files, you need to specify what columns you want to have in your document. Each column represents a data field. If you don't specify any columns, the resulting table will have all possible columns, so the table might be much bigger than you expect or need it to be.
You can see the list of all available data fields in Discover. To find which fields are relevant to the logs you're exporting, inspect an individual log in Discover.
- To add a column, click Add. Type the name of the column.
- To delete a column, click -.
- To reorder the columns, click and drag the arrows.
Warning
Pressing enter after typing a column name will run the export.
This example was downloaded from the export shown above as a .csv file, then separated into columns using the Microsoft Excel Convert Text to Columns Wizard. You can see that the columns here match the columns specified in the export.
4. Run the export by pressing Start.
Once you run the export, you are automatically brought back to the list of exports, and your export appears at the top of the list.
Download an export¶
1. On the List of exports page, click on the cloud button to download.
OR
1. On the List of exports page, click on the export's name.
2. Click Download.
Your browser should automatcially start a download.
Delete an export¶
1. On the List of exports page, click on the trash can button.
OR
1. On the List of exports page, click on the export's name.
2. Click Remove.
The export should disappear from your list.
Add an export to your library¶
Note
This feature is only available to administrators.
If you like an export you've created or edited, you can save it to your library as a preset for future use.
1. On the List of exports page, click on the export's name.
2. Click Save to Library.
When you click on New from the List of exports page, your new export preset should be in the list.
All features ↵
Home page¶
The Home page gives you an overview of your data sources and critical incoming events.
Viewing options¶
Chart and list view¶
To switch between chart and list view, click the list button.
Getting more details¶
Clicking on any portion of a chart takes you to Discover, where you then see the list of logs that make up this portion of the chart. From there, you can examine and filter these logs.
You can see here that Discover is automatically filtering for events from the selected dataset (from the chart on the Home page), event.dataset:devolutions
.
Discover¶
Discover gives you an overview of all logs being collected in real time. Here, you can filter the data by time and field.
Navigating Discover¶
Terms¶
Total count: The total number of logs in the timeframe being shown.
Aggregated by: In the bar chart, each bar represents the count of logs collected within a time interval. Use Aggregated by to choose the time interval. For example, Aggregated by: 30m means that each bar in the bar chart shows the count of all of the logs collected in a 30 minute timeframe. If you change to Aggregated by: hour, then each bar represents one hour of logs. The available options change based on the overall timeframe you are viewing in Discover.
Filtering data¶
Change the timeframe from which logs appear, and filter logs by field.
Tip: Why filter data?
Logs contain a lot of information, more than you need to accomplish most tasks. When you filter data, you choose which information you see. This can help you learn more about your network, identify trends, and even hunt for threats.
Examples:
- You want to see login data from just one user, so you filter the data to show logs containing their username.
- You had a security event on Wednesday night, and you want to learn more about it, so you filter the data to show logs from that time period.
- You notice you don't see any data from one of your network devices. You can filter the data to see all the logs from just that device. Now, you can see when the data stopped coming, and what the last event was that might have caused the problem.
Changing the timeframe¶
You can view logs from a specified timeframe. Set the timeframe by choosing start and end points using this tool:
Remember: Once you change the timeframe, press the blue refresh button to update your page.
Using the time setting tool¶
Setting a relative start/end point¶
To set the start or end point to a time relative to now, use the Relative tab.
Quick time settings
Use the quick now- ("now minus") options to set the timeframe to a preset with one click. Selecting one of these options affects both the start and end point. For example, if you choose now-1 week, the start point will be one week ago, and the end point will be "now." Choosing a now- option from the end point does the same thing as choosing a now- option from the start point. (You can't use the now- options to set the end point to anything besides "now.")
Drop-down options
To set a relative time (such as 15 minutes ago) for the start or end point, use the relative time options below the quick setting options. Select your unit of time from the drop-down list, and type or click to your desired number.
To confirm your choice, click Set relative time, and view the logs by clicking on the refresh button.
Example shown: This selection will show logs collected starting from one day ago until now.
Setting an exact start/end point¶
To choose the exact day and time for the start or end point, use the Absolute tab and select a date and time on the calendar.
To confirm your choice, click Set date.
Example shown: This selection will show logs collected starting from June 7, 2023 at 6:00 until now.
Auto refresh¶
To update the view automatically at a set time interval, choose a refresh rate:
Refresh¶
To reload the view with your changes, click the blue refresh button.
Note: Don't choose "Now" as your start point. Since the program can't show data newer than "now," it's not valid, so you'll see an error message.
Using the time selector¶
To select a more specific time period within the current timeframe, click and drag on the graph.
Filtering by field¶
In Discover, you can filter data by any field in multiple ways.
Using the field list¶
Use the search bar to find the field you want, or scroll through the list.
Isolating fields¶
To choose which fields you see in the log list, click the + symbol next to the field name. You can select multiple fields.
Seeing all occuring values in one field¶
To see a percentage breakdown of all the values from one field, click the magnifying glass next to the field name (the magnifying glass appears when you hover over the field name).
Tip: What does this mean?
This list of values from the field http.response.status_code compares how often users are getting certain http response codes. 51.4% of the time, users are getting a 404 code, meaning the webpage wasn't found. 43.3% of the time, users are getting a 200 code, which means that the request succeeded. The high percentage of "not found" response codes could inform a website administrator that one or more of their frequently clicked links are broken.
Viewing and filtering log details¶
To view the details of individual logs as a table or in JSON, click the arrow next to the timestamp. You can apply filters using the field names in the table view.
Filtering from the expanded table view¶
You can use controls in the table view to filter logs:
Filter for logs that contain the same value in the selected field (update_item
in action
in the example)
Filter for logs that do NOT contain the same value in the selected field (update_item
in action
in the example)
Show a percentage breakdown of values in this field (the same function as the magnifying glass in the fields list on the left)
Add to list of displayed fields for all visible logs (the same function as in the fields list on the left)
Query bar¶
You can filter for field (not time) using the query bar. The query bar tells you which query language to use. The query language depends on your data source. Use Lucene Query Syntax for data stored using ElasticSearch.
After you type your query, set the timeframe and click the refresh button. Your filters will be applied to the visible incoming logs.
Investigating IP addresses¶
You can investigate IP addresses using external analysis tools. You might want to do this, for example, if you see multiple suspicious logins from one IP address.
Using external IP analysis tools
1. Click on the IP address you want to analyze.
2. Click on the tool you want to use. You'll be taken to the tool's website, where you can see the results of the IP address analysis.
Dashboards¶
A dashboard is a set of charts and graphs that represent data from your system. Dashboards allow you to quickly get a sense for what's going on in your network.
Your administrator sets up dashboards based on the data sources and fields that are most useful to you. For example, you might have a dashboard that shows graphs related only to email activity, or only to login attempts. You might have many dashboards for different purposes.
You can filter the data to change which data the dashboard shows within its preset constraints.
How can dashboards help me?
By having certain data arranged into a chart, table, or graph, you can get a visual overview of activity within your system and identify trends. In this example, you can see that a high volume of emails were sent and received on June 19th.
Navigating Dashboards¶
Opening a dashboard¶
To open a dashboard, click on its name.
Dashboard controls¶
Setting the timeframe¶
You can change the timeframe the dashboard represents. Find the time-setting guide here. To refresh the dashboard with your new timeframe, click on the refresh button.
Note: There is no auto-refresh rate in Dashboards.
Filtering dashboard data¶
To filter the data the dashboard shows, use the query bar. The query language you need to use depends on your data source. The query bar tells you which query language to use. Use Lucene Query Syntax for data stored using ElasticSearch.
The example above uses Lucene Query Syntax.
Moving widgets¶
You can reposition and resize each widget. To move widgets, click on the dashboard menu button and select Edit.
To move a widget, click anywere on the widget and drag. To resize a widget, click on the widget's bottom right corner and drag.
To save your changes, click the green save button. To cancel the changes, click the red cancel button.
Printing dashboards¶
To print a dashboard, click on the dashboard menu button and select Print. Your browser opens a window, and you can choose your print settings there.
Reports¶
Reports are printer-friendly visual representations of your data, like printable dashboards. Your administrator chooses what information goes into your reports based on your needs.
Find and print a report¶
- Select the report from your list, or use the search bar to find your report.
- Click Print. Your browser opens a print window where you can choose your print settings.
Export¶
Turn sets of logs into downloadable, sendable files in Export. You can keep these files on your computer, inspect them in another program, or send them via email.
What is an export?
An export is not a file, but a process that creates a file. The export contains and follows your instructions for which data to put in the file, what type of file to create, and what to do with the file. When you run the export, you create the file.
Why would I export logs?
Being able to see a group of logs in one file can help you inspect the data more closely. A few reasons you might want to export logs are:
- To investigate an event or attack
- To send data to an analyst
- To explore the data in a program outside TeskaLabs LogMan.io
Navigating Export¶
List of exports
The List of exports shows you all the exports that have been run.
From the list page, you can:
- See an export's details by clicking on the export's name
- Download the export by clicking on the cloud beside its name
- Delete the export by clicking on the trash can beside its name
- Search for exports using the search bar
Export status is color-coded:
- Green: Completed
- Yellow: In progress
- Blue: Scheduled
- Red: Failed
Jump to:¶
Run an export¶
Running an export adds the export to your List of exports, but it does not automatically download the export. See Download an export for instructions.
Run an export based on a preset¶
1. Click New on the List of exports page. Now you can see the preset exports:
2. To run a preset export, click the run button beside the export name.
OR
2. To edit the export before running, click on the edit button beside the export name. Make your changes, and then click Start. (Use this guide to learn about making changes.)
Once you run the export, you are automatically brought back to the list of exports, and your export appears at the top of the list.
Note
Presets are created by administrators.
Run an export based on an export you've run before¶
You can re-run an export. Running an export again does not overwrite the original export.
1. On the List of exports page, click on the name of the export you want to run again.
2. Click Restart.
3. You can make changes here (see this guide) or run as-is.
4. Click Start.
Once you run the export, you are automatically brought back to the list of exports, and your new export appears at the top of the list.
Create a new export¶
Create an export from a blank form¶
1. In List of exports, click New, then click Custom.
2. Fill in the fields.
Note
The options in the drop down menus might change based on the selections you make.
Name
Name the export.
Data Source
Select your data source from the drop-down list.
Output
Choose the file type for your data. It can be:
- Raw: If you want to download the export and import the logs into different software, choose raw. If the data source is Elasticsearch, the raw file format is .json.
- .csv: Comma-separated values
- .xlsx: Microsoft Excel format
Compression
Choose to zip your export file, or leave it uncompressed. A zipped file is compressed, and therefore smaller, so it's easier to send, and it takes up less space in your computer.
Target
Choose the target for your file. It can be:
- Download: A file you can download to your computer.
- Email: Fill in the email fields. When you run the export, the email sends. You can still download the data file any time in the List of exports.
- Jupyter: Saves the file in the Jupyter notebook, which you can access through the Tools page. You need to have administrator permissions to access the Jupyter notebook, so only choose Jupyter as the target if you're an administrator.
Separator
If you select .csv as your output, choose what character will mark the separation between each value in each log. Even though CSV means comma-separated values, you can choose to use a different separator, such as a semicolon or space.
Schedule (optional)¶
To schedule the export, rather than running it immediately, click Add schedule.
-
Schedule once:
- To run the export one time at a future time, type the desired date and time in
YYYY-MM-DD HH:mm
format, for example2023-12-31 23:59
(December 31st, 2023, at 23:59).
- To run the export one time at a future time, type the desired date and time in
-
Schedule a recurring export:
-
To set up the export to run automatically on a regular schedule, use
cron
syntax. You can learn more aboutcron
from Wikipedia, and use this tool and these examples by Cronitor to help you writecron
expressions. -
The Schedule field also supports random
R
usage and Vixie cron-style@
keyword expressions.
-
Query
Type a query to filter for certain data. The query determines which data to export, including the timeframe of the logs.
Warning
You must include a query in every export. If you run an export without a query, all of the data stored in your program will be exported with no filter for time or content. This could create an extremely large file and put strain on data storage components, and the file likely won't be useful to you or to analysts.
If you accidentally run an export without a query, you can delete the export while it's still running in the List of exports by clicking on the trash can button.
TeskaLabs LogMan.io uses the Elasticsearch Query DSL (Domain Specific Language).
Here's the full guide to the Elasticsearch Query DSL.
Example of a query:
{
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lt": "now/d"
}
}
},
{
"prefix": {
"event.dataset": {
"value": "microsoft-office-365"
}
}
}
]
}
}
Query breakdown:
bool
: This tells us that the whole query is a Boolean query, which combines mutliple conditions such as "must," "should," and "must not" Here, it's using filter
to find characteristics the data must have to make it into the export. filter
can have mutliple conditions.
range
is the first filter condition. Since it refers to the field below it, which is @timestamp
, it will filter for logs based on a range of values in the timestamp field.
@timestamp
tells us that the query is filtering for time, so it will export logs from a certain timeframe.
gte
: This means "greater than or equal to," which is set to the value now-1d/d
, meaning the earliest timestamp (the first log) will be from exactly one day ago at the moment you run the export.
lt
means "less than," and it is set to now/d
, so the latest timestamp (the last log) will be the newest at the moment you run the export ("now").
prefix
is the second filter condition. It looks for logs where the value of a field, in this case event.dataset
, starts with microsoft-office-365
.
So, what does this query mean?
This export will show all logs from Microsoft Office 365 from the last 24 hours.
3. Add columns
For .csv and .xlsx files, you need to specify what columns you want to have in your document. Each column represents a data field. If you don't specify any columns, the resulting table will have all possible columns, so the table might be much bigger than you expect or need it to be.
You can see the list of all available data fields in Discover. To find which fields are relevant to the logs you're exporting, inspect an individual log in Discover.
- To add a column, click Add. Type the name of the column.
- To delete a column, click -.
- To reorder the columns, click and drag the arrows.
Warning
Pressing enter after typing a column name will run the export.
This example was downloaded from the export shown above as a .csv file, then separated into columns using the Microsoft Excel Convert Text to Columns Wizard. You can see that the columns here match the columns specified in the export.
4. Run the export by pressing Start.
Once you run the export, you are automatically brought back to the list of exports, and your export appears at the top of the list.
Download an export¶
1. On the List of exports page, click on the cloud button to download.
OR
1. On the List of exports page, click on the export's name.
2. Click Download.
Your browser should automatcially start a download.
Delete an export¶
1. On the List of exports page, click on the trash can button.
OR
1. On the List of exports page, click on the export's name.
2. Click Remove.
The export should disappear from your list.
Add an export to your library¶
Note
This feature is only available to administrators.
If you like an export you've created or edited, you can save it to your library as a preset for future use.
1. On the List of exports page, click on the export's name.
2. Click Save to Library.
When you click on New from the List of exports page, your new export preset should be in the list.
Library¶
Administrator feature
The Library is an administrator feature. The Library has a significant impact on the way TeskaLabs LogMan.io works. Some users don't have access to the Library.
The Library holds items (files) that determine what you see when using TeskaLabs LogMan.io. The items in the Library determine, for example, your homepage, dashboards, reports, exports, and some SIEM functions.
When you recieve TeskaLabs LogMan.io, the Library is already filled with files. You can change these according to your needs.
The Library supports these file types:
- .html
- .json
- .md
- .txt
- .yaml
- .yml
Warning
Changing items in the Library impacts how TeskaLabs LogMan.io and TeskaLabs SIEM work. If you are unsure about making changes in the Library, contact Support.
Navigating the Library¶
Some items have additional options in the upper right corner of the screen:
Locating items¶
To find an item, use the search bar, or click through the folders.
If you navigate to a folder in the Library and want to return to the search bar, click Library again.
Adding items to the Library¶
Warning
Do NOT attempt to add single items to the library with the Restore function. Restore is only for importing a whole library.
Creating items in a folder¶
You can create an item directly in certain folders. If adding an item is possible, you'll see a Create new item in (folder) button when you click on the folder.
- To add an item, click Create new item in (folder).
- Name the item, select the file extension from the dropdown, and click Create.
- If the item doesn't appear immediately, refresh the page, and your item should appear in the library.
Adding an item by duplicating an existing item¶
- Click on the item you want to duplicate.
- Click on the ... button near the top.
- Click Copy.
- Rename the item, choose the file extension from the dropdown, and click Copy.
- If the item doesn't appear immediately, refresh the page, and your item should appear in the library.
Editing an item in the Library¶
- Click on the item you want to edit.
- To edit the item, click Edit.
- To save your changes, click Save, or exit the editor without saving by clicking Cancel.
- If your edits don't display immediately, refresh the page, and your changes should be saved.
Removing an item from the Library¶
- Click on the item you want to remove.
- Click on the ... button near the top.
- Click Remove and confirm Yes if your browser prompts.
- If if the item doesn't disappear immediately, refresh the page, and the removed item should be gone.
Disabling items¶
You can temporarily disable an item. It stays in your library, but its effect on your system is paused.
To disable an item, click on the item and click Disable.
You can re-enable the file any time by clicking Enable.
Note
You can't read the contents of an item while it's disabled.
Backing up the Library¶
You can back up your whole Library onto your computer or other external storage by exporting the Library.
To export and download the contents of the Library, click Actions, then click Backup. Your browswer will start the download.
Restoring the library from backup¶
Warning
Using Restore means importing a whole library from your computer. Restore is intended to restore your library from a backup version, so it will overwrite (delete) the existing contents of your Library in TeskaLabs LogMan.io. ONLY restore the Library if you intend to replace the entire contents of the Library with the files you're importing.
Restoring¶
- Click Actions.
- Click Restore.
- Choose the file from your computer. You can only import tar.gz files.
- Click Import.
Remember, using Restore and Import overwrites your whole library.
Lookups¶
Administrator feature
Lookups are an administrator feature. Some users don't have access to Lookups.
You can use lookups to get and store additional information from external sources. The additional information enhances your data and adds relevant context. This makes your data more valuable because you can analyze the data more deeply. For example, you can store user names, active users, active VPNs, and suspicious IP addresses.
Tip
You can read more about Lookups here in the Reference guide.
Navigating Lookups¶
Creating a new lookup¶
To create a new lookup:
- Click Create lookup.
- Fill in the fields: Name, Short description, Detail description, and Key(s).
- To add another key, click on the +.
- Choose to add or not add an expiration.
- Click Save.
Finding a lookup¶
Use the search bar to find a specific lookup. Using the search bar does not search the contents of the lookups, only the lookup names. To view all the lookups again after using the search bar, clear the search bar and press Enter
or Return
.
Viewing and editing a lookup's details¶
Viewing a lookup's keys/items¶
To see a lookup's keys and values, or items, click on the ... button, and click Items.
Editing a lookup's keys/items¶
From the List of lookups, click on the ... button and click Items. This takes you to the individual lookup's page.
Adding: To add an item, click Add item.
Editing: To edit an existing item, click the ... button on the item line, and click Edit.
Deleting: To delete the item, click the ... button on the item line, and click Delete.
Remember to click Save after making changes.
Viewing a lookup's description¶
To see the detailed description of a lookup, click on the ... button on the List of lookups page, and click Info.
Editing a lookup's description¶
- Click on the ... button on the List of lookups page, and click Info. This takes you to the lookup's info page.
- Click Edit lookup at the bottom.
- After making changes, click Save, or click Cancel to exit editing mode.
Deleting a lookup¶
To delete a lookup:
-
Click on the ... button on the List of lookups page, and click Info. This takes you to the lookup's info page.
-
Click Delete lookup.
Tools¶
Administrator feature
Tools are an administrator feature. Changes you make when visiting external tools can have a significant impact on the way TeskaLabs LogMan.io works. Some users don't have access to the Tools page.
The Tools page gives you quick access to external programs that interact with or can be used alongside TeskaLabs LogMan.io.
Using external tools¶
To automatically log in securely to a tool, click on the tool's icon.
Warnings
- While tenants' data is separated in the TeskaLabs LogMan.io UI, tenants' data is not separated within these tools.
- Changes you make in Zookeeper, Kafka, and Kibana could damage your deployment of TeskaLabs LogMan.io.
Maintenance¶
Administrator feature
Maintenance is an administrator feature. What you do in Maintenance has a significant impact on the way TeskaLabs LogMan.io works. Some users don't have access to Maintenance.
The Maintenance section includes Configuration and Services.
Configuration¶
Configuration holds JSON files that determine some of the components you can see and use in TeskaLabs LogMan.io. For example, Configuration includes:
- The Discover page
- The sidebar
- Tenants
- The Tools page
Warning
Configuration files have a significant impact on the way TeskaLabs LogMan.io works. If you need help with your UI configuration, contact Support.
Basic and Advanced modes¶
You can switch between Basic and Advanced mode for configuration files.
Basic has fillable fields. Advanced shows the file in JSON. To choose a mode, click Basic or Advanced in the upper right corner.
Editing a configuration file¶
To edit a configuration file, click on the file name, choose your preferred mode, and make the changes. The file is always editable - you don't have to click anything to begin editing. Remember to click Save when you're finished.
Services¶
Services shows you all of the services and microservices ("mini programs") that make up the infrastructure of TeskaLabs LogMan.io.
Warning
Since TeskaLabs LogMan.io is made of microservices, interfering with the microservices could have a significant impact on the performance of the program. If you need help with microservices, contact Support.
Viewing service details¶
To view a service's details, click the arrow to the left of the service name.
Auth: Controlling user access¶
Administrator feature
Auth is an administrator feature. It has a significant impact on the people using TeskaLabs LogMan.io. Some users don't have access to the Auth pages.
The Auth (authorization) section includes all the controls administrators need to manage users and tenants.
Credentials¶
Credentials are users. From the Credentials screen, you can see:
- Name: The username that someone uses to log in
- Tenants: The tenants this user has access to
- Roles: The set of permissions this user has (see Roles)
Creating new credentials¶
1. To create a new user, click Create new credentials.
2. In the Create tab, enter a username. If you want to send the person an email inviting them to reset their password, enter their email address and check Send instructions to set password.
3. Click Create credentials.
The new credentials appear in the Credentials list. If you checked Send instructions to set password, the new user should recieve an email.
Editing credentials¶
To edit a credential, click on a username, and click Edit in the section you want to change. Remember to click Save to save your changes, or click Cancel to exit the editor.
Tenants¶
A tenant is one entity collecting data from a group of sources. Each tenant has an isolated space to collect and manage its data. (Every tenant's data is completely separated from all other tenants' data in the UI.) One deployment of TeskaLabs LogMan.io can handle many tenants (mutlitenancy).
As a user, your company might be just one tenant, or you might have different tenants for different departments. If you're a distributor, each of your clients has at least one tenant.
One tenant can be accessible by multiple users, and users can have access to multiple tenants. You can control which users can access which tenants by assigning credentials to tentants or vice-versa.
Resources¶
Resources are the most basic unit of authorization. They are single and specific access permissions.
Examples:
- Being able to access dashboards from a certain data source
- Being able to delete tenants
- Being able to make changes in the Library
Roles¶
A role is a container for resources. You can create a role to include any combination of resources, so a role is a set of permissions.
Clients¶
Clients are additonal applications that are accessing TeskaLabs LogMan.io to support its functioning.
Warning
Removing a client could interrupt essential program functions.
Sessions¶
Sessions are active login periods currently running.
Ways to end a session:
- Click on the red X on the session's line on the Sessions page.
- Click on the session's name, then click Terminate session
- To terminate all sessions (logging all users out), click Terminate all on the Sessions page.
Tip
The Auth module uses TeskaLabs SeaCat Auth. To learn more, you can read its documentation or take a look at its repository on GitHub.
Ended: All features
Ended: User Manual
Analyst Manual ↵
Analyst Manual¶
The Analyst Manual
Cybersecurity and data analysts use the Analyst Manual to:
- Query data
- Create cybersecurity detections
- Create data visualizations
- Use and create other analytical tools
To learn how to use the TeskaLabs LogMan.io web app, visit the User Manual. For information about setup and installation, see the Administration Manual and the Reference guide.
Quickstart¶
- Queries: Writing queries to find and filter data
- Dashboards: Designing visualizations for data summaries and patterns
- Detections: Creating custom detections for activity and patterns
- Notifications: Sending messages via email from detections or alerts
Using Lucene Query Syntax¶
If you're storing data in Elasticsearch, you need to use Lucene Query Syntax to query data in TeskaLabs LogMan.io.
These are some quick tips for using Lucene Query Syntax, but you can also see the full documentation on the Elasticsearch website, or visit this tutorial.
You might use Lucene Query Syntax when creating dashboards, filtering data in dashboards, and when searching for logs in Discover.
Basic query expressions¶
Search for the field message
with any value:
message:*
Search for the value delivered
in the field message
:
message:delivered
Search for the phrase not delivered
in the field message
:
message:"not delivered"
Search for any value in the field message
, but NOT the value delivered
:
message:* -message:delivered
Search for the text delivered
anywhere in the value in the field message
:
message:delivered*
message:delivered
message:not delivered
message:delivered with delay
Note
This query would not return the same results if the specified text (delivered
in this example) was only part of a word or number, not separated by spaces or periods. Therefore, the query message:eliv
, for example, would not return these results.
Search for the range of values 1 to 1000 in the field user.id
:
user.id:[1 TO 1000]
Search for the open range of values 1 and higher in the field user.id
:
user.id:[1 TO *]
Combining query expressions¶
Use boolean operators to combine expressions:
AND
- combines criteria
OR
- at least one of the criteria must be met
Using parentheses
Use parentheses when mutliple items need to be grouped together to form an expression.
Examples of grouped expressions:
Search for logs from the dataset security
, either with an IP address containing 123.456
and a message
of failed login
, or with an event action as deny
and a delay
greater than 10
:
event.dataset:security AND (ip.address:123.456* AND message:"failed login") OR
(event.action:deny AND delay:[10 TO *])
Search a library's database for a book written by either Karel Čapek or Lucie Lukačovičová that has been translated to English, or a book in English that is at least 300 pages and in the genre science fiction:
language:English AND (author:"Karel Čapek" OR author:"Lucie Lukačovičová") OR
(page.count:[300 TO *] AND genre:"science fiction")
Dashboards¶
Dashboards are visualizations of incoming log data. While TeskaLabs LogMan.io comes with a library of preset dashboards, you can also create your own. View preset dashboards in the LogMan.io web app in Dashboards.
In order to create a dashboard, you need to write or copy a dashboard file in the Library.
Creating a dashboard file¶
Write dashboards in JSON.
Creating a blank dashboard
- In TeskaLabs LogMan.io, go to the Library.
- Click Dashboards.
- Click Create new item in Dashboards.
- Name the item, and click Create. If the new item doesn't appear immediately, refresh the page.
Copying an existing dashboard
- In TeskaLabs LogMan.io, go to the Library.
- Click Dashboards.
- Click on the item you want to duplicate, then click the icon near the top. Click Copy.
- Choose a new name for the item, and click Copy. If the new item doesn't appear immediately, refresh the page.
Dashboard structure¶
Write dashboards in JSON, and be aware that they're case-sensitive.
Dashboards have two parts:
- The dashboard base: A query bar, time selector, refresh button, and options button
- Widgets: The visualizations (chart, graph, list, etc.)
Dashboard base
Include this section exactly as-is to include the query bar, time selector, refresh button, and options.
{
"Prompts": {
"dateRangePicker": true,
"filterInput": true,
"submitButton": true
Widgets¶
Widgets are made of datasource
and widget
pairs. When you write a widget, need to include both a datasource
section and a widget
section.
JSON formatting tips:
- Separate every
datasource
andwidget
section by a brace and a comma},
except for the final widget in the dashboard, which does not need a comma (see the full example) - End every line with a comma
,
except the final item in a section
Widget positioning
Each widget has layout lines, which dictate the size and position of the widget. If you don't include layout lines when you write the widget, the dashboard generates them automatically.
- Include the layout lines with the suggested values from each widget template, OR don't include any layout lines. (If you don't include any layout lines, make sure the final item in each section does NOT end with a comma.)
- Go to Dashboards in LogMan.io and resize and move the widget.
- When you move the widget on the Dashboards page, the dashboard file in the Library automatically generates or adjusts the layout lines accordingly. If you're working in the dashboard file in the Library and repositioning the widgets in Dashboards at the same time, make sure to save and refresh both pages after making changes on either page.
The order of widgets in your dashboard file does not determine widget position, and the order does not change if you reposition the widgets in Dashboards.
Naming
We recommend agreeing on naming conventions for dashboards and widgets within your organization to avoid confusion.
matchPhrase filter
For Elasticsearch data sources, use Lucene query syntax for the matchPhrase
value.
Colors
By default, pie chart and bar chart widgets use a blue color scheme. To change the color scheme, insert "color":"(color scheme)"
directly before the layout lines.
- Blue: No extra lines necessary
- Purple:
"color":"sunset"
- Yellow:
"color":"warning"
- Red:
"color":"danger"
Troubleshooting JSON
If you get an error message about JSON formatting when trying to save the file:
- Follow the recommendation of the error message specifying what the JSON is "expecting" - it might mean that you're missing a required key-value pair, or the punctuation is incorrect.
- If you can't find the error, double-check that your formatting is consistent with other functional dashboards.
If your widget does not display correctly:
- Make sure the value of
datasource
matches in both the data source and widget sections. - Check for spelling errors or query structure issues in any fields referenced and in fields specified in the
matchphrase
query. - Check for any other typos or inconsistencies.
- Check that the log source you are referencing is connected.
Use these examples as guides. Click the icons to learn what each line means.
Bar charts¶
A bar chart displays values with vertical bars on an x and y-axis. The length of each bar is proportional to the data it represents.
Bar chart JSON example:
"datasource:office365-email-aggregated": { #(1)
"type": "elasticsearch", #(2)
"datetimeField": "@timestamp", #(3)
"specification": "lmio-{{ tenant }}-events*", #(4)
"aggregateResult": true, #(5)
"matchPhrase": "event.dataset:microsoft-office-365 AND event.action:MessageTrace" #(6)
},
"widget:office365-email-aggregated": { #(7)
"datasource": "datasource:office365-email-aggregated", #(8)
"title": "Sent and received emails", #(9)
"type": "BarChart", #(10)
"xaxis": "@timestamp", #(11)
"yaxis": "o365.message.status", #(12)
"ylabel": "Count", #(13)
"table": true, #(14)
"layout:w": 6, #(15)
"layout:h": 4,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
},
datasource
marks the beginning of the data source section as well as the name of the data source. The name doesn't affect the dashboard's function, but you need to refer to the name correctly in the widget section.- The type of data source. If you're using Elasticsearch, the value is
"elasticsearch"
- Indicates which field in the logs is the date and time field. For example, in Elasticsearch logs, which are parsed by the Elastic Common Schema (ECS), the date and time field is
@timestamp
. - Refers to the index from which to get data in Elasticsearch. The value
lmio-{{ tenant}}-events*
fits our index naming conventions in Elasticsearch, and{{ tenant }}
is a placeholder for the active tenant. The asterisk*
allows unspecified additional characters in the index name followingevents
. The result: The widget displays data from the active tenant. aggregateResult
set totrue
performs aggregation on the data before displaying it in the dashboard. In this case, the sent and received emails are being counted (sum calculated).- The query that filters for specific logs using Lucene query syntax. In this case, any data displayed in the dashboard must be from the Microsoft Office 365 dataset and have the value
MessageTrace
in the fieldevent.action
. widget
marks the beginning of the widget section as well as the name of the widget. The name doesn't affect the dashboard's function.- Refers to the data source section above which populates it. Make sure the value here matches the name of the corresponding data source exactly. (This is how the widget knows where to get data from.)
- Title of the widget that will display in the dashboard
- Type of widget
- The field from the logs whose values will be represented on the x axis
- The field from the logs whose values will be represented on the y axis
- Label for y axis that will display in the dashboard
- Setting
table
totrue
enables you to switch between chart view and table view on the widget in the dashboard. Choosingfalse
disables the chart-to-table feature. - See the note above about widget positioning for information about layout lines.
Bar chart widget rendered:
Bar chart template:
To create a bar chart widget, copy and paste this template into a dashboard file in the Library and fill in the values. Recommended layout values, the values specifying an Elasicsearch data source, and the value that organizes the bar chart by time are already filled in.
"datasource:Name of datasource": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "lmio-{{ tenant }}-events*",
"aggregateResult": true,
"matchPhrase": " "
},
"widget:Name of widget": {
"datasource": "datasource:office365-email-aggregated",
"title": "Widget display title",
"type": "BarChart",
"xaxis": "@timestamp",
"yaxis": " ",
"ylabel": " ",
"table": true,
"layout:w": 6,
"layout:h": 4,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
},
Pie charts¶
A pie chart is a circle divided into slices, in which each slice represents a percentage of the whole.
Pie chart JSON example:
"datasource:office365-email-status": { #(1)
"datetimeField": "@timestamp", #(2)
"groupBy": "o365.message.status", #(3)
"matchPhrase": "event.dataset:microsoft-office-365 AND event.action:MessageTrace", #(4)
"specification": "lmio-{{ tenant }}-events*", #(5)
"type": "elasticsearch", #(6)
"size": 20 #(7)
},
"widget:office365-email-status": { #(8)
"datasource": "datasource:office365-email-status", #(9)
"title": "Received Emails Status", #(10)
"type": "PieChart", #(11)
"tooltip": true, #(12)
"table": true, #(13)
"layout:w": 6, #(14)
"layout:h": 4,
"layout:x": 6,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
},
datasource
marks the beginning of the data source section, as well as the name of the data source. The name doesn't affect the dashboard's function, but you need to refer to the name correctly in the widget section.- Indicates which field in the logs is the date and time field. For example, in Elasticsearch logs, which are parsed by the Elastic Common Schema (ECS), the date and time field is
@timestamp
. - The field whose values will represent each "slice" of the pie chart. In this example, the pie chart will separate logs by their message status. There will be a separate slice for each of Delivered, Expanded, Quarantined, etc. to show the percentage occurrance of each message status.
- The query that filters for specific logs. In this case, only data from logs from the Microsoft Office 365 dataset with the value
MessageTrace
in the fieldevent.action
will be displayed. - Refers to the index from which to get data in Elasticsearch. The value
lmio-{{ tenant}}-events*
fits our index naming conventions in Elasticsearch, and{{ tenant }}
is a placeholder for the active tenant. The asterisk*
allows unspecified additional characters in the index name followingevents
. The result: The widget displays data from the active tenant. - The type of data source. If you're using Elasticsearch, the value is
"elasticsearch"
- How many values you want to display. Since this pie chart is showing the statuses of received emails, a
size
of 20 displays the top 20 status types. (The pie chart can have a maximum of 20 slices.) widget
marks the beginning of the widget section as well as the name of the widget. The name doesn't affect the dashboard's function.- Refers to the data source section above which populates it. Make sure the value here matches the name of the corresponding data source exactly. (This is how the widget knows where to get data from.)
- Title of the widget that will display in the dashboard
- Type of widget
- If
tooltip
is set totrue
: When you hover over each slice of the pie chart in the dashboard, a small informational window with the count of values in the slice pops up at your cursor. Iftooltip
is set tofalse
: The count window appears in the top left corner of the widget. - Setting
table
totrue
enables you to switch between chart view and table view on the widget in the dashboard. Choosingfalse
disables the chart-to-table feature. - See the note above about widget positioning for information about layout lines.
Pie chart template
To create a pie chart widget, copy and paste this template into a dashboard file in the Library and fill in the values. Recommended values as well as the values specifying an Elasicsearch data source are already filled in.
"datasource:Name of data source": {
"datetimeField": "@timestamp",
"groupBy": " ",
"matchPhrase": " ",
"specification": "lmio-{{ tenant }}-events*",
"type": "elasticsearch",
"size": 20
},
"widget:Name of widget": {
"datasource": "datasource:Name of data source",
"title": "Widget display title",
"type": "PieChart",
"tooltip": true,
"table": true,
"layout:w": 6,
"layout:h": 4,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
},
Tables¶
A table displays text and numeric values from data fields that you specify.
Table widget example
"datasource:office365-email-failed-or-quarantined": { #(1)
"type": "elasticsearch", #(2)
"datetimeField": "@timestamp", #(3)
"specification": "lmio-{{ tenant }}-events*", #(4)
"size": 100, #(5)
"matchPhrase": "event.dataset:microsoft-office-365 AND event.action:MessageTrace AND o365.message.status:(Failed OR Quarantined)" #(6)
},
"widget:office365-email-failed-or-quarantined": { #(7)
"datasource": "datasource:office365-email-failed-or-quarantined", #(8)
"field:1": "@timestamp", #(9)
"field:2": "o365.message.status",
"field:3": "sender.address",
"field:4": "recipient.address",
"field:5": "o365.message.subject",
"title": "Failed or quarantined emails", #(10)
"type": "Table", #(11)
"dataPerPage": 9, #(12)
"layout:w": 12, #(13)
"layout:h": 4,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
}
datasource
marks the beginning of the data source section, as well as the name of the data source. The name doesn't affect the dashboard's function, but you need to refer to the name correctly in the widget section.- The type of data source. If you're using Elasticsearch, the value is
"elasticsearch"
- Indicates which field in the logs is the date and time field. For example, in Elasticsearch logs, which are parsed by the Elastic Common Schema (ECS), the date and time field is
@timestamp
. - Refers to the index from which to get data in Elasticsearch. The value
lmio-{{ tenant}}-events*
fits our index naming conventions in Elasticsearch, and{{ tenant }}
is a placeholder for the active tenant. The asterisk*
allows unspecified additional characters in the index name followingevents
. The result: The widget displays data from the active tenant. - How many values you want to display. This table will have a maximum of 100 rows. You can set rows per page in
dataPerPage
below. - The query that filters for specific logs using Lucene query syntax. In this case, the widget displays data only from logs from the Microsoft Office 365 dataset with the value
MessageTrace
in the fieldevent.action
and a message status ofFailed
orQuarantined
. widget
marks the beginning of the widget section as well as the name of the widget. The name doesn't affect the dashboard's function.- Refers to the data source section above which populates it. Make sure the value here matches the name of the corresponding data source exactly. (This is how the widget knows where to get data from.)
- Each field is a column that will display in the table in the dashboard. In this example table of failed or quarantied emails, the table would display the timestamp, message status, sender address, recipient address, and the email subject for each log (which represents each email). Use as many fields as you want.
- Title of the widget that will display in the dashboard
- Type of widget
- The number of items displayed per page (at once) in the table
- See the note above about widget positioning for information about layout lines.
Table widget rendered:
Table widget template:
To create a table widget, copy and paste this template into a dashboard file in the Library and fill in the values. Recommended values as well as the values specifying an Elasicsearch data source are already filled in.
"datasource:Name of datasource": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "lmio-{{ tenant }}-events*",
"size": 100,
"matchPhrase": " "
},
"widget:Name of widget": {
"datasource": "Name of datasource",
"field:1": "@timestamp",
"field:2": " ",
"field:3": " ",
"field:4": " ",
"field:5": " ",
"title": "Widget title",
"type": "Table",
"dataPerPage": 9,
"layout:w": 12,
"layout:h": 4,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
}
Single values¶
A value widget displays the most recent single value from the data field you specify.
"datasource:microsoft-exchange1": { #(1)
"datetimeField": "@timestamp", #(2)
"matchPhrase": "event.dataset:microsoft-exchange AND email.from.address:* AND email.to.address:*", #(3)
"specification": "lmio-{{ tenant }}-events*", #(4)
"type": "elasticsearch", #(5)
"size": 1 #(6)
},
"widget:fortigate1": { #(7)
"datasource": "datasource:microsoft-exchange1", #(8)
"field": "email.from.address", #(9)
"title": "Last Active User", #(10)
"type": "Value", #(11)
"layout:w": 4, #(12)
"layout:h": 1,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
}
datasource
marks the beginning of the data source section, as well as the name of the data source. The name doesn't affect the dashboard's function, but you need to refer to the name correctly in the widget section.- Indicates which field in the logs is the date and time field. For example, in Elasticsearch logs, which are parsed by the Elastic Common Schema (ECS), the date and time field is
@timestamp
. - The query that filters for specific logs using Lucene query syntax. In this case, the widget displays data only from logs from the Microsoft Exchange dataset with ANY value (
*
) in theemail.from.address
andemail.to.address
fields. - Refers to the index from which to get data in Elasticsearch. The value
lmio-{{ tenant}}-events*
fits our index naming conventions in Elasticsearch, and{{ tenant }}
is a placeholder for the active tenant. The asterisk*
allows unspecified additional characters in the index name followingevents
. The result: The widget displays data from the active tenant. - The type of data source. If you're using Elasticsearch, the value is
"elasticsearch"
- How many values you want to display. Since a value widget only displays a single value, the
size
is 1. widget
marks the beginning of the widget section as well as the name of the widget. The name doesn't affect the dashboard's function.- Refers to the data source section above which populates it. Make sure the value here matches the name of the corresponding data source exactly. (This is how the widget knows where to get data from.)
- Refers to the field (from the latest log) from which the value will be displayed.
- Title of the widget that will display in the dashboard
- Type of widget. The value type displays a single value.
- See the note above about widget positioning for information about layout lines.
Value widget rendered:
Value widget template:
To create a value widget, copy and paste this template into a dashboard file in the Library and fill in the values. Recommended values as well as the values specifying an Elasicsearch data source are already filled in.
"datasource:Name of datasource": {
"datetimeField": "@timestamp",
"matchPhrase": " ",
"specification": "lmio-{{ tenant }}-events*",
"type": "elasticsearch",
"size": 1
},
"widget:Name of widget": {
"datasource": "datasource:Name of datasource",
"field": " ",
"title": "Widget title",
"type": "Value",
"layout:w": 4,
"layout:h": 1,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
}
Dashboard example¶
This example is structured correctly:
{
"Prompts": {
"dateRangePicker": true,
"filterInput": true,
"submitButton": true
},
"datasource:access-log-combined HTTP Response": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "lmio-default-events*",
"size": 20,
"groupBy": "http.response.status_code",
"matchPhrase": "event.dataset: access-log-combined AND http.response.status_code:*"
},
"widget:access-log-combined HTTP Response": {
"datasource": "datasource:access-log-combined HTTP Response",
"title": "HTTP status codes",
"type": "PieChart",
"color": "warning",
"useGradientColors": true,
"table": true,
"tooltip": true,
"layout:w": 6,
"layout:h": 5,
"layout:x": 6,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
},
"datasource:access-log-combined Activity": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "lmio-default-events*",
"matchPhrase": "event.dataset:access-log-combined AND http.response.status_code:*",
"aggregateResult": true
},
"widget:access-log-combined Activity": {
"datasource": "datasource:access-log-combined Activity",
"title": "Activity",
"type": "BarChart",
"table": true,
"xaxis": "@timestamp",
"ylabel": "HTTP requests",
"yaxis": "http.response.status_code",
"color": "sunset",
"layout:w": 6,
"layout:h": 4,
"layout:x": 0,
"layout:y": 1,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
},
"datasource:Access-log-combined Last_http": {
"datetimeField": "@timestamp",
"matchPhrase": "event.dataset:access-log-combined AND http.response.status_code:*",
"specification": "lmio-default-events*",
"type": "elasticsearch",
"size": 1000
},
"widget:Access-log-combined Last_http": {
"datasource": "datasource:Access-log-combined Last_http",
"field": "http.response.status_code",
"title": "Last HTTP status code",
"type": "Value",
"layout:w": 6,
"layout:h": 1,
"layout:x": 0,
"layout:y": 0,
"layout:moved": false,
"layout:static": true,
"layout:isResizable": false
}
}
Note: The data is arbitrary. This example is meant only to help you format your dashboards correctly.
Dashboard rendered:
Parsing ↵
Parsing¶
Parsing is the process of analyzing the original log (which is typically in single/multiple-line string, JSON, or XML format) and transforming it into a list of key-value pairs that describe the log data (such as when the original event happened, the priority and severity of the log, information about the process that created the log, etc).
Every log that enters your TeskaLabs LogMan.io system needs to be parsed. The LogMan.io Parsec microservice is responsible for parsing logs. The Parsec needs parsers, which are sets of declarations (YAML files) to know how to parse each type of log. LogMan.io comes with the LogMan.io Common Library, which has many parsers already created for many common log types. However, if you need to create your own parsers, understanding parsing key terms, learning about declarations, and using the parsing tutorial can help.
Basic parsing example
Parsing takes a raw log, such as this:
<30>2023:12:04-15:33:59 hostname3 ulogd[1620]: id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth2.3009" outitf="eth6" srcmac="e0:63:da:73:bb:3e" dstmac="7c:5a:1c:4c:da:0a" srcip="172.60.91.60" dstip="192.168.99.121" proto="17" length="168" tos="0x00" prec="0x00" ttl="63" srcport="47100" dstport="12017"
@timestamp: 2023-12-04 15:33:59.033
destination.ip: 192.168.99.121
destination.mac: 7c:5a:1c:4c:da:0a
destination.port: 12017
device.model.identifier: SG230
dns.answers.ttl 63
event.action: Packet dropped
event.created: 2023-12-04 15:33:59.033
event.dataset: sophos
event.id: 2001
event.ingested: 2023-12-04 15:39:10.039
event.original: <30>2023:12:04-15:33:59 hostname3 ulogd[1620]: id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth2.3009" outitf="eth6" srcmac="e0:63:da:73:bb:3e" dstmac="7c:5a:1c:4c:da:0a" srcip="172.60.91.60" dstip="192.168.99.121" proto="17" length="168" tos="0x00" prec="0x00" ttl="63" srcport="47100" dstport="12017"
host.hostname: hostname3
lmio.event.source.id: hostname3
lmio.parsing: parsec
lmio.source: mirage
log.syslog.facility.code: 3
log.syslog.facility.name: daemon
log.syslog.priority: 30
log.syslog.severity.code: 6
log.syslog.severity.name: information
message id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth2.3009" outitf="eth6" srcmac="e0:63:da:73:bb:3e" dstmac="7c:5a:1c:4c:da:0a" srcip="172.60.91.60" dstip="192.168.99.121" proto="17" length="168" tos="0x00" prec="0x00" ttl="63" srcport="47100" dstport="12017"
observer.egress.interface.name: eth6
observer.ingress.interface.name: eth2.3009
process.name: ulogd
process.pid: 1620
sophos.action: drop
sophos.fw.rule.id: 60002
sophos.prec: 0x00
sophos.protocol: 17
sophos.sub: packetfilter
sophos.sys: SecureNet
sophos.tos: 0x00
source.bytes: 168
source.ip: 172.60.91.60
source.mac: e0:63:da:73:bb:3e
source.port: 47100
tags: lmio-parsec:v23.47
tenant: default
_id: e1a92529bab1f20e43ac8d6caf90aff49c782b3d6585e6f63ea7c9346c85a6f7
_prev_id: 10cc320c9796d024e8a6c7e90fd3ccaf31c661cf893b6633cb2868774c743e69
_s: DKNA
Parsing key terms¶
Important terms relevant to LogMan.io Parsec and the parsing process.
Event¶
A unit of data that moves through the parsing process is referred to as an event. An original event comes to LogMan.io Parsec as an input and is then parsed by the processors. If parsing succeeds, it produces a parsed event, and if parsing fails, it produces an error event.
Original event¶
An original event is the input that LogMan.io Parsec recieves - in other words, an unparsed log. It can be represented by a raw (possibly encoded) string or structure in JSON or XML format.
Parsed event¶
A parsed event is the output from successful parsing, formatted as an unordered list of key-value pairs serialized into JSON structure. A parsed event always contains a unique ID, the original event, and typically the information about when the event was created by the source and received by Apache Kafka.
Error event¶
An error event is the output from unsuccessful parsing, formatted as an unordered list of key-value pairs serialized into JSON structure. It is produced when parsing, mapping, or enrichment fails, or when another exception occurs in LogMan.io Parsec. It always contains the original event, the information about when the event was unsuccessfully parsed, and the error message describing the reason why the process of parsing failed. Despite unsuccessful parsing, the error event will always be in JSON format, key-value pairs.
Library¶
Your TeskaLabs LogMan.io Library holds all of your declaration files (as well as many other types of files). You can edit your declaration files in your Library via Zookeeper.
Declarations¶
Declarations describe how the event will be transformed. Declarations are YAML files that LogMan.io Parsec can interpret to create declarative processors. There are three types of declarations in LogMan.io Parsec: parsers, enrichers, and mappings. See Declarations for more.
Parser¶
A parser is the type of declaration that takes the original event or a specific field of a partially-parsed event as input, analyzes its individual parts, and then stores them as key-value pairs to the event.
Mapping¶
A mapping declaration is the type of declaration that takes a partially parsed event as input, renames the field names, and eventually converts the data types. It works together with a schema (ECS, CEF). It also works as a filter to leave out data that is not needed in the final parsed event.
Enricher¶
An enricher is the type of declaration that supplement a partially parsed event with additional data.
Declarations ↵
Declarations¶
Declarations describe how the event should be parsed. They are stored as YAML files in the Library. LogMan.io Parsec interprets these declarations and creates parsing processors.
There are three types of declarations:
- Parser declaration: A parser takes an original event or a specific field of a partially parsed event as input, analyzes its individual parts, and stores them as key-value pairs to the event.
- Mapping declaration: Mapping takes a partially parsed event as input, renames the field names, and eventually converts the data types. It works together with a schema (ECS, CEF).
- Enricher declaration: An enricher supplements a partially parsed event with extra data.
Data flow¶
A typical, recommended parsing sequence is a chain of declarations:
- The first main parser declaration begins the chain, and additional parsers (called sub-parsers) extract more detailed data from the fields created by the previous parser.
- Then, the (single) mapping declaration renames the keys of the parsed fields according to a schema and filters out fields that are not needed.
- Last, the enricher declaration supplements the event with additional data. While it's possible to use multiple enricher files, it's recommended to use just one.
Naming declarations¶
Important: Naming conventions
LogMan.io Parsec loads declarations alphabetically and creates the corresponding processors in the same order. Therefore, create the list of declaration files according to these rules:
-
Begin all declaration file names with a numbered prefix:
10_parser.yaml
,20_parser_message.yaml
, ...,90_enricher.yaml
.It is recommended to "leave some space" in your numbering for future declarations in case you want to add a new declaration between two existing ones (e.g.,
25_new_parser.yaml
). -
Include the type of declaration in file names:
20_parser_message.yaml
rather than10_message.yaml
. - Include the type of schema used in mapping file names:
40_mapping_ECS.yaml
rather than40_mapping.yaml
.
Example:
/Parsers/MyParser/:
- 10_parser.yaml
- 20_parser_username.yaml
- 30_parser_message.yaml
- 40_mapping_ECS.yaml
- 50_enricher_lookup.yaml
- 60_enricher.yaml
Parser declarations¶
A parser declaration takes an original event or a specific field of a partially parsed event as input, analyzes its individual parts, and stores them as key-value pairs to the event.
LogMan.io Parsec currently supports three types of parser declarations:
- JSON parser
- Windows Event parser
- Parsec parser
Declaration structure¶
In order to determine the type of the declaration, you need to specify a define
section.
define:
type: <declaration_type>
For a parser declaration, specify the type
as parser
.
JSON parser¶
A JSON parser is used for parsing events with a JSON structure.
define:
name: JSON parser
type: parser/json
This is a complete JSON parser and will parse events from a JSON structure, separating the fields into key-value pairs.
Warning
For now, LogMan.io Parsec does not support parsing of events with nested JSON format. For example, the event below cannot be parsed with JSON parser:
{
"key": {
"foo": 1,
"bar": 2
}
}
Windows Event parser¶
Windows Events parser is used for parsing events that are produced from Microsoft Windows. These events are in XML format.
define:
name: Windows Events Parser
type: parser/windows-event
This is a complete Windows Event parser and will parse events from Microsoft Windows, separating the fields into key-value pairs.
Parsec parser¶
A Parsec parser is used for parsing events in plain string format. It is based on SP-Lang Parsec expressions.
For parsing original events, use the following declaration:
define:
name: My Parser
type: parser/parsec
parse:
!PARSE.KVLIST
- ...
- ...
- ...
define:
name: My Parser
type: parser/parsec
field: <custom_field>
parse:
!PARSE.KVLIST
- ...
- ...
- ...
When field
is specified, parsing is applied on that field, otherwise it is applied on the original event. Therefore, it must be present in every sub-parser.
Examples of Parsec parser declarations¶
Example 1: Simple example
For the purpose of the example, let's say that we want to parse a collection of simple events:
Hello Miroslav from Prague!
Hi Kristýna from Pilsen.
{
"name": "Miroslav",
"city": "Prague"
}
{
"name": "Kristýna",
"city": "Pilsen"
}
define:
type: parser/parsec
parse:
!PARSE.KVLIST
- !PARSE.UNTIL " "
- name: !PARSE.UNTIL " "
- !PARSE.EXACTLY "from "
- city: !PARSE.LETTERS
Example 2: More complex example
For the purpose of this example, let's say that we want to parse a collection of simple events:
Process cleaning[123] finished with code 0.
Process log-rotation finished with code 1.
Process cleaning[657] started.
And we want the output in the following format:
{
"process.name": "cleaning",
"process.pid": 123,
"event.action": "process-finished",
"return.code": 0
}
{
"process.name": "log-rotation",
"event.action": "process-finished",
"return.code": 1
}
{
"process.name": "cleaning",
"process.pid": 657,
"event.action": "process-started",
}
Declaration will be the following:
define:
type: parser/parsec
parse:
!PARSE.KVLIST
- !PARSE.UNTIL " "
- !TRY
- !PARSE.KVLIST
- process.name: !PARSE.UNTIL "["
- process.pid: !PARSE.UNTIL "]"
- !PARSE.SPACE
- !PARSE.KVLIST
- process.name: !PARSE.UNTIL " "
- !TRY
- !PARSE.KVLIST
- !PARSE.EXACTLY "started."
- event.action: "process-started"
- !PARSE.KVLIST
- !PARSE.EXACTLY "finished with code "
- event.action: "process-finished"
- return.code: !PARSE.DIGITS
Example 3: Parsing syslog events
For the purpose of the example, let's say that we want to parse a simple event in syslog format:
<189> Sep 22 10:31:39 server-abc server-check[1234]: User "harry potter" logged in from 198.20.65.68
We would like the output in the following format:
{
"PRI": 189,
"timestamp": 1695421899,
"server": "server-abc",
"process.name": "server-check",
"process.pid": 1234,
"user": "harry potter",
"action": "log-in",
"ip": "198.20.65.68"
}
We will create two parsers. First parser will parse the syslog header and the second will parse the message.
define:
name: Syslog parser
type: parser/parsec
parse:
!PARSE.KVLIST
- !PARSE.EXACTLY "<"
- PRI: !PARSE.DIGITS
- !PARSE.EXACTLY ">"
- timestamp: ...
- server: !PARSE.UNTIL " "
- process.name: !PARSE.UNTIL "["
- process.pid: !PARSE.UNTIL "]"
- !PARSE.EXACTLY ":"
- message: !PARSE.CHARS
This parser
define:
type: parser/parsec
field: message
drop: yes
parse:
!PARSE.KVLIST
- !PARSE.UNTIL " "
- user: !PARSE.BETWEEN { what: '"' }
- !PARSE.EXACTLY " "
- !PARSE.UNTIL " "
- !PARSE.UNTIL " "
- !PARSE.UNTIL " "
- ip: !PARSE.CHARS
Mapping declarations¶
After all declared fields are obtained from parsers, the fields typically have to be renamed according to some schema (ECS, CEF) in a process called mapping.
Why is mapping necessary?
To store event data in Elasticsearch, it's essential that the field names in the logs align with the Elastic Common Schema (ECS), a standardized, open-source collection of field names that are compatible with Elasticsearch. The mapping process renames the fields of the parsed logs according to this schema. Mapping ensures that logs from various sources have unified, consistent field names, which enables Elasticsearch to interpret them accurately.
Important
By default, mapping works as a filter. Make sure to include all fields you want in the parsed output in the mapping declaration. Any field not specified in mapping will be removed from the event.
Writing a mapping declaration¶
Write mapping delcarations in YAML. (Mapping declarations do not use SP-Lang expressions.)
define:
type: parser/mapping
schema: /Schemas/ECS.yaml
mapping:
<original_key>: <new_key>
<original_key>: <new_key>
...
Specify parser/mapping
as the type
in the define
section. In the schema
field, specify the filepath to the schema you're using. If you use Elasticsearch, use the Elastic Common Schema (ECS).
To rename they key and change the data type of the value:
mapping:
<original_key>:
field: <new_key>
type: <new_type>
Find available data types here.
To rename the key without changing the data type of the value:
mapping:
<original_key>: <new_key>
Example¶
Example
For the purpose of the example, let's say that we want to parse a simple event in JSON format:
{
"act": "user login",
"ip": "178.2.1.20",
"usr": "harry_potter",
"id": "6514-abb6-a5f2"
}
and we would like the final output look like this:
{
"event.action": "user login",
"source.ip": "178.2.1.20",
"user.name": "harry_potter"
}
Notice that the key names in the original event differ from the key names in the desired output.
For the initial parser declaration in this case, we can use a simple JSON parser:
define:
type: parser/json
This parser will create a list of key-value pairs that are exactly the same as the original ones.
To change the names of individual fields, we create this mapping delcaration file, 20_mapping_ECS.yaml
, in which we describe what fields to map and how:
---
define:
type: parser/mapping # determine the type of declaration
schema: /Schemas/ECS.yaml # which schema is applied
mapping:
act: 'event.action'
ip: 'source.ip'
usr: 'user.name'
This declaration will produce the desired output. (Data types have not been changed.)
Enricher declarations¶
Enrichers supplement the parsed event with extra data.
An enricher can:
- Create a new field in the event.
- Transform a field's values in some way (changing a letter case, performing a calculation, etc).
Enrichers are most commonly used to:
- Specify the dataset where the logs will be stored in ElasticSearch (add the field
event.dataset
). - Obtain facility and severity from the syslog priority field.
define:
type: parsec/enricher
enrich:
event.dataset: <dataset_name>
new.field: <expression>
...
- Write enrichers in YAML.
- Specify
parsec/enricher
in thedefine
field.
Example
The following example is enricher used for events in syslog format. Suppose you have parser for the events of the form:
<14>1 2023-05-03 15:06:12 server pid: Username 'HarryPotter' logged in.
{
"log.syslog.priority": 14,
"user.name": "HarryPotter"
}
You want to obtain syslog severity and facility, which are computed in the standard way:
(facility * 8) + severity = priority
You would also like to lower the name HarryPotter
to harrypotter
in order to unify the users across various log sources.
Therefore, you create an enricher:
define:
type: parsec/enricher
enrich:
event.dataset: 'dataset_name'
user.id: !LOWER { what: !GET {from: !ARG EVENT, what: user.name} }
# facility and severity are computed from 'syslog.pri' in the standard way
log.syslog.facility.code: !SHR
what: !GET { from: !ARG EVENT, what: log.syslog.priority }
by: 3
log.syslog.severity.code: !AND [ !GET {from: !ARG EVENT, what: log.syslog.priority}, 7 ]
Ended: Declarations
Parsing tutorial¶
The complete parsing process requires parser, mapping, and enricher declarations. This tutorial breaks down creating declarations step-by-step. Visit the LogMan.io Parsec documentation for more on the Parsec microservice.
Before you start
SP-Lang
Parsing declarations are written in TeskaLabs SP-Lang. For more details about parsing expressions, visit the SP-Lang documentation.
Declarations
For more information on specific types of declarations, see:
Sample logs¶
This example uses this set of logs collected from various Sophos SG230 devices:
<181>2023:01:12-13:08:45 asgmtx httpd: 212.158.149.81 - - [12/Jan/2023:13:08:45 +0100] "POST /webadmin.plx HTTP/1.1" 200 2000
<38>2023:01:12-13:09:09 asgmtx sshd[17112]: Failed password for root from 218.92.0.190 port 56745 ssh2
<38>2023:01:12-13:09:20 asgmtx sshd[16281]: Did not receive identification string from 218.92.0.190
<38>2023:01:12-13:09:20 asgmtx aua[2350]: id="3005" severity="warn" sys="System" sub="auth" name="Authentication failed" srcip="43.139.111.88" host="" user="login" caller="sshd" reason="DENIED"
These logs are using the syslog format described in RFC 5424.
Logs can be typically separated into two parts: the header and the body. The header is anything preceding the first colon after the timestamp. The body is the rest of the log.
Parsing strategy¶
The Parsec interprets each declaration alphabetically by name, so naming order matters. Within each declaration, the parsing process follows the order that you write the expressions in like steps.
A parsing sequence can include multiple parser declarations, and also needs a mapping declaration and an enricher declaration. In this case, create these declarations:
- First parser declaration: Parse the syslog headers
- Second parser declaration: Parse the body of the logs as the message.
- Mapping declaration: Rename fields
- Enricher declaration: Add metadata (such as the dataset name) and compute syslog facility and severity from priority
As per naming conventions, name these files:
- 10_parser_header.yaml
- 20_parser_message.yaml
- 30_mapping_ECS.yaml
- 40_enricher.yaml
Remember that declarations are interpreted in alphabetical order, in this case by the increasing numeric prefix. Use prefixes such as 10, 20, 30, etc. so you can add a new declaration between existing ones later without renaming all of files.
1. Parsing the header¶
This is the first parser declaration. The subsequent sections break down and explain each part of the declaration.
---
define:
type: parser/parsec
parse:
!PARSE.KVLIST
# PRI part
- '<'
- PRI: !PARSE.DIGITS
- '>'
# Timestamp
- TIMESTAMP: !PARSE.DATETIME
- year: !PARSE.DIGITS # year: 2023
- ':'
- month: !PARSE.MONTH { what: 'number' } # month: 01
- ':'
- day: !PARSE.DIGITS # day: 12
- '-'
- hour: !PARSE.DIGITS # hour: 13
- ':'
- minute: !PARSE.DIGITS # minute: 08
- ':'
- second: !PARSE.DIGITS # second: 45
- !PARSE.UNTIL ' '
# Hostname and process
- HOSTNAME: !PARSE.UNTIL ' ' # asgmtx
- PROCESS: !PARSE.UNTIL ':'
# Message
- !PARSE.SPACES
- MESSAGE: !PARSE.CHARS
Log headers¶
The syslog headers are in the format:
<PRI>TIMESTAMP HOSTNAME PROCESS.NAME[PROCESS.PID]:
Important: Log variance
Notice that PROCESS.PID
in the square brackets is not present in the first log's header. To accomodate the discrepancy, the parser will need a way to handle the possibility of PROCESS.PID
being either present or absent. This is addressed later in the tutorial.
Parsing the PRI¶
First, parse the PRI, which is enclosed by <
and >
characters, with no space in between.
How to parse <PRI>
, as seen in the first parser declaration:
!PARSE.KVLIST
- !PARSE.EXACTLY { what: '<' }
- PRI: !PARSE.DIGITS
- !PARSE.EXACTLY { what: '>' }
Expressions used:
!PARSE.EXACTLY
: Parsing the characters<
and>
!PARSE.DIGITS
: Parsing the numbers (digits) of the PRI
!PARSE.EXACTLY
shortcut
The !PARSE.EXACTLY
expression has a syntactic shortcut because it is so commonly used. Instead of including the whole expression, PARSE.EXACTLY { what: '(character)' }
can be shortened to '(character')
.
So, the above parser declaration can be shortened to:
!PARSE.KVLIST
- '<'
- PRI: !PARSE.DIGITS
- '>'
Parsing the timestamp¶
The unparsed timestamp format is:
yyyy:mm:dd-HH:MM:SS
2023:01:12-13:08:45
Parse the timestamp with the !PARSE.DATETIME
expression.
As seen in the first parser declaration:
# 2023:01:12-13:08:45
- TIMESTAMP: !PARSE.DATETIME
- year: !PARSE.DIGITS # year: 2023
- ':'
- month: !PARSE.MONTH { what: 'number' } # month: 01
- ':'
- day: !PARSE.DIGITS # day: 12
- '-'
- hour: !PARSE.DIGITS # hour: 13
- ':'
- minute: !PARSE.DIGITS # minute: 08
- ':'
- second: !PARSE.DIGITS # second: 45
- !PARSE.UNTIL { what: ' ', stop: after }
Parsing the month:
The !PARSE.MONTH
expression requires you to specify the format of the month in the what
parameter. The options are:
'number'
(used in this case) which accepts numbers 01-12'short'
for shortened month names (JAN, FEB, etc.)'full'
for full month names (JANUARY, FEBRUARY, etc.)
Parsing the space:
The space at the end of the timestamp also needs to be parsed. Using the !PARSE.UNTIL
expression parses everything until the space character (' '
), stopping after the space, as defined (stop: after
).
!PARSE.UNTIL
shortcuts and alternatives
!PARSE.UNTIL
has the syntactic shortcut:
- !PARSE.UNTIL ' '
- !PARSE.UNTIL { what: ' ', stop: after }
Alternatively, you can choose an expression that specifically parses one or multiple spaces, respectively:
- !PARSE.SPACE
or
- !PARSE.SPACES
At this point, the sequence of characters <181>2023:01:12-13:08:45
(including the space at the end) is parsed.
Parsing the hostname and process¶
Next, parse the hostname and process: asgmtx sshd[17112]:
.
Remember, the first log's header is different than the rest. For a solution that accommodates this difference, create a parser declaration and a subparser declaration.
As seen in the first parser declaration:
# Hostname and process
- HOSTNAME: !PARSE.UNTIL ' ' # asgmtx
- PROCESS: !PARSE.UNTIL ':'
# Message
- !PARSE.SPACES
- MESSAGE: !PARSE.CHARS
- Parse the hostname: To parse the hostname, use the
!PARSE.UNTIL
expression to parse everything until the single character specified inside' '
, which in this case is a space, and stops after that character, without including the character in the output. - Parse the process: Use
!PARSE.UNTIL
again for parsing until:
. After the colon ('
), the header is parsed. - Parse the message: In this declaration, use
!PARSE.SPACES
to parse all spaces between the header and the message. Then, store the rest of the event in theMESSAGE
field using the!PARSE.CHARS
expression, which in this case parses all of the rest of the characters in the log. You will use additional declarations to parse the parts of the message.
1.5. Parsing for log variance¶
To address the issue of the first log not having a process PID, you need a second parser declaration, a subparser. In the other logs, the process PID is enclosed in square brackets ([ ]
).
Create a declaration called 15_parser_process.yaml
. To accommodate the differences in the logs, create two "paths" or "branches" that the parser can use. The first branch will parse PROCESS.NAME
, PROCESS.PID
and :
. The second branch will parse only PROCESS.NAME
.
Why do I need two branches?
For three of the logs, the process PID is enclosed in square brackets ([ ]
). Thus, the expression that isolates the PID begins parsing at a square bracket [
. However, in the first log, the PID field is not present. If you try to parse the first log using the same expression, the parser will try to find a square bracket in that log and will keep searching regardless of the character [
not being present in the header.
The result would be that whatever is inside the square brackets is parsed as PID
, which in this case would be nonsensical, and would disrupt the rest of the parsing process for that log.
The second declaration:
---
define:
type: parser/parsec
field: PROCESS
error: continue
parse:
!PARSE.KVLIST
- !TRY
- !PARSE.KVLIST
- PROCESS.NAME: !PARSE.UNTIL '['
- PROCESS.PID: !PARSE.UNTIL ']'
- !PARSE.KVLIST
- PROCESS.NAME: !PARSE.CHARS
To achieve this, construct two little parsers under the combinator !PARSE.KVLIST
using the !TRY
expression.
The !TRY
expression
The !TRY
expression allows you to nest a list of expressions under it. !TRY
begins by attempting to use the first expression, and if that first expression is unusable for the log, the process continues with the second nested expression, and so on, until an expression succeeds.
Under the !TRY
expression:
The first branch:
1. The expression parses PROCESS.NAME
and PROCESS.PID
, expecting the square brackets [
and ]
to be present in the event. After these are parsed, it also parses the :
character.
2. If the log does not contain a [
character, the expression !PARSE.UNTIL '['
fails, and in that case the whole !PARSE.KVLIST
expression in the first branch fails.
The second branch:
3. The !TRY
expression will continue with the next parser, which does not require the character [
to be present in the event. It simply parses everything before :
and stops after it.
4. If this second expression fails, the log goes to OTHERS.
2. Parsing the message¶
Consider again the events:
<181>2023:01:12-13:08:45 asgmtx httpd: 212.158.149.81 - - [12/Jan/2023:13:08:45 +0100] "POST /webadmin.plx HTTP/1.1" 200 2000
<38>2023:01:12-13:09:09 asgmtx sshd[17112]: Failed password for root from 218.92.0.190 port 56745 ssh2
<38>2023:01:12-13:09:20 asgmtx sshd[16281]: Did not receive identification string from 218.92.0.190
<38>2023:01:12-13:09:20 asgmtx aua[2350]: id="3005" severity="warn" sys="System" sub="auth" name="Authentication failed" srcip="43.139.111.88" host="" user="login" caller="sshd" reason="DENIED"
There are three different types of messages, dependent on the process name.
httpd
: Message is in a structured format. We can extract the data such as IPs and HTTP requests easily by using the standard parsing expressions.sshd
: Message is a human-readable string. To extract the data such as host IPs and ports, hardcode these messages in the parser and skip the words that are relevant to humans but not relevant for automatic parsing.aua
: Message consists of structured data in the form of key-value pairs. Extract them as they are and rename them in the mapping according to the Elasticsearch Common Schema.
For clarity, put each declaration into a separate YAML file and use the !INCLUDE
expression for including them into one parser.
---
define:
type: parser/parsec
field: MESSAGE
error: continue
parse:
!MATCH
what: !GET { from: !ARG EVENT, what: process.name, type: str }
with:
'httpd': !INCLUDE httpd.yaml
'sshd': !INCLUDE sshd.yaml
'aua': !INCLUDE aua.yaml
else: !PARSE.KVLIST []
The !MATCH
expression has three parameters. The what
parameter specifies the field name, the value is matched with one of the cases specified in with
dictionary. If match is successful, the corresponding expression will be executed, in this case one of !INCLUDE
expressions. If none of the listed cases matches, the expression in else
is executed. In this case, !PARSE.KVLIST
is used with an empty list, which means nothing will be parsed from the message.
Parsing the structured message¶
First, look at the message from 'httpd' process.
212.158.149.81 - - [12/Jan/2023:13:08:45 +0100] "POST /webadmin.plx HTTP/1.1" 200 2000
Parse the IP address, the HTTP request method, the response status and the number of bytes requested to yield the output:
host.ip: '212.158.149.81'
http.request.method: 'POST'
http.response.status_code: '200'
http.response.body.bytes: '2000'
This is straightforward, assuming all the events will satisfy the same format as the one from the example:
!PARSE.KVLIST
- host.ip: !PARSE.UNTIL ' '
- !PARSE.UNTIL '"'
- http.request.method: !PARSE.UNTIL ' '
- !PARSE.UNTIL '"'
- !PARSE.SPACE
- http.response.status_code: !PARSE.DIGITS
- !PARSE.SPACE
- http.request.body.bytes: !PARSE.DIGITS
This case uses the ECS for naming. Alternatively, you can rename fields according to your needs in the mapping declaration.
Parsing the human-readable string¶
Let us continue with 'sshd' messages.
Failed password for root from 218.92.0.190 port 56745 ssh2
Did not receive identification string from 218.92.0.190
You can extract IP addresses from both events and the port from the first one. Additionally, you can store the condensed information about the event type in event.action
field.
event.action: 'password-failed'
user.name: 'root'
source.ip: '218.92.0.190'
source.port: '56745'
event.action: 'id-string-not-received'
source.ip: '218.92.0.190'
To differentiate between these two messages, notice that each of them starts with a different prefix. You can take advantage of this and use !PARSE.TRIE
expression.
!PARSE.TRIE
- 'Failed password for ': !PARSE.KVLIST
- event.action: 'password-failed'
- user.name: !PARSE.UNTIL ' '
- 'from '
- source.ip: !PARSE.UNTIL ' '
- 'port '
- source.port: !PARSE.DIGITS
- 'Did not receive identification string from ': !PARSE.KVLIST
- event.action: 'id-string-not-received'
- source.ip: !PARSE.CHARS
- '': !PARSE.KVLIST []
!PARSE.TRIE
expression tries to match the incoming string with the listed prefixes and performs the corresponding expressions. The empty prefix '' is a fallback: if none of the listed prefixes match, the empty one is used.
Parsing key-value pairs¶
Finally, aua
events have key-value pairs.
id="3005" severity="warn" sys="System" sub="auth" name="Authentication failed" srcip="43.139.111.88" host="" user="login" caller="sshd" reason="DENIED"
Desired output:
id: '3005'
severity: 'warn'
sys: 'System'
sub: 'auth'
name: 'Authentication failed'
srcip: '43.139.111.88'
host: ''
user: 'login'
caller: 'sshd'
reason: 'DENIED'
When encountering structured messages, you can use !PARSE.REPEAT
together with !PARSE.KV
.
The !PARSE.REPEAT
expression performs the expression specified in what
parameter multiple times. In this case, you want to repeat the steps until it is no longer possible:
- Parse everything until '=' character and use it as a key.
- Parse everything between '"' characters and assign that value to the key.
- Optionally, omit spaces before the next key begins.
For that, we create the following expression:
!PARSE.KVLIST
- !PARSE.REPEAT
what: !PARSE.KV
- !PARSE.OPTIONAL { what: !PARSE.SPACE }
- key: !PARSE.UNTIL '='
- value: !PARSE.BETWEEN '"'
KV
in !PARSE.KV
stands for key-value. This expression takes a list of parsing expressions, including the keywords key
and value
.
3. Mapping declaration¶
Mapping renames the keys so that they correspond to the ECS (Elastic Common Schema).
---
define:
type: parser/mapping
schema: /Schemas/ECS.yaml
mapping:
# 10_parser_header.yaml and 15_parser_process.yaml
'PRI': 'log.syslog.priority'
'TIMESTAMP': '@timestamp'
'HOSTNAME': 'host.hostname'
'PROCESS.NAME': 'process.name'
'PROCESS.PID': 'process.pid'
'MESSAGE': 'message'
# 20_parser_message.yaml
# httpd.yaml
- 'host.ip': 'host.ip'
- 'http.request.method': 'http.request.method'
- 'http.response.status_code': 'http.response.status_code'
- 'http.request.body.bytes': 'http.request.body.bytes'
# sshd.yaml
'event.action': 'event.action'
'user.name': 'user.name'
'source.ip': 'source.ip'
'source.port': 'source.port'
# aua.yaml
'sys': 'sophos.sys'
'host': 'sophos.host'
'user': 'sophos.user'
'caller': 'log.syslog.appname'
'reason': 'event.reason'
Mapping as a filter
Note that we must map fields from httpd.yaml
and sshd.yaml
files, although they are already in ECS format. The mapping processor also work as a filter. Any key you do not include in the mapping declaration is dropped from the event. This is the case of aua.yaml
, where some fields are not included in mapping and therefore skipped.
4. Enricher declaration¶
The enricher will have this structure:
---
define:
type: parsec/enricher
enrich:
...
For the purpose of this example, the enricher will
- Add the fields
event.dataset
anddevice.model.identifier
, which will be "static" fields, always with the same value. - Transform the field
HOST.HOSTNAME
to lowercase,host.hostname
. - Compute the syslog facility and severity from syslog priority, both with numeric and readable value.
Note that enrichers do not modify or delete the already existing fields, unless you explicitly specify it in the declaration. This is done by creating a field that is already existing in the event. In that case, the field is simply replaced by the new value.
Enriching simple fields¶
To enrich the event with event.dataset
supplemented by device.model.identifier
:
event.dataset: "sophos"
device.model.identifier: "SG230"
For that, specify these fields in the enricher, and the fields will be added to the event every time.
---
define:
type: parsec/enricher
enrich:
event.dataset: "sophos"
device.model.identifier: "SG230"
Editing existing fields¶
You can perform some operations with already existing fields. In this case, the goal is to change HOST.HOSTNAME
to lowercase, host.hostname
. For that, use the following expression:
host.hostname: !LOWER
what: !GET {from: !ARG EVENT, what: host.hostname}
You can also change the field name. If you do it like this,
host.id: !LOWER
what: !GET {from: !ARG EVENT, what: host.hostname}
the output would include the original field host.hostname
as well as a new lowercase field host.id
.
Computing facility and severity from priority¶
Syslog severity and facility are computed from syslog priority by the formula:
PRIORITY = FACILITY * 8 + SEVERITY
There is a shortcut for faster computation that uses the fact that numbers are represented in binary format.
The shortcut allows the use of low level operations such as !SHR
(right shift) and !AND
.
8 = 2^3
, therefore obtaining an integer quotient after dividing by 8 is done by performing the right shift by 3.
Integer 7
in binary representation is 111
, therefore applying !AND
operation gives the remainder after dividing by 8.
The expression is the following:
log.syslog.facility.code: !SHR { what: !GET { from: !ARG EVENT, what: log.syslog.priority }, by: 3 }
log.syslog.severity.code: !AND [ !GET { from: !ARG EVENT, what: log.syslog.priority }, 7 ]
You can consider the number 38
to illustrate this concept. 38
is 100 110
in binary representation. Dividing it by 8
is the same as right shift by 3 places, which is 11
in binary representation.
shr(100 110, 11) = 000 100
which is 4
. So the value of FACILITY is 4
, which corresponds to AUTH
. Performing !AND
operation gives
and(100 100, 111) = 000 100
which is again 4
. So the value of SEVERITY is 4
, which corresponds to WARNING
.
You can also match the numeric values of severity and facility with human-readable names using the !MATCH
expression. The complete declaration is the following:
---
define:
type: parsec/enricher
enrich:
# New fields
event.dataset: "sophos"
device.model.identifier: "SG230"
# Lowercasing the existing field
host.hostname: !LOWER
what: !GET {from: !ARG EVENT, what: host.hostname}
# SYSLOG FACILITY
log.syslog.facility.code: !SHR { what: !GET { from: !ARG EVENT, what: log.syslog.priority }, by: 3 }
log.syslog.facility.name: !MATCH
what: !GET { from: !ARG EVENT, what: log.syslog.facility.code }
with:
0: 'kern'
1: 'user'
2: 'mail'
3: 'daemon'
4: 'auth'
5: 'syslog'
6: 'lpr'
7: 'news'
8: 'uucp'
9: 'cron'
10: 'authpriv'
11: 'ftp'
16: 'local0'
17: 'local1'
18: 'local2'
19: 'local3'
20: 'local4'
21: 'local5'
22: 'local6'
23: 'local7'
# SYSLOG SEVERITY
log.syslog.severity.code: !AND [ !GET { from: !ARG EVENT, what: log.syslog.priority }, 7 ]
log.syslog.severity.name: !MATCH
what: !GET { from: !ARG EVENT, what: log.syslog.severity.code }
with:
0: 'emergency'
1: 'alert'
2: 'critical'
3: 'error'
4: 'warning'
5: 'notice'
6: 'information'
7: 'debug'
Ended: Parsing
Detections ↵
LogMan.io Correlator¶
TeskaLabs LogMan.io Correlator is a powerful, fast, scalable component of LogMan.io and TeskaLabs SIEM. As the Correlator makes detections possible, it is essential to effective cybersecurity.
The Correlator identifies specified activity, patterns, anomalies, and threats in real time as defined by detection rules. The Correlator works in your system's data stream, rather than on disk storage, making it a fast and uniquely scalable security mechanism.
What does the Correlator do?¶
The Correlator keeps track of events and when they happen in relation to a larger pattern or activity.
- First, you identify the pattern, threat, or anomaly you want the Correlator to monitor for. You write a detection that defines the activity, including which types of events (logs) are relevant and how many times an event needs to occur in a defined timeframe in order to trigger a response.
-
The Correlator identifies the relevant incoming events, and organizes the events first by a specific attribute in the event (dimension), such as source IP address or user ID, then sorts the events into short time intervals so the number of events can be analyzed. The time intervals are also defined by the detection rule.
Note: It's most common to use the Correlator's sum function to count events that occur in a specified timeframe. However, the Correlator can also analyze using other mathematical functions.
-
The Correlator analyzes these dimensions and time intervals to see if the relevant events have happened in the desired timeframe. When the Correlator detects the activity, it triggers the response specified in the detection.
In other words, this microservice shares event statuses over time intervals and uses a sliding, or rolling, analysis window.
What is a sliding analysis window?
Using a sliding analysis window means that the Correlator can analyze multiple time intervals continuously. For example, when analyzing a period of 30 seconds, the Correlator shifts its analysis, which is a window of 30 seconds, to overlap previous analyses as time progresses.
This picture represents a single dimension, for example the analysis of events with the same source IP address. In a real detection rule, you'd have several rows of this table, one row for each IP address. More in the example below.
The sliding window makes it possible to analyze the overlapping 30-second timeframes 0:00-0:30
, 0:10-0:40
, 0:20-0:50
, and 0:30-0:60
, rather than just 0:00-0:30
and 0:30-0:60
.
Example¶
Example scenario: You create a detection to alert you when 20 login attempts are made to the same user account within 30 seconds. Since this password entry rate is higher than most people could achieve on their own, this activity could indicate a brute force attack.
In order to detect this security threat, the Correlator needs to know two things:
- Which events are relevant. In this case, that means failed login attempts to the same user account.
- When the events (login attempts) happen in relation to each other.
Note: The following logs and images are heavily simplified to better illustrate the ideas.
1. These logs occur in the system:
What do these logs mean?
Each table you see above is a log for the event of a user having a single failed login attempt.
log.ID
: The unique log identifier, as seen in the table belowtimestamp
: The time the event occurredusername
: The Correlator will analyze groups of logs from the same users, because it wouldn't be effective in this case to analyze login attempts across all users combined.event.message
: The Correlator is only looking for failed logins, as would be defined by the detection rule.
2. The Correlator begins tracking the events in rows and columns:
- Username is the dimension, as defined by the detection rule, so each user has their own row.
- Log ID (A, B, C, etc.) is here in the table so you can see which logs are being counted.
- The number in each cell is how many events occurred in that time interval per username (dimension).
3. The Correlator continues keeping track of events:
You can see that one account is experiencing a higher volume of failed login attempts now.
4. At the same time, the Correlator is analyzing 30-second time periods with an analysis window:
The analysis window moves across the time intervals to count the total number of events in 30-second timeframes. You can see that when the analysis window reaches the 01:20-01:50
timeframe for the username anna.s.ample
, it will count more than 20 events. This would trigger a response from the Correlator, as defined by the detection (more on triggers here).
A gif to illustrate the analysis window moving
The 30-second analysis window "slides" or "rolls" along the time intervals, counting how many events occurred. When it finds 20 or more events in a single analysis, an action from the detection rule is triggered.
Memory and storage¶
The Correlator operates in the data stream, not in a database. This means that the Correlator is tracking events and performing analysis in real time as events occur, rather than pulling past collected events from a database to perform analysis.
In order to work in the data stream, the Correlator uses memory mapping, which allows it to function in the system's quickly accessible memory (RAM) rather than relying on disk storage.
Memory mapping provides significant benefits:
- Real-time detection: Data in RAM is more quickly accessible than data from a storage disk. This makes the Correlator very fast, allowing you to detect threats immediately.
- Simultaneous processing: Greater processing capacity allows for the Correlator to run many parallel detecions at once.
- Scalability: The volume of data in your log collection system will likely increase as your organization grows. The Correlator can keep up. Allocating additional RAM is faster and simpler than increasing disk storage.
- Persistence: If the system shuts down unexpectedly, the Correlator does not lose data. The Correlator's history is backed up to disk (SSD) often, so the data is available when the system restarts.
For more technical information, visit our Correlator reference documentation.
What is a detection?¶
A detection (sometimes called a correlation rule) defines and finds patterns and specific events in your data. A huge volume of event logs moves through your system, and detections help identify events and combinations of events that might be the result of a security breach or system error.
Important
- The possibilities for your detections depend on your Correlator configuration.
- All detections are written in TeskaLabs SP-Lang. There is a quick guide for SP-Lang in the window correlation example and additional detection guidelines.
What can detections do?¶
You can write detections to describe and find an endless combination of events and patterns, but these are common activities to monitor:
- Multiple failed login attempts: Numerous unsuccessful login attempts within a short period, often from the same IP address, to catch brute-force or password-spraying attacks.
- Unusual data transfer or exfiltration: Abnormal or large data transfers from inside the network to external locations.
- Port scanning: Attempts to identify open ports on network devices, which may be the precursor to an attack.
- Unusual hours of activity: User or system activities during non-business hours, which could indicate a compromised account or insider threat.
- Geographical anomalies: Logins or activities originating from unexpected geographical locations based on the user's typical behavior.
- Access to sensitive resources: Unauthorized or unusual attempts to access critical or sensitive files, databases, or services.
- Changes to critical system files: Unexpected changes to system and configuration files
- Suspicious email activity: Phishing emails, attachments with malware, or other types of malicious email content.
- Privilege escalation: Attempts to escalate privileges, such as a regular user trying to gain admin-level access.
Getting started¶
Plan your correlation rule carefully to avoid missing important events or drawing attention to irrelevant events. Answer the questions:
- What activity (events or patterns) do you want to detect?
- Which logs are relevant to this activity?
- What do you want to happen if the activity is detected?
To get started writing a detection, see this example of a window correlation and follow these additional guidelines.
Writing a window correlation-type detection rule¶
A window correlation rule is a highly versatile type of detection that can identify combinations of events over time. This example shows some of the techniques you can use when writing window correlations, but there are many more options, so this page gives you additional guidance.
Before you can write a new detection rule, you need to:
- Decide what activity you are looking for, and decide the timeframe in which this activity happening is notable.
- Identify which data source produces the logs that could trigger a positive detection, and identify what information those logs contain.
- Decide what you want to happen when the activity is detected.
Use TeskaLabs SP-Lang to write correlation rules.
Sections of a correlaton rule¶
Include each of these sections in your rule:
- Define: Information that describes your rule.
- Predicate: The
predicate
section is a filter that identifies which logs to evaluate, and which logs to ignore. - Evaluate: The
evaluate
section sorts or organizes data to be analyzed. - Analyze: The
analyze
section defines and searches for the desired pattern in the data sorted byevaluate
. - Trigger: The
trigger
section defines what happens if there is a positive detection.
To better understand the structure of a window correlation rule, consult this example.
Comments
Include comments in your detection rules so that you and others can understand what each item in the detection rule does. Add comments on separate lines from code, and begin comments with hashtags #
.
Parentheses
Words in parentheses ()
are placeholders to show that there would normally be a value in this space. Correlation rules don't use parentheses.
Define¶
Always include in define
:
Item in the rule | How to include |
---|---|
|
Name the rule. While the name has no impact on the rule's functionality, it should still be a name that's clear and easy for you and others to understand. |
|
Describe the rule briefly and accurately. The description also has no impact on the rule's functionality, but it can help you and others understand what the rule is for. |
|
Include this line as-is. The type does impact the rule's functionality. The rule uses correlator/window to function as a window correlator.
|
Predicate¶
The predicate
section is a filter. When you write the predicate
, you use SP-Lang expressions to structure conditions for the filter "allow in" only logs that are relevant to the activity or pattern that the rule is detecting.
If a log meets the predicate's conditions, it gets analyzed in the next steps of the detection rule, alongside other related logs. If a log doesn't meet the predicate's conditions, the detection rule ignores the log.
See this guide to learn more about writing predicates.
Evaluate¶
Any log that passes through the filter in predicate
gets evaluated in evaluate
. The evaluate
section organizes the data so it can by analyzed. Usually, you can't spot a security threat (or other noteworthy patterns) based on just one event (for example, one failed login attempt), so you need to write detection rules to group events together to find patterns that point to security or operational issues.
The evaluate
section creates an invisible evaluation window - you can think of the window as a table. The table is what the analyze
section uses to detect the activity the detection rule is seeking.
You can see an example of the evaluate
and analyze
sections working together here.
Item in evaluate |
How to include |
---|---|
|
dimension creates the rows in the table. In the table, the values of the specified fields are grouped into one row (see the table below).
|
|
by creates the columns in the table. In most cases, @timestamp is the right choice because window correlation rules are based around time. So, each column in the table is an interval of time, which the resolution specifies.
|
|
The resolution unit is seconds. Each time interval will be the number of seconds you specify.
|
|
The saturation field sets how many times the trigger can be activated before the rule stops counting events in a single cell that caused the trigger (see the table below). With a recommended saturation of 1, relevant events that happen within the same specified timeframe (resolution ) will stop being counted after one trigger. Setting the saturation to 1 prevents additional triggers for identical behavior in the same timeframe.
|
Analyze¶
analyze
uses the table created by the evaluate
section to find out if the activity the detection rule is seeking has happened.
You can see an example of the evaluate
and analyze
sections working together here.
Item in analyze |
How to include |
---|---|
|
The window analyzes a specified number of cells in the table created by evaluate section, each of which represents logs in a specified timeframe. Hopping window: The window will count the values in cells, testing all adjacent combinations of cells to cover the specified time period, with overlap. A hopping window is recommended. Tumbling window: The window counts the values in cells, testing all adjacent combinations of cells to cover the specified time period, WITHOUT overlap. See the note below to learn more about hopping and tumbling windows. |
|
The aggregate depends on the dimension . Use unique count to ensure that the rule won't count the same value of your specified field in dimension more than once.
|
|
A span sets the number of cells in the table that will be analyzed at once. span multiplied by resolution is the timeframe in which the correlation rule looks for a pattern or behavior. (For example, 2*60 is a 2-minute timeframe.)
|
|
The !GE expression means "greater than or equal to," and !ARG VALUE refers to the output value of the aggregate function. The value listed under !ARG VALUE is the number of unique occurances of a value in a single analysis window that will trigger the correlation rule.
|
Hopping vs. tumbling windows
This page about tumbling and hopping windows can help you understand the different types of analysis windows.
Trigger¶
After identifying the suspicious activity you specified, the rule can:
- Send the detection to Elasicsearch as a document. Then, you can see the detection as a log in TeskaLabs LogMan.io. You can create your own dashboard to display correlation rule detections, or find the logs in Discover.
- Send a notification via email
Visit the triggers page to learn about setting up triggers to create events, and go to the notifications page to learn about sending messages from detections.
Example of a window correlation detection rule¶
A window correlation rule is a type of detection that can identify combinations of events over time. Before using this example to write your own rule, visit these guidelines to better understand each part of the rule.
Like all detections, write window correlation rules in TeskaLabs SP-Lang.
Jump to: Define | Predicate | Evaluate | Analyze | Trigger
This detection rule is looking for a single external IP trying to access 25 or more unique internal IP addresses in 2 minutes. This activity could indicate an attacker trying search the network infrastructure for vulnerability.
Note
Any line beginning with a hashtag (#) is a comment, not part of the detection rule. Add notes to your detection rules to help others understand the rules' purpose and function.
The complete detection rule using a window correlation:
define:
name: "Network T1046 Network Service Discovery"
description: "External IP accessing 25+ internal IPs in 2 minutes"
type: correlator/window
predicate:
!AND
- !OR
- !EQ
- !ITEM EVENT event.dataset
- "fortigate"
- !EQ
- !ITEM EVENT event.dataset
- "sophos"
- !OR
- !EQ
- !ITEM EVENT event.action
- "deny"
- !EQ
- !ITEM EVENT event.action
- "drop"
- !IN
what: source.ip
where: !EVENT
- !NOT
what:
!STARTSWITH
what: !ITEM EVENT source.ip
prefix: "193.145"
- !NE
- !ITEM EVENT source.ip
- "8.8.8.8"
- !IN
what: destination.ip
where: !EVENT
evaluate:
dimension: [tenant, source.ip]
by: "@timestamp"
resolution: 60
saturation: 1
analyze:
window: hopping
aggregate: unique count
dimension: destination.ip
span: 2
test:
!GE
- !ARG VALUE
- 25
trigger:
- event:
!DICT
type: "{str:any}"
with:
ecs.version: "1.10.0"
lmio.correlation.depth: 1
lmio.correlation.name: "Network T1046 Network Service Discovery"
# Events
events: !ARG EVENTS
# Threat description
# https://www.elastic.co/guide/en/ecs/master/ecs-threat.html
threat.framework: "MITRE ATT&CK"
threat.software.platforms: "Network"
threat.indicator.sightings: !ARG ANALYZE_RESULT
threat.indicator.confidence: "Medium"
threat.indicator.ip: !ITEM EVENT source.ip
threat.indicator.port: !ITEM EVENT source.port
threat.indicator.type: "ipv4-addr"
threat.tactic.id: "TA0007"
threat.tactic.name: "Discovery"
threat.tactic.reference: "https://attack.mitre.org/tactics/TA0007/"
threat.technique.id: "T1046"
threat.technique.name: "Network Service Discovery"
threat.technique.reference: "https://attack.mitre.org/techniques/T1046/"
# Identification
event.kind: "alert"
event.dataset: "correlation"
source.ip: !ITEM EVENT source.ip
Define¶
define:
name: "Network T1046 Network Service Discovery"
description: "External IP accessing 25+ internal IPs in 2 minutes"
type: correlator/window
Item in the rule | What does it mean? |
---|---|
|
This is the name of the rule. The name is for the users and has no impact on the rule itself. |
|
The description is also for the users. It describes what the rule does, but it has no impact on the rule itself. |
|
The type does impact the rule. The rule uses correlator/window to function as a window correlator.
|
Predicate¶
predicate
is the filter that checks if an incoming log might be related to the event that the detection rule is searching for.
The predicate is made of SP-Lang expressions. The expressions create conditions. If the expression is "true," the condition is met. The filter checks the incoming log to see if the log makes the predicate's expressions "true" and therefore meets the conditions.
If a log meets the predicate's conditions, it gets analyzed in the next steps of the detection rule, alongside other related logs. If a log doesn't meet the predicate's conditions, the detection rule ignores the log.
You can find the full SP-Lang documentation here.
SP-Lang terms, in the order they appear in the predicate
Expression | Meaning |
---|---|
!AND |
ALL of the criteria nested under !AND must be met for the !AND to be true |
!OR |
At least ONE of the criteria nested under !OR must be met for the !OR to be true |
!EQ |
"Equal" to. Must be equal to, or match the value, to be true |
!ITEM EVENT |
Gets information from the content of the incoming logs (accesses the fields and values in the incoming logs) |
!IN |
Looks for a value in a set of values (what in where ) |
!NOT |
Seeks the opposite of the expression nested under the !NOT (following what ) |
!STARTSWITH |
The value of the field (what ) must start with the specified text (prefix ) to be true |
!NE |
"Not equal" to, or doesn't equal. Must NOT equal (must not match the value) to be true |
You can see that there are several expressions nested under !AND
. A log must meet ALL of the conditions nested under !AND
to pass through the filter.
As seen in rule | What does it mean? |
---|---|
|
This is the first !OR expression, and it has two !EQ expressions nested under it, so at least ONE !EQ condition nested under this !OR must be true. Remember, !ITEM EVENT gets the value of the field it specifies. If the incoming log has "fortigate" OR "sophos" in the field event.dataset , then the log meets the !OR condition.
This filter accepts events only from the FortiGate and Sophos data sources. FortiGate and Sophos provide security tools such as firewalls, so this rule is looking for events generated by security tools that might already be intercepting suspicious activity. |
|
This condition is structured the same way as the previous one. If the incoming log has the value "deny" OR "drop" in the field event.action , then the log meets this !OR condition.
The values "deny" and "drop" in a log both signal that a security device, such as a firewall, blocked attempted access based on authorization or security policies. |
|
If the field source.ip exists in the incoming log (!EVENT ), then the log meets this !IN condition.
The field source.ip is the IP address that is trying to gain access to another IP address. Since this rule is specifically about IP addresses, the log needs to have the source IP address in it to be relevant.
|
|
If the value of the field source.ip DOES NOT begin with "193.145," then this !NOT expression is true. 193.145 is the beginning of this network's internal IP addresses, so the !NOT expression filters out internal IP addresses. This is because internal IPs accessing many other internal IPs in a short timeframe would not be suspicious. If internal IPs were not filtered out, the rule would return false positives.
|
|
If the incoming log DOES NOT have the value "8.8.8.8" in the field source.ip , then the log meets this !NE condition.
The rule filters out 8.8.8.8 as a source IP address because it is a well-known and trusted DNS resolver operated by Google. 8.8.8.8 is not generally associated with malicious activity, so not excluding it would trigger false positives in the rule. |
|
If the field destination.ip exists in the incoming log, then the log meets this !IN condition.
The field destination.ip is the IP address that is being accessed. Since this rule is specifically about IP addresses, the log needs to have the destination IP address in it to be relevant.
|
If an incoming log meets EVERY condition shown above (nested under !AND
), then the log gets evaluated and analyzed in the next sections of the detection rule.
Evaluate¶
Any log that passes through the filter in predicate
gets evaluated in evaluate
. The evaluate
section organizes the data so it can by analyzed. Usually, you can't spot a security threat (or other noteworthy patterns) based on just one event (for example, one failed login attempt), so the detection rule groups events together to find patterns that point to security or operational issues.
The evaluate
section creates an invisible evaluation window - you can think of the window as a table. The table is what the analyze
section uses to detect the event the detection rule is seeking.
evaluate:
dimension: [tenant, source.ip]
by: "@timestamp"
resolution: 60
saturation: 1
As seen in rule | What does it mean? |
---|---|
|
dimension creates the rows in the table. The rows are tenant and source.ip . In the final table, the values of tenant and source.ip are grouped into one row (see the table below).
|
|
by creates the columns in the table. It refers to the field @timestamp because the values from that field enable the rule to compare the events over time. So, each column is an interval of time, which the resolution specifies.
|
|
The resolution unit is seconds, so the value here is 60 seconds. Each time interval will be 60 seconds long.
|
|
The saturation field sets how many times the trigger can be activated before the rule stops counting events in a single cell that caused the trigger (see the table below). Since the saturation is 1, this means that relevant events that happen within one minute of each other will stop being counted after one trigger. Setting the saturation to 1 prevents additional triggers for identical behavior in the same timeframe. In this example, the trigger would be activated only once if an external IP address tried to access any number of unique internal IPs above 25.
|
This is an example of how the evaluate
section sorts logs that pass through the predicate
filter. (Click the table to enlarge.) The log data is heavily simplified for the sake of readability (for example, log IDs in the field _id
are letters rather than real log IDs, and the timestamps are shortened).
As specified by the dimension
field, the logs are grouped by tenant and source IP address, as you can see in cells A2-A5.
Since by
has the value timestamp
, and the resolution
is set to 60 seconds, the cells B1-E1 are time intervals, and the logs are sorted into the columns by their timestamp
value.
The number beside the list of log IDs in each cell (for example, 14 in cell C4) is the count of how many logs with the same source IP address passed through the filter in that timeframe. This becomes essential information in the analyze
section of the rule, since we're counting access attempts by external IPs.
Analyze¶
analyze
uses the table created by the evaluate
section to find out if the event the detection rule is seeking has happened.
analyze:
window: hopping
aggregate: unique count
dimension: destination.ip
span: 2
test:
!GE
- !ARG VALUE
- 25
As seen in rule | What does it mean? |
---|---|
|
The window type is hopping. The window analyzes a specified number of cells in the table created by evaluate section, each of which represents logs in a timeframe of 60 seconds. Since the type is hopping, the window will count some cells twice to test any adjacent combination of a two-minute time period. Since the span is set to 2, the rule will analyze two minutes (cells) at a time, with overlap.
|
|
The aggregate depends on the dimension . Here, unique count applies to destination.ip . This ensures that the rule won't count the same desination IP address more than once.
|
|
A span of 2 means that the cells in the table will be analyzed 2 at a time. |
|
The !GE expression means "greater than or equal to," and !ARG VALUE refers to the output value of the aggregate function. The value 25 is listed under !ARG VALUE , so this whole test expression is testing for 25 or more unique destination IP addresses in a single analysis window.
|
The red window around cells C4 and D4 shows that the rule has detected what it's looking for - attempted connection to 25 unique IP addresses.
Analysis with a hopping window explained in a gif
This illustrates how the window analyzes the data two cells at a time. When the window gets to cells C4 and D4, it detects 25 unique destination IP addresses.
Trigger¶
The trigger
section defines what happens if the analyze
section detects the event that the detection rule is looking for. In this case, the trigger is activated when a single external IP address attempts to connect to 25 or more different interal IP addresses.
As seen in rule | What does it mean? |
---|---|
|
In the trigger, event means that the rule will create an event based on this positive detection and send it into the data pipeline via Elasticsearch, where it is stored as a document. Then, the event comes through to TeskaLabs LogMan.io, where you can see this event in Discover and in dashboards.
|
|
!DICT creates a dictionary of keys (fields) and values. type has "st:any" (meaning string) so that any type of value (numbers, words, etc) can be a value in a key-value pair. with begins the list of key-value pairs, which you define. These are the fields and values that the event will be made of.
|
To learn more about each field, click the icons. Since TeskaLabs LogMan.io uses Elasticsearch and the Elastic Common Schema (ECS), you can get more details about many of these fields in the ECS reference guide.
trigger:
- event:
!DICT
type: "{str:any}"
with:
ecs.version: "1.10.0" #(1)
lmio.correlation.depth: 1 #(2)
lmio.correlation.name: "Network T1046 Network Service Discovery" #(3)
# Events
events: !ARG EVENTS #(4)
# Threat description
# https://www.elastic.co/guide/en/ecs/master/ecs-threat.html
threat.framework: "MITRE ATT&CK" #(5)
threat.software.platforms: "Network" #(6)
threat.indicator.sightings: !ARG ANALYZE_RESULT #(7)
threat.indicator.confidence: "Medium" #(8)
threat.indicator.ip: !ITEM EVENT source.ip #(9)
threat.indicator.port: !ITEM EVENT source.port #(10)
threat.indicator.type: "ipv4-addr" #(11)
threat.tactic.id: "TA0007" #(12)
threat.tactic.name: "Discovery" #(13)
threat.tactic.reference: "https://attack.mitre.org/tactics/TA0007/" #(14)
threat.technique.id: "T1046" #(15)
threat.technique.name: "Network Service Discovery" #(16)
threat.technique.reference: "https://attack.mitre.org/techniques/T1046/" #(17)
# Identification
event.kind: "alert" #(18)
event.dataset: "correlation" #(19)
source.ip: !ITEM EVENT source.ip #(20)
- The version of the Elastic Common Schema that this event conforms to - required field that must exist in all events going to Elasticsearch.
- The correlation depth indicates if this rule depends on any other rules or is in a chain of rules. The value 1 means that it is either the first in a chain, or the only rule involved - it doesn't depend on any other rules.
- The name of the rule
- In SP-Lang,
!ARG EVENTS
accesses original logs. So, this will list the IDs all of the events that make up this positive detection, so that you can investigate each log individually. - Name of the threat framework used to further categorize and classify the tactic and technique of the reported threat. See ECS reference.
- The platforms of the software used by this threat to conduct behavior commonly modeled using MITRE ATT&CK®. See ECS reference.
- Number of times this indicator was observed conducting threat activity. See ECS reference.
- Identifies the vendor-neutral confidence rating using the None/Low/Medium/High scale defined in Appendix A of the STIX 2.1 framework. See ECS reference.
- Identifies a threat indicator as an IP address (irrespective of direction). See ECS reference.
- Identifies a threat indicator as a port number (irrespective of direction). See ECS reference.
- Type of indicator as represented by Cyber Observable in STIX 2.0. See ECS reference.
- The id of tactic used by this threat. See ECS reference.
- The name of the type of the tactic used by this threat. See ECS reference.
- The reference url of tactic used by this threat. See ECS reference.
- The id of technique used by this threat. See ECS reference.
- The name of technique used by this threat. See ECS reference.
- The reference url of technique used by this threat. See ECS reference.
- The type of event
- The dataset that this event will be grouped in.
- The source IP address associated with this event (the one that tried to access 25 internal IPs in two minutes)
Ended: Detections
Predicates¶
A predicate
is a filter made of conditions formed by SP-Lang expressions.
How to write predicates¶
Before you can create a filter, you need to know the possible fields and values of the logs you are looking for. To see what fields and values your logs have, go to Discover in the TeskaLabs LogMan.io web app.
SP-Lang expressions¶
Construct conditions for the filter using SP-Lang expressions. The filter checks the incoming log to see if the log makes the expressions "true" and therefore meets the conditions.
You can find the full SP-Lang documentation here.
Common SP-Lang expressions:
Expression | Meaning |
---|---|
!AND |
ALL of the criteria nested under !AND must be met for the !AND to be true |
!OR |
At least ONE of the criteria nested under !OR must be met for the !OR to be true |
!EQ |
"Equal" to. Must be equal to, or match the value, to be true |
!NE |
"Not equal" to, or doesn't equal. Must NOT equal (must not match the value) to be true |
!IN |
Looks for a value in a set of values (what in where ) |
!STARTSWITH |
The value of the field (what ) must start with the specified text (prefix ) to be true |
!ENDSWITH |
The value of the field (what ) must end with the specified text (postfix ) to be true |
!ITEM EVENT |
Gets information from the content of the incoming logs (allows the filter to access the fields and values in the incoming logs) |
!NOT |
Seeks the opposite of the expression nested under the !NOT (following what ) |
Conditions¶
Use this guide to structure your individual conditions correctly.
Parentheses
Words in parentheses ()
are placeholders to show where values go. SP-Lang does not use parentheses.
Filter for a log that: | SP-Lang |
---|---|
Has a specified value in a specified field |
|
Has a specified field |
|
Does NOT have a specified value in a specified field |
|
Has one of multiple possible values in a field |
|
Has a specified value that begins with a specified number or text (prefix), in a specified field |
|
Has a specified value that ends with a specified number or text (postfix), in a specified field |
|
Does NOT satisfy a condition or set of conditions |
|
Example¶
To learn what each expression means in the context of this example, click the icons.
!AND #(1)
- !OR #(2)
- !EQ
- !ITEM EVENT event.dataset
- "sophos"
- !EQ
- !ITEM EVENT event.dataset
- "vmware-vcenter"
- !OR #(3)
- !EQ
- !ITEM EVENT event.action
- "Authentication failed"
- !EQ
- !ITEM EVENT event.action
- "failed password"
- !EQ
- !ITEM EVENT event.action
- "unsuccessful login"
- !OR #(4)
- !IN
what: source.ip
where: !EVENT
- !IN
what: user.id
where: !EVENT
- !NOT #(5)
what:
!STARTSWITH
what: !ITEM EVENT user.id
prefix: "harry"
- Every expression nested under
!AND
must be true for a log to pass through this filter. - In the log, in the field
event.dataset
, the value must besophos
orvmware-vcenter
for this!OR
to be true. - In the log, in the field
event.action
, the value must beAuthentication failed
,failed password
, orunsuccessful login
for this!OR
to be true. - The log must contain the field
source.ip
or the fielduser.id
for this!OR
to be true. - In the log, the field
user.id
must not begin withharry
for this!NOT
to be true.
This filters for logs that:
- Have the value
sophos
orvmware-vcenter
in the fieldevent.dataset
AND - Have the value
Authentication failed
,failed password
, orunsuccessful login
in the fieldevent.action
AND - Include at least one of the fields
source.ip
oruser.id
AND - Do not have a value that begins with
harry
in the fielduser.id
For more ideas and formatting tips, see this example in the context of a detection rule, including details about the predicate
section.
Triggers¶
A trigger, in an alert or detection, executes an action. For example, in a detection, the trigger
section can send an email when the specified activity is detected.
A trigger can:
- Trigger an event: Send an event to Elasicsearch where it is stored as a document. Then, you can see the event as a log in the TeskaLabs LogMan.io app. You can create your own dashboard to display correlation rule detections, or find the logs in Discover.
- Trigger a notification: Send a message via email
Trigger an event¶
You can trigger an event. The end result is that the trigger creates a log of the event, which you can see in TeskaLabs LogMan.io.
Item in trigger |
How to include |
---|---|
|
In the trigger, event means that the rule will create an event based on this positive detection and send it into the data pipeline via Elasticsearch, where it is stored as a document. Then, the event comes through to TeskaLabs LogMan.io, where you can see this event in Discover and Dashboards.
|
|
!DICT creates a dictionary of keys (fields) and values. type has "st:any" (meaning string) so that any type of value (numbers, words, etc) can be a value in a key-value pair. with begins the list of key-value pairs, which you define. These are the fields and values that the event will be made of.
|
Following with
, make a list of the key-value pairs, or fields and values, that you want in the event.
!DICT
type: "{str:any}"
with:
key.1: "value"
key.2: "value"
key.3: "value"
key.4: "value"
If you're using Elasticsearch and therefore the Elastic Common Schema (ECS), you can read about standard fields in the ECS reference guide.
Trigger a notification¶
Notifications send messages. Currently, you can use notifications to send emails.
Learn more about writing notifications and creating email templates.
Notifications ↵
Notifications¶
Notifications send messages. You can add a notification
section anywhere that you want the output of a trigger
to be a message, such as in an alert or detection. In a detection, the notification
section can send a message when the specified activity (such as a potential threat) is detected.
TeskaLabs LogMan.io uses TeskaLabs ASAB Iris, a TeskaLabs microservice, to send messages.
Warning
To avoid notification spam, only use notifications for highly urgent and well-tested detection rules. Some detections are better suited to be sent as events through Elasticsearch and viewed in the LogMan.io web app.
Notification types¶
Currently, you can send messages via email.
Sending notifications via email¶
Write notifications in TeskaLabs SP-Lang. If you're writing a notification for a detection, write the email notification in the trigger
section.
Important
For notifications that send emails, you need to create an email template in the Library to connect with. This template includes the actual text that the recipient will see, with blank fields that change based on what the detected activity is (using Jinja templating), including which logs are involved in the detection, and any other information you choose. The notification section in the detection rule is what populates the blank fields in the email template. You can use a single email template for multiple detection rules.
Example:
Use this example as a guide. Click the icons to learn what each line means.
trigger: #(1)
- notification: #(2)
type: email #(3)
template: "/Templates/Email/Notification.md" #(4)
to: [email@example.com] #(5)
variables: #(6)
!DICT #(7)
type: "{str:any}" #(8)
with: #(9)
name: Notification from the detection X #(10)
events: !ARG EVENTS #(11)
address: !ITEM EVENT client.address #(12)
description: Detection of X by TeskaLabs LogMan.io #(13)
-
Indicates the beginning of the
trigger
section. -
Indicates the beginning of the
notification
section. -
To send an email, write email for
type
. -
This tells the notification where to get the email template from. You need to specify the filepath (or location) of the email template in the Library. In this example, the template is in the Library, in the Templates folder, in the Email subfolder, and it’s called Notification.md.
-
Write the email address where you want the email to go.
-
Begins the section that gives directions for how to fill the blank fields from the email template.
-
An SP-Lang expression that creates a dictionary so you can use key-value pairs in the notification. (The key is the first word, and the value is what follows.) Always include
!DICT
. -
Always make type "{str:any}" so that the values in the key-value pairs can be in any format (numbers, words, arrays, etc.).
-
Always include
with
, because it begins the list of fields from the email template. Everything nested underwith
is a field from the email template. -
The name of the detection rule, which should be understandable to the recipient
-
events
is the key, or field name, and!ARG EVENTS
is an SP-Lang expression that lists the logs that caused a positive detection from the detection rule. -
address
is the key, or field name, and!ITEM EVENT client.address
gets the value of the fieldclient.address
from each log that caused a positive detection from the detection rule. -
Your description of the event, which needs to be very clear and accurate
Populating the email template
name
, events
, address
, and description
are fields in the email template in this example. Always make sure that the keys you write in the with
section match the fields in your email template.
The fields name
and description
are static text values - they stay the same in every notification.
The fileds events
and address
are dynamic values - they change based on which logs caused a positive detection from the detection rule. You can write dynamic fields using TeskaLabs SP-Lang.
Refer to our directions for creating email templates to write templates that work correctly as notifications.
Creating email templates¶
An email template is a document that works with a notification
to send an email, for example as a result of a positive detection in a detection rule. Jinja template fields allow the email template to have dynamic values that change based on variables such as events involved in a positive detection. (After you learn about creating email templates, learn how to use Jinja template fields.)
The email template provides the text that the recipient sees when they get an email from the notification. You can find email templates in your Library in the Templates folder.
When you write an email template to go with a notification, make sure that the template fields in each item match.
How do the notification and email template work together?
TeskaLabs ASAB Iris is a message-sending microservice that pairs the notification and the email template to send emails with populated placeholder fields.
Creating an email template¶
Create a new blank email template
- In the Library, click Templates, then click Email.
- To the right, click Create new item in Email.
- Name your template, choose the file type, and click Create. If the new item doesn't appear immediately, refresh the page.
- Now, you can write the template.
Copy an existing email template
- In the Library, click Templates, then click Email.
- Click on the existing template you'd like to copy. The copy you create will be placed in the same folder as the original.
- Click the icon at the top of the screen, and click Copy.
- Rename the file, choose the file type, and click Copy. If the new item doesn't appear immediately, refresh the page.
- Click Edit to make changes, and click Save to save your changes.
To exit editing mode, save by clicking Save or cancel by clicking Cancel.
Writing an email template¶
You can write email templates in Markdown or in HTML. Markdown is less complex, but HTML gives you more formatting options.
When you write the text, make sure to tell the recipient:
- Who the email is from
- Why they are receiving this email
- What the email/alert means
- How to investigate or follow up on the problem - include all of the relevant and useful information, such as log IDs or direct links to view selected logs
Simple template example using Markdown:
SUBJECT: {{ name }}
TeskaLabs LogMan.io has identified a noteworthy event in your IT infrastructure which might require your immediate attention.
Please review following summary of the event:
Event: {{name}}
Event description: {{description}}
This notification has been created based on the original log/logs:
{% for event in events %}
- {{event}}
{% endfor %}
The notification was generated for this address: {{address}}
We encourage you to review this incident promptly to determine the next appropriate course of action.
Remember, the effectiveness of any security program lies in a swift response.
Thank you for your attention to this matter.
Stay safe,
TeskaLabs LogMan.io
Made with <3 by [TeskaLabs](https://teskalabs.com)
The words in double braces (such as {{address}}
) are template fields, or placeholders. These are the Jinja template fields that pull information from the notification
section in a detection rule. Learn about Jinja templates here.
Testing an email template¶
You can test an email template using the Test template feature. Testing an email template means sending a real email to see if the format and fields are displaying correctly. This test does not interact with the detection rule at all.
Fill out the From, To, CC, BCC, and Subject fields the same way you would for any email (but it's best practice to send the email to yourself). You must always fill in, at minimum, the From and To fields.
Test parameters¶
You can populate the Jinja template fields for testing purposes using the Parameters tool. Write the parameters in JSON. JSON uses keys-value pairs. Keys are the fields in the template, and values are what populate the fields.
In this example, the keys and values are highlighted to show that the keys in Parameters need to match the fields in the template, and the values will populate the fields in the resulting email:
Parameters has two editing modes: the text editor and the clickable JSON editor. To switch between modes, click the <···> or icon beside Parameters. You can switch between modes without losing your work.
Clickable editor¶
To switch to the clickable JSON editor, click the <···> icon beside Parameters. The clickable editor formats your parameters for you and tells you the value type for each item.
How to use the clickable editor:
In the clickable editor, edit, delete, and add icons appear when you hover over lines and items.
1. Add a key: When you hover over the top line (it says the number of keys you have, for example 0 items), a icon appears. To add a parameter, click the icon. It prompts you for the key name. Type the key name (the field name you want to test) and click the to save. Don't use quotation marks - the editor adds the quotation marks for you. The key name appears with the value NULL
beside it.
2. Add a value: To edit the value, click the icon that appears when you hover beside NULL
. Type the value (what you want to appear in place of the field/placeholder in the email you send), and save by clicking the icon.
3. To add more key-value pairs, click on the that appears when you hover over the top line.
4. To delete an item, click the that appears when you hover over the item. To edit an item, click the that appears when you hover over the item.
Text editor¶
To switch to the text editor, click the icon beside Parameters.
Example of parameter formatting:
{
"name":"Detection rule 1",
"description":"Description of Detection rule 1",
"events":["log-ID-1", "log-ID-2", "log-ID-3"],
"address":"Example address"
}
Quick JSON tips
- Begin and end the parameters with braces (curly brackets)
{}
- Write every item, both keys and values, in quotation marks
""
- Link keys to their values with a colon
:
(for example:"key":"value"
) - Separate key-value pairs with commas
,
. You can also use spaces and line breaks for your own readability - they'll be ignored in terms of function. - Type arrays in brackets
[]
and separate items with commas (the keyevents
might have multiple values, as the Jinjafor
expression allows for, so here it's written as an array)
The testing box tells you if the parameters are not in a valid JSON format.
Switching modes¶
You can switch modes and continue editing your parameters. The Parameters tool will automatically convert your work for the new mode.
Note about arrays
An array is a list of multiple values. To edit an array value in the clickable editor, you need to type at least two values manually in the text editor in the correct array format (see Quick JSON tips above). Then, you can switch to the clickable editor and add more items to the array.
Sending the test email¶
When you're ready to test the email, click Send. You should receive the email in the inbox of addressee in the To: field, where you can check the formatting of the template. If you don't see the email, check your spam folder.
Jinja templating¶
The notification
section of a detection rule works with an email template to send a message when the detection rule is triggered. The email template has placeholder fields, and the notification determines what fills those placeholder fields in the actual email that the recipient gets. This is possible because of Jinja templating. (Learn about writing email templates before you learn about Jinja fields.)
Format¶
Format all Jinja template fields with two braces (curly brackets) on each side of the field name in both Markdown and HTML email templates. You can use or not use a space on either side of the field name.
{{fieldname}}
OR {{ fieldname }}
For a more in-depth explanation of Jinja templating, visit this tutorial.
if
expression¶
You might want to use the same email template for multiple detection rules. Since different detection rules might have different data included, some parts of your email might only be relevant for some detection rules. You can use if
to include a section only if a certain key in the notification template has a value. This helps you avoid unpopulated template fields or nonsensical text in an email.
In this example, anything between if
and endif
is only included in the email if the key sender
has a value in the notification section of the detection rule. (If there is no value for sender
, this section won't appear in the email.)
{% if sender %}
The email address {{ sender }} has sent a suspicious number of emails.
{% endif %}
For more details, visit this tutorial.
for
expression¶
Use for
when you might have multiple values from the same category that you want to appear as a list in your email.
In this example, events
is the actual template field that you'd see in the notification, and it might contain multiple values (in this case, multiple log IDs). Here, log
is just a temporary variable used only in this for
expression to represent one value that the notification sends from the field events
. (This temporary variable could be any word, as it refers only to itself in the email template.) The for
expression allows the template to display these multiple values as a bulleted list (mutliple instances).
{% for log in events %}
- {{ log }}
{% endfor %}
For more details, visit this tutorial.
Link templating¶
Thanks to TeskaLabs ASAB Iris, you can include links in your emails that change based on tenant or events detected by the rule.
Link to a tenant's home page:
{{lmio_url}}/?tenant={{tenant}}#/
tenant
in your detection rule notification
section for the link to work.
Link to a specific log:
[{{event}}]({{lmio_url}}/?tenant={{tenant}}#/discover/lmio-{{tenant}}-events?aggby=minute&filter={{event}}&ts=now-2d&te=now&refresh=off&size=40)
tenant
or lmio_url
in your detection rule notification
section for the link to work.Using Base64 images in HTML email templates¶
To hardcode an image into an email template written in HTML, use Base64. Converting an image to Base64 makes the image into a long string of text.
- Use an image converting tool (such as this one by Atatus) to convert your image to Base64.
- Using image
<img>
and alt textalt
tags, copy and paste the Base64 string into your template like this:
<img alt="ALT TEXT HERE" src="PASTE HERE"/>
Note
The alt text is optional, but it is recommended in case your image doesn't load for any reason.
Ended: Notifications
Ended: Analyst Manual
Administration Manual ↵
TeskaLabs LogMan.io Administration Manual¶
Welcome to the Administration Manual. Use this guide to set up and configure LogMan.io for yourself or clients.
Installation¶
TeskaLabs LogMan.io could be installed manually on compute resources. Compute resources include physical servers, virtual servers, private and public cloud compute/VM instances and so on.
Danger
TeskaLabs LogMan.io CANNOT BE operated under root
user (superuser). Violation of this rule may lead to a significant cybersecurity risks.
Prerequisites¶
- Hardware (physical or virtualized server)
- OS Linux: Ubuntu 22.04 LTS and 20.04 LTS, RedHat 8 and 7, CentOS 7 and 8 (for others, kindly contact our support)
- Network connectivity with enabled outgoing access to the Internet (could be restricted after the installation); details are descibed here
- Credentials to SMTP server for outgoing emails
- DNS domain, even internal (needed for HTTPS setup)
- Credentials to "docker.teskalabs.com" (contact our support if you don't have one)
From Bare Metal server to the Operating system¶
Note
Skip this section if you are installing on the virtual machine, respective on the host with the operating system installed already.
Prerequisites¶
- The server that conforms to prescribed data storage organisation.
- Bootable USB stick with Ubuntu Server 22.04 LTS; the most recent release.
- Access to the server equipped with a monitor and a keyboard; alternatively over IPMI or equivalent Out-of-band management.
- Network connectivity with enabled outgoing access to the Internet.
Note
These are additional prerequisites on top of the general prerequisites from above.
Steps¶
1) Boot the server using a bootable USB stick with Ubuntu Server.
Insert the bootable USB stick into the USB port of the server, then power on the server.
Use UEFI partition on the USB stick as a boot device.
Select "Try or Install Ubuntu Server" in a boot menu.
2) Select "English" as the language
3) Update to the new installer if needed
4) Select the english keyboard layout
5) Select the "Ubuntu Server" installation type
6) Configure the network connection
This is the network configuration for the installation purposes, the final network configuration can be different.
If you are using DHCP server, the network configuration is automatic.
IMPORTANT: The Internet connectivity must be available.
Note the IP address of the server for a future use.
7) Skip or configure the proxy server
Skip (press "Done") the proxy server configuration.
8) Confirm selected mirror address
Confirm the selected mirror address by pressing "Done".
9) Select "Custom storage layout"
The custom storage layout of the system storage is as follows:
Mount | Size | FS | Part. | RAID / Part. | VG / LV |
---|---|---|---|---|---|
/boot/efi |
1G | fat32 | 1 | ||
SWAP | 64G | 2 | |||
/boot |
2G | ext4 | 3 | md0 / 1 |
|
/ |
50G | etx4 | 3 | md0 / 2 |
systemvg / rootlv |
/var/log |
50G | etx4 | 3 | md0 / 2 |
systemvg / loglv |
Unused | >100G | 3 | md0 / 2 |
systemvg |
Legend:
- FS: Filesystem
- Part.: GUID Partition
- RAID / Part.: MD RAID volume and a partition on the given RAID volume
- VG: LVM Volume Group
- LV: LVM Logical Volume
Note
Unused space will be used later in the installation for i.e. Docker containers.
10) Identify two system storage drives
The two system storage drives are structured symmetrically to provided redundancy in case of one system drive failure.
Note
The fast and slow storage is NOT configured here during the OS installation but later from the installed OS.
11) Set the first system storage as a primary boot device
This step will create a first GPT partition with UEFI, that is mounted at /boot/efi
.
The size of this partition is approximately 1GB.
12) Set the second system storage as a secondary boot device
Another UEFI partition is created on the second system storage.
13) Create SWAP partitions on both system storage drives
On each of two drives, add a GPT partition with size 64G and format swap
.
Select "free space" on respective system storage drive and then "Add GPT Partition"
Resulting layout is as follows:
14) Create the GPT partition for RAID1 on both system storage drives
On each of two drives, add GPT partition with the all remaining free space. The format is "Leave unformatted" because this partition will be added to the RAID1 array. You can leave “Size” blank to use all the remaining space on the device.
The result is "partition" entry instead of the "free space" on respective drives.
15) Create software RAID1
Select "Create software RAID (md)".
The name of the array is md0
(default).
RAID level is "1 (mirrored)".
Select two partitions from the above step, keep them marked as "active", and press "Create".
The layout of system storage drives is following:
16) Create a BOOT partition of the RAID1
Add a GPT partition onto the md0
RAID1 from the step above.
The size is 2G, format is ext4
and the mount is /boot
.
17) Setup LVM partition on the RAID1
The remaining space on the RAID1 will be managed by LVM.
Add a GPT partition onto the md0
RAID1, using "free space" entry under md0
device.
Use the maximum available space and set the format to "Leave unformatted". You can leave “Size” blank to use all the remaining space on the device.
18) Setup LVM system volume group
Select "Create volume group (LVM)".
The name of the volume group is systemvg
.
Select the available partition on the md0
that has been created above.
19) Create a root logical volume
Add a logical volume named rootlv
on the systemvg
(in "free space" entry), the size is 50G, format is ext4
and mount is /
.
20) Add a dedicated logical volume for system logs
Add a logical volume named loglv
on the systemvg
, the size is 50G, format is ext4
and mount is "Other" and /var/log
.
21) Confirm the layout of the system storage drives
Press "Done" on the bottom of the screen and eventually "Continue" to confirm application of actions on the system storage drives.
22) Profile setup
Your name: TeskaLabs Admin
Your server's name: lm01
(for example)
Pick a username: tladmin
Select a temporary password, it will be removed at the end of the installation.
23) SSH Setup
Select "Install OpenSSH server"
24) Skip the server snaps
Press "Done", no server snaps will be installed from this screen.
25) Wait till the server is installed
It takes approximately 10 minutes.
When the installation is finished, including security updated, select "Reboot Now".
26) When prompted, remove USB stick from the server
Press "Enter" to continue reboot process.
Note
You can skip this step if you are installing over IPMI.
27) Boot the server into the installed OS
Select "Ubuntu" in the GRUB screen or just wait 30 seconds.
28) Login as tladmin
29) Update the operating system
sudo apt update
sudo apt upgrade
sudo apt autoremove
30) Configure the slow data storage
Slow data storage (HDD) is mounted at /data/hdd
.
Assuming the server provides following disk devices /dev/sdc
, /dev/sdd
, /dev/sde
, /dev/sdf
, /dev/sdg
and /dev/sdh
.
Create software RAID5 array at /dev/md1
with ext4
filesystem, mounted at /data/hdd
.
sudo mdadm --create /dev/md1 --level=5 --raid-devices=6 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
Note
For the RAID6 array, use --level=6
.
Create a EXT4 filesystem and the mount point:
sudo mkfs.ext4 -L data-hdd /dev/md1
sudo mkdir -p /data/hdd
Enter the following line to /etc/fstab
:
/dev/disk/by-label/data-hdd /data/hdd ext4 defaults,noatime 0 1
Danger
The noatime
flag is important for a optimal storage performance.
Mount the drive:
sudo mount /data/hdd
Note
The RAID array construction can take substantial amount of time. You can monitor progress by cat /proc/mdstat
. Server reboots are safe during RAID array construction.
You can speed up the construction by increasing speed limits:
sudo sysctl -w dev.raid.speed_limit_min=5000000
sudo sysctl -w dev.raid.speed_limit_max=50000000
These speed limit settings will last till the next reboot.
31) Configure the fast data storage
Fast data storage (SSD) is mounted at /data/ssd
.
Assuming the server provides following disk devices /dev/nvme0n1
and /dev/nvme1n1
.
Create software RAID1 array at /dev/md2
with ext4
filesystem, mounted at /data/ssd
.
sudo mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1
sudo mkfs.ext4 -L data-ssd /dev/md2
sudo mkdir -p /data/ssd
Enter the following line to /etc/fstab
:
/dev/disk/by-label/data-ssd /data/ssd ext4 defaults,noatime 0 1
Danger
The noatime
flag is important for a optimal storage performance.
Mount the drive:
sudo mount /data/ssd
32) Persist the RAID array configuration
Run:
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
The example of the output:
ARRAY /dev/md/2 metadata=1.2 name=lmd01:2 UUID=5ac64642:51677d00:20c5b5f9:7de93474
ARRAY /dev/md/1 metadata=1.2 name=lmd01:1 UUID=8b0c0872:b8c08564:1815e508:a3753449
Update the init ramdisk:
sudo update-initramfs -u
33) Disable periodic check of RAID
sudo systemctl disable mdcheck_continue
sudo systemctl disable mdcheck_start
34) Installation of the OS is completed
Reboot the server to verify the correctness of the OS installation.
sudo reboot
Here is a video, that recapitulates the installation process:
From the Operating system to the Docker¶
Prerequisites¶
- Running server with installed operating system.
- Access to the server over SSH, the user is
tladmin
with an permission to executesudo
. - Slow storage mounted at
/data/hdd
. - Fast storage mounted at
/data/ssd
.
Steps¶
1) Login into the server over SSH as an user tladmin
ssh tladmin@<ip-of-the-server>
2) Configure SSH access
Install public SSH key(s) for tladmin
user:
cat > /home/tladmin/.ssh/authorized_keys
Restrict the access:
sudo vi /etc/ssh/sshd_config
Changes in the ssh_config
:
PermitRootLogin no
PubkeyAuthentication yes
PasswordAuthentication no
3) Configure Linux kernel parameters
Write this contents into file /etc/sysctl.d/01-logman-io.conf
vm.max_map_count=262144
net.ipv4.ip_unprivileged_port_start=80
The parameter vm.max_map_count
increase the maximum number of mmaps in Virtual Memory subsystem of Linux.
It is needed for the ElasticSearch.
The parameter net.ipv4.ip_unprivileged_port_start
enabled unpriviledged processes to listen on port 80 (and more).
This is to enable NGINX to listen on this port and not require elevated priviledges.
4) Install a Docker
Docker is necessary for deployment of all LogMan.io microservices in containers, namely Apache Kafka, ElasticSearch, NGINX and individual streaming pumps etc.
Create dockerlv
logical volume with EXT4 filesystem:
sudo lvcreate -L100G -n dockerlv systemvg
sudo mkfs.ext4 /dev/systemvg/dockerlv
sudo mkdir /var/lib/docker
Enter the following line to /etc/fstab
:
/dev/systemvg/dockerlv /var/lib/docker ext4 defaults,noatime 0 1
Mount the volume:
sudo mount /var/lib/docker
Install the Docker package:
sudo apt-get install ca-certificates curl gnupg lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker tladmin
Re-login to the server to apply the group change.
5) Install git
sudo apt install git
6) Configure hostnames' resolution (optionally)
TeskaLabs LogMan.io cluster requires that each node can resolve IP address of any other cluster node from its hostname.
If the configured DNS server doesn't provide this ability, node names and their IP addresses have to be inserted into /etc/hosts
.
sudo vi /etc/hosts
Example:
192.168.108.101 lma1
192.168.108.111 lmb1
192.168.108.121 lmx1
7) Reboot the server
sudo reboot
This is important to apply all above parametrization.
From a Docker to a running LogMan.io¶
Steps¶
1) Create a folder structure
sudo mkdir -p \
/data/ssd/zookeeper/data \
/data/ssd/zookeeper/log \
/data/ssd/kafka/kafka-1/data \
/data/ssd/elasticsearch/es-master/data \
/data/ssd/elasticsearch/es-hot01/data \
/data/ssd/elasticsearch/es-warm01/data \
/data/hdd/elasticsearch/es-cold01/data \
/data/ssd/influxdb/data \
/data/hdd/nginx/log
Change ownership to elasticsearch data folder:
sudo chown -R 1000:0 /data/ssd/elasticsearch
sudo chown -R 1000:0 /data/hdd/elasticsearch
2) Clone the site configuration files into the /opt
folder:
cd /opt
git clone https://gitlab.com/TeskaLabs/<PARTNER_GROUP>/<MY_CONFIG_REPO_PATH>
Login to docker.teskalabs.com.
cd <MY_CONFIG_REPO_PATH>
docker login docker.teskalabs.com
Enter the repository and deploy the server specific Docker Compose file:
docker compose -f docker-compose-<SERVER_ID>.yml pull
docker compose -f docker-compose-<SERVER_ID>.yml build
docker compose -f docker-compose-<SERVER_ID>.yml up -d
Check that all containers are running:
docker ps
Hardware for TeskaLabs LogMan.io¶
This is a hardware specification designed for vertical scalability. It is optimised for those who plan to built an initial TeskaLabs LogMan.io cluster with the lower possible cost yet with the possibility to add more hardware gradually as the cluster grows. This specification is also fully compatible with the horizontal scalability strategy, which means adding one or more new server node to the cluster.
Specifications¶
- Chasis: 2U
- Front HDD trays: 12 drive bays, 3.5", for Data HDDs, hot-swap
- Rear HDD trays: 2 drive bays, 2.5", for OS HDDs, hot-swap
- CPU: 1x AMD EPYC 32 Cores
- RAM: 256GB DDR4 3200, using 64GB modules
- Data SSD: 2x 4TB SSD NVMe, PCIe 3.0+
- Data SSD controller: NVMe PCIe 3.0+ riser card, no RAID; or use motherboard NVMe slots
- Data HDD: 3x 20TB SATA 2/3+ or SAS 1/2/3+, 6+ Gb/s, 7200 rpm
- Data HDD controller: HBA or IT mode card, SATA or SAS, JBOD, no RAID, hot-swap
- OS HDD: 2x 256GB+ SSD SATA 2/3+, HBA, no RAID, directly attached to motherboard SATA
- Network: 2x 1Gbps+ Ethernet NIC; or 1x dual port
- Power supply: Redundant 920W
- IPMI or equivalent
Note
RAID is implemented in software/OS.
Vertical scalability¶
- Add one more CPU (2 CPUs in total), a motherboard with 2 CPU slots is required for this option
- Add RAM up to 512GB
- Add up to 9 additional Data HDDs, maximum 220 TB space using 12x 20 TB HDDs in RAID5
Note
3U and 4U variants are also available with 16 respective 24 drive bays.
Last update: Dec 2023
Data Storage¶
TeskaLabs LogMan.io operates with several different storage tiers in order to deliver optimal data isolation, performance and the cost.
Data storage structure¶
Schema: Recommended structure of the data storage.
Fast data storage¶
Fast data storage (also known as 'hot' tier) contains the most fresh logs and other events received into the TeskaLabs LogMan.io. We recommend to use the fastest possible storage class for the best throughput and search performance. The real-time component (Apache Kafka) also uses fast data storage for the stream persistency.
- Recommended time span: a one day to one week
- Recommended size: 2TB - 4TB
- Recommended redundancy: RAID 1, additional redundancy is provided by the application layer
- Recommended hardware: NVMe SSD PCIe 4.0 and better
- Fast data storage physical devices MUST BE managed by mdadm
- Mount point:
/data/ssd
- Filesystem: EXT4,
noatime
flag is recommended to be set for an optimum performance
Backup strategy¶
Incoming events (logs) are copied into the archive storage once they enter TeskaLabs LogMan.io. It means that there is always the way how to "replay" events into the TeskaLabs LogMan.in in case of need. Also, data are replicated to other nodes of the cluster immediately after arrival to the cluster. For this reason, traditional backup is not recommended but possible.
The restoration is handled by the cluster components by replicating the data from other nodes of the cluster.
Example
/data/ssd/kafka-1
/data/ssd/elasticsearch/es-master
/data/ssd/elasticsearch/es-hot1
/data/ssd/zookeeper-1
/data/ssd/influxdb-2
...
Slow data storage¶
The slow storage contains data, that does not have to be quickly accessed, and usually contain older logs and events, such as warm and cold indices for ElasticSearch.
- Recommended redundancy: software RAID 6 or RAID 5; RAID 0 for virtualized/cloud instances with underlying storage redundancy
- Recommended hardware: Cost-effective hard drives, SATA 2/3+, SAS 1/2/3+
- Typical size: tens of TB, e.g. 18TB
- Controller card: SATA or HBA SAS (IT Mode)
- Slow data storage physical devices MUST BE managed by software RAID (mdadm)
- Mount point:
/data/hdd
- Filesystem: EXT4,
noatime
flag is recommended to be set for an optimum performance
Calculation of the cluster capacity¶
This is a formula how to calculate total available cluster capacity on the slow data storage.
total = (disks-raid) * capacity * servers / replica
disks
is a number of slow data storage disk per serverraid
is a RAID overhead, 1 for RAID5 and 2 for RAID6capacity
is a capacity of the slow data storage diskservers
is a number of serversreplica
is a replication factor in ElasticSearch
Example
(6[disks]-2[raid6]) * 18TB[capacity] * 3[servers] / 2[replica] = 108TB
Backup strategy¶
The data stored on the slow data storage are ALWAYS replicated to other nodes of the cluster and also stored in the archive. For this reason, traditional backup is not recommended but possible (consider the huge size of the slow storage).
The restoration is handled by the cluster components by replicating the data from other nodes of the cluster.
Example
/data/hdd/elasticsearch/es-warm01
/data/hdd/elasticsearch/es-warm02
/data/hdd/elasticsearch/es-cold01
/data/hdd/mongo-2
/data/hdd/nginx-1
...
Large slow data storage strategy¶
If your slow data storage will be larger than >50 TB, we recommend to employ HBA SAS Controllers, SAS Expanders and JBOD as the optimal strategy for scaling slow data storage. SAS storage connectivity can be daisy-chained to enable large number of drives to be connected. External JBOD chassis can be also connected using SAS to provide housing for additional drives.
RAID 6 vs RAID 5¶
RAID 6 and RAID 5 are both types of RAID (redundant array of independent disks) that use data striping and parity to provide data redundancy and increased performance.
RAID 5 uses striping across multiple disks, with a single parity block calculated across all the disks. If one disk fails, the data can still be reconstructed using the parity information. However, the data is lost if a second disk fails before the first one has been replaced.
RAID 6, on the other hand, uses striping and two independent parity blocks, which are stored on separate disks. If two disks fail, the data can still be reconstructed using the parity information. RAID 6 provides an additional level of data protection compared to RAID 5. However, RAID 6 also increases the overhead and reduces the storage capacity because of the two parity blocks.
Regarding slow data storage, RAID 5 is generally considered less secure than RAID 6 because the log data is usually vital, and two disk failures could cause data loss. RAID 6 is best in this scenario as it can survive two disk failures and provide more data protection.
In RAID 5, the number of disks required is (N-1) disks, where N is the number of disks in the array. This is because one of the disks is used for parity information, which is used to reconstruct the data in case of a single disk failure. For example, if you want to create a RAID 5 array with 54 TB of storage, you would need at least four (4) disks with a capacity of at least 18 TB each.
In RAID 6, the number of disks required is (N-2) disks. This is because it uses two sets of parity information stored on separate disks. As a result, RAID 6 can survive the failure of up to two disks before data is lost. For example, if you want to create a RAID 6 array with 54 TB of storage, you would need at least five (5) disks with a capacity of at least 18 TB each.
It's important to note that RAID 6 requires more disk space as it uses two parity blocks, while RAID5 uses only one. That's why RAID 6 requires additional disks as compared to RAID 5. However, RAID 6 provides extra protection and can survive two disk failures.
It is worth mentioning that the data in slow data storage are replicated across the cluster (if applicable) to provide additional data redundancy.
Tip
Use Online RAID Calulator to calculate storage requirements.
System storage¶
The system storage is dedicated for an operation system, software installations and configurations. No operational data are stored on the system storage. Installations on virtualization platforms uses commonly available locally redundant disk space.
- Recommended size: 250 GB and more
- Recommend hardware: two (2) local SSD disks in software RAID 1 (mirror), SATA 2/3+, SAS 1/2/3+
If applicable, following storage parititioning is recommended:
- EFI partition, mounted at
/boot/efi
, size 1 GB - Swap partition, 64 GB
- Software RAID1 (mdadm) over rest of the space
- Boot partition on RAID1, mounted at
/boot
, size 512 MB, ext4 filesystem - LVM partition on RAID1, rest of the available space with volume group
systemvg
- LVM logical volume
rootlv
, mounted at/
, size 50 GB, ext4 filesystem - LVM logical volume
loglv
, mounted at/var/log
, size 50 GB, ext4 filesystem - LVM logical volume
dockerlv
, mounted at/var/lib/docker
, size 100 GB, ext4 filesystem (if applicable)
Backup strategy for the system storage¶
It is recommended to periodically backup all filesystems on the system storage so that they could be used for restoring the installation when needed. The backup strategy is compatible with most common backup technologies in the market.
- Recovery Point Objective (RPO): full backup once per week or after major maintenance work, incremental backup one per day.
- Recovery Time Objective (RTO): 12 hours.
Note
RPO and RTO are recommended, assuming highly available setup of the LogMan.io cluster. It means three and more nodes so that the complete downtime of the single node don't impact service availability.
Archive data storage¶
Data archive storage is recommended but optional. It serves for a very long data retention periods and redundancy purposes. It also represents economical way of long-term data storage. Data are not available online in the cluster, they has to be restored back when needed, which is connected with a certain "time-to-data" interval.
Data are compressed when copied into the archive, the typical compression ratio is in range from 1:10 to 1:2, depending on the nature of the logs.
Data are replicated into the storage after initial consolidation on the fast data storage, practically immediately after ingesting into a cluster.
- Recommended technologies: SAN / NAS / Cloud cold storage (AWS S3, MS Azure Storage)
- Mount point:
/data/archive
(if applicable)
Note
Public clouds can be used as a data archive storage. Data encryption has to be enabled in such a case to protect data from unauthorised access.
Dedicated archive nodes¶
For large archives, dedicated archive nodes (servers) are recommended. These nodes should use HBA SAS drive connectivity and storage-oriented OS distributions such as Unraid or TrueNAS.
Data Storage DON'Ts¶
- We DON'T recommend use of NAS / SAN storage for data storages
- We DON'T recommend use of hardware RAID controllers etc. for data storages
The storage administration¶
This chapter provides a practical example of the configuration of the storage for TeskaLabs LogMan.io. You don't need to configure or manage the LogMan.io storage unless you have a specific reason for it, the LogMan.io is delivered in fully configured state.
Assuming following hardware configuration:
- SSD drives for a fast data storage:
/dev/nvme0n1
,/dev/nvme1n1
- HDD drives for a slow data storage:
/dev/sde
,/dev/sdf
,/dev/sdg
Tip
Use lsblk
command to monitor the actual status of the storage devices.
Create a software RAID1 for a fast data storage¶
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1
mkfs.ext4 /dev/md2
mkdir -p /data/ssd
Add mount points into /etc/fstab
:
/dev/md2 /data/ssd ext4 defaults,noatime 0 2
Mount data storage filesystems:
mount /data/ssd
Tip
Use cat /proc/mdstat
to check the state of the software RAID.
Create a software RAID5 for a slow data storage¶
mdadm --create /dev/md1 --level=5 --raid-devices=3 /dev/sde /dev/sdf /dev/sdg
mkfs.ext4 /dev/md1
mkdir -p /data/hdd
Note
For RAID6 use --level=6
.
Add mount points into /etc/fstab
:
/dev/md1 /data/hdd ext4 defaults,noatime 0 2
Mount data storage filesystems:
mount /data/hdd
Grow the size of a data storage¶
With ever increasing data volumes, it is highly likely that you need to grow (aka extend) the data storage, either on fast or slow data storage. It is done by adding a new data volume (eg. physical disk or virtual volume) to the machine - or on some virtualized solutions - by growing an existing volume.
Note
The data storage could be extended without any downtime.
Slow data storage grow example¶
Assuming that you want to add a new disk /dev/sdh
to a slow data storage /dev/md1
:
mdadm --add /dev/md1 /dev/sdh
The new disk is added as a spare device.
You can check the state of the RAID array by:
cat /proc/mdstat
The (S) behind the device means spare device.
The grow the RAID to the spare devices:
mdadm --grow --raid-devices=4 /dev/md1
Number 4
needs to be adjusted to reflect the actual RAID setup.
Grow the filesystem:
resize2fs /dev/md1
Networking¶
This documentation section is designed to guide you through the process of setting up and managing the networking of TeskaLabs LogMan.io. To ensure seamless functionality, it is important to follow the prescribed network configuration described below.
Schema: Network overview of the LogMan.io cluster.
Fronting network¶
Fronting network is a private L2 or L3 segment that serves for log collection. For that reason, it has to be accessible from all log sources.
Each node (server) has a dedicated IPv4 address on a fronting network. IPv6 is also supported.
Fronting network must be available at all locations of the LogMan.io cluster.
User network¶
User is a private L2 or L3 segment that serves for a user access to Web User Interface. For that reason, it has to be accessible for all users.
Each node (server) has a dedicated IPv4 address on a user network. IPv6 is also supported.
User network must be available at all locations of the LogMan.io cluster.
Internal network¶
Internal network is a private L2 or L3 segment that is used for private cluster communication. It MUST BE dedicated to the TeskaLabs LogMan.io with no external access to maintain the security envelope of the cluster. The internal network must provide the encryption if it is operated in the shared environment (ie as VLAN). This is critical requirement for a security of the cluster.
Each node (server) has a dedicated IPv4 address on an internal network. IPv6 is also supported.
Internal network must be available at all locations of the LogMan.io cluster.
Containers running on the node use "network mode" set to "host" on the internal network. It means that container’s network stack is not isolated from the node (host), and the container does not get its own IP address.
Connectivity¶
Each node (aka server) has following connectivity requirement:
Fronting network¶
- Minimal: 1Gbit NIC
- Recommended: 2x bonded 10Gbit NIC
User network¶
- Minimal: shared with the fronting network
- Recommended: 1Gbit NIC
Internal network¶
- Minimal: No NIC, internal only for a single node installations, 1Gbit
- Recommended: 2x bonded 10Gbit NIC
- IPMI if available at the server level
Internet connectivity (NAT, Firewalled, behind proxy server) using Fronting network OR Internal network.
SSL Server Certificate¶
The fronting network and the user network exposes web interfaces over HTTPS on the port TCP/443. For this reason, the LogMan.io needs an SSL Server certificate.
It could be either:
- self-signed SSL server certificate
- SSL server certificate issued by the Certificate Authority operated internally by the user
- SSL server certificate issued by a public (commercial) Certificate Authority
Tip
You can use XCA tool to generate or verify your SSL certificates.
Self-signed certificate¶
This option is suitable for very small deployments.
Users will get warnings from thier browsers when accessing LogMan.io Web interface.
Also insecure
flags needs to be used in collectors.
Create a self-signed SSL certificate using OpenSSL command-line
openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
-keyout key.pem -out cert.pem -sha256 -days 3650 -nodes \
-subj "/CN=logman.int"
This command will create key.pem
(a private key) and cert.pem
(a certificate), for internal domain name logman.int
.
Certificate from Certificate Authority¶
Parameters for the SSL Server certificate:
- Private key: EC 384 bit, curve secp384p1 (minimum), alternatively RSA 2048 (minimum)
- Subject Common name
CN
: Fully Qualified Domain Name of the LogMan.io user Web UI - X509v3 Subject Alternative Name: Fully Qualified Domain Name of the LogMan.io user Web UI set to "DNS"
- Type: End Entity, critical
- X509v3 Subject Key Identifier set
- X509v3 Authority Key Identifier set
- X509v3 Key Usage: Digital Signature, Non Repudiation, Key Encipherment, Key Agreement
- X509v3 Extended Key Usage: TLS Web Server Authentication
Example of SSL Server certificate for http://logman.example.com/
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 6227131463912672678 (0x566b3712dc2c4da6)
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = logman.example.com
Validity
Not Before: Nov 16 11:17:00 2023 GMT
Not After : Nov 15 11:17:00 2024 GMT
Subject: CN = logman.example.com
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (384 bit)
pub:
04:79:e2:9f:69:cb:ac:f5:3f:93:43:56:a5:ac:d7:
cf:97:f9:ba:44:ee:9b:53:89:19:fd:91:02:0d:bd:
59:41:d6:ec:c6:2b:01:33:03:b6:3e:4a:1d:f4:e9:
2c:3f:af:49:92:79:9c:00:0b:0b:e3:28:7b:13:33:
b4:ac:88:d7:9c:0a:7b:95:90:09:a2:f7:aa:ce:7c:
51:3e:3a:94:af:a8:4b:65:4f:82:90:6a:2f:a9:57:
25:6f:5f:80:09:4c:cb
ASN1 OID: secp384r1
NIST CURVE: P-384
X509v3 extensions:
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Key Identifier:
49:7A:34:F8:A6:EB:6D:8E:92:42:57:BB:EB:2D:B3:82:F4:98:9D:17
X509v3 Authority Key Identifier:
49:7A:34:F8:A6:EB:6D:8E:92:42:57:BB:EB:2D:B3:82:F4:98:9D:17
X509v3 Key Usage:
Digital Signature, Non Repudiation, Key Encipherment, Key Agreement
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Subject Alternative Name:
DNS:logman.example.com
Signature Algorithm: ecdsa-with-SHA256
Signature Value:
30:64:02:30:16:09:95:f4:04:1b:99:f4:06:ef:1e:63:4e:aa:
1d:21:b0:b1:31:c1:84:9a:a9:55:c6:14:bd:a1:62:c5:14:14:
35:73:da:8b:a8:7b:f2:f6:4c:8c:b0:6b:72:79:5f:4c:02:30:
49:6f:ef:05:0f:dd:28:fb:26:f8:76:71:01:f3:e4:da:63:72:
17:db:96:fb:5c:09:43:f8:7b:3b:a1:b6:dc:23:31:66:5d:23:
18:94:0b:e4:af:8b:57:1e:c3:3d:93:6f
Generate a CSR¶
If the Certificate Authority requires CSR to be submitted to receive a SSL certificate, follow this procedure:
1. Generate a private key:
openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:prime256v1 -out key.pem
This command will create key.pem
with the private key.
2. Create CSR using generated private key:
openssl req -new -key key.pem -out csr.pem -subj "/CN=logman.example.com"
This command will produce csr.pem
file with that Certificate Signing Request.
Replace logman.example.com
with the FQDN (domain name) of the LogMan.io deployment.
3. Submit the CSR to a Certificate Authority
The Certificate Authority will generate a certificate, store it in a cert.pem
in a PEM format.
Cluster¶
TeskaLabs LogMan.io can be deployed into a single server (aka "node") or in a cluster setup. TeskaLabs LogMan.io supports also geo-clustering.
Geo-clustering¶
Geo-clustering is a technique used to provide redundancy against failures by replicating data, and services across multiple geographic locations. This approach aims to minimize the impact of any unforeseen failures, disasters, or disruptions that may occur in one location, by ensuring that the system can continue to operate without interruption from another location.
Geo-clustering involves deploying multiple instances of the LogMan.io across different geographic regions or data centers, and configuring them to work together as a single logical entity. These instances are linked together using a dedicated network connection, which enables them to communicate and coordinate their actions in real-time.
One of the main benefits of geo-clustering is that it provides a high level of redundancy against failures. In the event of a failure in one location, the remaining instances of the system take over and continue to operate without disruption. This not only helps to ensure high availability (HA) and uptime, but also reduces the risk of data loss and downtime.
Another advantage of geo-clustering is that it can provide better performance and scalability by enabling load balancing and resource sharing across multiple locations. This means that resources can be dynamically allocated and adjusted to meet changing demands, ensuring that the system is always optimized for performance and efficiency.
Overall, geo-clustering is a powerful technique that helps to ensure high availability, resilience, and scalability for their critical applications and services. By replicating resources across multiple geographic locations, organizations can minimize the impact of failures and disruptions, while also improving performance and efficiency.
Locations¶
Location "A"¶
Location "A" is the first location to be build. In the single node setup, it is also the only location.
Node lma1
is the first server to built of the cluster.
Nodes in this location are named "Node lmaX
". X
is a sequence number of the server (eg 1, 2, 3, 4 and so on).
If you run out of numbers, continue with small letters (eg. a, b, c and so on).
Please refer to the recommended hardware specification for details about nodes.
Location B, C, D and so on¶
Location B (and C, D and so on) are next locations of the cluster.
Nodes in these locations are named "Node lmLX
".
L
is a small letter that represents location in the alphabetical order (eg a, b, c).
X
is a sequence number of the server( eg 1, 2, 3, 4 and so on).
If you run out of numbers, continue with small letters (eg. a, b, c and so on).
Please refer to the recommended hardware specification for details about nodes.
Coordinating location "X"¶
The cluster MUST have odd number of locations to avoid Split-brain problem.
For that reason, we recommend to build a small, coordinating location with one node (Node lmx1
).
We recommend to use virtualisation platform for "Node x1
", not a physical hardware.
No data (logs, events) are stored at this location.
Types of nodes¶
Core node¶
First three nodes in the cluster are called code nodes. Core nodes form the consensus within the cluster, ensuring consistency, and coordinating activities across the cluster.
Peripheral nodes¶
Peripheral nodes are these nodes that don't participate in the consensus of the cluster.
Cluster layouts¶
Schema: Example of the cluster layout.
Single node "cluster"¶
Node: lma1
(Location a, Server 1).
Two big and one small node¶
Nodes: lma1
, lmb1
and lmx1
.
Thee nodes, three locations¶
Nodes: lma1
, lmb1
and lmc1
.
Four big and one small node¶
Nodes: lma1
, lma2
, lmb1
, lmb2
and lmx1
.
Six nodes, three locations¶
Nodes: lma1
, lma2
, lmb1
, lmb2
, lmc1
and lmc2
.
Bigger clusters¶
Bigger clusters typically introduce a specialization of nodes.
Data Lifecycle¶
Data (e.g. logs, events, metrics) are stored in several availability stages, basically in the chronological order. It means that the recent logs are stored in the fastest data storage and as they age, they are moved to the slower and cheaper data storage and eventually into the offline archive or they are deleted.
Schema: Data life cycle in the TeskaLabs LogMan.io.
The lifecycle is controlled by ElasticSearch feature called Index Lifecycle Management (ILM).
Index Lifecycle Management¶
Index Lifecycle Management (ILM) in ElasticSearch serves to automatically close or delete old indices (f. e. with data older than three months), so searching performance is kept and data storage is able to store present data. The setting is present in the so-called ILM policy.
The ILM should be set before the data are pumped into ElasticSearch, so the new index finds and associates itself with the proper ILM policy. For more information, please refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html
LogMan.io components such as Dispatcher then use a specified ILM alias (lm_) and ElasticSearch automatically put the data to the proper index assigned with the ILM policy.
Hot-Warm-Cold architecture (HWC)¶
HWC is an extension of the standard index rotation provided by the ElasticSearch ILM and it is a good tool for managing time series data. HWC architecture enables us to allocate specific nodes to one of the phases. When used correctly, along with the cluster architecture, this will allow for maximum performance, using available hardware to its fullest potential.
Hot stage¶
There is usually some period of time (week, month, etc.), where we want to query the indexes heavily, aiming for speed, rather than memory (and other resources) conservation. That is where the “Hot” phase comes in handy, by allowing us to have the index with more replicas, spread out and accessible on more nodes for optimal user experience.
Hot nodes¶
Hot nodes should use the fast parts of the available hardware, using most CPU's and faster IO.
Warm stage¶
Once this period is over, and the indexes are no longer queried as often, we will benefit by moving them to the “Warm” phase, which allows us to reduce the number of nodes (or move to nodes with less resources available) and index replicas, lessening the hardware load, while still retaining the option to search the data reasonably fast.
Warm nodes¶
Warm nodes, as the name suggests, stand on the crossroads, between being solely for the storage purposes, while still retaining some CPU power to handle the occasional queries.
Cold stage¶
Sometimes, there are reasons to store data for extended periods of time (dictated by law, or some internal rule). The data are not expected to be queried, but at the same time, they cannot be deleted just yet.
Cold nodes¶
This is where the Cold nodes come in, there may be few, with only little CPU resources, they have no need to use SSD drives, being perfectly fine with slower (and optionally larger) storage.
The setting should be done in following way:
Archive stage¶
The archive stage is optional in the design. It is an offline long-term storage. The oldest data from a cold stage could be moved periodically to the archive stage instead of their deletion.
The standard archiving policy of the SIEM operating organization are applied. The archived data needs to be encrypted.
It is also possible to forward certain logs directly from a warm stage into the archive stage.
Create the ILM policy¶
Kibana¶
Kibana version 7.x can be used to create ILM policy in ElasticSearch.
1.) Open Kibana
2.) Click Management in the left menu
3.) In the ElasticSearch section, click on Index Lifecycle Policies
4.) Click Create policy blue button
5.) Enter its name, which should be the same as the index prefix, f. e. lm_
6.) Set max index size to the desired rollover size, f. e. 25 GB (size rollover)
7.) Set maximum age of the index, f. e. 10 days (time rollover)
8.) Click the switch down the screen at Delete phase, and enter the time after which the index should be deleted, f. e. 120 days from rollover
9.) Click on Save policy green button
Use the policy in index template¶
Modify index template(s)¶
Add the following lines to the JSON index template:
"settings": {
"index": {
"lifecycle": {
"name": "lm_",
"rollover_alias": "lm_"
}
}
},
Kibana¶
Kibana version 7.x can be used to link ILM policy with ES index template.
1.) Open Kibana
2.) Click Management in the left menu
3.) In the ElasticSearch section, click on Index Management
4.) At the top, select Index Template
5.) Select your desired index template, f. e. lm_
6.) Click on Edit
7.) On the Settings screen, add:
{
"index": {
"lifecycle": {
"name": "lm_",
"rollover_alias": "lm_"
}
}
}
8.) Click on Save
Create a new index which will utilize the latest index template¶
Through PostMan or Kibana, create a following HTTP request to the instance of ElasticSearch you are using:
PUT lm_tenant-000001
{
"aliases": {
"lm_": {
"is_write_index": true
}
}
}
The alias is then going to be used by the ILM policy to distribute data to the proper ElasticSearch index, so pumps do not have to care about the number of the index.
Warning
The prefix and number of index for ILM rollover must be separated with -
000001, not _
000001!
Note
Make sure there is no index prefix configuration in the source, like in ElasticSearchSink in the pipeline. The code configuration would replace the file configuration.
Elasticsearch backup and restore¶
Snapshots¶
Located under Stack Management -> Snapshot and Restore
. The snapshots are stored in the repository location. The structure is as follows. The snapshot itself is just a pointer to the indices that it contains. The indices themselves are stored in a separate directory, and they are stored incrementally. This basically means, that if you create a snapshot every day, the older indices are just referenced again in the snapshot, while only the new indices are actually copied to the backup directory.
Repositories¶
First, the snapshot repository needs to be set up. Specify the location where the snapshot repository resides, /backup/elasticsearch
for instance. This path needs to be accessible from all nodes in the cluster. With the Elasticsearch running in docker, this includes mounting the space inside of the docker containers, and restarting them.
Policies¶
To begin taking snapshots, a policy needs to be created. The policy determines the naming prefix of the snapshots it creates, it specifies repository it will be using for creating snapshots, It requires a schedule setting, indices (defined using patterns or specific index names - lmio-mpsv-events-*
for instance).
Furthermore, the policy is able to specify whether to ignore unavailable indices, allow partial indices and include global state. Use of these depends on the specific case, in which the snapshot policy will be used and are not recommended by default. There is also a setting available to automatically delete snapshots and define expiration. These also depend on specific policy, the snapshots themselves however are very small (memory wise), when they do not include global state, which is to be expected since they are just pointers to a different place, where the actual index data are stored.
Restoring a snapshot¶
To restore a snapshot, simply select the snapshot containing the index or indices you wish to bring back and select "Restore". You then need to specify whether you want to restore all indices contained in the snapshot, or just a portion. You are able to rename the restored indices, you can also restore partially snapshot indices and modify the index setting while restoring them. Or resetting them to default. The indices are then restored as specified back into the cluster.
Caveats¶
When deleting snapshots, bear in mind that you need to have the backed up indices covered by a snapshot to be able to restore them. What this means is, when you for example clear some of the indices from the cluster and then delete the snapshot that contained the reference to these indexes, you will be unable to restore them.
Continuity Plan¶
Risk matrix¶
The risk matrix defines the level of risk by considering the category of "Likelihood" of an incident occurring against the category of "Impact". Both categories are given a score between 1 and 5. By multiplying the scores for "Likelihood" and "Impact" together, a total risk score is be produced.
Likelihood¶
Likelihood | Score |
---|---|
Rare | 1 |
Unlikely | 2 |
Possible | 3 |
Likely | 4 |
Almost certain | 5 |
Impact¶
Impact | Score | Description |
---|---|---|
Insignificant | 1 | The functionality is not impacted, performance is not reduced, downtime is not needed. |
Minor | 2 | The functionality is not impacted, the performance is not reduced, downtime of the impacted cluster node is needed. |
Moderate | 3 | The functionality is not impacted, the performance is reduced, downtime of the impacted cluster node is needed. |
Major | 4 | The functionality is impacted, the performance is significantly reduced, downtime of the cluster is needed. |
Catastrophic | 5 | Total loss of functionality. |
Incident scenarios¶
Complete system failure¶
Impact: Catastrophic (5)
Likelihood: Rare (1)
Risk level: medium-high
Risk mitigation:
- Geographically distributed cluster
- Active use of monitoring and alerting
- Prophylactic maintenance
- Strong cyber-security posture
Recovery:
- Contact the support and/or vendor and consult the strategy.
- Restore the hardware functionality.
- Restore the system from the backup of the site configuration.
- Restore the data from the offline backup (start with the most fresh data and continue to the history).
Loss of the node in the cluster¶
Impact: Moderate (4)
Likelihood: Unlikely (2)
Risk level: medium-low
Risk mitigation:
- Geographically distributed cluster
- Active use of monitoring and alerting
- Prophylactic maintenance
Recovery:
- Contact the support and/or vendor and consult the strategy.
- Restore the hardware functionality.
- Restore the system from the backup of the site configuration.
- Restore the data from the offline backup (start with the most fresh data and continue to the history).
Loss of the fast storage drive in one node of the cluster¶
Impact: Minor (2)
Likelihood: Possible (3)
Risk level: medium-low
Fast drives are in RAID 1 array so the loss of one drive is non-critical. Ensure quick replacement of the failed drive to prevent a second fast drive failure. A second fast drive failure will escalate to a "Loss of the node in the cluster".
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely replacement of the failed drive
Recovery:
- Turn off the impacted cluster node
- Replace failed fast storage drive ASAP
- Turn on the impacted cluster node
- Verify correct RAID1 array reconstruction
Note
Hot swap of the fast storage drive is supported on a specific customer request.
Fast storage space shortage¶
Impact: Moderate (3)
Likelihood: Possible (3)
Risk level: medium-high
This situation is problematic if it happens on multiple nodes of the cluster simultaneously. Use monitoring tools to identify this situation ahead of escalation.
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
Recovery:
- Remove unnecessary data from the fast storage space.
- Adjust the life cycle configuration so that the data are moved to slow storage space sooner.
Loss of the slow storage drive in one node of the cluster¶
Impact: Insignificant (1)
Likelihood: Likely (4)
Risk level: medium-low
Slow drives are in RAID 5 or RAID 6 array so the loss of one drive is non-critical. Ensure quick replacement of the failed drive to prevent another drive failure. A second drive failure in RAID 5 or third drive failure in RAID 6 will escalate to a "Loss of the node in the cluster".
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely replacement of the failed drive
Recovery:
- Replace failed slow storage drive ASAP (hot swap)
- Verify a correct slow storage RAID reconstruction
Slow storage space shortage¶
Impact: Moderate (3)
Likelihood: Likely (4)
Risk level: medium-high
This situation is problematic if it happens on multiple nodes of the cluster simultaneously. Use monitoring tools to identify this situation ahead of escalation.
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely extension of the slow data storage size
Recovery:
- Remove unnecessary data from the slow storage space.
- Adjust the life cycle configuration so that the data are removed from slow storage space sooner.
Loss of the system drive in one node of the cluster¶
Impact: Minor (2)
Likelihood: Possible (3)
Risk level: medium-low
System drives are in RAID 1 array so the loss of one drive is non-critical. Ensure quick replacement of the failed drive to prevent a second fast drive failure. A second system drive failure will escalate to a "Loss of the node in the cluster".
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely replacement of the failed drive
Recovery:
- Replace failed fast storage drive ASAP (how swap)
- Verify correct RAID1 array reconstruction
System storage space shortage¶
Impact: Moderate (3)
Likelihood: Rare (1)
Risk level: low
Use monitoring tools to identify this situation ahead of escalation.
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
Recovery:
- Remove unnecessary data from the system storage space.
- Contact the support or the vendor.
Loss of the network connectivity in one node of the cluster¶
Impact: Minor (2)
Likelihood: Possible (3)
Risk level: medium-low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Redundant network connectivity
Recovery:
- Restore the network connectivity
- Verify the proper cluster operational condition
Failure of the ElasticSearch cluster¶
Impact: Major (4)
Likelihood: Possible (3)
Risk level: medium-high
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely reaction to the deteriorating ElasticSearch cluster health
Recovery:
- Contact the support and/or vendor and consult the strategy.
Failure of the ElasticSearch node¶
Impact: Minor (2)
Likelihood: Likely (4)
Risk level: medium-low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely reaction to the deteriorating ElasticSearch cluster health
Recovery:
- Monitor an automatic ElasticSearch node rejoining to the cluster
- Contact the support / the vendor if the failure persists over several hours.
Failure of the Apache Kafka cluster¶
Impact: Major (4)
Likelihood: Rare (1)
Risk level: medium-low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely reaction to the deteriorating Apache Kafka cluster health
Recovery:
- Contact the support and/or vendor and consult the strategy.
Failure of the Apache Kafka node¶
Impact: Minor (2)
Likelihood: Rare (1)
Risk level: low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely reaction to the deteriorating Apache Kafka cluster
Recovery:
- Monitor an automatic Apache Kafka node rejoining to the cluster
- Contact the support / the vendor if the failure persists over several hours.
Failure of the Apache ZooKeeper cluster¶
Impact: Major (4)
Likelihood: Rare (1)
Risk level: medium-low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely reaction to the deteriorating Apache ZooKeeper cluster
Recovery:
- Contact the support and/or vendor and consult the strategy.
Failure of the Apache ZooKeeper node¶
Impact: Insignificant (1)
Likelihood: Rare (1)
Risk level: low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
- Timely reaction to the deteriorating Apache ZooKeeper cluster
Recovery:
- Monitor an automatic Apache ZooKeeper node rejoining to the cluster
- Contact the support / the vendor if the failure persists over several hours.
Failure of the stateless data path microservice (collector, parser, dispatcher, correlator, watcher)¶
Impact: Minor (2)
Likelihood: Possible (3)
Risk level: medium-low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
Recovery:
- Restart the failed microservice.
Failure of the stateless support microservice (all others)¶
Impact: Insignificant (1)
Likelihood: Possible (3)
Risk level: medium-low
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
Recovery:
- Restart the failed microservice.
Significant reduction of the system performance¶
Impact: Moderate (3)
Likelihood: Possible (3)
Risk level: medium-high
Risk mitigation:
- Active use of monitoring and alerting
- Prophylactic maintenance
Recovery:
- Identify and remove the root cause of the reduction of the performance
- Contact the vendor or the support if help is needed
Backup and recovery strategy¶
Offline backup for the incoming logs¶
Incoming logs are duplicated to the offline backup storage that is not part of the active cluster of LogMan.io (hence is "offline"). Offline backup provides an option to restore logs to the LogMan.io after critical failure etc.
Backup strategy for the fast data storage¶
Incoming events (logs) are copied into the archive storage once they enter the LogMan.io. It means that there is always the way how to “replay” events into the TeskaLabs LogMan.in in case of need. Also, data are replicated to other nodes of the cluster immediately after arrival to the cluster. For this reason, traditional backup is not recommended but possible.
The restoration is handled by the cluster components by replicating the data from other nodes of the cluster.
Backup strategy for the slow data storage¶
The data stored on the slow data storage are ALWAYS replicated to other nodes of the cluster and also stored in the archive. For this reason, traditional backup is not recommended but possible (consider the huge size of the slow storage).
The restoration is handled by the cluster components by replicating the data from other nodes of the cluster.
Backup strategy for the system storage¶
It is recommended to periodically backup all filesystems on the system storage so that they could be used for restoring the installation when needed. The backup strategy is compatible with most common backup technologies in the market.
- Recovery Point Objective (RPO): full backup once per week or after major maintenance work, incremental backup one per day.
- Recovery Time Objective (RTO): 12 hours.
Note
RPO and RTO are recommended, assuming highly available setup of the LogMan.io cluster. It means three and more nodes so that the complete downtime of the single node don’t impact service availability.
Generic backup and recovery rules¶
-
Data Backup: Regularly backup to a secure location, such as a cloud-based storage service, backup tapes, to minimize data loss in case of failures.
-
Backup Scheduling: Establish a backup schedule that meets the needs of the organization, such as daily, weekly, or monthly backups.
-
Backup Verification: Verify the integrity of backup data regularly to ensure that it can be used for disaster recovery.
-
Restoration Testing: Test the restoration of backup data regularly to ensure that the backup and recovery process is working correctly and to identify and resolve any issues before they become critical.
-
Backup Retention: Establish a backup retention policy that balances the need for long-term data preservation with the cost of storing backup data.
Monitoring and alerting¶
Monitoring is an important component of a Continuity Plan as it helps to detect potential failures early, identify the cause of failures, and support decision-making during the recovery process.
LogMan.io microservices provides OpenMetrics API and/or ship their telemetry into InfluxDB and uses Grafana as a monitoring tool.
-
Monitoring Strategy: OpenMetrics API is used to collect telemetry from all microservices in the cluster, Operating system and hardware. Telemetry is collected once per minute. InfluxDB is used to store the telemetry data. Grafana is used as the Web-based User interface for telemetry inspection.
-
Alerting and Notification: The monitoring system is configured to generate alerts and notifications in case of potential failures, such as low disk space, high resource utilization, or increased error rates.
-
Monitoring Dashboards: Monitoring dashboards are provided in Grafana that display the most important metrics for the system, such as resource utilization, error rates, and response times.
-
Monitoring Configuration: Regularly reviews and updates are provided for the monitoring configuration to ensure that it is effective and that it reflects changes in the system.
-
Monitoring Training: Trainings are provided for the monitoring team and other relevant parties on the monitoring system and the monitoring dashboards in Grafana.
High availability architecture¶
TeskaLabs LogMan.io is deployed in a highly available architecture (HA) with multiple nodes to reduce the risk of single points of failure.
High availability architecture is a design pattern that aims to ensure that a system remains operational and available, even in the event of failures or disruptions.
In a LogMan.io cluster, a high availability architecture includes the following components:
-
Load Balancing: Distribution of incoming traffic among multiple instances of microservices, thereby improving the resilience of the system and reducing the impact of failures.
-
Redundant Storage: Storing of data redundantly across multiple storage nodes to prevent data loss in the event of a storage failure.
-
Multiple Brokers: Use multiple brokers in Apache Kafka to improve the resilience of the messaging system and reduce the impact of broker failures.
-
Automatic Failover: Automatic failover mechanisms, such as leader election in Apache Kafka, to ensure that the system continues to function in the event of a cluster node failure.
-
Monitoring and Alerting: Usage of monitoring and alerting components to detect potential failures and trigger automatic failover mechanisms when necessary.
-
Rolling Upgrades: Upgrades to the system without disrupting its normal operation, by upgrading nodes one at a time, without downtime.
-
Data Replication: Replication of log across multiple cluster nodes to ensure that the system continues to function even if one or more nodes fail.
Communication plan¶
A clear and well-communicated plan for responding to failures and communicating with stakeholders helps to minimize the impact of failures and ensure that everyone is on the same page.
-
Stakeholder Identification: Identify all stakeholders who may need to be informed during and after a disaster, such as employees, customers, vendors, and partners.
-
Participating organisations: The LogMan.io operator, the integrating party and the vendor (TeskaLabs).
-
Communication Channels: Communication channels that will be used during and after a disaster are Slack, email, phone and SMS.
-
Escalation Plan: Specify an escalation plan to ensure that the right people are informed at the right time during a disaster, and that communication is coordinated and effective.
-
Update and Maintenance: Regularly update and maintain the communication plan to ensure that it reflects changes in the organization, such as new stakeholders or communication channels.
Log Collector ↵
TeskaLabs LogMan.io Collector¶
This is the administration manual for the TeskaLabs LogMan.io Collector. It describes how to install the collector.
For more details about how to collect logs, continue to the reference manual.
Installation of TeskaLabs LogMan.io Collector¶
This short tutorial explains how to connect a new log collector running as a virtual machine.
Tip
If you are using a hardware TeskaLabs LogMan.io Collector, connect the monitor via HDMI and go straight to step 5.
-
Download the virtual machine image.
Here's the download link.
-
Import the downloaded image to your virtualization platform.
-
Configure network settings of a new virtual machine
Requirements:
- The virtual machine must be able to reach the TeskaLabs LogMan.io installation.
- The virtual machine must be reachable from devices that will ship logs into TeskaLabs LogMan.io.
-
Launch the virtual machine.
-
Determine the identity of TeskaLabs LogMan.io Collector.
The identity consists of 16 letters and digits. Please save this for the following steps.
-
Open the LogMan.io web application in your browser.
Follow this link or navigate to "Collectors" and click on the "Provisioning" button.
-
Enter the collector identity from step 4 in the box.
Then, click Provision to connect the collector and start collecting logs.
-
TeskaLabs LogMan.io Collector is successfully connected and collects logs.
Tip
The green cirle on the left indicates that the log collector is online. The blue line indicates how many logs the collector has received in the last 24 hours.
Administration inside the VM¶
Administrative actions in the Virtual Machine of TeskaLabs LogMan.io Collector are available in the menu. Press "M" to access it. Use arrow keys and Enter to navigate and select actions.
Available options are:
- Power down
- Reboot
- Network configuration
Tip
We recommend to use Power down
feature to safely turn off the collector's virtual machine.
Additional notes¶
You can connect unlimited amount of log collectors, e.g. to collect from different sources or to collect different types of logs.
Supported virtualization technologies¶
The TeskaLabs LogMan.io Collector supports the following virtualization technologies:
- VMWare
- Oracle VirtualBox
- Microsoft Hyper-V
- Qemu
Virtual Machine¶
TeskaLabs LogMan.io Collector can be manually installed into a virtual machine.
Specifications¶
- 1 vCPU
- OS Linux, preferably Ubuntu Server 22.04.4 LTS, other mainstream distributions are also supported
- 4 GB RAM
- 500 GB disk (50 GB for OS; the rest is a buffer for collected logs)
- 1x NIC, preferably 1Gbps
The collector must be able to connect to a TeskaLabs LogMan.io installation over HTTPS (WebSocket) using its URL.
Note
For environments with higher loads, the virtual machine should be scaled up accordingly.
Network¶
We recommend to assign static IP address to the collector virtual machine because it will be used in the many configurations of log sources.
Ended: Log Collector
ElasticSearch Setting¶
Index Templates¶
Before the data are loaded to the ElasticSearch, there should be an index template present, so proper data types are assigned to every field.
This is especially needed for time-based fields, which would not work without index template and could not be used for sorting and creating index patterns in Kibana.
The ElasticSearch index template should be present in the site-
repository
under the name es_index_template.json
.
To insert the index template through PostMan or Kibana, create a following HTTP request to the instance of ElasticSearch you are using:
PUT _template/lmio-
{
//Deploy to <SPECIFY_WHERE_TO_DEPLOY_THE_TEMPLATE>
"index_patterns" : ["lmio-*"],
"version": 200721, // Increase this with every release
"order" : 9999998, // Decrease this with every release
"settings": {
"index": {
"lifecycle": {
"name": "lmio-",
"rollover_alias": "lmio-"
}
}
},
"mappings": {
"properties": {
"@timestamp": { "type": "date", "format": "strict_date_optional_time||epoch_millis" },
"rt": { "type": "date", "format": "strict_date_optional_time||epoch_second" },
...
}
}
The body of the request is the content of the es_index_template.json
.
Index Lifecycle Management¶
Index Lifecycle Management (ILM) in ElasticSearch serves to automatically close or delete old indices (f. e. with data older than three months), so searching performance is kept and data storage is able to store present data. The setting is present in the so-called ILM policy.
The ILM should be set before the data are pumped into ElasticSearch, so the new index finds and associates itself with the proper ILM policy. For more information, please refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html
LogMan.io components such as Dispatcher then use a specified ILM alias (lm_) and ElasticSearch automatically put the data to the proper index assigned with the ILM policy.
The setting should be done in following way:
Create the ILM policy¶
Kibana¶
Kibana version 7.x can be used to create ILM policy in ElasticSearch.
1.) Open Kibana
2.) Click Management in the left menu
3.) In the ElasticSearch section, click on Index Lifecycle Policies
4.) Click Create policy blue button
5.) Enter its name, which should be the same as the index prefix, f. e. lm_
6.) Set max index size to the desired rollover size, f. e. 25 GB (size rollover)
7.) Set maximum age of the index, f. e. 10 days (time rollover)
8.) Click the switch down the screen at Delete phase, and enter the time after which the index should be deleted, f. e. 120 days from rollover
9.) Click on Save policy green button
Use the policy in index template¶
Modify index template(s)¶
Add the following lines to the JSON index template:
"settings": {
"index": {
"lifecycle": {
"name": "lmio-",
"rollover_alias": "lmio-"
}
}
},
Kibana¶
Kibana version 7.x can be used to link ILM policy with ES index template.
1.) Open Kibana
2.) Click Management in the left menu
3.) In the ElasticSearch section, click on Index Management
4.) At the top, select Index Template
5.) Select your desired index template, f. e. lmio-
6.) Click on Edit
7.) On the Settings screen, add:
{
"index": {
"lifecycle": {
"name": "lmio-",
"rollover_alias": "lmio-"
}
}
}
8.) Click on Save
Create a new index which will utilize the latest index template¶
Through PostMan or Kibana, create a following HTTP request to the instance of ElasticSearch you are using:
PUT lmio-tenant-events-000001
{
"aliases": {
"lmio-tenant-events": {
"is_write_index": true
}
}
}
The alias is then going to be used by the ILM policy to distribute data to the proper ElasticSearch index, so pumps do not have to care about the number of the index.
//Note: The prefix and number of index for ILM rollover must be separated with -
000001, not _
000001!//
Configure other LogMan.io components¶
The pumps may now use the ILM policy through the created alias, which in the case above is lm_tenant
. The configuration file should then look like this:
[pipeline:<PIPELINE>:ElasticSearchSink]
index_prefix=lm_tenant
doctype=_doc
The pump will always put data to the lm_tenant
alias, where ILM will take care of the proper assignment to the index, f. e. lm_-000001
.
//Note: Make sure there is no index prefix configuration in the source, like in ElasticSearchSink in the pipeline. The code configuration would replace the file configuration.//
Hot-Warm-Cold architecture (HWC)¶
HWC is an extension of the standard index rotation provided by the ElasticSearch ILM and it is a good tool for managing time series data. HWC architecture enables us to allocate specific nodes to one of the phases. When used correctly, along with the cluster architecture, this will allow for maximum performance, using available hardware to its fullest potential.
Hot¶
There is usually some period of time (week, month, etc.), where we want to query the indexes heavily, aiming for speed, rather than memory (and other resources) conservation. That is where the “Hot” phase comes in handy, by allowing us to have the index with more replicas, spread out and accessible on more nodes for optimal user experience.
Hot nodes¶
Hot nodes should use the fast parts of the available hardware, using most CPU's and faster IO.
Warm¶
Once this period is over, and the indexes are no longer queried as often, we will benefit by moving them to the “Warm” phase, which allows us to reduce the number of nodes (or move to nodes with less resources available) and index replicas, lessening the hardware load, while still retaining the option to search the data reasonably fast.
Warm nodes¶
Warm nodes, as the name suggests, stand on the crossroads, between being solely for the storage purposes, while still retaining some CPU power to handle the occasional queries.
Cold¶
Sometimes, there are reasons to store data for extended periods of time (dictated by law, or some internal rule). The data are not expected to be queried, but at the same time, they cannot be deleted just yet.
Cold nodes¶
This is where the Cold nodes come in, there may be few, with only little CPU resources, they have no need to use SSD drives, being perfectly fine with slower (and optionally larger) storage.
Conclusion¶
Using the HWC ILM feature to its full effect requires some preparation, it should be considered when building the production ElasticSearch cluster. The added value however, can be very high, depending on the specific use cases.
InfluxDB Setting¶
Docker-compose.yaml configuration for Influx v1.x¶
influxdb:
restart: on-failure:3
image: influxdb:1.8
ports:
- "8083:8083"
- "8086:8086"
- "8090:8090"
volumes:
- /<path_on_host>/<where_you_want_data>:/var/lib/influxdb
environment:
- INFLUXDB_DB=<your_db>
- INFLUXDB_USER=telegraf
- INFLUXDB_ADMIN_ENABLED=true
- INFLUXDB_ADMIN_USER=<your_user>
- INFLUXDB_ADMIN_PASSWORD=<your_password>
logging:
options:
max-size: 10m
Docker-compose.yaml configuration for Influx v2.x¶
influxdb:
image: influxdb:2.0.4
restart: 'always'
ports:
- "8086:8086"
volumes:
- /data/influxdb/data:/var/lib/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=telegraf
- DOCKER_INFLUXDB_INIT_PASSWORD=my-password
- DOCKER_INFLUXDB_INIT_ORG=my-org
- DOCKER_INFLUXDB_INIT_BUCKET=my-bucket
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=my-super-secret-auth-token
Run InfluxDB container¶
docker-compose up -d
Use UI interface on:¶
http://localhost:8086/
How to write/delete data using CLI influx:¶
docker exec -it <influx-container> bash
influx write \
-b my-bucket \
-o my-org \
-p s \
'myMeasurement,host=myHost testField="testData" 1556896326' \
-t ${your-token}
influx delete \
-bucket my-bucket \
--org my-org \
--start 2001-03-01T00:00:00Z \
--stop 2021-04-14T00:00:00Z \
--token ${your-token}
Setting up retention policy¶
Retention policy controls how long do you want to keep data in the influxdb, you setup a name for your policy, which databse is affected, how long will you keep the data, replication and finaly group (DEFAULT in the case below) DEFAULT is used for all sources that do not specify the group when inserting data to InfluxDB.
docker exec <container_name> influx -execute CREATE RETENTION POLICY "<name_your_policy>" ON "<your_db>" DURATION 47h60m REPLICATION 1 DEFAULT
Altering an existing policy¶
docker exec <container_name> influx -execute ALTER RETENTION POLICY "autogen" on "<dbs>/<affected>" duration 100d
Deleting old data¶
Mind the quotation marks
delete from "<collection>" where "<field>" = '<value>'
Deleting old data in a specific field¶
When reconfiguring your sources, you may want to get rid of some old values in specific fields, so they do not clog your visualizations. You may do so using the folloging command:
docker exec <container_name> influx -execute DROP SERIES WHERE "<tag_key>" = '<tag_value>'
Downsampling¶
https://docs.influxdata.com/influxdb/v1.8/guides/downsample_and_retain/ if want to use multiple rules for different data sources, use the group name other than DEFAULT and configure your sources accordingly, for example in telegraf use:
Specific retention policies example (telegraf)¶
Used when you want to set different retention on different sources.
[[outputs.influxdb]
]
## Name of existing retention policy to write to. Empty string writes to
## the default retention policy. Only takes effect when using HTTP.
# retention_policy = "**telegraf1**"
docker exec <container_name> influx -execute CREATE RETENTION POLICY "<name_your_policy>" ON "<your_db>" DURATION 47h60m REPLICATION 1 **telegraf1**
Guide to deploying TeskaLabs LogMan.io for partners¶
Preimplementation analysis¶
Every delivery should begin with a preimplementation analysis, which lists all the log sources, that should be connected to LogMan.io. Outcome of the analysisis the spreadsheet, where each lane describes one log source, the way how logs are gathered (reading files, log forwarding to destination port etc.), who is responsible for the log source from the customer perspective and the estimation when the log source should be connected. See the following picture:
In the picture, there are two more columns that are not part of the preimplementation analysis and that are filled later when the implementation takes place (kafka topic & dataset). Fore more information, see Event lanes section below.
It MUST BE defined, which domain (URL) will be used to host the LogMan.io.
The customer or partner themselves SHOULD provide appropriate HTTPS SSL certificates (see nginx
below), e.g. using Let's Encrypt or other Certification authority.
LogMan.io cluster and collector servers¶
Servers¶
By the end of the preimplementation analysis, it should be clear how big the volume of gathered logs (in events or log messages per second, EPS for short) should be. The logs are always gathered from the customer's infrastructure with at least one server dedicated to collecting logs (aka log collector).
When it comes to LogMan.io cluster, there are three ways:
- LogMan.io cluster is deployed to the customer's infrastructure to physical and/or virtual machines (on-premise)
- LogMan.io cluster is deployed at the partner's infrastructure and available for more customers, where each customer is assigned a single
tenant
(SoC, SaaS etc.)
See Hardware specification section for more information about the physical servers' configuration.
Cluster architecture¶
In either cluster way, there SHOULD be at least one server (for PoCs) or at least three servers (for deployment) available for the LogMan.io cluster. If the cluster is deployed to the customer's infrastructure, the servers may also act as the collector servers, so there is no need to have a dedicated collector server in this case. The three server architecture may consists of three similar physical servers, or two physical servers and one small arbitrary virtual machine.
A smaller or non-critical deployments are possible on the single machine configuration.
For more information of the LogMan.io cluster organization, see Cluster architecture section.
Data storage¶
Every physical or non-arbiter server in LogMan.io cluster should have enough available disk storage to hold the data for the requested time period from the preimplementation analysis.
There should be at least one fast (for current or one-day log messages and Kafka topics) and one slower (for older data, metadata and configurations) data storage mapped to /data/ssd
and /data/hdd
.
Since all LogMan.io services run as Docker containers, /var/lib/docker
folder should be also mapped to one of those storage.
For detailed information about the disk storage organization, mount etc. please see Data Storage section.
Installation¶
The RECOMMENDED operating system is Linux Ubuntu 22.04 LTS or newer. Alternatives are Linux RedHat 8 and 7, CentOS 7.
The hostnames of the LogMan.io servers in the LogMan.io cluster should follow the notation lm01
, lm11
etc.
If separate collector servers are used (see above) there is no requirement for their hostname naming.
If TeskaLabs is part of the delivery, there should be a tladmin
user created with sudoer
permissions.
On every server (both LogMan.io cluster and Collector), there should be git
, docker
and docker-compose
installed.
Please refer to Manual installation for a comprehensive guide.
All services are then created and started via docker-compose up -d
command from the folder the site repository is cloned to (see the following section):
$ cd /opt/site-tenant-siterepository/lm11
$ docker-compose up -d
The Docker credentials are provided to the partner by TeskaLabs' team.
Site repository and configuration¶
Every partner is given access to TeskaLabs GitLab to manage the configurations for deployments there, which is a recommended way to store configurations for future consultations with TeskaLabs. However, every partner may also use their own GitLab or any other Git repository and provide TeskaLabs' team with appropriate (at least read-only) accesses.
Every deployment to every customer should have a separate site repository, regardless if the entire LogMan.io cluster is installed or only collector servers are deployed. The structure of the site repository should look as follows:
Each server node (server) should have a separate subfolder at the top of the GitLab repository.
Next, there should be a folder with LogMan.io library
, that contains declarations for parsing, correlating etc. groups, config
, that contains configuration of the Discover screen in UI and dashboards and ecs
folder with index templates for ElasticSearch.
Every partner is given access to a reference site repository with all the configurations including parsers and discover settings ready.
ElasticSearch¶
Each node in the LogMan.io Cluster should contain at least one ElasticSearch master
node, one ElasticSearch data_hot
node, one ElasticSearch data_warm
node and one ElasticSearch data_cold
node.
All the ElasticSearch nodes are deployed via Docker Compose and are part of the site/configuration repository.
Arbitrary nodes in the cluster contain only one ElasticSearch master node.
If one server architecture is used, the replicas in ElasticSearch should be set to zero (this will also be provided after the consultation with TeskaLabs). For illustration, see the following snipplet from Docker Compose file to see how ElasticSearch hot node is deployed:
lm21-es-hot01:
network_mode: host
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.2
depends_on:
- lm21-es-master
environment:
- network.host=lm21
- node.attr.rack_id=lm21 # Or datacenter name. This is meant for ES to effectively and safely manage replicas
# For smaller installations -> a hostname is fine
- node.attr.data=hot
- node.name=lm21-es-hot01
- node.roles=data_hot,data_content,ingest
- cluster.name=lmio-es # Basically "name of the database"
- cluster.initial_master_nodes=lm01-es-master,lm11-es-master,lm21-es-master
- discovery.seed_hosts=lm01:9300,lm11:9300,lm21:9300
- http.port=9201
- transport.port=9301 # Internal communication among nodes
- "ES_JAVA_OPTS=-Xms16g -Xmx16g -Dlog4j2.formatMsgNoLookups=true"
# - path.repo=/usr/share/elasticsearch/repo # This option is enabled on demand after the installaton! It's not part of the initial setup (but we have it here because it's a workshop)
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
...
For more information about ElasticSearch including the explanation of hot (recent, one-day data on SSD), warm (older) and cold nodes, please refer to ElasticSearch Setting section.
ZooKeeper & Kafka¶
Each server node in the LogMan.io Cluster should contain at least one ZooKeeper and one Kafka node. ZooKeeper is a metadata storage available in the entire cluster, where Kafka stores information about topic consumers, topic names etc., and where LogMan.io stores the current library
and config
files (see below).
The Kafka and ZooKeeper setting can be copied from the reference site repository and consulted with TeskaLabs developers.
Services¶
The following services should be available at least on one of the LogMan.io nodes and they include:
nginx
(webserver with HTTPS certificate, see the reference site repository)influxdb
(metric storage, see InfluxDB Setting)mongo
(database for credentials of users, sessions etc.)telegraf
(gathers telemetry metrics from the infrastructure, burrow and ElasticSearch and sends them to InfluxDB, it should be installed on every server)burrow
(gathers telemetry metrics from Kafka and sends them to InfluxDB)seacat- auth
(TeskaLabs SeaCat Auth is a OAuth service, that stores its data to mongo)asab-library
(manages thelibrary
with declarations)asab-config
(manages theconfig
section)lmio-remote-control
(monitors other microservices likeasab-config
)lmio-commander
(uploads thelibrary
to ZooKeeper)lmio-dispatcher
(dispatches data fromlmio-events
andlmio-others
Kafka topics to ElasticSearch, it should run in at least three instances on every server)
For more information about SeaCat Auth and its management part in LogMan.io UI, see TeskaLabs SeaCat Auth documentation.
For information on how to upload the library
from the site repository to ZooKeeper, refer to LogMan.io Commander guide.
UI¶
The following UIs should be deployed and made available via nginx
. First implementation should always be discussed with TeskaLabs' developers.
LogMan.io UI
(see LogMan.io User Interface)Kibana
(discover screen, visualizations, dashboards and monitoring on top of ElasticSearch)Grafana
(telemetry dashboards on top of data from InfluxDB)ZooKeeper UI
(management of data stored in ZooKeeper)
The following picture shows the Parsers
from the library
imported to ZooKeeper in ZooKeeper UI:
LogMan.io UI Deployment¶
Deployment of LogMan.io UI is partially semi-automatic process when set up correctly. So there are several steps to ensure the safe UI deployment:
- Deployment artifact of the UI should be pulled via
azure
site repository provided to the partner by TeskaLabs' developers. Information about where the particular UI application is stored can be obtained from CI/CD image of the application repository. - It is recommended to use
tagged
versions, but there can be situations whenmaster
version is desired. Information how to set it up can be found indocker-compose.yaml
file of the reference site repository. - UI application have to be aligned with the services to ensure the best performance (usually latest
tag
versions). If uncertain, contact TeskaLabs' developers.
Creating the tenant¶
Each customer is assigned one or more tenants
.
Tenants are lowercase ASCII names, that tag the data/logs belonging to the user and store each tenant's data in a separate ElasticSearch index.
All event lanes (see below) are also tenant specific.
Create the tenant in SeaCat Auth using LogMan.io UI¶
In order to create the tenant, log into the LogMan.io UI with the superuser role, which can be done through the provisioning. For more information about provisioning, please refer to Provisioning mode section of the SeaCat Auth documentation.
In LogMan.io UI, navigate to the Auth
section in the left menu and select Tenants
.
Once there, click on Create tenant
option and write the name of the tenant there.
Then click on the blue button and the tenant should be created:
After that, go to Credentials
and assign the newly created tenant to all relevant users.
ElasticSearch indices¶
In Kibana, every tenant should have index templates for lmio-tenant-events
and lmio-tenant-others
indices, where tenant
is the name of the tenant (refer to the reference site repository provided by TeskaLabs).
The index templates can be inserted via Kibaba's Dev Tools from the left menu.
After the insertion of the index templates, ILM (index life cycle management) policy and the first indices should be manually created, exactly as specified in the ElasticSearch Setting guide.
Kafka¶
There is no specific tenant creation setting in Kafka, except the event lanes below.
However, always make sure the lmio-events
and lmio-others
topics are created properly.
The following commands should be run in the Kafka container (f. e.: docker exec -it lm11_kafka_1 bash
):
# LogMan.io
/usr/bin/kafka-topics --zookeeper lm11:2181 --create --topic lmio-events --replication-factor 1 --partitions 6
/usr/bin/kafka-topics --zookeeper lm11:2181 --create --topic lmio-others --replication-factor 1 --partitions 6
/usr/bin/kafka-topics --zookeeper lm11:2181 --alter --topic lmio-events --config retention.ms=86400000
/usr/bin/kafka-topics --zookeeper lm11:2181 --alter --topic lmio-others --config retention.ms=86400000
# LogMan.io+ & SIEM
/usr/bin/kafka-topics --zookeeper lm11:2181 --create --topic lmio-events-complex --replication-factor 1 --partitions 6
/usr/bin/kafka-topics --zookeeper lm11:2181 --create --topic lmio-lookups --replication-factor 1 --partitions 6
Each Kafka topic should have at least 6 partitions (that can be automatically used for parallel consuming), which is the appropriate number for most of the deployments.
Important note¶
The following section describes the connection of event lanes to LogMan.io. The knowledge about LogMan.io architecture from the documentation is mandatory.
Event lanes¶
Event lanes in LogMan.io define how logs are sent to the cluster. Each event lane is specific for the collected source, hence one row in the preimplementation analysis table should correspond to one event lane. Each event lane consists of one lmio-collector
service, one lmio-ingestor
service and one or more instances of lmio-parser
service.
Collector¶
LogMan.io Collector should run on the collector server or on one or more LogMan.io servers, if they are part of the same internal network. The configuration sample is part of the reference site repository.
LogMan.io Collector is able to, via YAML configuration, open a TCP/UDP port to obtain logs from, read files, open a WEC server, read from Kafka topics, Azure accounts and so on. The comprehensive documentation is available here: LogMan.io Collector
The following configuration sample opens 12009/UDP
port on the server the collector is installed to, and redirects the collected data via WebSocket to the lm11
server to port 8600
, where lmio-ingestor
should be running:
input:Datagram:UDPInput:
address: 0.0.0.0:12009
output: WebSocketOutput
output:WebSocket:WebSocketOutput:
url: http://lm11:8600/ws
tenant: mytenant
debug: false
prepend_meta: false
The url
is either the hostname of the server and port of the Ingestor, if Collector and Ingestor are deployed to the same server, or URL with https://
, if collector server outside of the internal network is used. It is then necessary to specify HTTPS certificates, please see the output:WebSocket
section in the LogMan.io Collector Outputs guide for more information.
The tenant
is the name of the tenant the logs belong to. The tenant name is then automatically propagated to Ingestor and Parser.
Ingestor¶
LogMan.io Ingestor takes the log messages from Collector along with metadata and stores them in Kafka in a topic, that begins with collected-tenant-
prefix, where tenant
is the tenant name the logs belong to and technology
the name of the technology the data are gathered from like microsoft-windows
.
The following sections in the CONF files are necessary to be always set up differently for each event lane:
# Output
[pipeline:WSPipeline:KafkaSink]
topic=collected-tenant-technology
# Web API
[web]
listen=0.0.0.0 8600
The port in the listen
section should match the port in the Collector YAML configuration (if the Collector is deployed to the same server) or the setting in nginx (if the data are collected from a collector server outside of the internal network). Please refer to the reference site repository provided by TeskaLabs' developers.
Parser¶
The parser should be deployed in more instances to scale the performance. It parses the data from original bytes or strings to a dictionary in the specified schema like ECS (ElasticSearch Schema) or CEF (Common Event Format), while using a parsing group from the library loaded in ZooKeeper. It is important to specify the Kafka topic to read from, which is the same topic as specified in the Ingestor configuration:
[declarations]
library=zk://lm11:2181/lmio/library.lib
groups=Parsers/parsing-group
raw_event=log.original
# Pipeline
[pipeline:ParsersPipeline:KafkaSource]
topic=collected-tenant-technology
group_id=lmio_parser_collected
auto.offset.reset=smallest
Parsers/parsing-group
is the location of the parsing group from the library loaded in ZooKeeper through LogMan.io Commander. It does not have to exist at the first try, because all data are then automatically send to lmio-tenant-others
index. When the parsing group is ready, the parsing takes place and the data can be seen in the document format in lmio-tenant-events
index.
Kafka topics¶
Before all three services are started via docker-compose up -d
command, it is important to check the state of the specific collected-tenant-technology
Kafka topic (where tenant
is the name of the tenant and technology
is the name of the connected technology/device type). In the Kafka container (f. e.: docker exec -it lm11_kafka_1 bash
), the following commands should be run:
/usr/bin/kafka-topics --zookeeper lm11:2181 --create --topic collected-tenant-technology --replication-factor 1 --partitions 6
/usr/bin/kafka-topics --zookeeper lm11:2181 --alter --topic collected-tenant-technology --config retention.ms=86400000
Parsing groups¶
For most common technologies, TeskaLabs have already prepared the parsing groups to ECS schema. Please get in touch with TeskaLabs developers. Since all parsers are written in the declarative language, all parsing groups in the library can be easily adjusted. The name of the group should be the same as the name of the dataset
attribute written in the parser groups' declaration.
For more information about our declarative language, please refer to the official documentation: SP-Lang
After the parsing group is deployed via LogMan.io Commander
, the appropriate Parser(s) should be restarted.
Deployment¶
On the LogMan.io servers, simply run the following command in the folder the site-
repository is cloned to:
docker-compose up -d
The collection of logs can be then checked in the Kafka Docker container via Kafka's console consumer:
/usr/bin/kafka-console-consumer --bootstrap-server lm11:9092 --topic collected-tenant-technology --from-beginning
The data are pumped in Parser from collected-tenant-technology
topic to lmio-events
or lmio-others
topic and then in Dispatcher (lmio-dispatcher
, see above) to lmio-tenant-events
or lmio-tenant-others
index in ElasticSearch.
SIEM¶
SIEM part should be now always discussed with TeskaLabs's developers, who will provide first correlation rules and entries to configuration files and Docker Compose. The SIEM
part consists mainly of different lmio-correlators
instances and lmio-watcher
.
For more information, see the LogMan.io Correlator section.
Connecting a new log source to LogMan.io¶
Prerequisites¶
Tenant¶
Each customer is assigned one or more tenants.
Name of the tenant must be a lowercase ASCII name, that tag the data/logs belonging to the user and store each tenant's data in a separate ElasticSearch index. All Event Lanes (see below) are also tenant specific.
Create the tenant in SeaCat Auth using LogMan.io UI¶
In order to create the tenant, log into the LogMan.io UI with the superuser role, which can be done through the provisioning. For more information about provisioning, please refer to Provisioning mode section of the SeaCat Auth documentation.
In LogMan.io UI, navigate to the Auth
section in the left menu and select Tenants
.
Once there, click on Create tenant
option and write the name of the tenant there.
Then click on the blue button and the tenant should be created:
After that, go to Credentials
and assign the newly created tenant to all relevant users.
ElasticSearch index templates¶
In Kibana, every tenant should have index templates for lmio-tenant-events
and lmio-tenant-others
indices, where tenant
is the name of the tenant (refer to the reference site-
repository provided by TeskaLabs), so proper data types are assigned to every field.
This is especially needed for time-based fields, which would not work without index template and could not be used for sorting and creating index patterns in Kibana.
The ElasticSearch index template should be present in the site-
repository
under the name es_index_template.json
.
The index templates can be inserted via Kibaba's Dev Tools from the left menu.
ElasticSearch index lifecycle policy¶
Index Lifecycle Management (ILM) in ElasticSearch serves to automatically close or delete old indices (f. e. with data older than three months), so searching performance is kept and data storage is able to store present data. The setting is present in the so-called ILM policy.
The ILM should be set before the data are pumped into ElasticSearch, so the new index finds and associates itself with the proper ILM policy. For more information, please refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html
LogMan.io components such as Dispatcher then use a specified ILM alias (lmio-) and ElasticSearch automatically put the data to the proper index assigned with the ILM policy.
The setting should be done in following way:
Create the ILM policy¶
Kibana version 7.x can be used to create ILM policy in ElasticSearch.
1.) Open Kibana
2.) Click Management in the left menu
3.) In the ElasticSearch section, click on Index Lifecycle Policies
4.) Click Create policy blue button
5.) Enter its name, which should be the same as the index prefix, f. e. lmio-
6.) Set max index size to the desired rollover size, f. e. 25 GB (size rollover)
7.) Set maximum age of the index, f. e. 10 days (time rollover)
8.) Click the switch down the screen at Delete phase, and enter the time after which the index should be deleted, f. e. 120 days from rollover
9.) Click on Save policy green button
Use the policy in index template¶
Add the following lines to the JSON index template:
"settings": {
"index": {
"lifecycle": {
"name": "lmio-",
"rollover_alias": "lmio-"
}
}
},
ElasticSearch indices¶
Through PostMan or Kibana, create a following HTTP request to the instance of ElasticSearch you are using.
1.) Create a index for parsed events/logs:
PUT lmio-tenant-events-000001
{
"aliases": {
"lmio-tenant-events": {
"is_write_index": true
}
}
}
2.) Create a index for unparsed and error events/logs:
PUT lmio-tenant-others-000001
{
"aliases": {
"lmio-tenant-others": {
"is_write_index": true
}
}
}
The alias is then going to be used by the ILM policy to distribute data to the proper ElasticSearch index, so pumps do not have to care about the number of the index.
//Note: The prefix and number of index for ILM rollover must be separated with -
000001, not _
000001!//
Event Lane¶
Event Lane in LogMan.io define how logs from a specific data source for a given tenant are sent to the cluster. Each event lane is specific for the collected source. Each event lane consists of one lmio-collector
service, one lmio-ingestor
service and one or more instances of lmio-parser
service.
Collector¶
LogMan.io Collector should run on the collector server or on one or more LogMan.io servers, if they are part of the same internal network. The configuration sample is part of the reference site-
repository.
LogMan.io Collector is able to, via YAML configuration, open a TCP/UDP port to obtain logs from, read files, open a WEC server, read from Kafka topics, Azure accounts and so on. The comprehensive documentation is available here: LogMan.io Collector
The following configuration sample opens 12009/UDP
port on the server the collector is installed to, and redirects the collected data via WebSocket to the lm11
server to port 8600
, where lmio-ingestor
should be running:
input:Datagram:UDPInput:
address: 0.0.0.0:12009
output: WebSocketOutput
output:WebSocket:WebSocketOutput:
url: http://lm11:8600/ws
tenant: mytenant
debug: false
prepend_meta: false
The url
is either the hostname of the server and port of the Ingestor, if Collector and Ingestor are deployed to the same server, or URL with https://
, if collector server outside of the internal network is used. It is then necessary to specify HTTPS certificates, please see the output:WebSocket
section in the LogMan.io Collector Outputs guide for more information.
The tenant
is the name of the tenant the logs belong to. The tenant name is then automatically propagated to Ingestor and Parser.
Ingestor¶
LogMan.io Ingestor takes the log messages from Collector along with metadata and stores them in Kafka in a topic, that begins with collected-tenant-
prefix, where tenant
is the tenant name the logs belong to and technology
the name of the technology the data are gathered from like microsoft-windows
.
The following sections in the CONF files are necessary to be always set up differently for each event lane:
# Output
[pipeline:WSPipeline:KafkaSink]
topic=collected-tenant-technology
# Web API
[web]
listen=0.0.0.0 8600
The port in the listen
section should match the port in the Collector YAML configuration (if the Collector is deployed to the same server) or the setting in nginx (if the data are collected from a collector server outside of the internal network). Please refer to the reference site-
repository provided by TeskaLabs' developers.
Parser¶
The parser should be deployed in more instances to scale the performance. It parses the data from original bytes or strings to a dictionary in the specified schema like ECS (ElasticSearch Schema) or CEF (Common Event Format), while using a parsing group from the library loaded in ZooKeeper. It is important to specify the Kafka topic to read from, which is the same topic as specified in the Ingestor configuration:
[declarations]
library=zk://lm11:2181/lmio/library.lib
groups=Parsers/parsing-group
raw_event=log.original
# Pipeline
[pipeline:ParsersPipeline:KafkaSource]
topic=collected-tenant-technology
group_id=lmio_parser_collected
auto.offset.reset=smallest
Parsers/parsing-group
is the location of the parsing group from the library loaded in ZooKeeper through LogMan.io Commander. It does not have to exist at the first try, because all data are then automatically send to lmio-tenant-others
index. When the parsing group is ready, the parsing takes place and the data can be seen in the document format in lmio-tenant-events
index.
Kafka topics¶
Before all three services are started via docker-compose up -d
command, it is important to check the state of the specific collected-tenant-technology
Kafka topic (where tenant
is the name of the tenant and technology
is the name of the connected technology/device type). In the Kafka container (f. e.: docker exec -it lm11_kafka_1 bash
), the following commands should be run:
/usr/bin/kafka-topics --zookeeper lm11:2181 --create --topic collected-tenant-technology --replication-factor 1 --partitions 6
/usr/bin/kafka-topics --zookeeper lm11:2181 --alter --topic collected-tenant-technology --config retention.ms=86400000
Parsing groups¶
For most common technologies, TeskaLabs have already prepared the parsing groups to ECS schema. Please get in touch with TeskaLabs developers. Since all parsers are written in the declarative language, all parsing groups in the library can be easily adjusted. The name of the group should be the same as the name of the dataset
attribute written in the parser groups' declaration.
For more information about our declarative language, please refer to the official documentation: SP-Lang
After the parsing group is deployed via LogMan.io Commander
, the appropriate Parser(s) should be restarted.
Deployment¶
On the LogMan.io servers, simply run the following command in the folder the site-
repository is cloned to:
docker-compose up -d
The collection of logs can be then checked in the Kafka Docker container via Kafka's console consumer:
/usr/bin/kafka-console-consumer --bootstrap-server lm11:9092 --topic collected-tenant-technology --from-beginning
The data are pumped in Parser from collected-tenant-technology
topic to lmio-events
or lmio-others
topic and then in Dispatcher (lmio-dispatcher
) to lmio-tenant-events
or lmio-tenant-others
index in ElasticSearch.
Kafka ↵
Kafka¶
The Apache Kafka serves as a queue to temporarily store events among the LogMan.io microservices. For more information, see Architecture
Kafka within LogMan.io¶
Topic naming in event lanes¶
Each event lane has received
, events
and others
topics specified.
Each topic name contains the name of the tenant and the event lane's stream in the following manner:
received.tenant.stream
events.tenant.stream
others.tenant
received.tenant.stream¶
The received
topic stores the incoming logs for the incoming tenant
and event lane's stream
.
events.tenant.stream¶
The events
topic stores the parsed events for the given event lane defined by tenant
and stream
.
others.tenant¶
The others
topic stores the unparsed events for the given tenant
.
Internal topics¶
There are the following internal topics for LogMan.io:
lmio-alerts¶
This topic stores the triggered alerts and is read by LogMan.io Alerts microservice.
lmio-notifications¶
This topic stores the triggered notifications and is read by ASAB IRIS microservice.
lmio-lookups¶
This topic stores the requested changes in lookups and is read by LogMan.io Watcher microservice.
Recommended setup for 3-node cluster¶
There are three instances of Apache Kafka, one on each node.
The number of partitions for each topic must be at least the same as the number of consumers (3) and divisible by 2, hence the recommended number for partitions is always 6.
The recommended replica count is 1.
Each topic must have a reasonable retention set based on the available size of SSD drives.
In the LogMan.io cluster environment, where average EPS is above 1000 events per second and SSD disk space is below 2 TB, the retention is usually 1 day (86400000 milliseconds). See the Commands section.
Hint
When the EPS is lower or there is more SSD space, it is recommended to set the retention for Kafka topics to higher values like 2 or more days in order to give to the administrators more time to solve potential issues.
To create the partitions, replicas and retention properly, see the Commands section.
Commands¶
The following commands serve to create, alter and delete Kafka topics within the LogMan.io environment. All Kafka topics managed by LogMan.io next to the internal ones are specified in event lane *.yaml
declarations inside /EventLanes
folder in the library.
Prerequisites¶
All commands should be run from Kafka Docker container, that can be accessed via the following command:
docker exec -it kafka_container bash
The command utilizes Kafka Command Line interface, which is documented here: Kafka Command-Line Interface (CLI) Tools
Create a topic¶
In order to create a topic, specify the topic name, number of partitions and replication factor. The replication factor should be set to 1 and partitions to 6, which is the default for LogMan.io Kafka topics.
/usr/bin/kafka-topics --zookeeper locahost:2181 --create --topic "events.tenant.fortigate" --replication-factor 1 --partitions 6
Replace events.tenant.fortigate
with your topic name.
Configure a topic¶
Retention¶
The following command changes the retention of data for Kafka topic to 86400000
milliseconds, that is 1 day. This means that data older than 1 day will be deleted from Kafka to spare storage space:
/usr/bin/kafka-configs --bootstrap-server localhost:9092 --entity-type topics --entity-name "events\.tenant\.fortigate" --alter --add-config retention.ms=86400000
events\.tenant\.fortigate
with your topic name.
Info
All Kafka topics in LogMan.io should have a retention for data set.
Info
When editing a topic setting in Kafka, the special characters like dot (.) should be escaped with slash (\).
Reseting a consumer group offset for a given topic¶
In order to reset the reading position, or the offset, for the given group ID (consumer group), use the following command:
/usr/bin/kafka-consumer-groups --bootstrap-server localhost:9092 --group "my-console-client" --topic "events\.tenant\.fortigate" --reset-offsets --to-datetime 2020-12-20T00:00:00.000 --execute
Replace events\.tenant\.fortigate
with your topic name.
Replace my-console-client
with the given group ID.
Replace 2020-12-20T00:00:00.000
with the time to reset the reading offset to.
Hint
To reset the group to the current offset, use --to-current instead of --to-datetime 2020-12-20T00:00:00.000.
Deleting a consumer group offset for a given topic¶
The offset for the given topic can be deleted from the consumer group, hence the consumer group would be effectively disconnected from the topic itself. Use the following command:
/usr/bin/kafka-consumer-groups --bootstrap-server localhost:9092 --group "my-console-client" --topic "events\.tenant\.fortigate" --delete-offsets
Replace events\.tenant\.fortigate
with your topic name.
Replace my-console-client
with the given group ID.
Deleting the consumer group¶
A consumer group for ALL topics can be deleted with its offset information using the following command:
/usr/bin/kafka-consumer-groups --bootstrap-server localhost:9092 --delete --group my-console-client
Replace my-console-client
with the given group ID.
Alter a topic¶
Change the number of partitions¶
The following command increases the number of partitions within the given topic.
/usr/bin/kafka-topics --zookeeper locahost:2181 --alter -partitions 6 --topic "events\.tenant\.fortigate"
Replace events\.tenant\.fortigate
with your topic name.
Specify ZooKeeper node
Kafka reads and alters data stored in ZooKeeper. In case you've configured Kafka so its files are stored in specific ZooKeeper node, you will get this error.
Error while executing topic command : Topic 'events.tenant.fortigate' does not exist as expected
[2024-05-06 10:16:36,207] ERROR java.lang.IllegalArgumentException: Topic 'events.tenant.fortigate' does not exist as expected
at kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:539)
at kafka.admin.TopicCommand$ZookeeperTopicService.alterTopic(TopicCommand.scala:408)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:66)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
(kafka.admin.TopicCommand$)
Adjust the --zookeeper
argument accordingly. E.g. Kafka data is stored in kafka
node of ZooKeeper:
/usr/bin/kafka-topics --zookeeper lm11:2181/kafka --alter --partitions 6 --topic 'events\.tenant\.fortigate'
Try to remove escape characters (/
) if the topic name is still not recognized.
Delete a topic¶
The topic can be deleted using the following command. Please keep in mind that Kafka topics are automatically created if new data are being produced/sent to it by any service.
/usr/bin/kafka-topics --zookeeper locahost:2181 --delete --topic "events\.tenant\.fortigate"
Replace events\.tenant\.fortigate
with your topic name.
Troubleshooting¶
There are many logs in others and I cannot find the ones with "interface" attribute inside¶
Kafka Console Consumer can be used to obtain events from multiple topics, here from all topics starting with events.
.
Next it is possible the grep the field in doublequotes:
/usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --whitelist "events.*" | grep '"interface"'
This command gives you all incoming logs with "interface"
attribute from all events topics.
Kafka Partition Reassignment¶
When a new Kafka node is added, Kafka automatically does not do the partition reassignment. The following steps are used to perform manual reassignment of Kafka partitions for specified topic(s):
1.) Go to Kafka container
docker exec -it kafka_container bash
2.) Create /tmp/topics.json
with topics whose partitions should be reassigned in the following format:
cat << EOF | tee /tmp/topics.json
{
"topics": [
{"topic": "events.tenant.stream"},
],
"version": 1
}
EOF
3.) Generate reassignment JSON output from list of topics to be migrated, specify the broker IDs in the broker list:
/usr/bin/kafka-reassign-partitions --zookeeper localhost:2181 --broker-list "121,122,221,222" --generate --topics-to-move-json-file /tmp/topics.json
The result should be stored in /tmp/reassign.json
and look as follows, with all topics and partitions having their new assigment specified:
[appuser@lm11 data]$ cat /tmp/reassign.json
{"version":1,"partitions":[{"topic":"events.tenant.stream","partition":0,"replicas":[122],"log_dirs":["any"]},{"topic":"events.tenant.stream","partition":1,"replicas":[221],"log_dirs":["any"]},{"topic":"events.tenant.stream","partition":2,"replicas":[222],"log_dirs":["any"]},{"topic":"events.tenant.stream","partition":3,"replicas":[121],"log_dirs":["any"]},{"topic":"events.tenant.stream","partition":4,"replicas":[122],"log_dirs":["any"]},{"topic":"events.tenant.stream","partition":5,"replicas":[221],"log_dirs":["any"]}]}
4.) Use the output from the previous command as input to the execution of the reassignment/rebalance:
/usr/bin/kafka-reassign-partitions --zookeeper localhost:2181 --execute --reassignment-json-file /tmp/reassign.json --additional --bootstrap-server localhost:9092
That's it! Now Kafka should perform the partitions reassignment within the following hours.
For more information, see Reassigning partitions in Apache Kafka Cluster .
Ended: Kafka
System monitoring ↵
System monitoring¶
The following tools and techniques can help you understand how your TeskaLabs LogMan.io system is performing and investigate any issues that arise.
Preset dashboards¶
LogMan.io includes preset diagnostic dashboards that give you insight into your system performance. This is the best place to start monitoring.
Prophylactic checks¶
Prophylacitic checks are preventative checkups on your LogMan.io app and system performance. Visit our prophylactic check manual to learn how to perform regular prophylactic checks.
Metrics¶
Metrics are measurements regarding system performance. Investigating metrics can be useful if you already know what area of your system you need insight into, which you can discover through analyzing your preset dashboards or performing a prophylactic check.
Grafana dashboards for system diagnostics¶
Through TeskaLabs LogMan.io, you can access dashboards in Grafana that monitor your data pipelines. Use these dashboards for diagnostic purposes.
The first few months of your deployment of TeskaLabs LogMan.io are a stabilization period, in which you might see extreme values produced by these metrics. These dashboards are especially useful during stabilization and can help with system optimization. Once your system is stable, extreme values, in general, indicate a problem.
To access the dashboards:
1. In LogMan.io, go to Tools.
2. Click on Grafana. You are now securely logged in to Grafana with your LogMan.io user credentials.
3. Click the menu button, and go to Dashboards.
4. Select the dashboard you want to see.
Tips
- Hover over any graph to see details at specific time points.
- You can change the timeframe of any dashboard with the timeframe tools in the top right corner of the screen.
LogMan.io dashboard¶
The LogMan.io dashboard monitors all data pipelines in your installation of TeskaLabs LogMan.io. This dashboard can help you investigate if, for example, you're seeing fewer logs than expected in LogMan.io. See Pipeline metrics for deeper explanations.
Metrics included:
-
Event In/Out: The volume of events passing through each data pipeline measured in in/out operations per second (io/s). If the pipeline is running smoothly, the In and Out quantities are equal, and the Drop line is zero. This means that the same amount of events are entering and leaving the pipeline, and none are dropped. If you can see in the graph that the In quantity is greater than the Out quantity, and that the Drop line is greater than zero, then some events have been dropped, and there might be an issue.
-
Duty cycle: Displays the percentage of data being processed as compared to data waiting to be processed. If the pipeline is working as expected, the duty cycle is at 100%. If the duty cycle is lower than 100%, it means that somewhere in the pipeline, there is a delay or a throttle causing events to queue.
-
Time drift: Shows you the delay or lag in event processing, meaning how long after an event's arrival it is actually processed. A significant or increased delay impacts your cybersecurity because it inhibits your ability to respond to threats immediately. Time drift and duty cycle are related metrics. There is a greater time drift when the duty cycle is below 100%.
System-level overivew dashboard¶
The System-level overview dashboard monitors the servers involved in your TeskaLabs LogMan.io installation. Each node of the installation has its own section in the dashboard. When you encounter a problem in your system, this dashboard helps you perform an initial assessment on your server by showing you if the issue is related to input/output, CPU usage, network, or disk space or usage. However, for a more specific analysis, pursue exploring specific metrics in Grafana or InfluxDB.
Metrics included:
- IOWait: Percentage of time the CPU remains idle while waiting for disk I/O (input/output) requests. In other words, IOWait tells you how much processing time is being wasted waiting for data. A high IOWait, especially if it's around or exceeds 20% (depending on your system), signals that the disk read/write speed is becoming a system bottleneck. A rising IOWait indicates that the disk's performance is limiting the system's ability to receive and store more logs, impacting overall system throughput and efficiency.
- Uptime: The amount of a time the server has been running without since last being shut down or restarted.
- Load: Represents the average number of processes waiting in the queue for CPU time over the last 5 minutes. It's a direct indicator of how busy your system is. In systems with multiple CPU cores, this metric should be considered in relation to the total number of available cores. For instance, a load of 64 on a 64-core system might be acceptable, but above 100 indicates severe stress and unresponsiveness. The ideal load varies based on the specific configuration and use case but generally should not exceed 80% of the total number of CPU cores. Consistently high load values indicate that the system is struggling to process the incoming stream of logs efficiently.
- RAM usage: The percentage of the total memory currently being used by the system. Keeping RAM usage between 60-80% is generally optimal. Usage above 80% often leads to increased swap usage, which in turn can slow down the system and lead to instability. Monitoring RAM usage is crucial for ensuring that the system has enough memory to handle the workload efficiently without resorting to swap, which is significantly slower.
- CPU usage: An overview of the percentage of CPU capacity currently in use. It averages the utilization across all CPU cores, which means individual cores could be under or over-utilized. High CPU usage, particularly over 95%, suggests the system is facing CPU-bound challenges, where the CPU's processing capacity is the primary limitation. This dashboard metric helps differentiate between I/O-bound issues (where the bottleneck is data transfer) and CPU-bound issues. It's a critical tool for identifying processing bottlenecks, although it's important to interpret this metric alongside other system indicators for a more accurate diagnosis.
- Swap usage: How much of the swap space is being used. A swap partition is dedicated space on the disk used as a temporary substitute for RAM ("data overflow"). When RAM is full, the system temporarily stores data in swap space. High swap usage, above approximately 5-10%, indicates that the system is running low on memory, which can lead to degraded performance and instability. Persistent high swap usage is a sign that the system requires more RAM, as relying heavily on swap space can become a major performance bottleneck.
- Disk usage: Measures how much of the storage capacity is currently being used. In your log management system, it's crucial to keep disk usage below 90% and take action if it reaches 80%. Inadequate disk space is a common cause of system failures. Monitoring disk usage helps in proactive management of storage resources, ensuring that there is enough space for incoming data and system operations. Since most systems are configured to delete data after 18 months of storage, disk space usage can begin to stabilize after the system has been running for 18 months. Read more about the data lifecycle.
Elasticsearch metrics dashboard¶
The Elasticsearch metrics dashboard monitors the health of the Elastic pipeline. (Most TeskaLabs LogMan.io users use the Elasticsearch database to store log data.)
Metrics included:
- Cluster health: Green is good; yellow and red indicate a problem.
- Number of nodes: A node is a single instance of Elasticsearch. The number of nodes is how many nodes are part of your LogMan.io Elasticsearch cluster.
- Shards
- Active shards: Number of total shards active. A shard is the unit at which Elasticsearch distributes data around a cluster.
- Unassigned shards: Number of shards that are not available. They might be in a node which is turned off.
- Relocating shards: Number of shards that are in the process of being moved to a different node. (You might want to turn off a node for maintenance, but you still want all of your data to be available, so you can move a shard to a different node. This metrics tells you if any shards are actively in this process and therefore can't provide data yet.)
- Used mem: Memory used. Used memory at 100% would mean that Elasticsearch is overloaded and requires investigation.
- Output queue: The number of tasks waiting to be processed in the output queue. A high number could indicate a significant backlog or bottleneck.
- Stored GB: The amount of disk space being used for storing data in the Elasticsearch cluster. Monitoring disk usage is helps to ensure that there's sufficient space available and to plan for capacity scaling as necessary.
- Docs count: The total number of documents stored within the Elasticsearch indices. Monitoring the document count can provide insights into data growth and index management requirements
- Task max waiting in queue: The maximum time a task has waited in a queue to be processed. It’s useful for identifying delays in task processing which could impact system performance and throughput.
- Open file descriptors: File descriptors are handles that allow the system to manage and access files and network connections. Monitoring the number of open file descriptors is important to ensure that system resources are being managed effectively and to prevent potential file handle leaks which could lead to system instability
- Used cpu %: The percentage of CPU resources currently being used by Elasticsearch. Monitoring CPU usage helps you understand the system's performance and identify potential CPU bottlenecks.
- Indexing: The rate at which new documents are being indexed into Elasticsearch. A higher rate means your system can index more information more efficiently.
- Inserts: The number of new documents being added to the Elasticsearch indices. This line follows a regular pattern if you have a consistent number of inputs. If the line spikes or dips irregularly, there could be an issue in your data pipeline keeping events from reaching Elasticsearch.
Burrow consumer lag dashboard¶
The Burrow dashboard monitors the consumers and partitions of Apache Kafka. Learn more about Burrow here.
Apache Kafka terms:
- Consumers: Consumers read data. They subscribe to one or more topics and read the data in the order in which it was produced.
- Consumer groups: Consumers are typically organized into consumer groups. Each consumer within a group reads from exclusive partitions of the topics they subscribe to, ensuring that each record is processed only once by the group, even if multiple consumers are reading.
- Partitions: Topics are split into partitions. This allows the data to be distributed across the cluster, allowing for concurrent read and write operations.
Metrics included:
- Group status: The overall health status of the consumer group. A status of OK means that the group is functioning normally, while a warning or error could indicate issues like connectivity problems, failed consumers, or misconfigurations.
- Total lag: In this case, lag can be thought of as a queue of tasks waiting to be processed by a microservice. The total lag metric represents the count of messages that have been produced to the topic but not yet consumed by a specific consumer or consumer group. If the lag is 0, everything is dispatched properly, and there is no queue. Because Apache Kafka tends to group data into batches, some amount of lag is often normal. However, an increasing lag, or a lag above approximately 300,000 (this number is dependent on your system capacity, configuration, and sensitivity) is cause for investigation.
- Partitions lag: The lag for individual partitions within a topic. Being able to see partitions' lags separated tells you if some partitions have a larger queue, or higher delay, than others, which might indicate uneven data distribution or other partition-specific issues.
- Partition status: The status of individual partitions. An OK status indicates the partition is operating normally. Warnings or errors can signify problems like a stalled consumer, which is not reading from the partition. This metric helps identify specific partition-level issues that might not be apparent when looking at the overall group status.
Prophylactic check manual¶
A prophylactic check is a systematic preventative assessment to verify that a system is working properly, and to identify and mitigate potential issues before they escalate into more severe or critical problems. By performing regular prophylactic checks, you can proactively maintain the integrity, reliability, and efficiency of your TeskaLabs LogMan.io system, minimizing the risk of unexpected failures or disruptions that could arise if left unaddressed.
Support
If you need any further information or support than what you see here, reach out to your TeskaLabs LogMan.io support Slack channel, or send an e-mail to support@teskalabs.com. We will assist you promptly.
Performing prophylactic checks¶
Important
Conduct prophylactic checks at consistent intervals, ideally on the same day of the week and around the same time. Remember that the volume and timing of incoming events can fluctuate depending on the day of the week, working hours, and holidays.
During prophylactic checks, make sure to conduct a comprehensive review of all available tenants.
Examine each of the following components of your TeskaLabs LogMan.io installment according to our recommendations, and report issues as needed.
TeskaLabs LogMan.io functionalities¶
Location: TeskaLabs LogMan.io sidebar
Goal: Ensuring that every functionality of the Teskalabs LogMan.io app works properly
Within the assigned tenant, thoroughly examine each component featured in the sidebar (Discover, Dashboards, Exports, Lookups, Reports, etc.) to ensure their proper operation. Issues identified in this section should be reported to your TeskaLabs support channel. This can include issues such as pop up errors when opening a section from sidebar, lost availability of some of the tools or for example not being able to open Dashboards.
Issue reporting: Utilize the support Slack channel for general reporting.
Log source monitoring¶
Location: TeskaLabs LogMan.io Discover screen or dedicated dashboard
Goal: Ensuring that each log source is active and works as expected and no anomalies are found (for example a drop out, peak, or anything unusual). This is also crucial for your log source visibility.
Note: Consider incorporating Baselines as another option for log source checks.
Log source monitoring can be achieved by individually reviewing each log source, or by creating an overview dashboard equipped with widgets for monitoring each log source's activity visually. We recommend creating a dashboard with line charts.
The examination should always cover a sample of data between each prophylactic check.
Issue reporting: In case of an inactive log source, conduct further investigation and report to your TeskaLabs LogMani.io Slack support channel.
Log time zones¶
Location: TeskaLabs LogMan.io Discover screen
Goal: Ensuring that there are no discrepancies between your time zone and time zone present in the logs
Investigate if there are any logs with a @timestamp
value that is a future time. You can do so by filtering the time range to from now to 2+ (or more) hours from now.
Issue reporting: Utilize the project support Slack for general reporting.
If the issue appears to be linked to the logging device settings, please investigate this further within your own network.
Other events¶
Location: TeskaLabs LogMan.io Discover screen, lmio-others-events
index
Goal: Ensuring all the events are parsed correctly using either Parsec or Parser.
In most installations, we collect error logs from the following areas:
-
Parser
-
Parsec
-
Dispatcher
-
Depositor
-
Unstructured logs
Logs that are not parsed correctly go to others index
. Ideally, no logs should be present in the others index
.
Issue reporting: If a few logs are found in others index
, such as those indicating incorrect parsing errors, it's generally not a severe problem requiring immediate attention. Investigate these logs further and report to your TeskaLabs LogMan.io support Slack channel.
System logs¶
Location: TeskaLabs LogMan.io - System tenant, index Events & Others.
Goal: Ensuring the system is working properly and there are no unusual or critical system logs that could signal any internal issue
Issue reporting: A multitude of log types may be found in this section. Reporting can be done either via your TeskaLabs LogMan.io Slack channel, or within your infrastructure.
Baseliner¶
Note
Baseliner is included only in advanced deployments of LogMan.io. If you would like to upgrade LogMan.io, contact support, and we'll be happy to assist you.
Location: TeskaLabs LogMan.io Discover screen filtering for event.dataset:baseliner
Goal: Ensuring that the Baseliner functionality is working properly and is detecting deviations from a calculated activity baseline.
Issue reporting: If the Baseliner is not active, report it to your TeskaLabs LogMan.io support Slack channel.
Elasticsearch¶
Location: Grafana, dedicated Elasticsearch dashboard
Goal: Ensuring that there are no malfunctions linked to Elasticsearch and services associated with it.
The assessment should always be based on a sample of data from the past 24 hours. This operational dashboard provides an indication of the proper functioning of Elasticsearch.
Key Indicators:
-
Inactive Nodes should be at zero.
-
System Health should be green. Any indication of yellow or red should be escalated to TeskaLabs LogMan.io Slack support channel immediately.
-
Unassigned Shards should be at zero and marked as green. Any value in yellow or above warrants monitoring and reporting.
Issue reporting: If there are any issues detected, ensure prompt escalation. Further investigation of the Elastic cluster can be conducted in Kibana/Stack monitoring.
Nodes¶
Detailed information about node health can be found in Elasticsearch. JVM Heap monitors memory usage.
Overview¶
The current EPS (events per second) of the entire Elastic cluster is visible.
Index sizing & lifecycle monitoring¶
Location: Kibana, Stack monitoring or Stack management
Follow these steps to analyze indices for abnormal size:
- Access the "Indices" section.
- Proceed to filter the "Data" column, arranging it from largest to smallest.
- Examine the indexes to identify any that exhibit a significantly larger size compared to the others.
The acceptable index size range is a topic for discussion, but generally, indices up to 200 GB are considered acceptable.
Any indices exceeding 200 GB in size should be reported.
In the case of indexes associated with ILM (index lifecycle management), it's crucial to verify the index status. If an index lacks a string of numbers at the end of its name, it indicates it is not linked to an ILM policy and may grow without automatic rollover. To confirm this, review the index's properties to check whether it falls under the hot, warm, or cold category. When indices are not connected to ILM, they tend to remain in a hot state or exhibit irregular shifts between hot, cold, and warm.
Please note that lookups do not have ILM and should always be considered in the hot state.
Issue reporting: Report to the dedicated project support Slack channel. Such reports should be treated with the utmost seriousness and escalated promptly.
System-Level Overview¶
Location: Grafana, dedicated System Level Overview dashboard
The assessment should always be based on a sample of data from the past 24 hours.
Key metrics to monitor:
-
Disk usage:
All values must not exceed 80%, except for/boot
, which should not exceed 95%. -
Load:
Values must not exceed 40%, and the maximum load should align with the number of cores. -
IOWait:
Indicates data processing and should only register as a small percentage, signifying that the device is waiting for data to load from the disk. -
RAM usage:
Further considerations should be made regarding the establishment of high-value thresholds.
In the case of multiple servers, ensure values are checked for each.
Issue reporting: Report to the dedicated project support Slack channel.
Burrow Consumer Lag¶
Location: Grafana, dedicated Burrow Consumer Lag dashboard
For Kafka Monitoring, scrutinize this dashboard for consumerGroup, with a specific focus on:
-
lmio dispatcher
-
lmio depositor
-
lmio baseliner
-
lmio correlator
-
lmio watcher
The lag value exhibiting an increasing trend over time indicates a problem that needs to be addressed immediately.
Issue reporting: If lag increases compared to the previous week's prophylaxis, promptly report this on the support Slack channel.
Depositor Monitoring¶
Location: Grafana, dedicated Depositor dashboard.
Key metrics to monitor:
-
Failed bulks
- Must be green and equal to zero -
Output Queue Size of Bulks
-
Duty Cycle
-
EPS IN & OUT
-
Successful Bulks
-
Failed Bulks
Issue reporting: Report to the dedicated project support Slack channel.
Metrics ↵
System monitoring metrics¶
When logs and events pass through the TeskaLabs LogMan.io, the logs and events are processed by several TeskaLabs microservices as well as Apache Kafka, and most deployments store data in Elasticsearch. Since the microservices and other technologies handle a huge volume of events, it is not practical to monitor them with logs. Instead, metrics, or measurements, monitor the status and health of each microservice and other parts of your system.
You can access the metrics in Grafana and/or InfluxDB with preset or custom visualizations. Each metric for each microservice updates approximately once per minute.
Viewing metrics¶
To access system monitoring metrics, you can use Grafana and/or InfluxDB through the TeskaLabs LogMan.io web app Tools page.
Using Grafana to view metrics¶
Preset dashboards¶
We deploy TeskaLabs LogMan.io with a prepared set of monitoring and diagnostic dashboards - details and instructions for access here. These dashboards give you a broader overview of what's going on in your system. We recommend consulting these dashboards first if you don't know what specfic metrics you want to investigate.
Using Grafana's Explore tool¶
1. In Grafana, click the (menu) button, and go to Explore.
2. Set data source to InfluxDB.
3. Use the clickable query builder:
Grafana query builder
FROM:
1. Measurement: Click on select measurement to choose a group of metrics. In this case, the metrics group is bspump.pipeline
.
2. Tag: Click the plus sign beside WHERE to select a tag. Since this example shows metrics from a microservice appclass::tag
is selected.
3. Tag value: Click select tag value, and select a value. In this example, the query will show metrics from the Parsec microservice.
Optionally, you can add additional filters in the FROM section, such as pipeline and host.
SELECT:
4. Fields: Add fields to add specific metrics to the query.
5. Aggregation: You can choose the aggregation method for each metric. Be aware that Grafana cannot display a graph in which some values are aggregated and others are non-aggregated.
GROUP BY:
6. Fill: You can choose fill(null)
or fill(none)
to decide how to fill gaps between data points. fill(null)
does not fill the gaps, so your resulting graph will be data points with space between. fill(none)
connects data points with a line, so you can more easily see trends.
4. Adjust the timeframe as needed, and click Run query.
For more information about Grafana's Explore function, visit Grafana's documentation.
Using InfluxDB to view metrics¶
If you have access to InfluxDB, you can use it to explore data. InfluxDB provides a query builder that allows you to filter out which metrics you want to see, and get visualizations (graphs) of those metrics.
To access InfluxDB:
- In the LogMan.io web app, go to Tools.
- Click on InfluxDB, and log in.
Using the query builder:
This example guides you through investigating a metric that is specific to a microservice, such as a pipeline monitoring metric. If you're seeking a metric that does not involve a microservice, begin with the _measurement
tag, then filter with additional relevant tags.
- In InfluxDB, in the left sidebar, click the icon to go to the Data Explorer. Now, you can see InfluxDB's visual query builder.
- In the first box, select a bucket. (Your metrics bucket is most likely either named
metrics
or named after your organization.) - In the next filter, select appclass from the drop-down menu to see the list of microservices that produce metrics. Click on the microservice from which you want to see metrics.
- In the next filter, select _measurement from the drop-down menu to see the list of metrics groups. Select the group you want to see.
- In the next filter, select _field from the drop-down menu to see the list of metrics available. Select the metrics you want to see.
- A microservice can have multiple pipelines. To narrow your results to a specific pipeline, use an additional filter. Select pipeline from the drop-down menu, and select the pipeline(s) you want represented.
- Optionally, you can also select a host in the next filter. Without filtering, InfluxDB displays the data from all hosts available, but you likely have only one host. To select a host, choose host in the drop-down menu, and select a host.
- Change the timeframe if desired.
- To load the visualization, click Submit.
Visualization produced in this example:
For more information about InfluxDB's Data explorer function, visit InfluxDB's documentation.
Pipeline metrics¶
Pipeline metrics, or measurements, monitor the throughput of logs and events in the microservices' pipelines. You can use these pipeline metrics to understand the status and health of each microservice.
The data that moves through microservices is broken down to and measured in events. (Each event is one message in Kafka and will result in one entry in Elasticsearch.) Since events are countable, the metrics quantify the throughput, allowing you to assess pipeline status and health.
BSPump
Several TeskaLabs microservices are built on the technology BSPump, so the names of the metrics include bspump
.
Microservices built on BSPump:
Microservice architecture
The internal architecture of each microservice differs and might affect your analysis of the metrics. Visit our Architecture page.
The microservices most likely to produce uneven event.in
and event.out
counter metrics without actually having an error are:
- Parser/Parsec - This is due to its internal architecture; the parser sends events into a different pipeline (Enricher), where the events are then not counted in
event.out
. - Correlator - Since the correlator assesses events as they are involved in patterns, it often has a lower
event.out
count thanevent.in
.
Metrics¶
Naming and tags in Grafana and InfluxDB
- Pipeline metrics groups are under the
measurement
tag. - Pipeline metrics are produced for microservices (tag
appclass
) and can be further filtered with the additional tagshost
andpipeline
. - Each individual metric (for example,
event.in
) is a value in thefield
tag.
All metrics update automatically once per minute by default.
bspump.pipeline
¶
event.in
¶
Description: Counts the number of events entering the pipeline
Unit: Number (of events)
Interpretation: Observing event.in
over time can show you patterns, spikes, and trends in how many events have been received by the microservice. If no events are coming in, event.in
is a line at 0. If you are expecting throughput, and event.in
is 0, there is a problem in the data pipeline.
event.out
¶
Description: Counts the number of events leaving the pipeline successfully
Unit: Number (of events)
Interpretation: event.out
should typically be the same as event.in
, but there are exceptions. Some microservices are constructed to have either multiple outputs per input, or to divert data in such a way that the output is not detected by this metric.
event.drop
¶
Description: Counts the number of events that have been dropped, or messages that have been lost, by a microservice.
Unit: Number (of events)
Interpretation: Since the microservices built on BSPump are generally not designed to drop messages, any drop is most likely an error.
When you hover over a graph in InfluxDB, you can see the values of each line at any point in time. In this graph, you can see that event.out
is equal to event.in
, and event.drop
equals 0, which is the expected behavior of the microservice. The same number of events are leaving as are entering the pipeline, and no events are being dropped.
warning
¶
Description: Counts the number of warnings produced in a pipeline.
Unit: Number (of warnings)
Interpretation: Warnings tell you that there is an issue with the data, but the pipeline was still able to process it. A warning is less severe than an error.
error
¶
Description: Counts the number of errors in a pipeline.
Unit: Number (of errors)
Interpretation: Microservices might trigger errors for different reasons. The main reason for an error is that the data does not match the microservice's expectation, and the pipeline has failed to process that data.
bspump.pipeline.eps
¶
EPS means events per second.
eps.in
¶
Description: "Events per second in" - Rate of events successfully entering the pipeline
Unit: Events per second (rate)
Interpretation: eps.in
should stay consistent over time if If a microservice's eps.in
slows over time unexpectedly, there might be a problem in the data pipeline before the microservice.
eps.out
¶
Description: "Events per second out" - Rate of events successfully leaving the pipeline
Unit: Events per second (rate)
Interpretation: Similar to event.in
and event.out
, eps.in
and eps.out
should typically be the same, but they could differ depending on the microservice. If events are entering the microservice much faster than they are leaving, and this is not the expected behavior of that pipeline, you might need to address an error causing a bottleneck in the microservice.
eps.drop
¶
Description: "Events per second dropped" - rate of events being dropped in the pipeline
Unit: Events per second (rate)
Interpretation: See event.drop
. If eps.drop
rapidly increases, and it is not the expected behavior of the microservice, that indicates that events are being dropped, and there is a problem in the pipeline.
Similar to graphing event.in
and event.out
, the expected behavior of most microservices is for eps.out
to equal eps.in
with drop
being equal to 0.
warning
¶
Description: Counts the number of warnings produced in a pipeline in the specified timeframe.
Unit: Number (of warnings)
Interpretation: Warnings tell you that there is an issue with the data, but the pipeline was still able to process it. A warning is less severe than an error.
error
¶
Description: Counts the number of errors in a pipeline in the specified timeframe.
Unit: Number (of errors)
Interpretation: Microservices might trigger errors for different reasons. The main reason for an error is that the data does not match the microservice's expectation, and the pipeline has failed to process that data.
bspump.pipeline.gauge
¶
A gauge metric, percentage expressed as a number 0 to 1.
warning.ratio
¶
Description: Ratio of events that generated warnings compared to the total number of successfully processed events.
Interpretation: If the warning ratio increases unexpectedly, investigate the pipeline for problems.
error.ratio
¶
Description: Ratio of events that failed to process compared to the total number of successfully processed events.
Interpretation: If the error ratio increases unexpectedly, investigate the pipeline for problems. You could create a trigger to notify you when error.ratio
exceeds, for example, 5%.
bspump.pipeline.dutycycle
¶
The duty cycle (also called power cycle) describes if a pipeline is waiting for messages (ready, value 1) or unable to process new messages (busy, value 0).
In general:
- A value of 1 is acceptable because the pipeline can process new messages
- A value 0 indicates a problem, because the pipeline cannot process new messages.
Understanding the idea of duty cycle
We can use human productivity to explain the concept of the duty cycle. If a person is not busy at all and has nothing to do, they are just waiting for a task. Their duty cycle reading is at 100% - they are spending all of their time waiting and can take on more work. If a person is busy doing something and cannot take on any more tasks, their duty cycle is at 0%.
The above example (not taken from InfluxDB) shows what a change in duty cycle looks like on a very short time scale. In this example, the pipeline had two instances of being at 0, meaning not ready and unable to process new incoming events. Keep in mind that your system's duty cycle can fluctuate between 1 or 0 thousands of times per second; the duty cycle ready
graphs you'll see in Grafana or InfluxDB will already be aggregated (more below).
ready
¶
Description: ready
aggregates (averages) the duty cycle values once per minute. While duty cycle is expressed as 0 (false, busy) or 1 (true, waiting), the ready
metric represents the percentage of time the duty cycle is at 0 or 1. Therefore, the value of ready
is a percentage anywhere between 0 and 1, so the graph does not look like a typical duty cycle graph.
Unit: Percentage expressed as a number, 0 to 1
Interpretation: Monitoring the duty cycle is critical to understanding your system's capacity. While every system is different, in general, ready
should stay above 70%. If ready
goes below 70%, that means the duty cycle has dropped to 0 (busy) more than 30% of the time in that interval, indicating that the system is quite busy and requires some attention or adjustment.
The above graph shows that the majority of the time, the duty cycle was ready more than 90% of the time over the course of these two days. However, there are two points at which it dropped near and below 70%.
timedrift
¶
The timedrift
metric serves as a way to understand how much the timing of events' origins (usually @timestamp
) varies from what the system considers to be the "current" time. This can be helpful for identifying issues like delays or inaccuracies in a microservice.
Each value is calculated once per minute by default:
avg
¶
Average. This calculates the average time difference between when an event actually happened and when your system recorded it. If this number is high, it may indicate a consistent delay.
median
¶
Median. This tells you the middle value of all timedrifts for a set interval, offering a more "typical" view of your system's timing accuracy. The median
is less sensitive to outliers than average
, since it is a value and not a calculation.
stddev
¶
Standard deviation. This gives you an idea of how much the timedrift varies. A high standard deviation might mean that your timing is inconsistent, which could be problematic.
min
¶
Minimum. This shows the smallest timedrift in your set of data. It's useful for understanding the best-case scenario in your system's timing accuracy.
max
¶
Maximum. This indicates the largest time difference. This helps you understand the worst-case scenario, which is crucial for identifying the upper bounds of potential issues.
In this graph of time drift, you can see a spike in lag before the pipeline returns to normal.
commlink
¶
The commlink is the communication link between LogMan.io Collector and LogMan.io Receiver. These metrics are specific to data sent from the Collector microservice to the Receiver microservice.
Tags: ActivityState
, appclass
(LogMan.io Receiver only), host
, identity
, tenant
- bytes.in: bytes that enter LogMan.io Receiver
- event.in: events that enter LogMan.io Receiver
logs
¶
Count of logs that pass through microservices.
Tags: appclass
, host
, identity
, instance_id
, node_id
, service_id
, tenant
- critical: Count of critical logs
- errors: Count of error logs
- warnings: Count of warning logs
Disk usage metrics¶
Monitor your disk usage carefully to avoid a common cause of system failure.
disk
¶
Metrics to monitor disk usage. See the InfluxDB Telegraf plugin documentation for more.
Tags: device
, fstype
(file system type), mode
, node_id
, path
- free: Total amount of free disk space available on the storage device, measured in bytes
- inodes_free: The number of free inodes, which corresponds to the number of free file descriptors available on the file system.
- inodes_total: The total number of inodes or file descriptors that the file system supports.
- inodes_used: The number of inodes or file descriptors currently being used on the file system.
- total: Total capacity of the disk or storage device, measured in bytes.
- used: The amount of disk space currently in use, calculated in bytes.
- used_percent: The percentage of the disk space that is currently being used in relation to the total capacity.
diskio
¶
Metrics to monitor disk traffic and timing. Consult the InfluxDB Telegraf plugin documentation for the definition of each metric.
Tags: name
, node_id
, wwid
- io_time
- iops_in_progress
- merged_reads
- merged_writes
- read_bytes
- read_time
- reads
- weighted_io_time
- write_bytes
- write_time
- writes
System performance metrics¶
cpu
¶
Metrics to monitor system CPUs. See the InfluxDB Telegraf plugin documentation for more.
Tags: ActivityState
, cpu
, node_id
- time_active: Total time the CPU has been active, performing tasks excluding idle time.
- time_guest: Time spent running a virtual CPU for guest operating systems.
- time_guest_nice: Time the CPU spent running a niced guest (a guest with a positive niceness value).
- time_idle: Total time the CPU was not in use (idle).
- time_iowait: Time the CPU was idle while waiting for I/O operations to complete.
- time_irq: Time spent handling hardware interrupts.
- time_nice: Time the CPU spent processing user processes with a positive niceness value.
- time_softirq: Time spent handling software interrupts.
- time_steal: Time that a virtual CPU waited for a real CPU while the hypervisor was servicing another virtual processor.
- time_system: Time the CPU spent running system (kernel) processes.
- time_user: Time spent on executing user processes.
- usage_active: Percentage of time the CPU was active, performing tasks.
- usage_guest: Percentage of CPU time spent running virtual CPUs for guest OSes.
- usage_guest_nice: Percentage of CPU time spent running niced guests.
- usage_idle: Percentage of time the CPU was idle.
- usage_iowait: Percentage of time the CPU was idle due to waiting for I/O operations.
- usage_irq: Percentage of time spent handling hardware interrupts.
- usage_nice: Percentage of CPU time spent on processes with a positive niceness.
- usage_softirq: Percentage of time spent handling software interrupts.
- usage_steal: Percentage of time a virtual CPU waited for a real CPU while the hypervisor serviced another processor.
- usage_system: Percentage of CPU time spent on system (kernel) processes.
- usage_user: Percentage of CPU time spent executing user processes.
mdstat
¶
Statistics about Linux MD RAID arrays configured on the host. RAID (redundant array of inexpensive or independent disks) combines multiple physical disks into one unit for the purpose of data redundancy (and therefore safety or protection against loss in the case of disk failure) as well as system performance (faster data access). Visit the InfluxDB Telegraf plugin documentation for more.
Tags: ActivityState
(active or inactive), Devices
, Name
, _field
, node_id
- BlocksSynced: The count of blocks that have been scanned if the array is rebuilding/checking
- BlocksSyncedFinishTime: Minutes remaining in the expected finish time of the rebuild scan
- BlocksSyncedPct: Percentage remaining of the rebuild scan
- BlocksSyncedSpeed: The current speed the rebuild is running at, listed in K/sec
- BlocksTotal: The count of total blocks in the array
- DisksActive: Number of disks in the array that are currently considered healthy
- DisksDown: Number of disks in the array that are currently down, or non-operational
- DisksFailed: Count of currently failed disks in the array
- DisksSpare: Count of spare disks in the array
- DisksTotal: Count of total disks in the array
processes
¶
All processes, grouped by status. Find the InfluxDB Telegraf plugin documentation here.
Tags: node_id
- blocked: Number of processes in a blocked state, waiting for resource or event to become available.
- dead: Number of processes that have finished execution but still have an entry in the process table.
- idle: Number of processes in an idle state, typically indicating they are not actively doing any work.
- paging: Number of processes that are waiting for paging, either swapping into our out from disk.
- running: Number of processes that are currently executing or ready to execute.
- sleeping: Number of processes that are in a sleep state, inactive until certain conditions are met or events occur.
- stopped: Number of processes that are stopped, typically due to receiving a signal or being in debug.
- total: Total number of processes currently existing in the system.
- total_threads: The total number of threads across all processes, as processes can have multiple threads.
- unknown: Number of processes in an unknown state, where their state can't be determined.
- zombies: Number of zombie processes, which have completed execution but still have an entry in the process table due to the parent process not reading its exit status.
system
¶
These metrics provide general information about the system load, uptime, and number of users logged in. Visit the InfluxDB Telegraf plugin for details.
Tags: node_id
- load1: The average system load over the last one minute, indicating the number of processes in the system's run queue.
- load15: The average system load over the last 15 minutes, providing a longer-term view of the recent system load.
- load5: The average system load over the last 5 minutes, offering a shorter-term perspective of the recent system load.
- n_cpus: The number of CPU cores available in the system.
- uptime: The total time in seconds that the system has been running since its last startup or reboot.
temp
¶
Temperature readings as collected by system sensors. Visit the InfluxDB Telegraf plugin documentation for details.
Tags: node_id
, sensor
- temp: Temperature
Network-specific metrics¶
net
¶
Metrics for network interface and protocol usage for Linux systems. Monitoring the volume of data transfer and potential errors is important to understanding the network health and performance. Visit the InfluxDB Telegraf plugin documentation for details.
Tags: interface
, node_id
bytes fields: Monitoring the volume of data transfer, which is important to bandwidth management and network capacity planning.
- bytes_recv: The total number of bytes received by the interface
- bytes_sent: The total number of bytes sent by the interface
drop fields: Dropped packets are often a sign of network congestion, hardware issues, or incorrect configurations. Dropped packets can lead to performance degradation.
- drop_in: The total number of received packets dropped by the interface
- drop_out: The total number of transmitted packets dropped by the interface
error fields: High error rates can signal issues with the network hardware, interference, or configuration problems.
- err_in: The total number of receive errors detected by the interface
- err_out: The total number of transmit errors detected by the interface
packet fields: The number of packets sent and received gives an indication of network traffic and can help identify if the network is under heavy load or if there are issues with packet transmission.
- packets_recv: The total number of packets sent by the interface
- packets_sent: The total number of packets received by the interface
nstat
¶
Network metrics. Visit the InfluxDB Telegraf plugin documentation for more.
Tags: name
, node_id
ICMP fields¶
ICMP (internet control message protocol) metrics are used for network diagnostics and control messages, like error reporting and operational queries. Visit this page for additional field definitions.
Key terms:
- Echo requests/replies (ping): Used to test reachability and round-trip time.
- Destination unreachable: Indicates that a destination is unreachable.
- Parameter problems: Signals issues with IP header parameters.
- Redirect messages: Instructs to use a different route.
- Time exceeded messages: Indicates that the time to live (TTL) for a packet has expired.
IP fields¶
IP (internet protocol) metrics monitor the core protocol for routing packets across the internet and local networks.
Visit this page for additional field definitions.
Key terms:
- Address errors: Errors related to incorrect or unreachable IP addresses.
- Header errors: Problems in the IP header, such as incorrect checksums or formatting issues.
- Delivered packets: Packets successfully delivered to their destination.
- Discarded packets: Packets discarded due to errors or lack of buffer space.
- Forwarded datagrams: Packets routed to their next hop towards the destination.
- Reassembly failures: Failure in reassembling fragmented IP packets.
- IPv6 multicast/broadcast packets: Packets sent to multiple destinations or all nodes in a network segment in IPv6.
TCP fields¶
These metrics monitor the TCP, or transmission control protocol, which provides reliable, ordered, and error-checked delivery of data between applications. Visit this page for additional field definitions.
Key terms:
- Connection opens: Initiating a new TCP connection.
- Segments: Units of data transmission in TCP.
- Reset segments (RST): Used to abruptly close a connection.
- Retransmissions: Resending data that was not successfully received.
- Active/passive connection openings: Connections initiated actively (outgoing) or passively (incoming).
- Checksum errors: Errors detected in the TCP segment checksum.
- Timeout retransmissions: Resending data after a timeout, indicating potential packet loss.
UDP fields¶
These metrics monitor the UDP, or user datagram protocol, which facilitates low-latency (low-delay) but less reliable data transmission compared to TCP. Visit this page for additional field definitions.
- Datagrams: Basic transfer units in UDP.
- Receive/send buffer errors: Errors due to insufficient buffer space for incoming/outgoing data.
- No ports: Datagrams sent to a port with no listener.
- Checksum errors: Errors in the checksum field of UDP datagrams.
All nstat
fields
- Icmp6InCsumErrors
- Icmp6InDestUnreachs
- Icmp6InEchoReplies
- Icmp6InEchos
- Icmp6InErrors
- Icmp6InGroupMembQueries
- Icmp6InGroupMembReductions
- Icmp6InGroupMembResponses
- Icmp6InMLDv2Reports
- Icmp6InMsgs
- Icmp6InNeighborAdvertisements
- Icmp6InNeighborSolicits
- Icmp6InParmProblems
- Icmp6InPktTooBigs
- Icmp6InRedirects
- Icmp6InRouterAdvertisements
- Icmp6InRouterSolicits
- Icmp6InTimeExcds
- Icmp6OutDestUnreachs
- Icmp6OutEchoReplies
- Icmp6OutEchos
- Icmp6OutErrors
- Icmp6OutGroupMembQueries
- Icmp6OutGroupMembReductions
- Icmp6OutGroupMembResponses
- Icmp6OutMLDv2Reports
- Icmp6OutMsgs
- Icmp6OutNeighborAdvertisements
- Icmp6OutNeighborSolicits
- Icmp6OutParmProblems
- Icmp6OutPktTooBigs
- Icmp6OutRedirects
- Icmp6OutRouterAdvertisements
- Icmp6OutRouterSolicits
- Icmp6OutTimeExcds
- Icmp6OutType133
- Icmp6OutType135
- Icmp6OutType143
- IcmpInAddrMaskReps
- IcmpInAddrMasks
- IcmpInCsumErrors
- IcmpInDestUnreachs
- IcmpInEchoReps
- IcmpInEchos
- IcmpInErrors
- IcmpInMsgs
- IcmpInParmProbs
- IcmpInRedirects
- IcmpInSrcQuenchs
- IcmpInTimeExcds
- IcmpInTimestampReps
- IcmpInTimestamps
- IcmpMsgInType3
- IcmpMsgOutType3
- IcmpOutAddrMaskReps
- IcmpOutAddrMasks
- IcmpOutDestUnreachs
- IcmpOutEchoReps
- IcmpOutEchos
- IcmpOutErrors
- IcmpOutMsgs
- IcmpOutParmProbs
- IcmpOutRedirects
- IcmpOutSrcQuenchs
- IcmpOutTimeExcds
- IcmpOutTimestampReps
- IcmpOutTimestamps
- Ip6FragCreates
- Ip6FragFails
- Ip6FragOKs
- Ip6InAddrErrors
- Ip6InBcastOctets
- Ip6InCEPkts
- Ip6InDelivers
- Ip6InDiscards
- Ip6InECT0Pkts
- Ip6InECT1Pkts
- Ip6InHdrErrors
- Ip6InMcastOctets
- Ip6InMcastPkts
- Ip6InNoECTPkts
- Ip6InNoRoutes
- Ip6InOctets
- Ip6InReceives
- Ip6InTooBigErrors
- Ip6InTruncatedPkts
- Ip6InUnknownProtos
- Ip6OutBcastOctets
- Ip6OutDiscards
- Ip6OutForwDatagrams
- Ip6OutMcastOctets
- Ip6OutMcastPkts
- Ip6OutNoRoutes
- Ip6OutOctets
- Ip6OutRequests
- Ip6ReasmFails
- Ip6ReasmOKs
- Ip6ReasmReqds
- Ip6ReasmTimeout
- IpDefaultTTL
- IpExtInBcastOctets
- IpExtInBcastPkts
- IpExtInCEPkts
- IpExtInCsumErrors
- IpExtInECT0Pkts
- IpExtInECT1Pkts
- IpExtInMcastOctets
- IpExtInMcastPkts
- IpExtInNoECTPkts
- IpExtInNoRoutes
- IpExtInOctets
- IpExtInTruncatedPkts
- IpExtOutBcastOctets
- IpExtOutBcastPkts
- IpExtOutMcastOctets
- IpExtOutMcastPkts
- IpExtOutOctets
- IpForwDatagrams
- IpForwarding
- IpFragCreates
- IpFragFails
- IpFragOKs
- IpInAddrErrors
- IpInDelivers
- IpInDiscards
- IpInHdrErrors
- IpInReceives
- IpInUnknownProtos
- IpOutDiscards
- IpOutNoRoutes
- IpOutRequests
- IpReasmFails
- IpReasmOKs
- IpReasmReqds
- IpReasmTimeout
- TcpActiveOpens
- TcpAttemptFails
- TcpCurrEstab
- TcpEstabResets
- TcpExtArpFilter
- TcpExtBusyPollRxPackets
- TcpExtDelayedACKLocked
- TcpExtDelayedACKLost
- TcpExtDelayedACKs
- TcpExtEmbryonicRsts
- TcpExtIPReversePathFilter
- TcpExtListenDrops
- TcpExtListenOverflows
- TcpExtLockDroppedIcmps
- TcpExtOfoPruned
- TcpExtOutOfWindowIcmps
- TcpExtPAWSActive
- TcpExtPAWSEstab
- TcpExtPAWSPassive
- TcpExtPruneCalled
- TcpExtRcvPruned
- TcpExtSyncookiesFailed
- TcpExtSyncookiesRecv
- TcpExtSyncookiesSent
- TcpExtTCPACKSkippedChallenge
- TcpExtTCPACKSkippedFinWait2
- TcpExtTCPACKSkippedPAWS
- TcpExtTCPACKSkippedSeq
- TcpExtTCPACKSkippedSynRecv
- TcpExtTCPACKSkippedTimeWait
- TcpExtTCPAbortFailed
- TcpExtTCPAbortOnClose
- TcpExtTCPAbortOnData
- TcpExtTCPAbortOnLinger
- TcpExtTCPAbortOnMemory
- TcpExtTCPAbortOnTimeout
- TcpExtTCPAutoCorking
- TcpExtTCPBacklogDrop
- TcpExtTCPChallengeACK
- TcpExtTCPDSACKIgnoredNoUndo
- TcpExtTCPDSACKIgnoredOld
- TcpExtTCPDSACKOfoRecv
- TcpExtTCPDSACKOfoSent
- TcpExtTCPDSACKOldSent
- TcpExtTCPDSACKRecv
- TcpExtTCPDSACKUndo
- TcpExtTCPDeferAcceptDrop
- TcpExtTCPDirectCopyFromBacklog
- TcpExtTCPDirectCopyFromPrequeue
- TcpExtTCPFACKReorder
- TcpExtTCPFastOpenActive
- TcpExtTCPFastOpenActiveFail
- TcpExtTCPFastOpenCookieReqd
- TcpExtTCPFastOpenListenOverflow
- TcpExtTCPFastOpenPassive
- TcpExtTCPFastOpenPassiveFail
- TcpExtTCPFastRetrans
- TcpExtTCPForwardRetrans
- TcpExtTCPFromZeroWindowAdv
- TcpExtTCPFullUndo
- TcpExtTCPHPAcks
- TcpExtTCPHPHits
- TcpExtTCPHPHitsToUser
- TcpExtTCPHystartDelayCwnd
- TcpExtTCPHystartDelayDetect
- TcpExtTCPHystartTrainCwnd
- TcpExtTCPHystartTrainDetect
- TcpExtTCPKeepAlive
- TcpExtTCPLossFailures
- TcpExtTCPLossProbeRecovery
- TcpExtTCPLossProbes
- TcpExtTCPLossUndo
- TcpExtTCPLostRetransmit
- TcpExtTCPMD5NotFound
- TcpExtTCPMD5Unexpected
- TcpExtTCPMTUPFail
- TcpExtTCPMTUPSuccess
- TcpExtTCPMemoryPressures
- TcpExtTCPMinTTLDrop
- TcpExtTCPOFODrop
- TcpExtTCPOFOMerge
- TcpExtTCPOFOQueue
- TcpExtTCPOrigDataSent
- TcpExtTCPPartialUndo
- TcpExtTCPPrequeueDropped
- TcpExtTCPPrequeued
- TcpExtTCPPureAcks
- TcpExtTCPRcvCoalesce
- TcpExtTCPRcvCollapsed
- TcpExtTCPRenoFailures
- TcpExtTCPRenoRecovery
- TcpExtTCPRenoRecoveryFail
- TcpExtTCPRenoReorder
- TcpExtTCPReqQFullDoCookies
- TcpExtTCPReqQFullDrop
- TcpExtTCPRetransFail
- TcpExtTCPSACKDiscard
- TcpExtTCPSACKReneging
- TcpExtTCPSACKReorder
- TcpExtTCPSYNChallenge
- TcpExtTCPSackFailures
- TcpExtTCPSackMerged
- TcpExtTCPSackRecovery
- TcpExtTCPSackRecoveryFail
- TcpExtTCPSackShiftFallback
- TcpExtTCPSackShifted
- TcpExtTCPSchedulerFailed
- TcpExtTCPSlowStartRetrans
- TcpExtTCPSpuriousRTOs
- TcpExtTCPSpuriousRtxHostQueues
- TcpExtTCPSynRetrans
- TcpExtTCPTSReorder
- TcpExtTCPTimeWaitOverflow
- TcpExtTCPTimeouts
- TcpExtTCPToZeroWindowAdv
- TcpExtTCPWantZeroWindowAdv
- TcpExtTCPWinProbe
- TcpExtTW
- TcpExtTWKilled
- TcpExtTWRecycled
- TcpInCsumErrors
- TcpInErrs
- TcpInSegs
- TcpMaxConn
- TcpOutRsts
- TcpOutSegs
- TcpPassiveOpens
- TcpRetransSegs
- TcpRtoAlgorithm
- TcpRtoMax
- TcpRtoMin
- Udp6IgnoredMulti
- Udp6InCsumErrors
- Udp6InDatagrams
- Udp6InErrors
- Udp6NoPorts
- Udp6OutDatagrams
- Udp6RcvbufErrors
- Udp6SndbufErrors
- UdpIgnoredMulti
- UdpInCsumErrors
- UdpInDatagrams
- UdpInErrors
- UdpLite6InCsumErrors
- UdpLite6InDatagrams
- UdpLite6InErrors
- UdpLite6NoPorts
- UdpLite6OutDatagrams
- UdpLite6RcvbufErrors
- UdpLite6SndbufErrors
- UdpLiteIgnoredMulti
- UdpLiteInCsumErrors
- UdpLiteInDatagrams
- UdpLiteInErrors
- UdpLiteNoPorts
- UdpLiteOutDatagrams
- UdpLiteRcvbufErrors
- UdpLiteSndbufErrors
- UdpNoPorts
- UdpOutDatagrams
- UdpRcvbufErrors
- UdpSndbufErrors
Authorization-specific metrics¶
TeskaLabs SeaCat Auth (as seen in tag appclass
) handles all LogMan.io authorization, including credentials, logins, and sessions.
credentials
¶
Tags: appclass
(SeaCat Auth only), host
, instance_id
, node_id
, service_id
- default: The number of credentials (user accounts) existing in your deployment of TeskaLabs LogMan.io.
logins
¶
Count of failed and successful logins via TeskaLabs SeaCat Auth.
Tags: appclass
(SeaCat Auth only), host
, insance_id
, node_id
, service_id
- failed: Counts failed login attempts. Reports at the time of the login.
- successful: Counts successful logins. Reports at the time of the login.
sessions
¶
A session begins any time a user logs in to LogMan.io, so the sessions
metric counts open sessions.
Tags: appclass
(SeaCat Auth only), host
, instance_id
, node_id
, service_id
- sessions: Number of sessions open at the time
Memory metrics¶
By monitoring memory usage metrics, you can understand how memory resources are being used. This, in turn, can provide insights into areas that may need optimization or adjustment.
memory
and os.stat
¶
Tags: appclass
, host
, identity
, instance_id
, node_id
, service_id
, tenant
VmPeak
¶
Meaning: Peak virtual memory size. This is the peak or current total of virtual memory being used by the microservice. Virtual memory includes both physical RAM and disk swap space (the sum of all virtual memory areas involved in the process).
Interpretation: Monitoring the peak can help you identify if a service is using more memory than expected, potentially indicating a memory leak or a requirement for optimization.
VmLck
¶
Meaning: Locked memory size. This indicates the portion of memory that is locked in RAM and can't be swapped out to disk.
Interpretation: A high amount of locked memory could potentially reduce the system's flexibility in managing memory, which might lead to performance issues.
VmPin
¶
Meaning: Pinned memory size. This is the portion of memory that is "pinned" in place; a memory page's physical location can't be changed within RAM automatically or swapped out to disk.
Interpretation: Like locked memory, pinned memory can't be moved, so a high value could also limit system flexibility.
VmHWM
¶
Meaning: Peak resident set size ("high water mark"). This is the maximum amount of physical RAM that the microservice has used.
Interpretation: If this value is consistently high, it might indicate that the service needs optimization or that you need to allocate more physical RAM.
VmRSS
¶
Meaning: Resident set size. This shows the portion of the microservice's memory that is held in RAM.
Interpretation: A high RSS value could mean your service is using a lot of RAM, potentially leading to performance issues if it starts to swap.
VmData
, VmStk
, VmExe
¶
Meaning: Size of data, stack, and text segments. These values represent the sizes of different memory segments: data, stack, and executable code.
Interpretation: Monitoring these can help you understand the memory footprint of your service and can be useful for debugging or optimizing your code.
VmLib
¶
Meaning: Shared library code size. This counts executable pages with a VmExe subtracted, and shows the amount of memory used by shared libraries in the process.
Interpretation: If this is high, you may want to check whether all the libraries are necessary, as they add to the memory footprint.
VmPTE
¶
Meaning: Page table entries size. This indicates the size of the page table, which maps virtual memory to physical memory.
Interpretation: A large size might signify that a lot of memory is being used, which could be an issue if it grows too much.
VmSize
¶
Meaning: Size of second-level page tables. This is an extension of VmPTE, indicating the size of the second-level page tables.
Interpretation: Like VmPTE, monitoring this size helps in identifying potential memory issues.
VmSwap
¶
Meaning: Swapped-out virtual memory size. This indicates the amount of virtual memory that has been swapped out to disk. shmem
swap is not included.
Interpretation: Frequent swapping is generally bad for performance; thus, if this metric is high, you may need to allocate more RAM or optimize your services.
mem
¶
Additional masurements regarding memory. Visit the InfluxDB Telegraf plugin documentation for details.
-
Tags:
node_id
-
active: Memory currently in use or very recently used, and thus not immediately available for eviction.
- available: The amount of memory that is readily available for new processes without swapping.
- available_percent: The percentage of total memory that is readily available for new processes.
- buffered: Memory used by the kernel for things like file system metadata, distinct from caching.
- cached: Memory used to store recently used data for quick access, not immediately freed when processes no longer require it.
- commit_limit: The total amount of memory that can be allocated to processes, including both RAM and swap space.
- committed_as: The total amount of memory currently allocated by processes, even if not used.
- dirty: Memory pages that have been modified but not yet written to their respective data location in storage.
- free: The amount of memory that is currently unoccupied and available for use.
- high_free: The amount of free memory in the system's high memory area (memory beyond direct kernel access).
- high_total: The total amount of system memory in the high memory area.
- huge_page_size: The size of each huge page (larger-than-standard memory pages used by the system).
- huge_pages_free: The number of huge pages that are not currently being used.
- huge_pages_total: The total number of huge pages available in the system.
- inactive: Memory that has not been used recently and can be made available for other processes or disk caching.
- low_free: The amount of free memory in the system's low memory area (memory directly accessible by the kernel).
- low_total: The total amount of system memory in the low memory area.
- mapped: Memory used for mapped files, such as libraries and executable files in memory.
- page_tables: Memory used by the kernel to keep track of virtual memory to physical memory mappings.
- shared: Memory used by multiple processes, or shared between processes and the kernel.
- slab: Memory used by the kernel for caching data structures.
- sreclaimable: Part of the slab memory that can be reclaimed, such as caches that can be freed if necessary.
- sunreclaim: Part of the slab memory that cannot be reclaimed under memory pressure.
- swap_cached: Memory that has been swapped out to disk but is still in RAM.
- swap_free: The amount of swap space currently not being used.
- swap_total: The total amount of swap space available.
- total: The total amount of physical RAM available in the system.
- used: The amount of memory that is currently being used by processes.
- used_percent: The percentage of total memory that is currently being used.
- vmalloc_chunk: The largest contiguous block of memory available in the kernel's vmalloc space.
- vmalloc_total: The total amount of memory available in the kernel's vmalloc space.
- vmalloc_used: The amount of memory currently used in the kernel's vmalloc space.
- write_back: Memory which is currently being written back to the disk.
- write_back_tmp: Temporary memory used during write-back operations.
Kernel-specific metrics¶
kernel
¶
Metrics to monitor the Linux kernel. Visit the InfluxDB Telegraf plugin documentation for more details.
Tags: node_id
- boot_time: The time when the system was last booted, measured in seconds since the Unix epoch (January 1, 1970). This tells you the system uptime and time of last restart. You can convert this number to a date using a (Unix epoch time converter).
- context_switches: The number (count, integer) of context switches the kernel has performed. A context switch occurs when the CPU switches from one process or thread to another. A high number of context switches can indicate that many processes are competing for CPU time, which can be a sign of high system load.
- entropy_avail: The amount (integer) of available entropy (randomness that can be generated) in the system, which is essential for secure random number generation. Low entropy can affect cryptographic functions and secure communications. Entropy is consumed by various operations and replenished over time, so monitoring this metric is important for maintaining security.
- interrupts: The total number (count, integer) of interrupts processed since boot. An interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. High numbers of interrupts can indicate a busy or possibly overloaded system.
- processes_forked: The total number (count, integer) of processes that have been forked (created) since the system was booted. Tracking the rate of process creation can help in diagnosing system performance issues, especially in environments where processes are frequently started and stopped.
kernel_vmstat
¶
Kernel virtual memory statistics gathered via proc/vmstat
. Visit the InfluxDB Telegraf plugin documentation for more details.
Relevant terms
- Active pages: Pages currently in use or recently used.
- Inactive pages: Pages not recently used, and therefore more likely to be moved to swap space or reclaimed.
- Anonymous pages: Memory pages not backed by a file on disk; typically used for data that does not need to be persisted, such as program stacks.
- Bounce buffer: Temporary memory used to facilitate data transfers between devices that cannot directly address each other’s memory.
- Compaction: The process of rearranging pages in memory to create larger contiguous free spaces, often useful for allocating huge pages.
- Dirty pages: Pages that have been modified in memory but have not yet been written back to disk.
- Evict: The process of removing pages from physical memory, either by moving them to disk (swapping out) or discarding them if they are no longer needed.
- File-backed pages: Memory pages that are associated with files on the disk, such as executable files or data files.
- Free pages: Memory pages that are available for use and not currently allocated to any process or data.
- Huge pages: Large memory pages that can be used by processes, reducing the overhead of page tables.
- Interleave: The process of distributing memory pages across different memory nodes or zones, typically to optimize performance in systems with non-uniform memory access (NUMA).
- NUMA (non-uniform memory access): A memory design where a processor accesses its own local memory faster than non-local memory.
- Page allocation: The process of assigning free memory pages to fulfill a request by a process or the kernel.
- Page fault: An event that occurs when a program tries to access a page that is not in physical memory, requiring the OS to handle this by allocating a page or retrieving it from disk.
- Page table: Data structure used by the operating system to store the mapping between virtual addresses and physical memory addresses.
- Shared memory (shmem): Memory that can be accessed by multiple processes.
- Slab pages: Memory pages used by the kernel to store objects of fixed sizes, such as file structures or inode caches.
- Swap space: A space on the disk used to store memory pages that have been evicted from physical memory.
- THP (transparent huge pages): A feature that automatically manages the allocation of huge pages to improve performance without requiring changes to applications.
- Vmscan: A kernel process that scans memory pages and decides which pages to evict or swap out based on their usage.
- Writeback: The process of writing dirty pages back to disk.
Tags: node_id
- nr_free_pages: Number of free pages in the system.
- nr_inactive_anon: Number of inactive anonymous pages.
- nr_active_anon: Number of active anonymous pages.
- nr_inactive_file: Number of inactive file-backed pages.
- nr_active_file: Number of active file-backed pages.
- nr_unevictable: Number of pages that cannot be evicted from memory.
- nr_mlock: Number of pages locked into memory (mlock).
- nr_anon_pages: Number of anonymous pages.
- nr_mapped: Number of pages mapped into userspace.
- nr_file_pages: Number of file-backed pages.
- nr_dirty: Number of pages currently dirty.
- nr_writeback: Number of pages under writeback.
- nr_slab_reclaimable: Number of reclaimable slab pages.
- nr_slab_unreclaimable: Number of unreclaimable slab pages.
- nr_page_table_pages: Number of pages used for page tables.
- nr_kernel_stack: Amount of kernel stack pages.
- nr_unstable: Number of unstable pages.
- nr_bounce: Number of bounce buffer pages.
- nr_vmscan_write: Number of pages written by vmscan.
- nr_writeback_temp: Number of temporary writeback pages.
- nr_isolated_anon: Number of isolated anonymous pages.
- nr_isolated_file: Number of isolated file pages.
- nr_shmem: Number of shared memory pages.
- numa_hit: Number of pages allocated in the preferred node.
- numa_miss: Number of pages allocated in a non-preferred node.
- numa_foreign: Number of pages intended for another node.
- numa_interleave: Number of interleaved hit pages.
- numa_local: Number of pages allocated on the local node.
- numa_other: Number of pages allocated on other nodes.
- nr_anon_transparent_hugepages: Number of anonymous transparent huge pages.
- pgpgin: Number of kilobytes read from disk.
- pgpgout: Number of kilobytes written to disk.
- pswpin: Number of pages swapped in.
- pswpout: Number of pages swapped out.
- pgalloc_dma: Number of DMA zone pages allocated.
- pgalloc_dma32: Number of DMA32 zone pages allocated.
- pgalloc_normal: Number of normal zone pages allocated.
- pgalloc_movable: Number of movable zone pages allocated.
- pgfree: Number of pages freed.
- pgactivate: Number of inactive pages activated.
- pgdeactivate: Number of active pages deactivated.
- pgfault: Number of page faults.
- pgmajfault: Number of major page faults.
- pgrefill_dma: Number of DMA zone pages refilled.
- pgrefill_dma32: Number of DMA32 zone pages refilled.
- pgrefill_normal: Number of normal zone pages refilled.
- pgrefill_movable: Number of movable zone pages refilled.
- pgsteal_dma: Number of DMA zone pages reclaimed.
- pgsteal_dma32: Number of DMA32 zone pages reclaimed.
- pgsteal_normal: Number of normal zone pages reclaimed.
- pgsteal_movable: Number of movable zone pages reclaimed.
- pgscan_kswapd_dma: Number of DMA zone pages scanned by kswapd.
- pgscan_kswapd_dma32: Number of DMA32 zone pages scanned by kswapd.
- pgscan_kswapd_normal: Number of normal zone pages scanned by kswapd.
- pgscan_kswapd_movable: Number of movable zone pages scanned by kswapd.
- pgscan_direct_dma: Number of DMA zone pages directly scanned.
- pgscan_direct_dma32: Number of DMA32 zone pages directly scanned.
- pgscan_direct_normal: Number of normal zone pages directly scanned.
- pgscan_direct_movable: Number of movable zone pages directly scanned.
- zone_reclaim_failed: Number of failed zone reclaim attempts.
- pginodesteal: Number of inodes pages reclaimed.
- slabs_scanned: Number of slab pages scanned.
- kswapd_steal: Number of pages reclaimed by kswapd.
- kswapd_inodesteal: Number of inode pages reclaimed by kswapd.
- kswapd_low_wmark_hit_quickly: Frequency of kswapd hitting low watermark quickly.
- kswapd_high_wmark_hit_quickly: Frequency of kswapd hitting high watermark quickly.
- kswapd_skip_congestion_wait: Number of times kswapd skipped wait due to congestion.
- pageoutrun: Number of pageout pages processed.
- allocstall: Number of times page allocation stalls.
- pgrotated: Number of pages rotated.
- compact_blocks_moved: Number of blocks moved during compaction.
- compact_pages_moved: Number of pages moved during compaction.
- compact_pagemigrate_failed: Number of page migrations failed during compaction.
- compact_stall: Number of stalls during compaction.
- compact_fail: Number of compaction failures.
- compact_success: Number of successful compactions.
- htlb_buddy_alloc_success: Number of successful HTLB buddy allocations.
- htlb_buddy_alloc_fail: Number of failed HTLB buddy allocations.
- unevictable_pgs_culled: Number of unevictable pages culled.
- unevictable_pgs_scanned: Number of unevictable pages scanned.
- unevictable_pgs_rescued: Number of unevictable pages rescued.
- unevictable_pgs_mlocked: Number of unevictable pages mlocked.
- unevictable_pgs_munlocked: Number of unevictable pages munlocked.
- unevictable_pgs_cleared: Number of unevictable pages cleared.
- unevictable_pgs_stranded: Number of unevictable pages stranded.
- unevictable_pgs_mlockfreed: Number of mlock-freed unevictable pages.
- thp_fault_alloc: Number of times a fault caused THP allocation.
- thp_fault_fallback: Number of times a fault fell back from THP.
- thp_collapse_alloc: Number of THP collapses allocated.
- thp_collapse_alloc_failed: Number of failed THP collapse allocations.
- thp_split: Number of THP splits.
Tenant metrics¶
You can investigate the health and status of microservices on a tenant-specific basis if you have multiple LogMan.io tenants in your system. Tenant metrics are specific to LogMan.io Parser, Dispatcher, Correlator, and Watcher microservices.
Naming and tags in Grafana and InfluxDB
- Tenant metrics groups are under the
measurement
tag. - Tenant metrics are produced for select microservices (tag
appclass
) and can be further filtered with the additional tagshost
andpipeline
. - Each individual metric (for example,
eps.in
) is a value in thefield
tag.
The tags are pipeline
(ID of the pipeline), host
(hostname of the microservice) and tenant
(the lowercase name of the tenant). Visit the Pipeline metrics page for more in-depth explanations and guides for interpreting each metric.
bspump.pipeline.tenant.eps
¶
A counter metric with following values, updated once per minute:
eps.in
: The tenant's events per second entering the pipeline.eps.aggr
: The tenant's aggregated events (number is multiplied bycnt
attribute in events) per second entering the pipeline.eps.drop
: The tenant's events per second dropped in the pipeline.eps.out
: The tenant's events per second successfully leaving the pipeline.warning
: The tenant's number of warnings produced in the pipeline in the specified time interval.error
: the tenant's number of errors produced in the pipeline in the specified time interval.
In LogMan.io Parser, the most relevant metrics come from ParsersPipeline
(when the data first enters the Parser and gets parsed via preprocessors and parsers) and EnrichersPipeline
. In LogMan.io Dispatcher, the most relevant metrics come from EventsPipeline
and OthersPipeline
.
bspump.pipeline.tenant.load
¶
A counter metric with following values, updated once per minute:
load.in
: The tenant's byte size of all events entering the pipeline in the specified time interval.load.out
: the tenant's byte size of all events leaving the pipeline in the specified time interval.
Correlator metrics¶
The following metrics are specific for LogMan.io Correlator. Detections (also known as correlation rules) are based on the Correlator microservice.
Naming and tags in Grafana and InfluxDB
- Correlator metrics groups are under the
measurement
tag. - Correlator metrics are only produced for the Correlator microservice (tag
appclass
) and can be further filtered with the additional tagscorrelator
to isolate a single correlator, andhost
. - Each individual metric (for example,
in
) is a value in thefield
tag.
correlator.predicate
¶
A counter metric that counts how many events went through the predicate
section, or filter, of a detection. Each metric updates once per minute, so time interval refers to the period of about one minute.
in
: Number events entering the predicate in the time interval.hit
: Number events successfully matching the predicate (fulfilling the conditions of the filter) in the time interval.miss
: Number events missing the predicate in the time interval (not fulfilling the conditions of the filter) and thus leaving the Correlator.error
: Number of errors in the predicate in the time interval.
correlator.trigger
¶
A counter metric that counts how many events went through the trigger
section of the correlator. The trigger defines and carries out an action. Each metric updates once per minute, so time interval refers to the period of about one minute.
in
: Number events entering the trigger in the time interval.out
: Number events leaving the trigger in the time interval.error
: Number of errors in the trigger in the time interval, should be equal toin
minusout
.
Ended: Metrics
Ended: System monitoring
Ended: Administration Manual
Reference ↵
TeskaLabs LogMan.io Reference¶
Welcome to the Reference Guide. You can find definitions and details of every LogMan.io component here.
Collector ↵
LogMan.io Collector¶
TeskaLabs LogMan.io Collector is a microservice responsible for collecting logs and other events from various inputs and sending them to LogMan.io Receiver.
- Before you proceed, see Configuration for setup instructions.
- For the setup of event collection from various log sources, see the Log sources subtopic.
- For the detailed configuration options, see Inputs, Transformations and Outputs.
- To mock logs, see Mirage.
- For the communication details between Collector and Receiver, see LogMan.io Receiver documentation.
LogMan.io Collector configuration¶
LogMan.io Collector configuration typically consists of two files.
- Collector configuration (
/conf/lmio-collector.conf
, INI format) specifies the path for pipeline configuration(s) and possibly other application-level configuration options. - Pipeline configuration (
/conf/lmio-collector.yaml
, YAML format) specifies from which inputs the data is collected (inputs), how the data is transformed (transforms) and how the data is sent further (outputs).
Collector configuration¶
[config]
path=/conf/lmio-collector.yaml
Pipeline configuration¶
Pipeline configuration is in a YAML format. Multiple pipelines can be configured in the same pipeline configuration file.
Every section represents one component of the pipeline. It always starts with either input:
, transform:
, output:
or connection:
and has the form:
input|transform|output:<TYPE>:<ID>
where <TYPE>
determines the component type. <ID>
is used for reference and can be chosen in any way.
- Input specifies a source/input of logs.
- Output specifies output where to ship logs.
- Connection specifies the connection that can be used by
output
. - Transform specifies a transformation action to be applied on logs (optional).
Typical pipeline configuration for LogMan.io Receiver:
# Connection to LogMan.io (central part)
connection:CommLink:commlink:
url: https://recv.logman.example.com/
# Input
input:Datagram:udp-10002-src:
address: 0.0.0.0 10002
output: udp-10002
# Output
output:CommLink:udp-10002: {}
For the detailed configuration options of each component, see Inputs, Transformations and Outputs chapters. See LogMan.io Receiver documentation for the CommLink connection details.
Docker Compose¶
version: '3'
services:
lmio-collector:
image: docker.teskalabs.com/lmio/lmio-collector
container_name: lmio-collector
volumes:
- ./lmio-collector/conf:/conf
- ./lmio-collector/var:/app/lmio-collector/var
network_mode: host
restart: always
LogMan.io Collector Inputs¶
Note
This chapter concerns setup for log sources collected over network, syslog, files, databases, etc. For the setup of event collection from various log sources, see the Log sources subtopic.
Network¶
Sections: input:TCP
, input:Stream
, input:UDP
, input:Datagram
These inputs listen on a given address using TCP, UDP or Unix Socket.
Tip
Logs should be collected through TCP protocol. Only if it is not possible, use UDP protocol.
The configuration options for listening:
address: # Specify IPv4, IPv6 or UNIX file path to listen from
output: # Which output to send the incoming events to
Here are possible form of address
:
8080
or*:8080
: Listen on a port 8080 all available network interfaces on IPv4 and IPv60.0.0.0:8080
: Listen on a port 8080 all available network interfaces on IPv4:::8080
: Listen on a port 8080 all available network interfaces on IPv61.2.3.4:8080
: Listen on a port 8080 and specific network interface (1.2.3.4
) on IPv4::1:8080
: Listen on a port 8080 and specific network interface (::1
) on IPv6/tmp/unix.sock
: Listen on a UNIX socket/tmp/unix.sock
The following configuration options are available only for input:Datagram
:
max_packet_size: # (optional) Specify the maximum size of packets in bytes (default: 65536)
receiver_buffer_size: # (optional) Limit the receiver size of the buffer in bytes (default: 0)
Warning
LogMan.io Collector runs inside Docker container. Propagation of network ports must be enabled like this:
services:
lmio-collector-tenant:
network_mode: host
Note
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are both protocols used for sending data over the network.
TCP is a Stream, as it provides a reliable, ordered, and error-checked delivery of a stream of data.
In contrast, UDP is a datagram that sends packets independently, allowing faster transmission but with less reliability and no guarantee of order, much like individual, unrelated messages.
Tip
For troubleshooting, use tcpdump
to capture raw network traffic and then use Wireshark for deeper analysis.
The example of capturing the traffic at TCP/10008 port:
$ sudo tcpdump -i any tcp port 10008 -s 0 -w /tmp/capute.pcap -v
When enough traffic is captured, press Ctrl-C and collect the file /tmp/capture.pcap
that contains the traffic capture.
This file can be opened in Wireshark.
Syslog¶
Sections: input:TCPBSDSyslogRFC6587
, input:TCPBSDSyslogNoFraming
Special cases of TCP input for parsing SysLog via TCP. For more information, see RFC 6587 and RFC 3164, section 4.1.1
The configuration options for listening on a given path:
address: # Specify IPv4, IPv6 or UNIX file path to listen from (f. e. 127.0.0.1:8888 or /data/mysocket)
output: # Which output to send the incoming events to
The following configuration options are available only for input:TCPBSDSyslogRFC6587
:
max_sane_msg_len: # (optional) Maximum size in bytes of SysLog message to be received (default: 10000)
The following configuration options are available only for input:TCPBSDSyslogNoFraming
:
buffer_size: # (optional) Maximum size in bytes of SysLog message to be received (default: 64 * 1024)
variant: # (optional) The variant of SysLog format of the incoming message, can be `auto`, `nopri` with no PRI number in the beginning and `standard` with PRI (default: auto)
Subprocess¶
Section: input:SubProcess
The SubProcess input runs a command as a subprocess of the LogMan.io collector, while
periodically checking for its output at stdout
(lines) and stderr
.
The configuration options include:
command: # Specify the command to be run as subprocess (f. e. tail -f /data/tail.log)
output: # Which output to send the incoming events to
line_len_limit: # (optional) The length limit of one read line (default: 1048576)
ok_return_codes: # (optional) Which return codes signify the running status of the command (default: 0)
File tailing¶
Section: input:SmartFile
Smart File Input is used for collecting events from multiple files whose content may be dynamically modified,
or the files may be deleted altogether by another process, similarly to the tail -f
shell command.
Smart File Input creates a monitored file object for every file path, that is specified in the configuration in the path
options.
The monitored file periodically checks for new lines in the file, and if one occurs, the line is read in bytes and passed further to the pipeline, including meta information such as file name and extracted parts of file path, see extract parameters section.
Various protocols are used for reading from different log file formats:
- Line Protocol for line-oriented log files
- XML Protocol for XML-oriented log files
- W3C Extended Log File Protocol for log files in W3C Extended Log File Format
- W3C DHCP Server Protocol for DHCP Server log files
Required configuration options:
input:SmartFile:MyFile:
path: | # File paths separated by newlines
/first/path/to/log/files/*.log
/second/path/to/log/files/*.log
/another/path/*s
protocol: # Protocol to be used for reading
Optional configuration options:
recursive: # Recursive scanning of specified paths (default: True)
scan_period: # File scan period in seconds (default: 3 seconds)
preserve_newline: # Preserve new line character in the output (default: False)
last_position_storage: # Persistent storage for the current positions in read files (default: ./var/last_position_storage)
Tip
In more complex setup, such as an extraction of logs from the Windows shared folder, you can utilize rsync
to synchronize logs from the shared folder to a local folder at the collector machine. Then Smart File Input reads logs from the local folder.
Warning
Internally, the current position in the file is stored in last position storage in position variable. If the last position storage file is deleted or not specified, all files are read all over again after the LogMan.io Collector restarts, i.e. no persistence means reset of the reading when restarting.
You can configure path for last position storage:
last_position_storage: "./var/last_position_storage"
Warning
If the file size is lower than the previous remembered file size, the file is read as a whole over again and sent to the pipeline split to lines.
File paths¶
File path globs are separated by newlines. They can contain wildcards (such as *, **
, etc.).
path: |
/first/path/*.log
/second/path/*.log
/another/path/*
By default, files are read recursively. You can disable recursive reading with:
recursive: False
Line Protocol¶
protocol: line
line/C_separator: # (optional) Character used for line separator. Default: '\n'.
Line Protocol is used for reading messages from line-oriented log files.
XML Protocol¶
protocol: xml
tag_separator: '</msg>' # (required) Tag for separator.
XML Protocol is used for reading messages from XML-oriented log files.
Parameter tag_separator
must be included in configuration.
Example
Example of XML log file:
...
<msg time='2024-04-16T05:47:39.814+02:00' org_id='orgid'>
<txt>Log message 1</txt>
</msg>
<msg time='2024-04-16T05:47:42.814+02:00' org_id='orgid'>
<txt>Log message 2</txt>
</msg>
<msg time='2024-04-16T05:47:43.018+02:00' org_id='orgid'>
<txt>Log message 3</txt>
</msg>
...
Example configuration:
input:SmartFile:Alert:
path: /xml-logs/*.xml
protocol: xml
tag_separator: "</msg>"
W3C Extended Log File Protocol¶
protocol: w3c_extended
W3C Extended Log File Protocol is used for collecting events from files in W3C Extended Log File Format and serializing them into JSON format.
Example of event collection from Microsoft Exchange Server
LogMan.io Collector Configuration example:
input:SmartFile:MSExchange:
path: /MicrosoftExchangeServer/*.log
protocol: w3c_extended
extract_source: file_path
extract_regex: ^(?P<file_path>.*)$
Example of log file content:
#Software: Microsoft Exchange Server
#Version: 15.02.1544.004
#Log-type: DNS log
#Date: 2024-04-14T00:02:48.540Z
#Fields: Timestamp,EventId,RequestId,Data
2024-04-14T00:02:38.254Z,,9666704,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:46955)"
2024-04-14T00:02:38.254Z,,7204389,"SendToServer 122.120.99.11(1), AAAA exchange.bradavice.cz, (query id:11737)"
2024-04-14T00:02:38.254Z,,43150675,"Send completed. Error=Success; Details=id=46955; query=AAAA exchange.bradavice.cz; retryCount=0"
...
W3C DHCP Server Format¶
protocol: w3c_dhcp
W3C DHCP Protocol is used for collecting events from DHCP Server log files. It is very similar to W3C Extended Log File Format with the difference in log file header.
Table of W3C DHCP events identification
Event ID | Meaning |
---|---|
00 | The log was started. |
01 | The log was stopped. |
02 | The log was temporarily paused due to low disk space. |
10 | A new IP address was leased to a client. |
11 | A lease was renewed by a client. |
12 | A lease was released by a client. |
13 | An IP address was found to be in use on the network. |
14 | A lease request could not be satisfied because the scope's address pool was exhausted. |
15 | A lease was denied. |
16 | A lease was deleted. |
17 | A lease was expired and DNS records for an expired leases have not been deleted. |
18 | A lease was expired and DNS records were deleted. |
20 | A BOOTP address was leased to a client. |
21 | A dynamic BOOTP address was leased to a client. |
22 | A BOOTP request could not be satisfied because the scope's address pool for BOOTP was exhausted. |
23 | A BOOTP IP address was deleted after checking to see it was not in use. |
24 | IP address cleanup operation has began. |
25 | IP address cleanup statistics. |
30 | DNS update request to the named DNS server. |
31 | DNS update failed. |
32 | DNS update successful. |
33 | Packet dropped due to NAP policy. |
34 | DNS update request failed as the DNS update request queue limit exceeded. |
35 | DNS update request failed. |
36 | Packet dropped because the server is in failover standby role or the hash of the client ID does not match. |
50+ | Codes above 50 are used for Rogue Server Detection information. |
Example of event collection from DHCP Server
LogMan.io Collector Configuration example:
input:SmartFile:DHCP-Server-Input:
path: /DHCPServer/*.log
protocol: w3c_dhcp
extract_source: file_path
extract_regex: ^(?P<file_path>.*)$
Example of DHCP Server log file content:
DHCP Service Activity Log
Event ID Meaning
00 The log was started.
01 The log was stopped.
...
50+ Codes above 50 are used for Rogue Server Detection information.
ID,Date,Time,Description,IP Address,Host Name,MAC Address,User Name, TransactionID, ...
24,04/16/24,00:00:21,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
24,04/16/24,00:00:22,Database Cleanup Begin,,,,,0,6,,,,,,,,,0
...
For instance, ignore_older_than
limit for files being red can be set to ignore_older_than: 20d
or ignore_older_than: 100s
.
Extract parameters¶
There are also options for the extraction of information from the file name or file path using a regular expression.
The extracted parts are then stored as meta data (which implicitly include unique meta ID and file name).
The configuration options start with extract_
prefix and include the following:
extract_source: # (optional) file_name or file_path (default: file_path)
extract_regex: # (optional) regex to extract field names from the extract source (disabled by default)
The extract_regex
must contain named groups. The group names are used as field keys for the extracted information.
Unnamed groups produce no data.
Example of extracting metadata from regex
Collecting from a file /data/myserver.xyz/tenant-1.log
The following configuration:
extract_regex: ^/data/(?P<dvchost>\w+)/(?P<tenant>\w+)\.log$
will produce metadata:
{
"meta": {
"dvchost": "myserver.xyz",
"tenant": "tenant-1"
}
}
The following in a working example of configuration of SmartFile
input with extraction
of attributes from file name using regex, and associated File
output:
input:SmartFile:SmartFileInput:
path: ./etc/tail.log
extract_source: file_name
extract_regex: ^(?P<dvchost>\w+).log$
output: FileOutput
output:File:FileOutput:
path: /data/my_path.txt
prepend_meta: true
debug: true
Prepending information¶
prepend_meta: true
Prepends the meta information such as the extracted field names to the log line/event as key-values pairs separated by spaces.
Ignore old changes¶
The following configuration options enable to check that modification time of files being read is not older than the specified limit.
ignore_older_than: # (optional) Limit in days, hours, minutes or seconds to read only files modified after the limit (default: "", f. e. "1d", "1h", "1m", "1s")
File¶
Section: input:File
, input:FileBlock
, input:XML
These inputs read specified files by lines (input:File
) or as a whole block (input:FileBlock
, input:XML
)
and pass their content further to the pipeline.
Depending on the mode, the files may be then renamed to <FILE_NAME>-processed
and if more of them are specified using a wildcard, another file will be open,
read and processed in the same way.
The available configuration options for opening, reading and processing the files include:
path: # Specify the file path(s), wildcards can be used as well (f. e. /data/lines/*)
chilldown_period: # If more files or wildcard is used in the path, specify how often in seconds to check for new files (default: 5)
output: # Which output to send the incoming events to
mode: # (optional) The mode by which the file is going to be read (default: 'rb')
newline: # (optional) File line separator (default is value of os.linesep)
post: # (optional) Specifies what should happen with the file after reading - delete (delete the file), noop (no renaming), move (rename to `<FILE_NAME>-processed`, default)
exclude: # (optional) Path of filenames that should be excluded (has precedence over 'include')
include: # (optional) Path of filenames that should be included
encoding: # (optional) Charset encoding of the file's content
move_destination: # (optional) Destination folder for post 'move', make sure it is outside of the path specified above
lines_per_event: # (optional) The number of lines after which the read method enters the idle state to allow other operations to perform their tasks (default: 10000)
event_idle_time: # (optional) The time in seconds for which the read method enters the idle state, see above (default: 0.01)
ODBC¶
Section: input:ODBC
Provides input via ODBC driver connection to collect logs form various databases.
Configuration options related to the connection establishment:
host: # Hostname of the database server
port: # Port where the database server is running
user: # Username to loging to the databse server (usually a technical/access account)
password: # Password for the user specified above
driver: # Pre-installed ODBC driver (see list below)
db: # Name of the databse to access
connect_timeout: # (optional) Connection timeout in seconds for the ODBC pool (default: 1)
reconnect_delay: # (optional) Reconnection delay in seconds after timeout for the ODBC pool (default: 5.0)
output_queue_max_size: # (optional) Maximum size of the output queue, i. e. in-memory storage (default: 10)
max_bulk_size: # (optional) Maximum size of one bulk composed of the incoming records (default 2)
output: # Which output to send the incoming events to
Configuration options related to querying the database:
query: # Query to periodically call the database
chilldown_period: # Specify in seconds how often the query above will be called (default: 5)
last_value_enabled: # Enable last value duplicity check (true/false)
last_value_table: # Specify table for SELECT max({}) from {};
last_value_column: # The column in query to be used for obtainment of last value
last_value_storage: # Persistent storage for the current last value (default: ./var/last_value_storage)
last_value_query: # (optional) To specify the last value query entirely (in case this option is set, last_value_table will not be considered)
last_value_start: # (optional) The first value to start from (default: 0)
Apache Kafka¶
Section: input:Kafka
This option is available from version v22.32
Creates a Kafka consumer for the specific .topic(s).
Configuration options related to the connection establishment:
bootstrap_servers: # Kafka nodes to read messages from (such as `kafka1:9092,kafka2:9092,kafka3:9092`)
Configuration options related to the Kafka Consumer setting:
topic: # Name of the topics to read messages from (such as `lmio-events` or `^lmio.*`)
group_id: # Name of the consumer group (such as: `collector_kafka_consumer`)
refresh_topics: # (optional) If more topics matching the topic name are expected to be created during consumption, this options specifies in seconds how often to refresh the topics' subscriptions (such as: `300`)
bootstrap_servers
, topic
and group_id
options are always required
topic
can be a name, a list of names separated by spaces or a simple regex (to match all available topics, use ^.*
)
For more configuration options, please refer to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
LogMan.io Collector Outputs¶
The collector output is specified as follows:
output:<output-type>:<output-name>:
debug: false
...
Common output options¶
In every output, meta information can be specified as dictionary in meta
attribute.
meta:
my_meta_tag: my_meta_tag_value # (optional) Custom meta information, that will be later available in LogMan.io Parser in event's context
tenant
meta information can be specified in the output's config directly.
Debugging¶
debug
(optional)
Specify if to write output also to the log for debugging.
Default: false
Prepend the meta information¶
prepend_meta
(optional)
Prepend the meta information to the the incoming event as key-values pairs separated by spaces.
Default: false
Note
Meta information include file name or extracted information from it (in case of Smart File input), custom defined fields (see below) etc.
TCP Output¶
Outputs events over TCP to a server specified by the IP address and the Port.
output:TCP:<output-name>:
address: <IP address>:<Port>
...
Address¶
address
The server address consists of the IP address and the port.
Hint
IPv4 and IPv6 addresses are supported.
Maximum size of packets¶
max_packet_size
(optional)
Specify the maximum size of packets in bytes.
Default: 65536
Receiver size of the buffer¶
receiver_buffer_size
(optional)
Limit the receiver size of the buffer in bytes.
Default: 0
UDP Output¶
Outputs events over a UDP to the specified IP address and the Port.
output:UDP:<output-name>:
address: <IP address>:<Port>
...
Address¶
address
The server address consists of the IP address and the port.
Hint
IPv4 and IPv6 addresses are supported.
Maximum size of packets¶
max_packet_size
(optional)
Specify the maximum size of packets in bytes.
Default: 65536
Receiver size of the buffer¶
receiver_buffer_size
(optional)
Limit the receiver size of the buffer in bytes.
Default: 0
WebSocket Output¶
Outputs events over WebSocket to a specified URL.
output:WebSocket:<output-name>:
url: <Server URL>
...
URL¶
url
Specify WebSocket destination URL. For example http://example.com/ws
Tenant¶
tenant
Name of the tenant the LogMan.io Collector, the tenant name is forwarded to LogMan.io parser and put to the event.
Inactive time¶
inactive_time
(optional)
Specify inactive time in seconds, after which idle Web Sockets will be closed.
Default: 60
Output queue size¶
output_queue_max_size
(optional)
Specify in-memory outcoming queue size for every Web Socket
Path to store persistent files¶
buffer
(optional)
Path to store persistent files in, when the Web Socket connection is offline.
SSL configuration options¶
The following configuration options specify the SSL (HTTPS) connection:
cert
: Path to the client SSL certificatekey
: Path to the private key of the client SSL certificatepassword
: Private key file password (optional, default: none)cafile
: Path to a PEM file with CA certificate(s) to verify the SSL server (optional, default: none)capath
: Path to a directory with CA certificate(s) to verify the SSL server (optional, default: none)ciphers
: SSL ciphers (optional, default: none)dh_params
: Diffie–Hellman (D-H) key exchange (TLS) parameters (optional, default: none)verify_mode
: One of CERT_NONE, CERT_OPTIONAL or CERT_REQUIRED (optional); for more information, see: github.com/TeskaLabs/asab
File Output¶
Outputs events into a specified file.
output:File:<output-name>:
path: /data/output.log
...
Path¶
path
Path of the output file.
Hint
Make sure the location of the output file is accessible within the Docker container when using Docker.
Flags¶
flags
(optional)
One of O_CREAT
and O_EXCL
, where the first one tell the output to create the file if it does not exist.
Default: O_CREAT
Mode¶
mode
(optional)
The mode by which the file is going to be written to.
Default: ab
(append bytes).
Unix Socket (datagram)¶
Outputs events into a datagram-oriented Unix Domain Socket.
output:UnixSocket:<output-name>:
address: <path>
...
Address¶
address
The Unix socket file path, e.g. /data/myunix.socket
.
Maximum size of packets¶
max_packet_size
(optional)
Specify the maximum size of packets in bytes.
Default: 65536
Unix Socket (stream)¶
Outputs events into a stream-oriented Unix Domain Socket.
output:UnixStreamSocket:<output-name>:
address: <path>
...
Address¶
address
The Unix socket file path, e.g. /data/myunix.socket
.
Maximum size of packets¶
max_packet_size
(optional)
Specify the maximum size of packets in bytes.
Default: 65536
Print Output¶
Helper output that print events to the terminal.
output:Print:<output-name>:
...
Null Output¶
Helper outputs that discard events.
output:Null:<output-name>:
...
Log sources ↵
Collecting events from Apache Kafka¶
TeskaLabs LogMan.io Collector is able to collect events from Apache Kafka, namely its topics. The events stored in Kafka may contain any data encoded in bytes, such as logs about various user, admin, system, device and policy actions.
Prerequisites¶
In order to create a Kafka consumer, the boostrap_servers
, that is the location of the Kafka nodes, need to be known as well as the topic
where to read the data from.
LogMan.io Collector Configuration¶
The LogMan.io Collector provides input:Kafka:
input section, that needs to be specified in the YAML configuration. The configuration looks as follows:
input:Kafka:KafkaInput:
bootstrap_servers: <BOOTSTRAP_SERVERS>
topic: <TOPIC>
group_id: <GROUP_ID>
...
The input creates a Kafka consumer for the specific topic(s).
Configuration options related to the connection establishment:
bootstrap_servers: # Kafka nodes to read messages from (such as `kafka1:9092,kafka2:9092,kafka3:9092`)
Configuration options related to the Kafka Consumer setting:
topic: # Name of the topics to read messages from (such as `lmio-events` or `^lmio.*`)
group_id: # Name of the consumer group (such as: `collector_kafka_consumer`)
refresh_topics: # (optional) If more topics matching the topic name are expected to be created during consumption, this options specifies in seconds how often to refresh the topics' subscriptions (such as: `300`)
Options bootstrap_servers
, topic
and group_id
are always required!
topic
can be a name, a list of names separated by spaces or a simple regex (to match all available topics, use ^.*
)
For more configuration options, please refer to librdkafka configuration guide.
Collecting events from Google Cloud PubSub¶
Info
This option is available from version v23.27
onwards.
TeskaLabs LogMan.io Collector can collect events from Google Cloud PubSub using a native asynchronous consumer.
Google Cloud PubSub Documentation
Google Cloud Pull Subscription Explanation
Prerequisites¶
In Pub Sub, the following information need to be gathered:
1.) The name of the project the messages are to be consumed from
How to create a topic in a project
2.) the subscription name created in the topic the messages are to be consumed from
How to create a PubSub subscription
3.) Service account file with a private key to authorize to the given topic and subscription
How to create a service account
LogMan.io Collector Input setup¶
Google Cloud PubSub Input¶
The input named input:GoogleCloudPubSub:
needs to be provided in the LogMan.io Collector YAML configuration:
input:GoogleCloudPubSub:GoogleCloudPubSub:
subscription_name: <NAME_OF_THE_SUBSCRIPTION_IN_THE_GIVEN_TOPIC>
project_name: <NAME_OF_THE_PROJECT_TO_CONSUME_FROM>
service_account_file: <PATH_TO_THE_SERVICE_ACCOUNT_FILE>
output: <OUTPUT>
<NAME_OF_THE_SUBSCRIPTION_IN_THE_GIVEN_TOPIC>
, <NAME_OF_THE_PROJECT_TO_CONSUME_FROM>
and <PATH_TO_THE_SERVICE_ACCOUNT_FILE>
must be provided from the Google Clould Pub Sub
The output is events as a byte stream with the following meta information: publish_time
, message_id
, project_name
and subscription_name
.
Commit¶
The commit/acknowledgement is done automatically after each individual bulk of messages is processed,
so the same messages are not set by PubSub repeatedly.
The default bulk is 5,000 messages and can be changed in the input configuration via max_messages
option:
max_messages: 10000
Collecting from Bitdefender¶
TeskaLabs LogMan.io can collect Bitdefender logs from requests made by Bitdefender as specified by the server API documentation.
LogMan.io Collector Configuration¶
On the LogMan.io server, where the logs are being forwarded to, run a LogMan.io Collector instance with the following configuration.
In the listen
section, set the appropriate port configured in the Log Forwarding in Bitdefender.
Bitdefender Server Configuration¶
input:Bitdefender:BitdefenderAPI:
listen: 0.0.0.0 <PORT_SET_IN_FORWARDING> ssl
cert: <PATH_TO_PEM_CERT>
key: <PATH_TO_PEM_KEY_CERT>
cafile: <PATH_TO_PEM_CA_CERT>
encoding: utf-8
output: <OUTPUT_ID>
output:xxxxxx:<OUTPUT_ID>:
...
Collecting from Cisco IOS based devices¶
This collecting method is designed to collect logs from Cisco products that operates IOS, such as Cisco Catalyst 2960 switch or Cisco ASR 9200 router.
Log configuration¶
Configure the remote address of a collector and the logging level:
CATALYST(config)# logging host <hostname or IP of the LogMan.io collector> transport tcp port <port-number>
CATALYST(config)# logging trap informational
CATALYST(config)# service timestamps log datetime year msec show-timezone
CATALYST(config)# logging origin-id <hostname>
Log format contains the following fields:
-
timestamp in the UTC format with:
- year month, day
- hour, minute, and second
- millisecond
-
hostname of the device
-
log level is set to informational
Example of the output
<189>36: CATALYST: Aug 22 2022 10:11:25.873 UTC: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (10.0.0.44)
Time synchronization¶
It is important that Cisco device time is synchronized using NTP.
Prerequisites are: * Internet connection (if you are using a public NTP server) * Configured name-server option (for a DNS query resolution)
LAB-CATALYST(config)# no clock timezone
LAB-CATALYST(config)# no ntp
LAB-CATALYST(config)# ntp server <hostname or IP of NTP server>
Example of the configuration with Google NTP server:
CATALYST(config)# no clock timezone
CATALYST(config)# no ntp
CATALYST(config)# do show ntp associations
%NTP is not enabled.
CATALYST(config)# ntp server time.google.com
CATALYST(config)# do show ntp associations
address ref clock st when poll reach delay offset disp
*~216.239.35.4 .GOOG. 1 58 64 377 15.2 0.58 0.4
* master (synced), # master (unsynced), + selected, - candidate, ~ configured
CATALYST(config)# do show clock
10:57:39.110 UTC Mon Aug 22 2022
Collecting from Citrix¶
TeskaLabs LogMan.io can collect Citrix logs using Syslog via log forwarding over TCP (recommended) or UDP communication.
Citrix ADC¶
If Citrix devices are being connected through ADC, there is the following guide on how to enable Syslog over TCP. Make sure you select the proper LogMan.io server and port to forward logs to.
F5 BIG-IP¶
If Citrix devices are connected to F5 BIG-IP, use the following guide. Make sure you select the proper LogMan.io server and port to forward logs to.
Configuring LogMan.io Collector¶
On the LogMan.io server, where the logs are being forwarded to, run a LogMan.io Collector instance with the following configuration.
Log Forwarding Via TCP¶
input:TCPBSDSyslogRFC6587:Citrix:
address: 0.0.0.0:<PORT_SET_IN_FORWARDING>
output: WebSocketOutput
output:WebSocket:WebSocketOutput:
url: http://<LMIO_SERVER>:<YOUR_PORT>/ws
tenant: <YOUR_TENANT>
debug: false
prepend_meta: false
Log Forwarding Via UDP¶
input:Datagram:Citrix:
address: 0.0.0.0:<PORT_SET_IN_FORWARDING>
output: WebSocketOutput
output:WebSocket:WebSocketOutput:
url: http://<LMIO_SERVER>:<YOUR_PORT>/ws
tenant: <YOUR_TENANT>
debug: false
prepend_meta: false
Collecting from Fortinet FortiGate¶
TeskaLabs LogMan.io can collect Fortinet FortiGate logs directly or through FortiAnalyzer via log forwarding over TCP (recommended) or UDP communication.
Forwards logs to LogMan.io¶
Both in FortiGate and FortiAnalyzer, the Syslog
type must be selected along with the appropriate port.
For precise guides, see the following link:
LogMan.io Collector Configuration¶
On the LogMan.io server, where the logs are being forwarded to, run a LogMan.io Collector instance with the following configuration.
In the address
section, set the appropriate port configured in the Log Forwarding in FortiAnalyzer.
Log Forwarding Via TCP¶
input:TCPBSDSyslogRFC6587:Fortigate:
address: 0.0.0.0:<PORT_SET_IN_FORWARDING>
output: <OUTPUT_ID>
output:xxxxxxx:<OUTPUT_ID>:
...
Log Forwarding Via UDP¶
input:Datagram:Fortigate:
address: 0.0.0.0:<PORT_SET_IN_FORWARDING>
output: <OUTPUT_ID>
output:xxxxxxx:<OUTPUT_ID>:
...
Collecting events from Microsoft Azure Event Hub¶
This option is available from version v22.45
onwards
TeskaLabs LogMan.io Collector can collect events from Microsoft Azure Event Hub through a native client or Kafka. The events stored in Azure Event Hub may contain any data encoded in bytes, such as logs about various user, admin, system, device, and policy actions.
Microsoft Azure Event Hub Setting¶
The following credentials need to be obtained for LogMan.io Collector to read the events: connection string
, event hub name
and consumer group
.
Obtain connection string from Microsoft Azure Event Hub¶
1) Sign in to the Azure portal with admin privileges to the respective Azure Event Hubs Namespace.
The Azure Event Hubs Namespace is available in the Resources
section.
2) In the selected Azure Event Hubs Namespace, click on Shared access policies
in the Settings
section in the left menu.
Click on Add
button, enter the name of the policy (the recommended name is: LogMan.io Collector), and a right popup window about the policy details should appear.
3) In the popup window, select the Listen
option to allow the policy to read from event hubs associated with the given namespace.
See the following picture.
4) Copy the Connection string-primary key
and click on Save
.
The policy should be visible in the table in the middle of the screen.
The connection string starts with Endpoint=sb://
prefix.
Obtain consumer group¶
5) In the Azure Event Hubs Namespace, select Event Hubs
option from the left menu.
6) Click on the event hub that contains events to be collected.
7) When in the event hub, click on the + Consumer group
button in the middle of the screen.
8) In the right popup window, enter the name of the consumer group (the recommended value is lmio_collector
) and click on Create
button.
9) Repeat this procedure for all event hubs meant to be consumed.
10) Write down the consumer group's name and all event hubs for the eventual LogMan.io Collector configuration.
LogMan.io Collector Input setup¶
Azure Event Hub Input¶
The input named input:AzureEventHub:
needs to be provided in the LogMan.io Collector YAML configuration:
input:AzureEventHub:AzureEventHub:
connection_string: <CONNECTION_STRING>
eventhub_name: <EVENT_HUB_NAME>
consumer_group: <CONSUMER_GROUP>
output: <OUTPUT>
<CONNECTION_STRING>
, <EVENT_HUB_NAME>
and <CONSUMER_GROUP>
are provided through the guide above
The following meta options are available for the parser: azure_event_hub_offset
, azure_event_hub_sequence_number
, azure_event_hub_enqueued_time
, azure_event_hub_partition_id
, azure_event_hub_consumer_group
and azure_event_hub_eventhub_name
.
The output is events as a byte stream, similar to Kafka input.
Azure Monitor Through Event Hub Input¶
The Azure Monitor Through Event Hub Input loads events from Azure Event Hub, loads the Azure Monitor JSON Log and breaks individual records to log lines, that are then sent to the defined output.
The input named input:AzureMonitorEventHub:
needs to be provided in the LogMan.io Collector YAML configuration:
input:AzureMonitorEventHub:AzureMonitorEventHub:
connection_string: <CONNECTION_STRING>
eventhub_name: <EVENT_HUB_NAME>
consumer_group: <CONSUMER_GROUP>
encoding: # default: utf-8
output: <OUTPUT>
<CONNECTION_STRING>
, <EVENT_HUB_NAME>
and <CONSUMER_GROUP>
are provided through the guide above
The following meta options are available for the parser: azure_event_hub_offset
, azure_event_hub_sequence_number
, azure_event_hub_enqueued_time
, azure_event_hub_partition_id
, azure_event_hub_consumer_group
and azure_event_hub_eventhub_name
.
The output is events as a byte stream, similar to Kafka input.
Alternative: Kafka Input¶
Azure Event Hub also provides (excluding basic tier users) a Kafka interface, so standard LogMan.io Collector Kafka input can be used.
There are multiple authentication options in Kafka, including oauth etc.
However, for the purposes of the documentation and reuse of the connection string
, the plain SASL authentication using the connection string
from the guide above is preferred.
input:Kafka:KafkaInput:
bootstrap_servers: <NAMESPACE>.servicebus.windows.net:9093
topic: <EVENT_HUB_NAME>
group_id: <CONSUMER_GROUP>
security.protocol: SASL_SSL
sasl.mechanisms: PLAIN
sasl.username: "$ConnectionString"
sasl.password: <CONNECTION_STRING>
output: <OUTPUT>
<CONNECTION_STRING>
, <EVENT_HUB_NAME>
and <CONSUMER_GROUP>
are provided through the guide above, <NAMESPACE>
in the name of the Azure Event Hub resource (also mentioned in the guide above).
The following meta options are available for the parser: kafka_key
, kafka_headers
, _kafka_topic
, _kafka_partition
and _kafka_offset
.
The output is events as a byte stream.
Collecting logs from Microsoft 365¶
TeskaLabs LogMan.io can collect logs from Microsoft 365, formerly Microsoft Office 365.
There are following classes of Microsoft 365 logs:
-
Audit logs: They contain information about various user, admin, system, and policy actions and events from Azure Active Directory, Exchange and SharePoint.
-
Message Trace: It provides an ability to gain an insight into the e-mail traffic passing thru Microsoft Office 365 Exchange mail server.
Enable auditing of Microsoft 365¶
By default, audit logging is enabled for Microsoft 365 and Office 365 enterprise organizations. However, when setting up logging of a Microsoft 365 or Office 365 organization, you should verify the auditing status of Microsoft Office 365.
1) Go to https://compliance.microsoft.com/ and sign in
2) In the left navigation pane of the Microsoft 365 compliance center, click Audit
3) Click the Start recording user and admin activity banner
It may take up to 60 minutes for the change to take effect.
For more details, see Turn auditing on or off.
Configuration of Microsoft 365¶
Before you can collect logs from Microsoft 365, you must configure Microsoft 365. Be aware that configuration takes a significant amount of time.
1) Setup a subscription to Microsoft 365 and a subscription to Azure
You need a subscription to Microsoft 365 and a subscription to Azure that has been associated with your Microsoft 365 subscription.
You can use trial subscriptions to both Microsoft 365 and Azure to get started.
For more details, see Welcome to the Office 365 Developer Program.
2) Register your TeskaLabs LogMan.io collector in Azure AD
It allows you to establish an identity for TeskaLabs LogMan.io and assign specific permissions it needs to collect logs from Microsoft 365 API.
Sign in to the Azure portal, using the credential from your subscription to Microsoft 365 you wish to use.
3) Navigate to Azure Active Directory
4) On the Azure Active Directory page, select "App registrations" (1), and then select "New registration" (2)
5) Fill the registration form for TeskaLabs LogMan.io application
- Name: "TeskaLabs LogMan.io"
- Supported account types: "Account in this organizational directory only"
- Redirect URL: None
Press "Register" to complete the process.
6) Collect essential informations
Store following informations from the registered application page at Azure Portal:
- Application (client) ID aka
client_id
- Directory (tenant) ID aka
tenant_id
7) Create a client secret
The client secret is used for the safe authorization and access of TeskaLabs LogMan.io.
After the page for your app is displayed, select Certificates & secrets (1) in the left pane. Then select "Client secrets" tab (2). On this tab, create new client secrets (3).
8) Fill in the information about a new client secret
- Description: "TeskaLabs LogMan.io Client Secret"
- Expires: 24 months
Press "Add" to continue.
9) Click the clipboard icon to copy the client secret value to the clipboard
Store the Value (not the Secret ID) for a configuration of TeskaLabs LogMan.io, it will be used as client_secret
.
10) Specify the permissions for TeskaLabs LogMan.io to access the Microsoft 365 Management APIs
Go to App registrations > All applications in the Azure Portal and select "TeskaLabs LogMan.io".
11) Select API Permissions (1) in the left pane and then click Add a permission (2)
12) On the Microsoft APIs tab, select Microsoft 365 Management APIs
13) On the flyout page, select the all types of permissions
- Delegated permissions
ActivityFeed.Read
ActivityFeed.ReadDlp
ServiceHealth.Read
- Application permissions
ActivityFeed.Read
ActivityFeed.ReadDlp
ServiceHealth.Read
Click "Add permissions" to finish.
14) Add "Microsoft Graph" permissions
- Delegated permissions
AuditLog.Read.All
- Application permissions
AuditLog.Read.All
Select "Microsoft Graph", "Delegated permissions", then seek and select "AuditLog.Read.All" in "Audit Log".
Then select again "Microsoft Graph", "Application permissions" then seek and select "AuditLog.Read.All" in "Audit Log".
15) Add "Office 365 Exchange online" permissions for collecting Message Trace reports
Click on "Add a permission" again.
Then go to "APIs my organization uses".
Type "Office 365 Exchange Online" to search bar.
Finally select "Office 365 Exchange Online" entry.
Select "Application permissions".
Type "ReportingWebService" into a search bar.
Check the "ReportingWebService.Read.All" select box.
Finally click on "Add permissions" button.
16) Grant admin consent
17) Navigate to Azure Active Directory
18) Navigate to Roles and administrators
19) Assign TeskaLabs LogMan.io to Global Reader role
Type "Global Reader" into a search bar.
Then click on "Global Reader" entry.
Select "Add assignments".
Type "TeskaLabs LogMan.io" into a search bar. Alternatively use "Application (client) ID" from previous steps.
Select "TeskaLabs LogMan.io" entry, the entry will appear in "Selected items".
Hit "Add" button.
Congratulations! Your Microsoft 365 is now ready for an log collection.
Configuration of TeskaLabs LogMan.io¶
Example¶
connection:MSOffice365:MSOffice365Connection:
client_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
tenant_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
client_secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Collect Microsoft 365 Audit.General
input:MSOffice365:MSOffice365Source1:
connection: MSOffice365Connection
content_type: Audit.General
output: ms-office365-01
# Collect Microsoft 365 Audit.SharePoint
input:MSOffice365:MSOffice365Source2:
connection: MSOffice365Connection
content_type: Audit.SharePoint
output: ms-office365-01
# Collect Microsoft 365 Audit.Exchange
input:MSOffice365:MSOffice365Source3:
connection: MSOffice365Connection
content_type: Audit.Exchange
output: ms-office365-01
# Collect Microsoft 365 Audit.AzureActiveDirectory
input:MSOffice365:MSOffice365Source4:
connection: MSOffice365Connection
content_type: Audit.AzureActiveDirectory
output: ms-office365-01
# Collect Microsoft 365 DLP.All
input:MSOffice365:MSOffice365Source5:
connection: MSOffice365Connection
content_type: DLP.All
output: ms-office365-01
output:XXXXXX:ms-office365-01: {}
# Collect Microsoft 365 Message Trace logs
input:MSOffice365MessageTraceSource:MSOffice365MTSource1:
connection: MSOffice365Connection
output: ms-office365-message-trace-01
output:XXXXXX:ms-office365-message-trace-01: {}
Connection¶
The connection to Microsoft 365 must be configured first in the connection:MSOffice365:...
section.
connection:MSOffice365:MSOffice365Connection:
client_id: # Application (client) ID from Azure Portal
tenant_id: # Directory (tenant) ID from Azure Portal
client_secret: # Client secret value from Azure Portal
resources: # (optional) resource to get data from separated by comma (,) (default: https://manage.office.com,https://outlook.office365.com)
Danger
Fields client id
, tenant_id
and client secret
MUST be specified for a successful connection to Microsoft 365.
Collecting from Microsoft 365 activity logs¶
Configuration options to set up the collection fot the Auditing logs (Audit.AzureActiveDirectory
, Audit.SharePoint
, Audit.Exchange
, Audit.General
and DLP.All
):
input:MSOffice365:MSOffice365Source1:
connection: # ID of the MSOffice365 connection
output: # Which output to send the incoming events to
content_type: # (optional but advised) Content type of obtained logs (default: Audit.AzureActiveDirectory Audit.SharePoint Audit.Exchange Audit.General DLP.All)
refresh: # (optional) The refresh interval in seconds to obtain messages from the API (default: 600)
last_value_storage: # (optional) Persistent storage for the current last value (default: ./var/last_value_storage)
Collecting from Microfost 365 Message Trace¶
Configuration options to set up the source of data of Microsoft 365 Message Trace:
input:MSOffice365MessageTraceSource:MSOffice365MessageTraceSource1:
connection: # ID of the MSOffice365 connection
output: # Which output to send the incoming events to
refresh: # (optional) The refresh interval in seconds to obtain messages from the API (default: 600)
last_value_storage: # (optional) Persistent storage for the current last value (default: ./var/last_value_storage)
Refresh of the client secret¶
The client secret will expire after 24 months and it has to be periodically recreated.
1) Navigate to Azure Active Directory.
2) Go to "App registrations" and select "TeskaLabs LogMan.io".
3) Create a new client secret.
Go to "Certificates & secrets".
Hit "New client secret" in "Client secrets" tab.
Fill "TeskaLabs LogMan.io Client Secret 2" in the Description. Use increasing numbers for new client secrets.
Select "730 days (24 mothns)" expiration.
Hit "Add" button.
4) Reconfigure TeskaLabs LogMan.io to use new client secrets.
5) Delete the old client secret.
Microsoft 365 Attributes explained¶
Attribute | Description | Values as an example | Notes | Full list (ext) | |
---|---|---|---|---|---|
o365.audit.ActorContextId | ID of the user or service account that performed the action. | 571c8d2c-1ae2-486d-a17c-81bf54cbaa15 | |||
o365.audit.ApplicationId | Application identifier (unique letter+number string) | 89bee1f7-5e6e-4d8a-9f3d-ecd601259da7 | |||
o365.audit.AzureActiveDirectoryEventType | The type of Azure Active Directory event. The following values indicate the type of event. | 0 - Indicates an account login event. 1 - Indicates an Azure application security event. |
|||
o365.audit.DeviceProperties | Source device properties such as OS, browser type etc. | Name:"OS" Value:"Linux" } {2 items Name:"BrowserType" Value:"Firefox" } {2 items Name:"IsCompliantAndManaged" Value:"False" } {2 items Name:"SessionId" Value:"e94ad17c-354f-4009-a9ee-34900770e997" |
Parcing of these properties is still in progress | ||
o365.audit.ErrorNumber | An error code string that can be used to classify types of errors that occur, and should be used to react to errors. | 0, 50140, 501314 ... | https://learn.microsoft.com/en-us/azure/active-directory/develop/reference-aadsts-error-codes | ||
o365.audit.ExtraProperties | Not defined yet | // | |||
o365.audit.FileSizeBytes | FIle size in bytes | 23301 | |||
o365.audit.InterSystemsId | Unique inter system ID string | acc33436-ee63-4d81-b6ee-544998a1c7d9 | |||
o365.audit.IntraSystemId | Unique intra system ID string | 01dd20c0-edb9-4aaa-a51b-2bf38e1a8900 | |||
o365.audit.ItemName | Unique item name | b1379a75-ce5e-4fa3-80c6-89bb39bf646c | |||
o365.audit.LogonError | Error message displayed after failed login | InvalidUserNameOrPassword, TriggerBrowserCapabilitiesInterrupt, InvalidPasswordExpiredPassword | |||
o365.audit.ObjectId | URL path to accesed file | https://telescopetest.sharepoint.com/sites/Shared Documents/Docs/o365 - logs.xlsx | |||
o365.audit.RecordType | The type of operation indicated by the record. This property indicates the service or feature that the operation was triggered in. | 6 | https://learn.microsoft.com/en-us/office/office-365-management-api/office-365-management-activity-api-schema#auditlogrecordtype | ||
o365.audit.ResultStatus | Triggered response | Success, Fail | |||
o365.audit.SourceFileExtension | Accessed file extension (format type). | .xlsx, .pdf, .doc etc. | |||
o365.audit.SourceFileName | Name of file user accessed | "o365.attributesexplained.xlsx" | |||
o365.audit.SupportTicketId | ID of the potential Support ticket, after user opened a support request in Azure Active Directory. | // | The customer support ticket ID for the action in "act-on-behalf-of" situations. | ||
o365.audit.TargetContextId | The GUID of the organization that the targeted user belongs to. | 571c8d2c-1ae2-486d-a17c-81bf54cbaa15 | |||
o365.audit.UserKey | An alternative ID for the user identified in the UserID property. For example, this property is populated with the passport unique ID (PUID) for events performed by users in SharePoint. This property also might specify the same value as the UserID property for events occurring in other services and events performed by system accounts. | i:0h.f|membership|1003200224fe6604@live.com | |||
o365.audit.UserType | The type of user that performed the operation. The following values indicate the user type. | 0 - A regular user. 2 - An administrator in your Microsoft 365 organization.1 3 - A Microsoft datacenter administrator or datacenter system account. 4 - A system account. 5 - An application. 6 - A service principal. 7 - A custom policy. 8 - A system policy. |
|||
o365.audit.Version | Indicates the version number of the activity (identified by the Operation property) that's logged. | 1 | |||
o365.audit.Workload | The Microsoft 365 service where the activity occurred. | AzureActiveDirectory | |||
o365.message.id | This is the Internet message ID (also known as the Client ID) found in the message header in the Message-ID: header field. | 08f1e0f6806a47b4ac103961109ae6ef@server.domain | This ID should be unique; however, not all sending mail systems behave the same way. As a result, there's a possibility that you may get results for multiple messages when querying upon a single Message ID. | ||
o365.message.index | Value of MessageTrace Index | 1, 2, 3 ... | |||
o365.message.size | Size of the sent/received message in bytes. | 33489 | |||
o365.message.status | Following action after sending the message. | Delivered, FilteredAsSpam, Expanded | https://learn.microsoft.com/en-us/exchange/monitoring/trace-an-email-message/run-a-message-trace-and-view-results | ||
o365.message.subject | Message subject; can be written uniquely. | "Binding Offer Letter for Ms. Smith" | |||
Microsoft Windows ↵
Collecting logs from Microsoft Windows¶
There are multiple ways of collecting logs or Windows Events from Microsoft Windows.
Window Event Collector (WEC/WEF)¶
The agent-less Window Event Collector (WEC) sends logs from Windows computers via the Windows Event Forwarding (WEF) service to TeskaLabs LogMan.io Collector. The TeskaLabs LogMan.io Collector then acts as Window Event Collector (WEC). The WEF configuration can be deployed using Group Policy, either centrally managed by the Active Directory server or using Local Group Policy. With Active Directory in place, there are no additional configuration requirements on individual Windows machines.
Tip
We recommend this method for collecting Windows Events.
Windows Remote Management¶
Agent-less remote control connects to a desired Windows computer over Windows Remote Management (aka WinRM) and runs the collection command there as a separate process to collect its standard output.
Agent on the Windows computer¶
In this method, TeskaLabs LogMan.io Collector runs as an agent on the desired Windows computer(s) and collects Windows Events.
Collecting from Microsoft Windows using WEC/WEF¶
The agent-less Window Event Collector (WEC) sends logs from Windows computers via the Windows Event Forwarding (WEF) service to TeskaLabs LogMan.io Collector. The TeskaLabs LogMan.io Collector then acts as Window Event Collector (WEC). The WEF configuration can be deployed using Group Policy, either centrally managed by the Active Directory server or using Local Group Policy. With Active Directory in place, there are no additional configuration requirements on individual Windows machines.
Schema: Event flow of WEC/WEF collection in TeskaLabs LogMan.io.
Prerequisites¶
- Microsoft Active Directory Domain Controller, in this example providing domain name
domain.int
/DOMAIN.int
- TeskaLabs LogMan.io Collector, in this example with IP address
10.0.2.101
and hostnamelmio-collector
, running in the same network as Windows computes, including Active Directory - The IP address of the TeskaLabs LogMan.io Collector MUST be fixed (ie. reserved by a DHCP server)
- Date and time of the TeskaLabs LogMan.io Collector MUST be NTP synchronized
- TeskaLabs LogMan.io Collector SHOULD use the DNS server of the Active Directory
- TeskaLabs LogMan.io Collector MUST be able to resolve the hostnames of Domain Controller servers of the Active Directory
- TeskaLabs LogMan.io Collector MUST be able to reach udp/88, tcp/88, udp/750 and tcp/750 ports (Kerberos authentication) ports
- All Windows servers sending longs MUST be able to reach TeskaLabs LogMan.io Collector's tcp/5985 for WEF and udp/88, tcp/88, udp/750 and tcp/750 ports (Kerberos authentication) ports
Tip
This setup utilizes Kerberos authentication. Kerberos authentication uses Active Directory domain-specific Kerberos tickets issued by the domain controller for authentication and encryption of the log forwarding. It is the optimal choice for Windows computers that are managed through a domain.
Active Directory¶
1.1. Create a new user in Active Directory
Navigate to Windows Administrative Tools > Active Directory Users and Computers > DOMAIN.int
> Users
Right-click and choose New > User
Enter following information:
- Full name:
TeskaLabs LogMan.io
- User logon name:
lmio-collector
Warning
The user logon name must be the same as the computer name of the TeskaLabs LogMan.io Collector. You can find it in the TeskaLabs LogMan.io collector setup screen.
Select "Next".
Set a password for the user.
This example uses Password01!
.
Warning
Use a strong password according your policy. This password will be used in later step of this procedure.
Uncheck "User must change password at next logon".
Check "Password never expires".
Hit Next and then Finish button to create the user.
Finally, right-click on the new user, click Properties, and open the Account tab.
- Check "This account supports Kerberos AES 128 bit encryption".
- Check "This account supports Kerberos AES 256 bit encryption".
The new user lmio-collector
is now ready.
1.2. Create an A record in the DNS server for TeskaLabs LogMan.io Collector
Use DHCP to reserve an IP address of the collector
A fixed IP address MUST be assigned to TeskaLabs LogMan.io Collector. This can by done by "reserving" the IP address in the Active Directory DHCP server.
Navigate to Windows Administrative Tools > DNS > Forward Lookup Zones > DOMAIN.int
Right-click and choose "New Host (A or AAAA)…"
Add a record with name lmio-collector
and IP address 10.0.2.101
.
Adjust this according to the IP address of your TeskaLabs LogMan.io Collector.
Hit Add Host button to finish.
1.3. Create a host principal name
Create a host principal name and the associated keytab file for the host of the TeskaLabs LogMan.io Collector. Execute following command on the Active Directory Domain Controller Server's command prompt (cmd.exe
):
ktpass /princ host/lmio-collector.domain.int@DOMAIN.INT /pass Password01! /mapuser DOMAIN\lmio-collector -pType KRB5_NT_PRINCIPAL /out host-lmio-collector.keytab /crypto AES256-SHA1
Process is case-sensitive
Make sure to CAPITALIZE anything you see capitalized in our examples (such as host/lmio-collector.domain.int@DOMAIN.INT
).
It has to be CAPITALIZED even if your domain contains lowercase letters.
The keytab file host-lmio-collector.keytab
is created.
1.4. Create a http principal name
Create a service principal name and the associated keytab file for a service:
ktpass /princ http/lmio-collector.domain.int@DOMAIN.INT /pass Password01! /mapuser DOMAIN\lmio-collector -pType KRB5_NT_PRINCIPAL /out http-lmio-collector.keytab /crypto AES256-SHA1
The keytab file http-lmio-collector.keytab
is created.
1.5. Collect key tab files from the Windows Server
Collect two keytab files from above. You'll upload them into TeskaLabs LogMan.io in a later step.
Group Policy¶
2.1. Open the Group Policy Management Console
Navigate to Windows Administrative Tools > Group Policy Management, select your domain, DOMAIN.int
in this example.
2.2. Create Group Policy Object
In the Group Policy Management console, select your domain, such as DOMAIN.int
.
Right-click the domain and choose "Create a GPO in this domain, and Link it here....
Specify a name for the new GPO, "TeskaLabs LogMan.io Windows Event Forwarding", then select OK.
2.3. Configure Group Policy Object
The new GPO is created and linked to your domain. To configure the policy settings, right-select the created GPO and choose "Edit...".
The "Group Policy Management Editor" opens to let you customize the GPO.
2.4. Configure Event Forwarding Policy under Computer Configuration section
In the "Group Policy Management Editor", navigate to Computer Configuration > Policies > Administative Templates > Windows Compontents and select Event Forwarding.
Select "Configure target Subscription Manager".
Enable the setting and select Show.
Fill in the location of the TeskaLabs LogMan.io Collector:
Server=http://lmio-collector.domain.int:5985/wsman/SubscriptionManager/WEC,Refresh=60
Press OK to apply the settings.
2.5. Apply
Execute gpupdate /force
in cmd.exe
on the Windows Server.
Security log¶
WEF can't access Windows security log by default.
To enable forwarding of the Security log, add Network Service
to WEF.
Tip
Windows Security log is the most important source of cyber security information and must be configured.
3.1. Open the Group Policy Management Console
Navigate to Windows Administrative Tools > Group Policy Management, select your domain; DOMAIN.int
in this example.
Right-click and select "Edit...".
Navigate to Computer Configuration > Administrative Templates > Windows Components > and select Event Log Service.
Then select Security.
Select Configure log access.
3.2. Configure the log access
In "Log Access" field, enter:
O:BAG:SYD:(A;;0xf0005;;;SY)(A;;0x5;;;BA)(A;;0x1;;;S-1-5-32-573)(A;;0x1;;;S-1-5-20)
Explanation
- O:BA: Specifies that the owner of the object is the Built-in Administrators group.
- G:SY: Specifies that the primary group is SYSTEM.
-
D:: Indicates that the following part defines the Discretionary Access Control List (DACL).
-
Built-in Administrators (BA): Read and write permissions.
- SYSTEM (SY): Full control with read and write permissions and special permissions for managing the event logs.
- Builtin\Event Log Readers (S-1-5-32-573): Read-only permissions.
- Network Service (S-1-5-20): Read-only permissions.
Press OK.
3.3. Apply
Execute gpupdate /force
in cmd.exe
on the Windows Server.
TeskaLabs LogMan.io¶
4.1. Configure Microsoft Events collection
In TeskaLabs Logman.io, navigate to Collectors > Your Collector > Microsoft Windows.
Fill the Realm and FQDN of the Domain Controller, add keytab files for host and http and press Apply.
4.2. The log collection is configured
Advanced topics¶
Alternatives¶
- Use of SSL certificates instead of Active Directory and Kerberos
- Use a local group policy instead of Active Directory Group Policy
Forwarding Event Log¶
The Eventlog-forwardingPlugin/Operational
event channel logs relevant information of machines that are set up to forward logs into the collector.
It also contains the information about possible issues with WEF subscription.
Use Event Viewer application to investigate.
Manual configuration¶
Collecting from Microsoft Windows by Windows Remote Management¶
Agent-less remote control connects to a desired Windows computer over Windows Remote Management (aka WinRM) and runs the collection command there as a separate process to collect its standard output.
Input specification: input:WinRM:
WinRM input connects to a remote Windows Server machine, where is calls a specified command.
It then periodically checks for new output at stdout
and stderr
, so it behaves in a similar manner to input:SubProcess
.
LogMan.io Collector WinRM configuration options¶
endpoint: # Endpoint URL of the Windows Management API of the remote Windows machine (f. e. http://MyMachine:5985/wsman)
transport: ntlm # Authentication type
server_cert_validation: # Specify the certificate validation (default: ignore)
cert_pem: # (optional) Specify path to the certificate (if using HTTPS)
cert_key_pem: # (optional) Specify path to the private key
username: # (optional) When using username authentication (like over ntlm), specify username in format <DOMAIN>\<USER>
password: # Password of the authenticated user above
output: # Which output to send the incoming events to
The following configuration clarifies the command that should be remotely called:
# Read 1000 system logs once per 2 seconds
command: # Specify the command, that should be remotely called (f. e. wevtutil qe system /c:1000 /rd:true)
chilldown_period: # How often in seconds should the remote command be called, if it is ended (default: 5)
duplicity_check: # Specify if to check for duplicities based on time (true/false)
duplicity_reverse_order: # Specify if to check for duplicities in reverse order (f. e. logs come in descending order)
last_value_storage: # Persistent storage for the current last value in duplicity check (default: ./var/last_value_storage)
Collecting by the agent on the Windows machine¶
TeskaLabs LogMan.io Collector runs as an agent on a desired Windows machine and collects Windows Events.
Input specification: input:WinEvent
Note: input:WinEvent
only works at Windows-based machine.
This input periodically reads Windows Events from the specified event type.
LogMan.io Collector WinEvent configuration options¶
server: # (optional) Specify source of the events (default: localhost, i. e. the entire local machine)
event_type: # (optional) Specify the event type to be read (default: System)
buffer_size: # (optional) Specify how many events should be read in one query (default: 1024)
event_block_size: # (optional) Specify the amount of events after which an idle time will be executed for other operations to take place (default: 100)
event_idle_time: # (optional) Specify the idle time in seconds mentioned above (default: 0.01)
last_value_storage: # Persistent storage for the current last value (default: ./var/last_value_storage)
output: # Which output to send the incoming events to
The event type can be specified for every Window Event log type, including:
Application
for application logsSystem
for system logsSecurity
for security logs etc.
Ended: Microsoft Windows
Collect logs from database using ODBC¶
Introduction¶
The recommended way of extracting logs and other events from databases is to use ODBC. ODBC provides a unified way how to connect LogMan.io to various database systems.
Tip
Examples of ODBC connection strings can be found here.
ODBC driver and configuration¶
You need to provide the ODBC driver for a database system you want to integrate with LogMan.io. The relevant ODBC driver must be compatible with Ubuntu 20.04 LTS, 64bit.
ODBC drivers needs to be deployed into the LogMan.io Collector, specifically into /odbc
directory.
Alternatively, our support will help you to deploy a correct ODBC driver for your database system or provide LogMan.io Collector with bundled ODBC driver.
Note
ODBC drivers are exposed to LogMan.io collector software via Docker volumes.
The ODBC configuration is done in /odbc/odbcinst.ini
and odbc.ini
files.
Collector input configuration¶
The input source specification is input:ODBC:
.
Example of the ODBC collector configuration:
input:ODBC:ODBCInput:
dsn: Driver={FreeTDS};Server=MyServer;Port=1433;Database=MyDatabase;TDS_Version=7.3;UID=MyUser;PWD=MyPassword
query: SELECT * FROM viewAlerts WHERE {increment_where_clause} ORDER BY Cas Time;
increment_strategy: date
increment_first_value: "2020-10-01 00:00:00.000"
increment_column_name: "Time"
chilldown_period: 30
output: WebSocket
last_value_storage: /data/var/last_value_storage
MySQL ODBC configuration¶
MySQL ODBC drivers can be obtained on the following link.
The driver package needs to be extracted into /odbc/mysql
directory.
Entries in the /odbc/odbcinst.ini
:
[MySQL ODBC 8.0 Unicode Driver]
Driver=/odbc/mysql/libmyodbc8w.so
UsageCount=1
[MySQL ODBC 8.0 ANSI Driver]
Driver=/odbc/mysql/libmyodbc8a.so
UsageCount=1
Microsoft SQL Server ODBC configuration¶
Microsoft SQL Server ODBC drivers can be obtained on the following link.
Entries in the /odbc/odbcinst.ini
:
[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/odbc/microsoft/msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1
UsageCount=1
Example of the connection string:
Driver={ODBC Driver 17 for SQL Server};Server=<server_name>;Authentication=ActiveDirectoryPassword;UID=<username>;PWD=<password>;Database=<database>;TrustServerCertificate=Yes
Microsoft SQL Server ODBC alternative configuration¶
Alternative connectivity to Microsoft SQL Server is provided by FreeTDS project, respective its ODBC driver.
Entries in the /odbc/odbcinst.ini
:
[FreeTDS]
Description=FreeTDS Driver for Linux & MSSQL
Driver=/odbc/freetds/libtdsodbc.so
Setup=/odbc/freetds/libtdsodbc.so
UsageCount=1
Example of the connection string:
Driver={FreeTDS};Server=<server_name>;Port=<server_port>;Database=<database>;UID=<username>;PWD=<password>;TDS_Version=7.3
MariaDB ODBC configuration¶
MariaDB ODBC drivers can be obtained on the following link.
SAP IQ, Sybase IQ, Sybase ASE ODBC configuration¶
SAP IQ / Sybase IQ / Sybase ASE / SQL Anywhere ODBC drivers needs to be downloaded from SAP support page.
The driver package needs to be extracted into /odbc/sybase
directory.
Entry in the /odbc/odbcinst.ini
:
[ODBC Driver for Sybase IQ]
Description=Sybase IQ
Driver=/odbc/sybase/IQ-16_1/lib64/libdbodbc17.so
UsageCount=1
Example of the connection string:
Driver={ODBC Driver for Sybase IQ};UID=<username>;PWD=<password>;Server=<server_name>;DBN=<database_name>;CommLinks=TCPIP{host=<host>;port=<port>};DriverUnicodeType=1
Troubleshooting¶
Add a ODBC Trace¶
If you need greater insight into the ODBC connectivity, you can enable ODBC tracing.
Add this section to the /odbc/odbcinst.ini
:
[ODBC]
Trace = yes
TraceFile = /odbc/trace.log
Then the collector is started, the trace output of the ODBC system is stored in the file /odbc/trace.log
.
This file is available also outside of the container.
Verify the ODBC configuration¶
The command odbcinst -j
(launched within the container) can be used to verify ODBC readiness:
# odbcinst -j
unixODBC 2.3.6
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /root/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8
Docker-compose¶
The LogMan.io collector can be started using docker-compose
.
This is an extract of relevant entries from docker-compose.yaml
:
lmio-collector:
image: docker.teskalabs.com/lmio/lmio-collector
...
volumes:
- /odbc/odbcinst.ini:/etc/odbcinst.ini
- /odbc/odbc.ini:/etc/odbc.ini
- /odbc:/odbc
...
network_mode: host
Afterwards, the LogMan.io Collector needs to be recreated with:
docker-compose up -d
Collecting logs from Oracle Cloud¶
You can collect logs from Oracle Cloud Infrastructure (OCI).
More about OCI Logging can be found here.
For LogMan.io Collector configuration, you will need to:
- Generate a new API key together with new public and private keys in the OCI console
- Create a new search query for the API requests
Generating API key in OCI Console¶
-
Log in to your OCI Console.
-
Select User settings from the menu on the top right corner.
-
Go to Resources, select API keys and click Add API key.
-
Make sure Generate API Key Pair is selected. Click Download Private Key and then Download Public Key. Then, click Add.
-
A configuration file with all the credentials you need will be created. Paste the contents of the text box into the
data/oci/config
file. -
Fill out
key_file
with the path to your private key file.
Important
Never share your private key with anyone else. Keep it secret in its own file. You might need to change the permissions of the public and private keys after downloading them.
How does the requesting process work?
The private and public keys are part of an asymmetric encryption system. The public key is stored with Oracle Cloud Infrastructure and is used to verify the identity of the client. The private key, which you keep secure and never share, is used to sign your API requests.
When you make a request to the OCI API, the client software uses the private key to create a digital signature for the request. This signature is unique for each request and is generated based on the request data. The client then sends the API request along with this digital signature to Oracle Cloud Infrastructure.
Upon receiving the request, Oracle uses the public key (which it already has) to verify the digital signature. If the signature is valid, it confirms that the request was indeed sent by the holder of the corresponding private key (you) and has not been tampered with during transmission. This process is crucial for maintaining the integrity and authenticity of the communication.
The private key itself is never sent over the network. It stays on your client-side. The security of the entire process depends on the private key remaining confidential. If it were compromised, others could impersonate your service and send requests to OCI on your behalf.
Logging Query Language Specification¶
Please refer to the official documentation for creating new queries.
You will need the following information:
- Compartment OCID
- Log group OCID
- Log OCID
Logging Query Example
search "<compartment_OCID>/<log_group_OCID>/<log_OCID>" | where level = 'ERROR' | sort by datetime desc
Configuration¶
Below is the required configuration for LogMan.io Collector:
input:Oracle:OCIInput:
oci_config_path: ./data/oci/config # Path to your OCI configuration file (required)
search_query: search '<compartment_OCID>/<log_group_OCID>/<log_OCID>' | where level = 'ERROR' # (required)
encoding: utf-8 # Encoding of the events (optional)
interval: 10 # Number of seconds between requests to OCI (optional)
output: <output_id> # (required)
Development¶
Warning
LogMan.io Collector Oracle Source is built on OCI integration for API which uses synchronous requests. There might be some problems when there is heavy TCP input.
Collecting events from Zabbix¶
TeskaLabs LogMan.io Collector can collect events from Zabbix through Zabbix API.
- Zabbix Metrics Source collects history and events.
- Zabbix Security Source collects alerts and events.
Zabbix Metrics Source¶
Zabbix Metrics Source periodically sends event.get
and history.get
requests.
The event.get
request is used to retrieve event data from the Zabbix server. Events in Zabbix represent significant occurrences within the monitored environment, such as triggers firing, discovery actions, or internal Zabbix events.
The history.get
request is used to retrieve historical data from Zabbix, which includes various types of monitoring data, such as numeric values, text logs, and more.
Configuration¶
Example of minimal required configuration:
input:ZabbixMetrics:<SOURCE_ID>:
url: https://192.0.0.5/api_jsonrpc.php # URL for Zabbix API
auth: b03.......6f # Authorization token for Zabbix API
output: <output_id>
output:<type>:<output_id>:
...
Optionally, you can configure properties of requests:
interval: 60 # (optional, default: 60) Time interval between requests in seconds
max_requests: 100 # (optional, default: 50) Number of concurrent requests
request timeout: 10 # (optional, default: 10) Timeout for requests in seconds
sleep_on_error: 10 # (optional, default: 10) When error occurs, LMIO Collector waits for some time and then sends the requests again
You can also change the encoding of incoming events:
encoding: utf-8 # (optional) Encoding of incoming events
History types¶
In Zabbix, a history object represents a recorded piece of data associated with a metric item over time. These history objects are fundamental for analyzing the performance and status of monitored entities, as they store the actual collected data points. Each history object is associated with a specific item and includes a timestamp indicating when the data was collected. The history objects are used to track and analyze trends, generate graphs, and trigger alerts based on historical data.
Multiple different history object types can be returned in events. See the official documentation for more info.
History object type | Name | Usage |
---|---|---|
0 | numeric float | metrics like CPU load, temperature, etc. |
1 | character | log entries, service statuses, etc. |
2 | log | system and application logs |
3 | (default) numeric unsigned | free disk space, network traffic, etc. |
4 | text | descriptions, messages, etc. |
5 | binary | binary messages |
History types are configured in the following way:
histories_to_return: "0,1,3" # (optional, default: '0,3') List of history types
Metric items¶
A metric item in Zabbix specifies the type of data to be gathered from a monitored host. Each item is associated with a key that uniquely identifies the data to be collected, as well as other attributes such as the data type, collection frequency, and units of measurement. Items can represent various types of data, including numerical values, text, log entries, and more.
The Zabbix server typically contains a large amount of hosts from which histories will be collected. To filter for specific metric items, do the following steps:
- Create a CSV file with the list of metric types, each on separate line:
Uptime
Number of processes
Number of threads
FortiGate: System uptime
VMware: Uptime
CPU utilization
CPU user time
...
- Configure the path in LogMan.io Collector Zabbix Metrics Source configuration:
items_list_filename: conf/items.csv
Tip
We recommend to filter for a small subset of metric types to prevent Zabbix server overloading.
Zabbix Security Source¶
Zabbix Security Source periodically sends event.get
and alert.get
requests.
The event.get
request is used to retrieve event data from the Zabbix server. Events in Zabbix represent significant occurrences within the monitored environment, such as triggers firing, discovery actions, or internal Zabbix events.
The alert.get
request is used to retrieve alert data from the Zabbix server. Alerts in Zabbix are notifications generated in response to certain conditions or events, such as trigger status changes, discovery actions, or internal system events. These alerts can be configured to notify administrators or take automated actions to address issues.
Required configuration¶
Example of minimal required configuration:
input:ZabbixSecurity:<SOURCE_ID>:
url: https://192.0.0.5/api_jsonrpc.php # URL for Zabbix API
auth: b03.......6f # Authorization token for Zabbix API
output: <output_id>
output:<type>:<output_id>:
...
Optionally, you can configure properties of requests:
interval: 60 # (optional, default: 60) Time interval between requests in seconds
request timeout: 10 # (optional, default: 10) Timeout for requests in seconds
sleep_on_error: 10 # (optional, default: 10) When error occurs, LMIO Collector waits for some time and then sends the requests again
You can also change the encoding of incoming events:
encoding: utf-8 # (optional) Encoding of incoming events
ESET Connect API Source¶
ESET Connect is an API gateway between a client and a collection of ESET backend services. It acts as a reverse proxy to accept all application programming interface (API) calls, aggregate the various services required to fulfill them and return the appropriate result.
Creating new API Client¶
Warning
In order to create new user, you need to have superuser permission in ESET Business Account.
- Log as superuser (Administrator) into ESET Business Account.
-
Open User Management and click on New user button at the bottom.
-
Create the account with read-only permission and enable Integrations slider bar.
-
New user account should be successfully created with permission to read from ESET Connect API.
Collecting detections from connected devices¶
Connected devices are organized in device groups. Each group can have its subgroups. One device can be a member of different subgroups. It is possible to monitor detections from from selected devices or selected device groups. If these are not specified, all devices are monitored.
Configuration of LogMan.io Collector¶
'input:ESET:EsetSource':
client_id: john.doe@domain.com # (required) E-mail of the API Client
client_secret: client_secret # (required) Password for the API Client
interval: 10 # (optional, default: 10) Interval between requests in seconds.
Ended: Log sources
LogMan.io Collector Transformations¶
Transformations are used for pre-processing the incoming event with user-defined declaration before it is passed to the specified output.
Available transforms¶
transform:Declarative:
provides declarative processor, that is configured via declarative configuration (YAML)transform:XMLToDict:
is typically used for XML files and Windows Events from WinRM.
LogMan.io Collector Mirage¶
LogMan.io Collector has the ability to create mock logs and send them through the data pipeline, mainly for testing and demonstration purposes. The source that produces mock logs is called Mirage.
Mirage uses LogMan.io Collector Library as a repository for collected logs from various providers. The logs in this library are derived from real logs.
Mirage input configuration¶
# Configuration YAML file for inputs
[config]
path=./etc/lmio-collector.yaml
# Connection to the Library where Mirage logs are stored
[library]
providers=git+http://user:password@gitlab.example.com/lmio/lmio-collector-library.git
[web]
listen=0.0.0.0 8088
input:Mirage:MirageInput:
path: /Mirage/<path>/
eps: <number of logs sent per second>
output: <output_id>
Throughput of logs¶
You can define the number of logs produced every second (EPS - events per second).
The configuration below will produce 20 logs every second:
input:Mirage:MirageInput:
eps: 20
This configuration will also add deviation, so each second the amount will be between 10 and 40 logs:
input:Mirage:MirageInput:
eps: 20
deviation: 0.5
In order to create a more realistic log source and change EPS during the day, you can use scenarios.
input:Mirage:MirageInput:
eps: dayshift
deviation: 0.5
Available options are:
- normal
- gaussian
- dayshift
- nightshift
- tiny
- peak
Finally, you can create a custom scenario. You can set EPS every minute:
input:Mirage:MirageInput:
eps:
"10:00": 10
"12:00": 20
"15:10": 10
"15:11": 12
"16:00": 5
"23:00": 0
deviation: 0.5
Adding new log sources to the Library¶
- Create a new repository for log collection or clone the existing one.
- Create a new directory. Name the directory after the log source.
- Create a new file for each log in the source directory. Mirage can use the same log multiple times when sending logs through the pipeline. (You don't need 100 separate log files to send 100 logs - Mirage will repeat the same logs.)
By default, when sending logs through the pipeline, Mirage chooses from your log files randomly and approximately evenly, but you can add weight to logs to change that.
Templating logs¶
To get more unique logs without having to create more log files, template your logs. You can choose which fields in the log will have variable values, then make a list of values you want to populate that field, and Mirage will randomly choose the values.
-
In your log, choose which field you would like to have variable values. Replace the field values with
${fieldname}
, wherefieldname
is what you will call the field in your list."user.name":${user.name}, "id":${id}, "msg":${msg}
-
Make a file called
values.yaml
in your log source directory. -
In the new file
values.yaml
, list possible values for each templated field. Match the field names in thevalues.yaml
file to the field names in your log files.values: user.name: - Albus Dumbledore - Harry Potter - Hermione Granger id: - 171 - 182 - 193 msg: - Connection ended - Connection interrupted - Connection success - Connection failure
Datetime formats¶
Mirage can generate a current timestamp for each log. For that, you need to choose the format the timestamp will be in. To add a timestamp to your mock log, add ${datetime: <format>}
to the text in the file.
Example
${datetime: %y-%m-%d %H:%M:%S}
generates the timestamp 23-06-07 12:55:26
${datetime: %Y %b %d %H:%M:%S:%f}
generates the timestamp 2023 Jun 07 12:55:26:002651
Datetime directives¶
Datetime directives are derived from the Python datetime
module.
Directive | Meaning | Example |
---|---|---|
%a |
Weekday as locale’s abbreviated name. | Sun, Mon, ..., Sat (en_US); |
%A |
Weekday as locale’s full name. | Sunday, Monday, ..., Saturday (en_US); |
%w |
Weekday as a decimal number, where 0 is Sunday and 6 is Saturday. | 0, 1, ..., 6 |
%d |
Day of the month as a zero-padded decimal number. | 01, 02, ..., 31 |
%b |
Month as locale’s abbreviated name. | Jan, Feb, ..., Dec (en_US); |
%B |
Month as locale’s full name. | January, February, ..., December (en_US); |
%m |
Month as a zero-padded decimal number. | 01, 02, ..., 12 |
%y |
Year without century as a zero-padded decimal number. | 00, 01, ..., 99 |
%Y |
Year with century as a decimal number. | 0001, 0002, ..., 2013, 2014, ..., 9998, 9999 |
%H |
Hour (24-hour clock) as a zero-padded decimal number. | 00, 01, ..., 23 |
%I |
Hour (12-hour clock) as a zero-padded decimal number. | 01, 02, ..., 12 |
%p |
Locale’s equivalent of either AM or PM. | AM, PM (en_US); am, pm (de_DE) |
%M |
Minute as a zero-padded decimal number. | 00, 01, ..., 59 |
%S |
Second as a zero-padded decimal number. | 00, 01, ..., 59 |
%f |
Microsecond as a decimal number, zero-padded to 6 digits. | 000000, 000001, ..., 999999 |
%z |
UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive). | (empty), +0000, -0400, +1030, +063415, -030712.345216 |
%Z |
Time zone name (empty string if the object is naive). | (empty), UTC, GMT |
%j |
Day of the year as a zero-padded decimal number. | 001, 002, ..., 366 |
%U |
Week number of the year (Sunday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. | 00, 01, ..., 53 |
%W |
Week number of the year (Monday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Monday are considered to be in week 0. | 00, 01, ..., 53 |
%c |
Locale’s appropriate date and time representation. | Tue Aug 16 21:30:00 1988 (en_US); Di 16 Aug 21:30:00 1988 (de_DE) |
%x |
Locale’s appropriate date representation. | 08/16/88 (None); 08/16/1988 (en_US); 16.08.1988 (de_DE) |
%X |
Locale’s appropriate time representation. | 21:30:00 (en_US); 21:30:00 (de_DE) |
%% |
A literal '%' character. | % |
Log weight¶
If you want Mirage to select some logs more often and some less, you can give certain log files more weight. Weight means how much more or less a log file is selected, compared to others.
To change the weight, add a number at the beginning of the log file name. This number creates a ratio with the other log files. If a file does not begin with a number, Mirage considers it to be a 1.
Example
5-example1.log
2-example2.log
3-example3.log
example4.log
Mirage would send these logs in a 5:2:3:1 ratio.
Collecting into lookups¶
Files¶
To periodically collect lookups from files such as CSV, use the input:FileBlock:
input with following configuration:
path: # Specify the lookup folder, where the file lookup will be stored (f. e. /data/lookups/mylookup/*)
chilldown_period: # Specify how often in seconds to check for new files (default: 5)
FileBlock reads all files in one block (one event is the entire file content) and passes it to configured output, which is usually output:WebSocket
.
In such a way, the lookup is passed to LogMan.io Receiver, and, eventually, to LogMan.io Parser, where the lookup can be processed and stored in Elasticsearch. See Parsing lookups for more information.
Sample configuration¶
input:FileBlock:MyLookupFileInput:
path: /data/lookups/mylookup/*
chilldown_period: 10
output: LookupOutput
output:WebSocket:LookupOutput:
url: https://lm1/files-ingestor-ws
...
Virtual Machine¶
TeskaLabs LogMan.io Collector can be deployed into a dedicated virtual machine.
Specifications¶
- 1 vCPU
- OS Linux, preferably Ubuntu Server 22.04.4 LTS, other mainstream distributions are also supported
- 4 GB RAM
- 500 GB disk (50 GB for OS; the rest is a buffer for collected logs)
- 1x NIC, preferably 1Gbps
The collector must be able to connect to a TeskaLabs LogMan.io installation over HTTPS (WebSocket) using its URL.
Note
For environments with higher loads, the virtual machine should be scaled up accordingly.
Network¶
We recommend to assign static IP address to the collector virtual machine because it will be used in the many configurations of log sources.
Default Configuration of LogMan.io Collector¶
The TeskaLabs LogMan.io collector is equipped with a default configuration designed for quick and efficient integration with typical log sources, optimizing initial setup times and providing robust connectivity out-of-the-box.
Default Network Ports for Log Sources¶
Below is a table outlining the default network ports used by various technologies when connecting to the LogMan.io Collector. Both UDP (User Datagram Protocol) and TCP (Transmission Control Protocol) ports are available to support different network communication needs.
Vendor Technology |
Product Variant |
Port range | Stream name | Note |
---|---|---|---|---|
Linux | Syslog RFC 3164 | 10000 10009 |
linux-syslog-rfc3164 |
BSD Syslog Protocol |
Linux | Syslog RFC 5424 | 10020 10029 |
linux-syslog-rfc5424 |
IETF Syslog Protocol |
Linux | rsyslog | 10010 10019 |
linux-rsyslog |
|
Linux | syslog-ng | 10030 10039 |
linux-syslogng |
|
Linux | Auditd | 10040 10059 |
linux-auditd |
|
Fortinet | FortiGate | 10100 10109 |
fortinet-fortigate |
RFC6587 Framing on TCP |
Fortinet | FortiGate | 10110 10119 |
fortinet-fortigate |
|
Fortinet | FortiSwitch | 10120 10129 |
fortinet-fortiswitch |
No Framing on TCP |
Fortinet | FortiSwitch | 10130 10139 |
fortinet-fortiswitch |
|
Fortinet | FortiMail | 10140 10159 |
fortinet-fortimail |
|
Fortinet | FortiClient | 10160 10179 |
fortinet-forticlient |
|
Fortinet | FortiAnalyzer | 10180 10199 |
fortinet-fortianalyzer |
|
Cisco | ASA | 10300 10319 |
cisco-asa |
|
Cisco | FTD | 10320 10339 |
cisco-ftd |
|
Cisco | IOS | 10340 10359 |
cisco-ios |
|
Cisco | ISE | 10360 10379 |
cisco-ise |
|
Cisco | Switch Nexus | 10380 10399 |
cisco-switch-nexus |
|
Cisco | WLC | 10400 10419 |
cisco-wlc |
|
Dell | Switch | 10500 10519 |
dell-switch |
|
Dell | PowerVault | 10520 10539 |
dell-powervault |
|
Dell | iDRAC | 10540 10559 |
dell-idrac |
|
HPE | Aruba Clearpass | 10600 10619 |
hpe-aruba-clearpass |
|
HPE | Aruba IAP | 10620 10639 |
hpe-aruba-iap |
|
HPE | Aruba Switch | 10640 10659 |
hpe-aruba-switch |
|
HPE | Integrated Lights-Out (iLO) | 10660 10679 |
hpe-ilo |
|
HPE | Primera | 10680 10699 |
hpe-primera |
|
HPE | StoreOnce | 10700 10719 |
hpe-storeonce |
|
Bitdefender | Gravity Zone | 10740 10759 |
bitdefender-gravityzone |
|
Broadcom | Brocade Switch | 10760 10779 |
broadcom-brocade-switch |
|
Devolutions | 10800 10819 |
devolutions |
||
ESET | Protect | 10840 10859 |
eset-protect |
|
F5 | 10860 10879 |
f5 |
||
FileZilla | 10880 10899 |
filezilla |
||
Gordic | Ginis | 10900 10919 |
gordic-ginis |
|
IceWarp | Mail Center | 10920 10939 |
icewarp |
|
Kubernetes | 10940 10959 |
kubernetes |
||
McAfee WebWasher | 10960 10979 |
mcafee-webwasher |
||
MikroTik | 10980 10999 |
mikrotik |
||
Oracle | Listener | 11000 11019 |
oracle-listener |
|
Oracle | Spark | 11020 11039 |
oracle-spark |
|
Ntopng | 11060 11079 |
ntopng |
||
OpenVPN | 11080 11099 |
openvpn |
||
SentinelOne | 11100 11119 |
sentinelone |
||
Squid | Proxy | 11120 11139 |
squid-proxy |
|
Synology | NAS | 11140 11159 |
synology-nas |
|
Veeam | Backup & Replication | 11160 11179 |
veeam-backup-replication |
|
ySoft | SafeQ | 11180 11199 |
ysoft-safeq |
|
Ubiquiti | UniFi Controller | 11200 11219 |
ubiquiti-unifi-controller |
|
Ubiquiti | UniFi Cloud Key | 11240 11259 |
ubiquiti-unifi-cloud-key |
|
Ubiquiti | Unifi Switch | 11220 11239 |
ubiquiti-unifi-switch |
|
VMware | vCenter | 11300 11319 |
vmware-vcenter |
|
VMware | vCloud Director | 11320 11339 |
vmware-vcloud-director |
|
VMware | ESXi | 11340 11359 |
vmware-esxi |
|
ZyXEL | CEF | 11440 11459 |
zyxel-cef |
|
ZyXEL | GS2210 | 11460 11479 |
zyxel-gs2210 |
|
Sophos | Standard Syslog Protocol | 11500 11519 |
sophos-standard-syslog-protocol |
|
Sophos | Syslog Devide Standard Format | 11520 11539 |
sophos-device-standard-format |
|
Sophos | Unstructured Format | 11540 11559 |
sophos-unstructured |
|
Custom | 14000 14099 |
custom |
Ended: Collector
Receiver ↵
LogMan.io Receiver¶
TeskaLabs LogMan.io Receiver is a microservice responsible for receiving logs and other events from the LogMan.io collector, forwarding these logs into the central LogMan.io system, and archiving raw logs. LogMan.io Receiver forwards incoming logs into proper tenants.
Note
LogMan.io Receiver replaces LogMan.io Ingestor.
Communication link¶
The communication between lmio-collector
and lmio-receiver
is named "commlink", short for the Communication Link.
The websocket is used as a primary communication protocol but HTTPS calls from the collector are also utilized. The collector keeps the websocket connection to the receiver open for a long period of a time. When the communication link from the collector is terminated, the collector tries to reconnect periodically.
Note
Websocket connection utilizes server-side generated PING packets to keep the websocket open.
The Communication Link is protected by the mutual SSL authorization.
It means that each lmio-collector
is equipped by the private key and the client SSL certificate.
The private key and the client SSL certificate is generated automatically during provisioning of the new collector.
The private key and the client SSL certificate are used to authenticate the collector.
This mechanism also provide a strong encryption of the traffic between the collector and the central part of the LogMan.io.
Production setup¶
The production setup is that LogMan.io Collector (lmio-collector
) connects over HTTPS via NGINX server to LogMan.io Receiver (lmio-receiver
).
graph LR
lmio-collector -- "websocket & SSL" --> n[NGINX]
n[NGINX] --> lmio-receiver
Diagram: Production setup
For more info, continue to the NGINX section.
Non-production setup¶
The direct connection from lmio-collector
to lmio-receiver
is also supported.
It is suitable for non-production setups such as testing or development.
graph LR
lmio-collector -- "websocket & SSL" --> lmio-receiver
Diagram: Non-production setup
High availability¶
TeskaLabs LogMan.io Receiver is designed to run in multiple instances, independent to LogMan.io tenants. The recommended setup is to operate one TeskaLabs LogMan.io Receiver on each node of the central LogMan.io cluster with the deployed NGINX.
TeskaLabs LogMan.io Collector uses DNS round-robin ballancing to connect to one of NGINX servers. The NGINX forwards the incoming communication links to the receiver instance, with the preference of the receiver running on the same node as the NGINX.
More than one lmio-receiver
instance can be operated on the cluster node, for example if the performance of the single instance of lmio-receiver
become the bottleneck.
Example of the high availability configuration
graph LR
c1[lmio-collector] -.-> n1[NGINX]
c1[lmio-collector] --> n2[NGINX]
c1[lmio-collector] -.-> n3[NGINX]
subgraph Node 3
n1[NGINX] --> r1[lmio-receiver]
end
subgraph Node 2
n2[NGINX] --> r2[lmio-receiver]
end
n2[NGINX] -.-> r1[lmio-receiver]
n2[NGINX] -.-> r3[lmio-receiver]
subgraph Node 1
n3[NGINX] --> r3[lmio-receiver]
end
Failure recovery scenarios¶
- The
lmio-receiver
instance is terminated: NGINX rebalances commlinks to another instances of the receiver on other nodes. - The NGINX is terminated: the collector reconnects to another NGINX in the cluster.
- The whole cluster node is terminated: the collector reconnects to another NGINX in the cluster.
Receiver configuration¶
The receiver requires following dependencies:
- Apache ZooKeeper
- NGINX (for production deployments)
- Apache Kafka
Example¶
This is the minimalistic example of the LogMan.io receiver configuration:
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[lifecycle]
hot=/data/ssd/receiver
warm=/data/hdd/receiver
cold=/data/nas/receiver
Zookeeper¶
Specify locations of the Zookeeper server in the cluster.
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Hint
For non-production deployments, the use of a single Zookeeper server is possible.
Lifecycle¶
Each lifecycle phases needs to be specified, typically by specifying the filesystem paths.
[lifecycle]
hot=/data/ssd/receiver
warm=/data/hdd/receiver
cold=/data/nas/receiver
task_limit=20
A lifecycle phase name (i.e. 'hot', 'warm', 'cold') must not contain "_" character.
See the Lifecycle chapter for more details.
task_limit
option specifies a maximum number of concurrently running lifecycle tasks on this node. 20 is a default value.
Warning
Do not change lifecycle paths after the receiver has already stored data. Changing the paths will not apply retrospectively, and the receiver will be unable to locate data for the lifecycle phase.
Web APIs¶
The receiver provides two Web APIs: public and private.
Public Web API¶
Public Web API is designed for the communication with collectors.
[web:public]
listen=3080
The default port of the public web API is tcp/3080.
This port is designed to serve as the NGINX upstream for connections from collectors. It is a recommended production setup.
Standalone Public Web API¶
You can operate lmio-receiver
without NGINX in stand-alone non-production setup.
Warning
Don't use this mode for production deployments.
This is ideal for development and testing environments, because you don't need a NGINX.
This is the configuration example:
[web:public]
listen=3443 ssl:web:public
[ssl:web:public]
key=${THIS_DIR}/server-key.pem
cert=${THIS_DIR}/server-cert.pem
verify_mode=CERT_OPTIONAL
This is how to generate self-signed server certificate for above config:
$ openssl ecparam -genkey -name secp384r1 -out server-key.pem
$ openssl req -new -x509 -subj "/OU=LogMan.io Receiver" -key server-key.pem -out server-cert.pem -days 365
Private Web API¶
[web]
listen=0.0.0.0 8950
The default port of the private web API is tcp/8950.
Certificate Authority¶
The receiver automatically creates Certificate Authority used for a collector provisioning.
The CA artefacts are stored in the Zookeeper at /lmio/receiver/ca
folder.
./lmio-receiver.py -c ./etc/lmio-receiver.conf
29-Jun-2023 19:43:50.651408 NOTICE asab.application is ready.
29-Jun-2023 19:43:51.716978 NOTICE lmioreceiver.ca.service Certificate Authority created
...
The default CA configration:
[ca]
curve=secp384r1
auto_approve=no
auto_approve
option automates the collector enrollment process, every received CSR is automatically approved when set to yes
.
Websocket¶
The default configuration of the websocket (to collectors)
[websocket]
timeout=30
compress=yes
max_msg_size=4M
Apache Kafka¶
The connection to Apache Kafka can be configured:
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
If the configuration is not present, then events are not forwarded to Apache Kafka.
Archive¶
The archive is enabled by default. Set the archive option to no
to disable archive functionality.
[general]
archive=yes
Metrics¶
The receiver produces own telemetry and also forwards the telemetry from collectors to the configured telemetry data storage, such as InfluxDB.
[asab:metrics]
...
Signing keys¶
The signing key is used to digitally sign raw log archives.
You can specify the EC curve to be used for signing keys.
The default is prime256v1
, also known as secp256r1
.
[signing]
curve=prime256v1
Collector¶
Collector provisioning¶
The collector instance needs to be provisioned prior the collector is authorized to send logs to the TeskaLabs LogMan.io. The provisioning is done exactly once during the collector life cycle.
Note
TeskaLabs LogMan.io Receiver operates Certificate Authority. The provisioning process is the approval of the CSR finished by the issuance of the client SSL certificate for the collector. This client certificate is used by the collector for its authentication.
The provisioning starts at the collector. The minimal collector YAML configuration specifies the URL of the LogMan.io entry endpoint for commlinks.
connection:CommLink:commlink:
url: https://recv.logman.example.com/
When the collector is started, it submits its enrolment request to the receiver. The collector also prints output similar to this one
...
Waiting for an approval, the identity: 'ABCDEF1234567890'
Waiting for an approval, the identity: 'ABCDEF1234567890'
It means that the collector has an unique identity ABCDEF1234567890
and that the receiver awaits an approval of this collector.
On the receiver side, the approval is granted by a following call:
curl -X 'PUT' \
http://lmio-receiver/provision/ABCDEF1234567890 \
-H 'Content-Type: application/json' \
-d '{"tenant": "mytenant"}'
Warning
Speficy a correct tenant in the request, instead of mytenant
value.
Hint
Approval can be granted also using web browser at the "Approve a CSR received from the collector" at http://lmio-receiver/doc
Mind that ABCDEF1234567890
needs to be replaced by the real identity from the output of the collector.
The tenant has to be specified in the request as well.
When this call is executed, the collector informs that it is provisioned and ready:
Waiting for an approval, the identity: 'ABCDEF1234567890'
29-Jun-2023 02:05:35.276253 NOTICE lmiocollector.commlink.identity.service The certificate received!
29-Jun-2023 02:05:35.277731 NOTICE lmiocollector.commlink.identity.service [sd identity="ABCDEF1234567890"] Ready.
29-Jun-2023 02:05:35.436872 NOTICE lmiocollector.commlink.service [sd url="https://recv.logman.example.com/commlink/v2301"] Connected.
Certificates of provisioned clients are stored in the ZooKeeper at /lmio/receiver/clients
.
Info
The tenant name is stored in the generated SSL client certificate.
CSRs that are not provisioned within 2 days are removed. The provisioning procedure can be restarted once the collector submits a new CSR.
Removing the collector¶
For removal of the provisioned collector at the receiver side, delete the relevant entry from a ZooKeeper folder /lmio/receiver/clients
.
This means that you revoked a grant of the collector to connect to a receiver.
Warning
The deletion will not affect currently connected collectors. The automated disconnection is on the product roadmap.
For removing at the collector side, delete ssl-cert.pem
and ssl-key.pem
when the collector is stopped.
The collector will start new enrollment under a new identity when started.
This action is called a reset of the collector identity.
Collector configuration¶
connection:CommLink:commlink:
url: https://recv.logman.example.com/
input:..:LogSource1:
output: logsource-1
output:CommLink:logsource-1: {}
...
Section connection:CommLink:commlink:
This section configures a communication link to the central part of the TeskaLabs LogMan.io.
The configuration can be also provided by the application configuration file.
If the section [commlink]
is present, items from there are loaded before applying values from YAML.
Example
Empty YAML specification:
connection:CommLink:commlink: {}
...
URL is used from the application configuration:
[commlink]
url=https://recv.logman.example.com/
...
Option url
Mandatory value with URL of the central part of LogMan.io.
It must use https://
protocol, not http://
.
Typical values are:
https://recv.logman.example.com/
- for a dedicated NGINX server for receiving logshttps://logman.example.com/lmio-receiver
- for a single DNS domain on NGINX server
Can be also provided in the environment variable LMIO_COMMLINK_URL
.
Option insecure
Optional (default: no
) boolean value that allows insecure server connections if set to yes
.
This option allows a use of self-signed server SSL certificates.
Danger
Don't use insecure
option in the production setups.
Advanced SSL configuration options¶
The following configuration options specify the SSL (HTTPS) connection:
cert
: Path to the client SSL certificatekey
: Path to the private key of the client SSL certificatepassword
: Private key file password (optional, default: none)cafile
: Path to a PEM file with CA certificate(s) to verify the SSL server (optional, default: none)capath
: Path to a directory with CA certificate(s) to verify the SSL server (optional, default: none)ciphers
: SSL ciphers (optional, default: none)dh_params
: Diffie–Hellman (D-H) key exchange (TLS) parameters (optional, default: none)verify_mode
: One of CERT_NONE, CERT_OPTIONAL or CERT_REQUIRED (optional); for more information, see: github.com/TeskaLabs/asab
Section output:CommLink:<stream>:
<stream>
The stream name in the archive and in the Apache Kafka topics.
Logs will be fed into the stream name received.<tenant>.<stream>
.
{}
means at the end that there are no options for this output.
Note
Generic options for output:
applies as well.
Such as debug: true
for a troubleshooting.
Multiple sources¶
The collector can handle multiple log sources (event lanes) from the one instance.
For each source, add input:..
and output:CommLink:...
section to the configuration.
Example
connection:CommLink:commlink:
url: https://recv.logman.example.com/
# First (TCP) log source
input:Stream:LogSource1:
address: 8888 # Listen on TCP/8888
output: tcp-8888
output:CommLink:tcp-8888: {}
# Second (UDP) log source
input:Datagram:LogSource2:
address: 8889 # Listen on UDP/8889
output: udp-8889
output:CommLink:udp-8889: {}
# Third (UDP + TCP) log source
input:Stream:LogSource3s:
address: 8890 # Listen on TCP/8890
output: p-8890
input:Datagram:LogSource3d:
address: 8890 # Listen on UDP/8890
output: p-8890
output:CommLink:p-8890: {}
Warning
Log sources collected by a one instance of the collector must share one tenant.
Delivery methods¶
When a collector is online, logs and other events are deliverted instantly over the Websocket.
When a collector is offline, logs are stored in the offline buffer and once the collector become online, buffered logs are synced back. This delivery method is called syncback. Buffered logs are uploaded using HTTP PUT request.
Offline buffer¶
When the collector is not connected to a receiver, logs are stored in the collector local buffer and uploaded to the receiver as soon as the connectivity is restored.
Buffered logs are compressed using xz
when stored in the offline buffer.
The local buffer is a directory on the filesystem, the location of this folder can be configured:
[general]
buffer_dir=/var/lib/lmio-receiver/buffer
Warning
The collector monitors an available disk capacity in this folder and it will stop buffering logs when less than 5% of the disk space is free.
Reconnection during housekeeping¶
The collector reconnects every day during housekeeping - typically at 4:00 in the morning. This is to restore balanced distribution of connected collectors across the cluster.
Archive¶
The LogMan.io Receiver archive is immutable, column-oriented, append-only data storage of the received raw logs.
Each commlink feeds data into the stream.
The stream is a infinite table with fields.
The stream name is composed by the received.
prefix, the name of the tenant and a commlink (ie. received.mytenant.udp-8889
)
The archive stream contains following fields for each log entry:
raw
: Raw log (string, digitally signed)row_id
: Primary identifier of the row unique across all streams (64bit unsigned integer)collected_at
: Date&time of the log collection at the collectorreceived_at
: Date&time of the log receival to the receiversource
: Description of the log source (string)
The source
field contains:
- for TCP inputs:
<ip address> <port> S
(S is for a stream) - for UDP inputs:
<ip address> <port> D
(D is for a datagram) - for file inputs: a filename
- for other inputs: optional specification of the source
The source
field for a log delivered over UDP
192.168.100.1 61562 D
The log was collected from IP address 192.168.100.1 and port UDP/61562.
Partition¶
Every stream is divided into partitions. Partitions of the same stream can be located on different receiver instances.
Info
Partitions can share identical periods of time. This means that data entries from the same span of time could be found in more than one partition.
Each partition has its number (part_no
), starting from 0.
This number monotonically increases for new partitions in the archive, across streams.
The partition number is globally unique, in terms of the cluster.
The partition number is encoded into the partition name.
The partition name is 6 character name, which starts with aaaaaa
(aka partition #0) and continues to aaaaab
(partition #1) and so on.
The parititon can be investigated in the Zookeeper:
/lmio/receiver/db/received.mytenant.udp-8889/aaaaaa.part
partno: 0 # The partition number, translates to aaaaaa
count: 4307 # Number of rows in this partition
size: 142138 # Size of the partition in bytes (uncompressed)
created_at:
iso: '2023-07-01T15:22:53.265267'
unix_ms: 1688224973265267
closed_at:
iso: '2023-07-01T15:22:53.283168'
unix_ms: 1688224973283167
extra:
address: 192.168.100.1 49542 # Address of the collector
identity: ABCDEF1234567890 # Identity of the collector
stream: udp-8889
tenant: mytenant
columns:
raw:
type: string
collected_at:
summary:
max:
iso: '2023-06-29T20:33:18.220173'
unix_ms: 1688070798220173
min:
iso: '2023-06-29T18:25:03.363870'
unix_ms: 1688063103363870
type: timestamp
received_at:
summary:
max:
iso: '2023-06-29T20:33:18.549359'
unix_ms: 1688070798549359
min:
iso: '2023-06-29T18:25:03.433202'
unix_ms: 1688063103433202
type: timestamp
source:
summary:
token:
count: 2
type: token:rle
Tip
Because the partition name is globally unique, it is possible to move partition to a shared storage, ie. NAS or a cloud storage from a different nodes of the cluster. The lifecycle is designed in a way that partition names will not collide, so data will not be overwritten by different receivers but reassembled correctly on the "shared" storage.
Lifecycle¶
The partition lifecycle is defined by phases.
The ingest partitions are partitions that receives the data. Once the ingest is completed, aka rotated to the new partition, the former partition is closed. The partition cannot be reopen.
When the partition is closed, the partition lifecycle starts. Each phase is configured to point to a specific directory on the filesystem.
The lifecycle is defined on the stream level, at /lmio/receiver/db/received...
entry in the ZooKeeper.
Tip
Partitions can be also moved manualy into a desired phase by the API call.
Default lifecycle¶
The default lifecycle consists of three phases: hot, warm and cold.
graph LR
I(Ingest) --> H[Hot];
H --1 week--> W[Warm];
W --6 months--> D(Delete);
H --immediately-->C[Cold];
C --18 months--> CD(Delete);
The ingest is done into the hot phase. Once the ingest is completed and the partition is closed, the partition is copied into the cold phase. After a week, the partition is moved to the warm phase. It means that the partition is duplicated - one copy is in the cold phase storage, the second copy is in the warm phase storage.
The partition on the warm phase storage is deleted after 6 months.
The partition on the cold phase storage is compressed using xz/LZMA. The partition is deleted from the cold phase after 18 months.
Default lifecycle definition
define:
type: jizera/stream
ingest: # (1)
phase: hot
rotate_size: 30G
rotate_time: daily
lifecycle:
hot:
- move: # (2)
age: 1w
phase: warm
- copy: # (3)
phase: cold
warm:
- delete: # (4)
age: 6M
cold:
- compress: # (5)
type: xz
preset: 6
threads: 4
- delete: # (6)
age: 18M
- Ingest new logs into the hot phase.
- After one week, move the partition from a hot to a warm phase.
- Copy the partition into a cold phase immediately after closing of ingest.
- Delete the partition after 6 months.
- Compress the partition immediatelly on arrival to the cold phase.
- Delete the partition after 18 months from the cold phase.
The phase storage tiers recommendations:
- Hot phase should be located on SSDs
- Warm phase should be located on HDDs
- Cold phase is an archive, could be located on NAS or slow HDDs.
Note
For more information, visit the Administration manual, chapter about Disk storage.
Lifecycle rules¶
move
: Move the partition at specifiedage
to the specifiedphase
.copy
: Copy the partition at specifiedage
to the specifiedphase
.delete
: Delete the partition at specifiedage
.
The age
can be e.g. "3h" (three hours), "5M" (five months), "1y" (one year) and so on.
Supported age
postfixes:
y
: year, respectively 365 daysM
: month, respectively 31 daysw
: weekd
: dayh
: hourm
: minute
Note
If age
is not specified, then the age is set to 0, which means that the lifecycle action is taken immediately.
Compression rule¶
compress
: Compress the data on receival to the phase.
Currently type: xz
is supported with following options:
preset
: The xz compression preset.
The compression preset levels can be categorised roughly into three categories:
0 ... 2
Fast presets with relatively low memory usage. 1 and 2 should give compression speed and ratios comparable to bzip2 1 and bzip2 9, respectively.
3 ... 5
Good compression ratio with low to medium memory usage. These are significantly slower than levels 0-2.
6 ... 9
Excellent compression with medium to high memory usage. These are also slower than the lower preset levels.
The default is 6.
Unless you want to maximize the compression ratio, you probably don't want a higher preset level than 7 due to speed and memory usage.
threads
: Maximum number of CPU threads used for a compression.
The default is 1.
Set to 0 to use as many threads as there are processor cores.
Manual decompression
You can use xz --decopress
or unxz
from XZ Utils.
You can use Z-Zip to decompress archive files on Windows.
Always work on the copy of files in the archive; copy all files out of the archive first, and don't modify (decompress) files in the archive.
Replication rule¶
replica
: Specify the number of data copies (replicas) should be present in the phase.
Replicas are stored on a different receiver instances, so that the number of replicas should NOT be greater than the number of receivers in the cluster that operates a given phase. Otherwise the "excessive" replica will not be created because the available receiver instance is not found.
Replication in the hot phase
define:
type: jizera/stream
lifecycle:
hot:
- replica:
factor: 2
...
factor
: A number of copies of the data in the phase, the default value is 1.
Rotation¶
A partition rotation is a mechanism that closed ingest partitions at specific conditions. When the ingest partition is closed, new data are stored in the newly created another ingest partition. This ensures more or less even slicing of the infinite stream of the data.
The rotation is configured on the stream level by:
rotate_time
: the period (iedaily
) the partition can be in the ingest moderotate_size
: the maximum size of the partition;T
,G
,M
andk
postfixes are supported using base 10.
Both options can be applied simultanously.
The default stream rotation is daily
and 30G
.
Roadmap
Only daily
option is available at the moment for rotate_time
.
Data vending¶
The data can be extracted from the archive (ie. for third party processing, migration and so one) by copying out the data directory of partitions in scope.
Use Zookeeper to identify what partitions are in scope of the vending and where they are physically located on storages.
The raw
column can be directly processed by third party tools.
When the data are compressed by the lifecycle configuration, the decompression can be needed.
Note
It means that you don't need to move partition from ie. cold phase into warm or hot phase.
Replay of the data¶
The archived logs can be replayed to subsequent central components.
Non-repudiation¶
The archive is a cryptographically secured, designed for traceability and non-repudiation. Digital signatures are used to verify the authenticity and integrity of the data, providing assurance that the logs have not been tampered with and were indeed generated by the stated log source.
This digital signature-based approach to maintaining logs is an essential aspect of secure logging practices and a cornerstone of a robust information security management system. These logs are vital tools for forensic analysis during an incident response, detecting anomalies or malicious activities, auditing, and regulatory compliance.
We use following cryptographical algorithms to ensure the security of logs: SHA256, ECDSA.
The hash function, SHA256, is applied to each raw log entry. This function takes the input raw log entry and produces a fixed-size string of bytes. The output (or hash) is unique to the input data; a slight alteration in the input will produce a dramatically different output, a characteristic known as the "avalanche effect".
This unique hash is then signed using a private signing key through the ECDSA algorithm, which generates a digital signature that is unique to both the data and the key. This digital signature is stored alongside the raw log data, certifying that the log data originated from the specified log source and has not been tampered with during storage.
Digital signatures of raw
columns are stored in the ZooKeeper (the canonical location) and in the filesystem, under the filename col-raw.sig
.
Each partition is also equipped with a unique SSL signing certificate, named signing-cert.der
.
This certificate, in conjunction with the digital signature, can be used to verify that the col-raw.data
(the original raw logs) has not been altered, thus ensuring data integrity.
Important
Please note that the associated private signing key is not stored anywhere but in the process memory for security purposes. The private key is removed as soon as the partition has finished its data ingest.
The signing certificate is issued by an internal Certificate Authority (CA).
The CA's certificate is available in ZooKeeper at /lmio/receiver/ca/cert.der
.
Digital signature verification
You can verify the digital signature by using the following OpenSSL commands:
$ openssl x509 -inform der -in signing-cert.der -pubkey -noout > signing-publickey.pem
$ openssl dgst -sha256 -verify signing-publickey.pem -signature col-raw.sig col-raw.data
Verified OK
These commands extract the public key from the certificate (signing-cert.der
), and then use that public key to verify the signature (col-raw.sig
) against the data file (col-raw.data
). If the data file matches the signature, you'll see a Verified OK
message.
Additionally, verify also the signing-cert.der
, this certificate has to be issued by the internal CA.
Practical example¶
The practical example of archive applied on the log stream from Microsoft 365.
The "cold" phase is stored on NAS, mounted to /data/nas
with XZ compression enabled.
Statistics¶
- Date range: 3 months
- Rotation: daily (typically one partition is created per day)
- Total size: 8.3M compressed, compression ratio: 92%
- Total file count: 1062
Content of directories¶
tladmin@lm01:/data/nas/receiver/received.default.o365-01$ ls -l
total 0
drwxr-x--- Jul 25 20:59 aaaebd.part
drwxr-x--- Jul 25 21:02 aaaebe.part
drwxr-x--- Jul 26 21:02 aaaebg.part
drwxr-x--- Jul 27 21:03 aaaeph.part
drwxr-x--- Jul 28 21:03 aaagaf.part
drwxr-x--- Jul 29 21:04 aaagfn.part
drwxr-x--- Jul 30 21:05 aaagjm.part
drwxr-x--- Jul 31 21:05 aaagog.part
drwxr-x--- Aug 1 21:05 aaahik.part
drwxr-x--- Aug 2 21:05 aaahmb.part
drwxr-x--- Aug 3 12:49 aaaifj.part
drwxr-x--- Aug 3 17:50 aaaima.part
drwxr-x--- Aug 3 18:46 aaaiok.part
drwxr-x--- Aug 4 18:46 aaajaf.part
drwxr-x--- Aug 5 18:46 aaajbk.part
drwxr-x--- Aug 6 18:47 aaajcj.part
drwxr-x--- Aug 7 11:33 aaajde.part
drwxr-x--- Aug 7 11:34 aaajeg.part
drwxr-x--- Aug 7 12:22 aaajeh.part
drwxr-x--- Aug 7 13:51 aaajem.part
drwxr-x--- Aug 8 09:50 aaajen.part
drwxr-x--- Aug 8 09:59 aaajfk.part
drwxr-x--- Aug 8 10:06 aaajfo.part
....
drwxr-x--- Oct 25 15:44 aadcne.part
drwxr-x--- Oct 26 06:23 aadcnp.part
drwxr-x--- Oct 26 09:54 aadcof.part
drwxr-x--- Oct 27 09:54 aadcpc.part
tladmin@lm01:/data/nas/receiver/received.default.o365-01/aadcpc.part$ ls -l
total 104
-r-------- 1824 Oct 27 09:54 col-collected_at.data.xz
-r-------- 66892 Oct 27 09:54 col-raw.data.xz
-r-------- 2076 Oct 27 09:54 col-raw.pos.xz
-r-------- 72 Oct 27 09:54 col-raw.sig
-r-------- 1864 Oct 27 09:54 col-received_at.data.xz
-r-------- 32 Oct 27 09:54 col-source-token.data.xz
-r-------- 68 Oct 27 09:54 col-source-token.pos.xz
-r-------- 68 Oct 27 09:54 col-source.data.xz
-r-------- 496 Oct 27 09:54 signing-cert.der.xz
-r-------- 1299 Oct 27 09:54 summary.yaml
Forwarding to Kafka¶
If the Apache Kafka is configured in the configuration, every received log event is forwarded by LogMan.io Receiver to Kafka topic. The topic is created automatically, when the first message is forwarded.
The name of the Apache Kafka topic is derived from the stream name: received.<tenant>.<stream>
, it is the same as the name in the archive.
Example of Kafka topics
received.mytenant.udp-8889
received.mytenant.tcp-7781
The raw log event is sent in the Kafka message body.
Following information is added to a message header:
row_id
: Row Id, the globally unique identifier of the event in the archive. Not present if the archive is disabled. 64bit binary big endian unsigned integer.collected_at
: The event collection date&time, Unix timestamp in microseconds as a string.received_at
: The event receival date&time, Unix timestamp in microseconds as a string.source
: The event source, string.tenant
: The name of the tenant which received this message, string. (OBSOLETE)
Roadmap
Automated setup of the Kafka topic (such as number of Kafka partitions) will be implemented in the future releases.
NGINX configuration¶
We recommend to use a dedicated virtual server in the NGINX for LogMan.io Receiver respectively communication links from LogMan.io Collector to the LogMan.io Receiver.
This server shares the NGINX server process and the IP address and it is operated on the dedicated DNS domain, different to the LogMan.io Web UI.
For example, the LogMan.io Web UI runs on http://logman.example.com/ and the receiver is available at https://recv.logman.example.com/.
In this example logman.example.com
and recv.logman.example.com
can resolve to the same IP address(es).
Multiple NGINX servers can be configured on different cluster nodes to handle incoming connections from collectors, sharing the same DNS name. We recommend to implement this option for high availability clusters.
upstream lmio-receiver-upstream {
server 127.0.0.1:3080; # (1)
server node-2:3080 backup; # (2)
server node-3:3080 backup;
}
server {
listen 443 ssl; # (3)
server_name recv.logman.example.com;
ssl_certificate recv-cert.pem; # (4)
ssl_certificate_key recv-key.pem;
ssl_client_certificate conf.d/receiver/client-ca-cert.pem; # (5)
ssl_verify_client optional;
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:50m;
ssl_session_tickets off;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers 'EECDH+AESGCM:EECDH+AES:AES256+EECDH:AES256+EDH';
ssl_prefer_server_ciphers on;
ssl_stapling on;
ssl_stapling_verify on;
server_tokens off;
add_header Strict-Transport-Security "max-age=15768000; includeSubdomains; preload";
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
location / { # (8)
proxy_pass http://lmio-receiver-upstream;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
proxy_set_header X-SSL-Verify $ssl_client_verify; # (6)
proxy_set_header X-SSL-Cert $ssl_client_escaped_cert;
client_max_body_size 500M; # (7)
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
-
Points to a locally running
lmio-receiver
, public Web API port. This is a primary destination since it saves a network traffic. -
Backup links to receivers run on other cluster nodes that runs
lmio-receiver
,node-2
andnode-3
in this example. Backups will be used when the locally running instance is not available. In the single node installation, skip these entries completely. -
This is a dedicated HTTPS server running on https://recv.logman.example.com.
-
You need to provide SSL server key and certificate. You can use a self-signed certificate or a certificate provide by a Certificate Authority.
-
The certificate
client-ca-cert.pem
is automatically created by thelmio-receiver
. See "Client CA certificate" section. -
This verifies the SSL certificate of the client (
lmio-collector
) and pass that info tolmio-receiver
. -
lmio-collector
may upload chunks of buffered logs. -
A URL location path where the
lmio-collector
API is exposed.
Verify the SSL web server
After NGINX configuration is completed, always verify the SSL configuration quality using ie. Qualsys SSL Server test. You should get "A+" overall rating.
OpenSSL command for generating self-signed server certificate
openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:secp384r1 \
-keyout recv-key.pem -out recv-cert.pem -sha256 -days 380 -nodes \
-subj "/CN=recv.logman.example.com" \
-addext "subjectAltName=DNS:recv.logman.example.com"
This command generates a self-signed certificate using elliptic curve cryptography with the secp384r1
curve.
The certificate is valid for 380 days and includes a SAN extension to specify the hostname recv.logman.example.com
.
The private key and the certificate are saved to recv-key.pem
and recv-cert.pem
, respectively.
Client CA certificate¶
The NGINX needs a client-ca-cert.pem
file for ssl_client_certificate
option.
This file is generated by the lmio-receiver
during the first launch, it is the export of the client CA certificate from the Zookeeper from lmio/receiver/ca/cert.der
.
For this reason lmio-receiver
needs to be started before this NGINX virtual server configuration is created.
The lmio-receiver
generates this file into ./var/ca/client-ca-cert.pem
folder.
docker-compose.yaml
lmio-receiver:
image: docker.teskalabs.com/lmio/lmio-receiver
volumes:
- ./nginx/conf.d/receiver:/app/lmio-receiver/var/ca
...
nginx:
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d
...
Single DNS domain¶
The lmio-receiver
can be alternativelly collocated on the same domain and port with the LogMan.io Web IU.
In this case, the lmio-receiver
API is exposed on the subpath: http://logman.example.com/lmio-receiver
Snipplet from the NGINX configuration for "logman.example.com" HTTPS server.
upstream lmio-receiver-upstream {
server 127.0.0.1:3080;
server node-2:3080 backup;
server node-3:3080 backup;
}
...
server {
listen 443 ssl;
server_name logman.example.com;
...
ssl_client_certificate conf.d/receiver/client-ca-cert.pem;
ssl_verify_client optional;
...
location /lmio-receiver {
rewrite ^/lmio-receiver/(.*) /$1 break;
proxy_pass http://lmio-receiver-upstream;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
proxy_set_header X-SSL-Verify $ssl_client_verify;
proxy_set_header X-SSL-Cert $ssl_client_escaped_cert;
client_max_body_size 500M;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
In this case, the lmio-collector
CommLink setup must be:
connection:CommLink:commlink:
url: https://logman.example.com/lmio-receiver/
...
Load balancing and high availability¶
Load balancing is configured by the upstream
section of the NGINX configuration.
upstream lmio-receiver-upstream {
server node-1:3080;
server node-2:3080 backup;
server node-3:3080 backup;
}
The collector connects to a receiver via NGINX using a long-lasting WebSocket connection (Commlink). The NGINX will first try to forward the incoming connection to "node-1". If that fails, it tries to forward to one of the backups: "node-2" or "node-3". The "node-1" is preferably "localhost" so the network traffic is limited, but it can be reconfigured otherwise.
Because the WebSocket connection is persistent, the web socket stays connected to the "backup" server even if the primary server becomes online again. The collector will reconnect "on housekeeping" (daily, during the night) to resume proper balancing.
This mechanism also provides the high availability feature of the installation. When the NGINX or receiver instance is down, collectors will connect to another NGINX instance, and that instance will forward these connections to available receivers.
DNS round-robin balancing is recommended for distributing the incoming web traffic across available NGINX instances. Ensure that the DNS TTL value of related entries (A, AAAA, CNAME) is set to a low value, such as 1 minute.
Receiver internals¶
Architecture¶
Commlink¶
MessagePack used for log/event delivery both in instant and syncback delivery.
The CommLink is also used for a bi-directionaly delivery of JSON messages, such as collector metrics that are shipped into a InfluxDB at the central cluster.
Archive structure¶
- Stream: identified by Stream Name
- Partition: identified by Partition Number
- Row: identified by Row Number respectively by Row Id
- Column: identified by the name.
- Row: identified by Row Number respectively by Row Id
- Partition: identified by Partition Number
The stream is an infinite data structure divided into partitions. The partition is created and populated through the ingesting of rows. Once the partition reaches a specific size or age, it is rotated, which means the partition is closed, and a new ingesting partition is created. Once the partition is closed, it cannot be reopened and remains read-only for the rest of its lifecycle.
The partition contains rows that are further divided into columns. Column structure is fixed in one partition but can be different in other partitions in the same stream.
Archive filesystem¶
The stream respective partition data are stored on the filesystem in the directory that is specified by the lifecycle phase.
Content of the /data/ssd/receiver
(hot phase)
+ received.mytenant.udp-8889
+ aaaaaa.part
+ summary.yaml
+ signing-cert.der
+ col-raw.data
+ col-raw.pos
+ col-raw.sig
+ col-collected_at.data
+ col-received_at.data
+ col-source.data
+ col-source-token.data
+ col-source-token.pos
+ aaaaab.part
+ summary.yaml
+ ...
+ aaaaac.part
+ summary.yaml
+ ...
+ received.mytenant.tcp-7781
...
The partition directory aaaaaa.part
contains the whole data content of the partition.
The structure is the same for every phase of the lifecycle.
The file summary.yaml
contains non-canonical information about the partition.
The canonical version of the information is in the Zookeeper.
The file signing-cert.der
contains the SSL certificate for verification of col-*.sig
digital signatures.
The signing certificate is unique for the partition.
col-*.data
files contain the data for the given field.
Partition Number / part_no
¶
Unsigned 24bit integer, range 0..16,777,216, max. 0xFF_FF_FF.
The partition number is provided by the shared counter in Zookeeper, located at /lmio/receiver/db/part.counter
.
It means each partition has a unique number regardless of which stream it belongs to.
Reasoning
10[years] * 365[days-in-year] * 24[hours-in-day] * 10[safety] = 876000 partitions (5% of the max.)
The partition number is frequently displayed as a string of 6 characters, such as aaaaaa
.
It is Base-16 encoded version of this integer, using abcdefghijklmnop
characters.
Row Number / row_no
¶
Position within the partition.
Unsigned 40bit unsigned integer 0xFF_FF_FF_FF_FF (range 0..1,099,511,627,775)
Reasoning
1.000.000 EPS per 24 hours * 10[safety] = 860,400,000,000
Row Id / row_id
¶
Global 64bit unique identifier of the row within the stream.
Because the row_id
is composed by the part_no
, it guarantees its global uniqueness, not just within a single stream, but across all streams.
Calculation
row_id = (part_no << 24) | row_no
row_id:
+------------------------+--------------------------+
| part_no | row_no |
+------------------------+--------------------------+
Column type string
¶
The column type string uses two types of files:
-
col-<columne name>.data
: The raw byte data of the entries in the column. Each entry is sequentially added to the end of the file, hence making it append-only. As such, entries are stored without any explicit delimiters or record markers. Instead, the location of each entry is tracked in a companioncol-<columne name>.pos
file. -
col-<columne name>.pos
: The starting byte positions of each entry in the correspondingcol-<columne name>.data
file. Each position is stored as a 32-bit unsigned integer. The positions are stored sequentially in the order the entries are added to thecol-<columne name>.data
. The Nth integer in thecol-<columne name>.pos
file indicates the starting position of the Nth entry in thecol-<columne name>.data
file. The length of the Nth entry is a difference betwenn Nth integer and (N+1)th integer in thecol-<columne name>.pos
file.
Sequential data access¶
Sequential access of data involves reading data in the order it is stored in the column files.
Below are the steps to sequentially access data:
-
Open both the
col-<column name>.pos
andcol-<column name>.data
files. Read the first 32-bit integer fromcol-<column name>.pos
. Initialize a current position with a value of 0. -
Read the next 32-bit integer, referred to here as the position value, from
col-<column name>.pos
. Calculate the length of the data entry by subtracting the current position from the position value. After that, update the current position to the newly read position value. -
Read the data from
col-<column name>.data
using the length calculated in step 2. This length of data corresponds to the actual content of the database row. -
Repeat steps 2 and 3 to read subsequent rows. Continue this process until you reach the end of the
col-<column name>.pos
file, which would indicate that all data rows have been read.
Random data access¶
To access the Nth row in a column, you would follow these steps:
-
Seek the
col-<column name>.pos
file to the (N-1)th position or to 0 if N == 0. Each position corresponds to a 32-bit integer, so the Nth position corresponds to the Nth 32-bit integer. For example, to get to the 6th row, you would seek to the 5th integer (20 bytes from the start, because each integer is 4 bytes). -
Read one or two 32-bit integers from the
col-<column name>.pos
file. If N == 0, read only one integer, the position is assumed to be 0. The first integer indicates the start position of the desired entry in thecol-<column name>.data
file. For N > 0, read two integers. The second integer indicates the start position of the next entry. The difference between the second and first integers gives the length of the desired entry. -
Seek to the position in the
col-<column name>.data
file pointed to by the first integer read in the previous step. -
Read the entry from the
col-<column name>.data
file using the calculated length from step 2.
Column type timestamp
¶
The column type timestamp uses one type of file: col-<column name>.data
.
Each entry in this column is a 64-bit Unix timestamp representing a date and time in microsecond precision.
Info
The Unix timestamp is a way to track time as a running total of seconds that have elapsed since 1970-01-01 00:00:00 UTC, not counting leap seconds.
The microsecond precision allows tracking time even more accurately.
The timestamp column summarizes the minumum and maximum timestamp in each partition.
The summary is is in the Zookeeper and in the summary.yaml
file on the filesystem.
Sequential Data Access¶
Sequential access to a timestamp column involves reading each timestamp in the order they're stored:
- Open the
col-<column name>.data
file. - Read a 64-bit integer from the
col-<column name>.data
file. This integer is your Unix timestamp in microseconds. - Repeat step 2 until you reach the end of the file, which means you've read all timestamps.
- Close the file.
Random Data Access¶
To access a timestamp at a specific row (Nth position):
- Seek to the Nth position in the
col-<column name>.data
file. As each timestamp is a 64-bit (or 8-byte) integer, to get to the Nth timestamp, you would seek to the N * 8th byte. - Read a 64-bit integer from the
col-<column name>.data
file. This is your Unix timestamp in microseconds for the Nth row.
Column type token
¶
A column of the type token is designed to store string data in an optimized format. It is particularly suited for scenarios where a column contains a relatively small set of distinct, repetitive values.
Instead of directly storing the actual string data, the token column type encodes each string into an integer. This encoding process is accomplished via an index that is constructed based on all unique string values within the column partition. Each unique string is assigned a unique integer identifier in this index, and these identifiers replace the original strings in the token column.
The index itself is represented by the position of the string in a pair of associated files: col-<column name>-token.data
and col-<column name>-token.pos
.
See string column type for more details.
This approach provides significant storage space savings and boosts query efficiency for columns with a limited set of frequently repeated values. Moreover, it allows faster comparison operations as integer comparisons are typically quicker than string comparisons.
Danger
Please note that this approach may not yield benefits if the number of unique string values is large or if the string values are mostly unique. The overhead of maintaining the index could outweigh the storage and performance advantages of using compact integer storage.
The column type token uses three types of files:
col-<column name>.data
: The index of the column values, each represented as a 16-bit unsigned integer.col-<column name>-token.data
&col-<column name>-token.pos
: The index, using the same structure as the string column type. The position of a string in these files represents the encoded value of the string in the token column.
Sequential Data Access¶
Sequential access of a token column involves reading each token in the order they are stored.
This is accomplished using the indices stored in the col-<column name>.data
file and translating them into the actual string values using the col-<column name>-token.data
and col-<column name>-token.pos
files.
Here are the steps to sequentially access data:
- Open the
col-<column name>.data
file. This file contains 16-bit unsigned integers that serve as indices into the token string list. - Read a 16-bit unsigned integer from the
col-<column name>.data file
. This is the index of your token string. - Apply the "Random data access" from string column type on
ol-<column name>-token.data
andcol-<column name>-token.pos
files to fetch the string value. - Repeat steps 2 and 3 until you reach the end of the col-
.data file, which indicates that all tokens have been read. - Close all files.
Random Data Access¶
Random access in a token column allows you to retrieve any entry without needing to traverse the previous entries. This can be particularly beneficial in scenarios where you only need to fetch certain specific entries and not the entire data set.
To access a token at a specific row (Nth position):
- Seek to the Nth position in the
col-<column name>.data
file. As each index entry is a 16-bit (or 2-byte) integer, to get to the Nth entry, you would seek to the N * 2nd byte. - Read a 16-bit integer from the
col-<column name>.data
file. This is your index entry for the Nth row. - Apply the "Random data access" from string column type on
col-<column name>-token.data
andcol-<column name>-token.pos
files to fetch the string value.
Column type token:rle
¶
A column of the type token:rle extends the token column type by adding Run-Length Encoding (RLE) to further optimize storage. This type is particularly suitable for columns that have many sequences of repeated values.
Like the token type, token:rle encodes string values into integer tokens. However, instead of storing each of these integer tokens separately, it utilizes Run-Length Encoding to compress sequences of repeated tokens into a pair of values: the token itself and the number of times it repeats consecutively.
The RLE compression is applied to the col-<column name>.data
file, turning a sequence of identical token indices into a pair: (token index, repeat count).
This approach provides significant storage savings for columns where values repeat in long sequences. It also allows faster data access and query execution by reducing the amount of data to be read from disk.
Danger
Keep in mind that this approach will not yield benefits if the data doesn't contain long sequences of repeated values. In fact, it may lead to increased storage usage and slower query execution as the overhead of maintaining and processing RLE pairs might outweigh the compression benefits.
The token:rle column type uses the same three types of files as the token type:
-
col-<column name>.data
: RLE compressed indices of the column values, each entry as a pair: (16-bit unsigned integer token index, 16-bit unsigned integer repeat count). -
col-<column name>-token.data
&col-<column name>-token.pos
: Te index, using the same structure as the string column type. The position of a string in these files represents the encoded value of the string in the token column.
Sequential data access¶
Sequential access of a token:rle column involves reading each RLE-compressed token pair in the order they are stored.
This is accomplished using the indices stored in the col-<column name>.data
file and translating them into actual string values using col-<column name>-token.data
and col-<column name>-token.pos
files.
Here are the steps to sequentially access data:
- Open the
col-<column name>.data
file. This file contains pairs of 16-bit unsigned integers representing the token index and the run length. - Read a pair of 16-bit unsigned integers from the
col-<column name>.data
file. The first integer is the index of your token string, and the second integer indicates how many times this token repeats consecutively (run length). - Use the token index to locate the string in the
col-<column name>-token.data
andcol-<column name>-token.pos
files, following the process described for the string column type (Random Data Access). - Repeat the value from step 3 as many times as indicated by the run length.
- Repeat steps 2 to 4 until you reach the end of the
col-<column name>.data
file, which indicates that all tokens have been read. - Close all files.
Random data access¶
To access a token at a specific row (Nth position):
- Open the
col-<column name>.data
file. This file contains pairs of 16-bit unsigned integers that serve as indices into the token string list and their corresponding run lengths. - Traverse the
col-<column name>.data
file pair by pair, summing the run lengths until the sum equals or exceeds N. The pair at which this occurs corresponds to the token that you are looking for. - Read the 16-bit integer token index from this pair.
- Apply the "Random data access" from string column type on
col-<column name>-token.data
andcol-<column name>-token.pos
files to fetch the string value.
Ended: Receiver
Parsec ↵
LogMan.io Parsec¶
TeskaLabs LogMan.io Parsec is a microservice responsible for parsing logs from different Kafka topics. LogMan.io Parsec puts logs into a single EVENTS
Kafka topic if parsing succeeds, and into an OTHERS
Kafka topic if parsing fails.
Parsing is the process of analyzing the original log (which is typically in single/multiple-line string, JSON, or XML format) and transforming it into a list of key-value pairs that describe the log data (such as when the original event happened, the priority and severity of the log, information about the process that created the log, etc).
Note
LogMan.io Parsec replaces LogMan.io Parser.
A simple parsing example
Parsing takes a raw log, such as this:
<30>2023:12:04-15:33:59 hostname3 ulogd[1620]: id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth2.3009" outitf="eth6" srcmac="e0:63:da:73:bb:3e" dstmac="7c:5a:1c:4c:da:0a" srcip="172.60.91.60" dstip="192.168.99.121" proto="17" length="168" tos="0x00" prec="0x00" ttl="63" srcport="47100" dstport="12017"
@timestamp: 2023-12-04 15:33:59.033
destination.ip: 192.168.99.121
destination.mac: 7c:5a:1c:4c:da:0a
destination.port: 12017
device.model.identifier: SG230
dns.answers.ttl 63
event.action: Packet dropped
event.created: 2023-12-04 15:33:59.033
event.dataset: sophos
event.id: 2001
event.ingested: 2023-12-04 15:39:10.039
event.original: <30>2023:12:04-15:33:59 hostname3 ulogd[1620]: id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth2.3009" outitf="eth6" srcmac="e0:63:da:73:bb:3e" dstmac="7c:5a:1c:4c:da:0a" srcip="172.60.91.60" dstip="192.168.99.121" proto="17" length="168" tos="0x00" prec="0x00" ttl="63" srcport="47100" dstport="12017"
host.hostname: hostname3
lmio.event.source.id: hostname3
lmio.parsing: parsec
lmio.source: mirage
log.syslog.facility.code: 3
log.syslog.facility.name: daemon
log.syslog.priority: 30
log.syslog.severity.code: 6
log.syslog.severity.name: information
message id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth2.3009" outitf="eth6" srcmac="e0:63:da:73:bb:3e" dstmac="7c:5a:1c:4c:da:0a" srcip="172.60.91.60" dstip="192.168.99.121" proto="17" length="168" tos="0x00" prec="0x00" ttl="63" srcport="47100" dstport="12017"
observer.egress.interface.name: eth6
observer.ingress.interface.name: eth2.3009
process.name: ulogd
process.pid: 1620
sophos.action: drop
sophos.fw.rule.id: 60002
sophos.prec: 0x00
sophos.protocol: 17
sophos.sub: packetfilter
sophos.sys: SecureNet
sophos.tos: 0x00
source.bytes: 168
source.ip: 172.60.91.60
source.mac: e0:63:da:73:bb:3e
source.port: 47100
tags: lmio-parsec:v23.47
tenant: default
_id: e1a92529bab1f20e43ac8d6caf90aff49c782b3d6585e6f63ea7c9346c85a6f7
_prev_id: 10cc320c9796d024e8a6c7e90fd3ccaf31c661cf893b6633cb2868774c743e69
_s: DKNA
LogMan.io Parsec Configuration¶
LogMan.io Parsec dependencies:
- Apache Kafka: The source of input unparsed events and the destination of parsed events.
- Apache Zookeeper: The library content, mainly parsing rules but also other shared cluster information.
Minimal configuration with event lane¶
LogMan.io Parsec can be configured both with or without event lane. We recommend to use the first option.
When event lane is used, LogMan.io Parsec reads Kafka topics, path for parsing rules and optionally charset, schema and timezone from it.
This is the minimal configuration for LogMan.io Parsec with event lane:
[tenant]
name=<tenant> # (1)
[eventlane]
name=/EventLanes/<tenant>/<eventlane>.yaml #(2)
[library]
providers=
zk:///library
...
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092 # (3)
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 # (4)
- Name of the tenant under which the service is running.
- Name of the event lane used for Kafka topics, path for parsing rules and optionally charset, schema and timezone.
- Addresses of Kafka servers in the cluster
- Addresses of Zookeeper servers in the cluster
Apache Zookeeper¶
Every LogMan.io microservice should advertise itself into Zookeeper.
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Library¶
The library configuration specifies from where the Parsec declarations (definitions) are loaded.
The library can consist of one or multiple providers, typically Zookeeper or git repositories.
[library]
providers=
zk:///library
# other library layers can be included
Note
The order of layers is important. Higher layers overwrite the layers beneath them. If one file is present in multiple layers, only the one included in the highest layer is loaded.
Apache Kafka¶
The connection to Apache Kafka has to be configured so that events can be received from and sent to Apache Kafka:
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
Without this configuration, connection to Apache Kafka can't be properly established.
Minimal configuration without event lane¶
When event lane is NOT used, parsing rules, timezone and schema must be included in the configuration.
This is the configuration required for LogMan.io Parsec when event lane is not used:
[pipeline:ParsecPipeline:KafkaSource]
topic=received.<tenant>.<stream> # (1)
[pipeline:ParsecPipeline:KafkaSink]
topic=events.<tenant>.<stream> # (2)
[pipeline:ErrorPipeline:KafkaSink]
topic=others.<tenant> # (3)
[tenant]
name=<tenant> # (5)
schema=/Schemas/ECS.yaml # (6)
[parser]
name=/Parsers/<parsing rule> # (4)
[library]
providers=
zk:///library
...
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092 # (7)
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 # (8)
- Name of the received topic from which events are consumed.
- Name of the events topic to which successfully parsed events committed.
- Name of the others topic to which unsuccessfully parsed events committed.
- Specify the parsing rule to apply.
- Name of the tenant under which this instance of Parsec is running.
- Schema should be stored in
/Schemas/
folder in Library. - Addresses of Kafka servers in the cluster
- Addresses of Zookeeper servers in the cluster
Parsing rule¶
Each parsec must know what parsing rule to apply.
[parser]
name=/Parsers/<parsing rule>
The name of the parser specifies the path from which the parsing rule declarations are loaded.
It MUST BE stored in /Parsers/
directory.
Parsing rules are YAML files.
The standard path format is <vendor>/<type>
, e.g. Microsoft/IIS
or Oracle/Listener
, but in case only one technology is used, only the name of the provider can be used, e.g. Zabbix
or Devolutions
.
Event lane configuration¶
This section optionally specifies significant attributes of the parsed events.
[eventlane]
timezone=Europe/Prague
charset=iso8859_2
timezone
: If the log source produces logs in the specific timezone, different from the tenant default timezone, it has to be specified here.
The name of the timezone must be compliant with IANA Time Zone Database. Internally, all timestamps are converted into UTC.
charset
: If the log source produces logs in the charset (or encoding) different from UTF-8, the charset must be specified here.
The list of supported charset is here.
Internally, every text is encoded in UTF-8.
Kafka topics¶
The specification of the topic from which original logs come and the topics where successfully parsed and unsuccessfully parsed logs are sent.
The recommended way of choosing topics is to create one 'received' and one 'events' topic for each event lane, one 'others' topic for each tenant.
[pipeline:ParsecPipeline:KafkaSource]
topic=received.<tenant>.<stream>
[pipeline:ParsecPipeline:KafkaSink]
topic=events.<tenant>.<stream>
[pipeline:ErrorPipeline:KafkaSink]
topic=events.<tenant>
Warning
The pipeline name ParsecPipeline
was introduced in Parsec version v23.37
. The name KafkaParserPipeline
used in previous versions is deprecated. End of service life is 30 January, 2024.
Kafka Consumer Group¶
LogMan.io Parsec is often running in multiple instances in cluster. The set of instances which consume from the same received topic is called Consumer group. This group is identified by a unique group.id. Each event is being consumed by one and only one members of the group.
LogMan.io Parsec creates group.id
automatically as follows:
- When event lane is used,
group.id
has the formlmio-parsec-<tenant>-<eventlane>
. - When event lane is not used,
group.id
has the formlmio-parsec-<tenant>-<parser name>
. group.id
can be overwritten inParsecPipeline
configuration as follows:
[pipeline:ParsecPipeline:KafkaSource]
group_id=lmio-parsec-<stream>
Warning
By changing group.id, a new consumer group will be created and begin to read events from the start. (This depends on auto.offset.reset
parameter of Kafka cluster, which is by default earliest
.)
Metrics¶
Parsec produces own telemetry for monitoring and also forwards the telemetry from collectors to the configured telemetry data storage, such as InfluxDB. Read more about metrics.
Include in configuration:
[asab:metrics]
...
Event Lanes¶
Relation to LogMan.io Parsec¶
TeskaLabs LogMan.io Parsec reads important part of its configuration from event lane. This configuration covers:
- Kafka topics from which events are taken and to which topics parsed and error events are sent
- parsing rules (declarations)
- (optionally) timezone, charset and schema
group.id
for consuming fromreceived
topic
Therefore, each instance of LogMan.io Parsec runs under exactly one eventlane (under exactly one tenant).
Note
Reading the configuration from event lanes was introduced in version v24.14
.
Declaration¶
This is the minimal required event lane definition, located in the /EventLanes/<tenant>
directory in the Library:
---
define:
type: lmio/event-lane
parsec:
name: /Parsers/path/to/parser # (1)
kafka:
received:
topic: received.tenant.stream
events:
topic: events.tenant.stream
others:
topic: others.tenant
- Path for the parsing rule. It must start with
/Parsers
. The standard path format is<vendor>/<type>
, e.g.Microsoft/IIS
orOracle/Listener
, but in case only one technology is used, only the name of the provider can be used, e.g.Zabbix
orDevolutions
.
When Parsec is started and the event lane is loaded, two pipelines are created:
ParsecPipeline
between received and events topicErrorPipeline
targeting to others topic
group.id
used for consuming from received topic has the form: lmio-parsec-<tenant>-<eventlane>
Timezone, schema, charset¶
Timezone, schema and charset are read from the tenant configuration by default, but these properties can be overwritten in event lane:
---
define:
type: lmio/event-lane
timezone: UTC
charset: utf-16
schema: /Schemas/CEF.yaml
timezone
: If the log source produces logs in the specific timezone, different from the tenant default timezone, it has to be specified here.
The name of the timezone must be compliant with IANA Time Zone Database. Internally, all timestamps are converted into UTC.
charset
: If the log source produces logs in the charset (or encoding) different from UTF-8, the charset must be specified here.
The list of supported charset is here.
Internally, every text is encoded in UTF-8.
Output of parsing¶
When you use LogMan.io Parsec to analyze logs, the result of this process is what we refer to as the "parsed event." This output is an essential aspect of log management, as it transforms raw log data into a structured format that is easier to understand, analyze, and act upon.
A parsed event is not just any collection of data; it is a meticulously structured output that presents the information in a flat list format. This means that each piece of information from the original log is extracted and presented as key-value pairs. These pairs are straightforward, making it easy to identify what each piece of data represents.
Key-Value Pairs¶
-
Key: This is a unique identifier that describes the type of information contained in the value. Keys are predefined labels that represent specific aspects of the log data, such as time stamps, error codes, user IDs, and so on. Keys are defined by the schema.
-
Value: This is the actual data or information associated with the key. Values can vary widely, from numerical codes and timestamps to textual descriptions or user inputs. The type of the value is defined in the schema.
The output event is typically serialized as JSON object.
Example of parsed event
{
"@timestamp": 12345678901,
"event.created": 12345678902,
"event.ingested": 12345678903,
"event.original": "<1> 2023-03-01 myhost myapp: Hello world!",
"key1": "value1",
"key2": "value2",
}
Common fields of parsed events¶
Warning
This chapter uses ECS schema!
From the parsec (implicit):
@timestamp
: If timestamp is not parsed, this field is automatically created with the time of parsing.
From the collector:
event.original
: The original event in its raw format.event.created
: The time when the event was collected by a LogMan.io Collector.lmio.source
: The name of the log source (created by LogMan.io Collector).
From the receiver:
event.ingested
: The time when the event was ingested to LogMan.io Receiver.tenant
: The name of the LogMan.io tenant in which the Parsec processing that event, as specified in configuration._id
: Unique identifier of the event.
Tags¶
Roadmap
There will be an option to add arbitrary tags to the event which will enable custom filtering.
At this time, the only tag that is automatically added to the tags
field is the version of the LogMan.io Parsec.
Error events¶
When parsing fails or an unexpected error occurs, the event is sent to others event lane (ErrorPipeline
),
where it is enriched with the information about when and why it happened.
Every error event contains:
@timestamp
: The time when the event was processed with failure in UNIX timestamp (number of seconds from epoch).event.original
: The original event in its raw format.error.message
: The error message.error.stack_trace
: The data about where in the code the exception happened.event.dataset
: The name of the dataset specified in mapping or the path for the parser in the Library.event.created
: The time when the event was created in LogMan.io Collector.event.ingested
: The time when the event was ingested to LogMan.io Receiver.tenant
: The name of the tenant this event was aimed for.
Parsing rules ↵
Declarations¶
Declarations describe how the event should be parsed. They are stored as YAML files in the Library. LogMan.io Parsec interprets these declarations and creates parsing processors.
There are three types of declarations:
- Parser declaration: A parser takes an original event or a specific field of a partially parsed event as input, analyzes its individual parts, and stores them as key-value pairs to the event.
- Mapping declaration: Mapping takes a partially parsed event as input, renames the field names, and eventually converts the data types. It works together with a schema (ECS, CEF).
- Enricher declaration: An enricher supplements a partially parsed event with extra data.
Data flow¶
A typical, recommended parsing sequence is a chain of declarations:
- The first main parser declaration begins the chain, and additional parsers (called sub-parsers) extract more detailed data from the fields created by the previous parser.
- Then, the (single) mapping declaration renames the keys of the parsed fields according to a schema and filters out fields that are not needed.
- Last, the enricher declaration supplements the event with additional data. While it's possible to use multiple enricher files, it's recommended to use just one.
Important: Naming conventions
LogMan.io Parsec loads declarations alphabetically and creates the corresponding processors in the same order. Therefore, create the list of declaration files according to these rules:
-
Begin all declaration file names with a numbered prefix:
10_parser.yaml
,20_parser_message.yaml
, ...,90_enricher.yaml
.It is recommended to "leave some space" in your numbering for future declarations in case you want to add a new declaration between two existing ones (e.g.,
25_new_parser.yaml
). -
Include the type of declaration in file names:
20_parser_message.yaml
rather than10_message.yaml
. - Include the type of schema used in mapping file names:
40_mapping_ECS.yaml
rather than40_mapping.yaml
.
Example:
/Parsers/MyParser/:
- 10_parser.yaml
- 20_parser_username.yaml
- 30_parser_message.yaml
- 40_mapping_ECS.yaml
- 50_enricher_lookup.yaml
- 60_enricher.yaml
Parsers¶
A parser declaration takes an original event or a specific field of a partially parsed event as input, analyzes its individual parts, and stores them as key-value pairs to the event.
LogMan.io Parsec currently supports three types of parser declarations:
- JSON parser
- Windows Event parser
- Parsec parser
Declaration structure¶
In order to determine the type of the declaration, you need to specify a define
section.
define:
type: <declaration_type>
For a parser declaration, specify the type
as parser
.
JSON parser¶
JSON parser is used for parsing events (or events parts) with a JSON structure.
define:
name: JSON parser
type: parser/json
field: <custom_field>
target: <custom_target>
When field
is specified, parsing is applied to that field; by default, it is applied to the original event.
When target
is specified, the parsed object is stored in the designated target field; by default, it is stored with json
key. The custom target field must adhere to the regular expression json[0-9a-zA-Z]
(beginning with "json" followed by any alphanumeric character).
Example
-
The following original event with a JSON structure is parsed by the JSON parser with the default settings:
{ "key": { "foo": 1, "bar": 2 } }
10_parser.yamldefine: name: JSON parser type: parser/json
The result event will be:
{ "json": <JSON object>, }
-
The following event includes JSON part inside, so JSON parser can be applied on this preparsed field and the result will be stored in the custom
jsonMessage
field:<14>1 2023-05-03 15:06:12 {"key": {"foo": 1, "bar": 2}}
20_parser_message.yamldefine: name: JSON parser type: parser/json field: message target: jsonMessage
The result event will be:
```json { "log.syslog.priority": 14, "@timestamp": 140994182325993472, "message": "{"key": {"foo": 1, "bar": 2}}", "jsonMessage": <JSON object> } ```
Windows Event parser¶
Windows Events parser is used for parsing events that are produced from Microsoft Windows. These events are in XML format.
define:
name: Windows Events Parser
type: parser/windows-event
This is a complete Windows Event parser and will parse events from Microsoft Windows, separating the fields into key-value pairs.
Parsec parser¶
A Parsec parser is used for parsing events in plain string format. It is based on SP-Lang Parsec expressions.
For parsing original events, use the following declaration:
define:
name: My Parser
type: parser/parsec
parse:
!PARSE.KVLIST
- ...
- ...
- ...
define:
name: My Parser
type: parser/parsec
field: <custom_field>
parse:
!PARSE.KVLIST
- ...
- ...
- ...
When field
is specified, parsing is applied on that field, otherwise it is applied on the original event. Therefore, it must be present in every sub-parser.
Types of field
specification:
-
field: <custom_field>
- regular field pre-parsed by the previous parser. -
field: json /key/foo
- JSON key/key/foo
from pre-parsed JSON objectjson
. Name of the JSON object and JSON key must be separated by space. JSON key is always starts with/
, and every next level is separated by/
. -
JSON key with specified type, by default we assume
string
type.
field:
json: /key/foo
type: int
Examples of Parsec parser declarations¶
Example 1: Simple example
For the purpose of the example, let's say that we want to parse a collection of simple events:
Hello Miroslav from Prague!
Hi Kristýna from Pilsen.
{
"name": "Miroslav",
"city": "Prague"
}
{
"name": "Kristýna",
"city": "Pilsen"
}
define:
type: parser/parsec
parse:
!PARSE.KVLIST
- !PARSE.UNTIL " "
- name: !PARSE.UNTIL " "
- !PARSE.EXACTLY "from "
- city: !PARSE.LETTERS
Example 2: More complex example
For the purpose of this example, let's say that we want to parse a collection of simple events:
Process cleaning[123] finished with code 0.
Process log-rotation finished with code 1.
Process cleaning[657] started.
And we want the output in the following format:
{
"process.name": "cleaning",
"process.pid": 123,
"event.action": "process-finished",
"return.code": 0
}
{
"process.name": "log-rotation",
"event.action": "process-finished",
"return.code": 1
}
{
"process.name": "cleaning",
"process.pid": 657,
"event.action": "process-started",
}
Declaration will be the following:
define:
type: parser/parsec
parse:
!PARSE.KVLIST
- !PARSE.UNTIL " "
- !TRY
- !PARSE.KVLIST
- process.name: !PARSE.UNTIL "["
- process.pid: !PARSE.UNTIL "]"
- !PARSE.SPACE
- !PARSE.KVLIST
- process.name: !PARSE.UNTIL " "
- !TRY
- !PARSE.KVLIST
- !PARSE.EXACTLY "started."
- event.action: "process-started"
- !PARSE.KVLIST
- !PARSE.EXACTLY "finished with code "
- event.action: "process-finished"
- return.code: !PARSE.DIGITS
Example 3: Parsing syslog events
For the purpose of the example, let's say that we want to parse a simple event in syslog format:
<189> Sep 22 10:31:39 server-abc server-check[1234]: User "harry potter" logged in from 198.20.65.68
We would like the output in the following format:
{
"PRI": 189,
"timestamp": 1695421899,
"server": "server-abc",
"process.name": "server-check",
"process.pid": 1234,
"user": "harry potter",
"action": "log-in",
"ip": "198.20.65.68"
}
We will create two parsers. First parser will parse the syslog header and the second will parse the message.
define:
name: Syslog parser
type: parser/parsec
parse:
!PARSE.KVLIST
- !PARSE.EXACTLY "<"
- PRI: !PARSE.DIGITS
- !PARSE.EXACTLY ">"
- timestamp: ...
- server: !PARSE.UNTIL " "
- process.name: !PARSE.UNTIL "["
- process.pid: !PARSE.UNTIL "]"
- !PARSE.EXACTLY ":"
- message: !PARSE.CHARS
This parser
define:
type: parser/parsec
field: message
drop: yes
parse:
!PARSE.KVLIST
- !PARSE.UNTIL " "
- user: !PARSE.BETWEEN { what: '"' }
- !PARSE.EXACTLY " "
- !PARSE.UNTIL " "
- !PARSE.UNTIL " "
- !PARSE.UNTIL " "
- ip: !PARSE.CHARS
Example 4: Parsing JSON events
For the purpose of the example, let's say that we want to parse a JSON event:
{
"data": {
"action": "allow",
"backendStatusCode": "200",
"clientAddr": "89.183.114.162",
"countryCode": "cz",
"host": "www.praha.cz",
"response": {
"backendTime": "0.043",
"code": "200",
},
},
"time": "2024-03-03T01:15:03.480Z",
}
We will create two parsers. First parser will parse the original event and store it as JSON object in json
field.
define:
type: parser/json
{
"json": <JSON object>
}
Next parser will parse /time
field from the JSON object.
define:
type: parser/parsec
field: json /time
parse:
!PARSE.KVLIST
- "@timestamp": !PARSE.DATETIME
- year: !PARSE.DIGITS
- '-'
- month: !PARSE.MONTH "number"
- '-'
- day: !PARSE.DIGITS
- 'T'
- hour: !PARSE.DIGITS
- ':'
- minute: !PARSE.DIGITS
- ':'
- second: !PARSE.DIGITS
- microsecond: !PARSE.FRAC
base: "micro"
Result event will be:
{
"json": <JSON object>,
"@timestamp": 140994182325993472,
}
Mapping¶
After all declared fields are obtained from parsers, the fields typically have to be renamed according to some schema (ECS, CEF) in a process called mapping.
Why is mapping necessary?
To store event data in Elasticsearch, it's essential that the field names in the logs align with the Elastic Common Schema (ECS), a standardized, open-source collection of field names that are compatible with Elasticsearch. The mapping process renames the fields of the parsed logs according to this schema. Mapping ensures that logs from various sources have unified, consistent field names, which enables Elasticsearch to interpret them accurately.
Important
By default, mapping works as a filter. Make sure to include all fields you want in the parsed output in the mapping declaration. Any field not specified in mapping will be removed from the event.
Writing a mapping declaration¶
Write mapping delcarations in YAML. (Mapping declarations do not use SP-Lang expressions.)
define:
type: parser/mapping
schema: /Schemas/ECS.yaml
mapping:
<original_key>: <new_key>
<original_key>: <new_key>
...
Specify parser/mapping
as the type
in the define
section. In the schema
field, specify the filepath to the schema you're using. If you use Elasticsearch, use the Elastic Common Schema (ECS).
To rename the key and change the data type of the value:
Warning
New data type must respond to the data type specified in the schema for this field.
By specifying type: auto
, the data type will be automatically determined from the schema based of field name.
mapping:
<original_key>:
field: <new_key>
type: <new_type>
Find available data types here.
To rename the key without changing the data type of the value:
mapping:
<original_key>: <new_key>
To rename the key stored in JSON object:
mapping:
<jsonObject> </jsonKey>: <new_key>
/
, and every next level is separated by /
. Same as before it is possible to change data type by specifying type
field.
Example¶
Example
For the purpose of the example, let's say that we want to parse a simple event in JSON format:
{
"act": "user login",
"ip": "178.2.1.20",
"usr": "harry_potter",
"id": "6514-abb6-a5f2"
}
and we would like the final output look like this:
{
"event.action": "user login",
"source.ip": "178.2.1.20",
"user.name": "harry_potter"
}
Notice that the key names in the original event differ from the key names in the desired output.
For the initial parser declaration in this case, we can use a simple JSON parser:
define:
type: parser/json
This parser will create a JSON object and will store it in json
field.
To change the names of individual fields, we create this mapping delcaration file, 20_mapping_ECS.yaml
, in which we describe what fields to map and how:
---
define:
type: parser/mapping # determine the type of declaration
schema: /Schemas/ECS.yaml # which schema is applied
mapping:
json /act: 'event.action'
json /ip: 'source.ip'
field: 'source.ip'
type: auto
json /usr: 'user.name'
This declaration will produce the desired output. Data type for the source.ip
field will be determined automatically based on the schema and changed accordingly.
Enrichers¶
Enrichers supplement the parsed event with extra data.
An enricher can:
- Create a new field in the event.
- Transform a field's values in some way (changing a letter case, performing a calculation, etc).
Enrichers are most commonly used to:
- Specify the dataset where the logs will be stored in ElasticSearch (add the field
event.dataset
). - Obtain facility and severity from the syslog priority field.
define:
type: parsec/enricher
enrich:
event.dataset: <dataset_name>
new.field: <expression>
...
- Write enrichers in YAML.
- Specify
parsec/enricher
in thedefine
field.
Example
The following example is enricher used for events in syslog format. Suppose you have parser for the events of the form:
<14>1 2023-05-03 15:06:12 server pid: Username 'HarryPotter' logged in.
{
"log.syslog.priority": 14,
"user.name": "HarryPotter"
}
You want to obtain syslog severity and facility, which are computed in the standard way:
(facility * 8) + severity = priority
You would also like to lower the name HarryPotter
to harrypotter
in order to unify the users across various log sources.
Therefore, you create an enricher:
define:
type: parsec/enricher
enrich:
event.dataset: 'dataset_name'
user.id: !LOWER { what: !GET {from: !ARG EVENT, what: user.name} }
# facility and severity are computed from 'syslog.pri' in the standard way
log.syslog.facility.code: !SHR
what: !GET { from: !ARG EVENT, what: log.syslog.priority }
by: 3
log.syslog.severity.code: !AND [ !GET {from: !ARG EVENT, what: log.syslog.priority}, 7 ]
Ended: Parsing rules
Date/time fields¶
Handling dates and times (timestamps) is crucial when parsing events.
In order for events to be displayed in the LogMan.io application, the events must contain the @timestamp
field with proper datetime and timezone.
Datetime fields, in accordance with ECS:
Field | Meaning |
---|---|
@timestamp |
The time when the original event occurred. Must be included in declarations. |
event.created |
The time when the original event was collected by LogMan.io Collector. |
event.ingested |
The time when the original event was received to LogMan.io Receiver. |
In normal conditions, assuming no tampering, the timestamp values should be chronological: @timestamp
< event.created
< event.ingested
.
Usefull links and tools¶
- UNIX time converter
- SP-Lang date/time format: this is the output format of all parsed timestamps produced by the Parsec.
LogMan.io Parsec Key terms¶
Important terms relevant to LogMan.io Parsec.
Event¶
A unit of data that moves through the parsing process is referred to as an event. An original event comes to LogMan.io Parsec as an input and is then parsed by the processors. If parsing succeeds, it produces a parsed event, and if parsing fails, it produces an error event.
Original event¶
An original event is the input that LogMan.io Parsec recieves - in other words, an unparsed log. It can be represented by a raw (possibly encoded) string or structure in JSON or XML format.
Parsed event¶
A parsed event is the output from successful parsing, formatted as an unordered list of key-value pairs serialized into JSON structure. A parsed event always contains a unique ID, the original event, and typically the information about when the event was created by the source and received by Apache Kafka.
Error event¶
An error event is the output from unsuccessful parsing, formatted as an unordered list of key-value pairs serialized into JSON structure. It is produced when parsing, mapping, or enrichment fails, or when another exception occurs in LogMan.io Parsec. It always contains the original event, the information about when the event was unsuccessfully parsed, and the error message describing the reason why the process of parsing failed. Despite unsuccessful parsing, the error event will always be in JSON format, key-value pairs.
Library¶
Your TeskaLabs LogMan.io Library holds all of your declaration files (as well as many other types of files). You can edit your declaration files in your Library via Zookeeper.
Declarations¶
Declarations describe how the event will be transformed. Declarations are YAML files that LogMan.io Parsec can interpret to create declarative processors. There are three types of declarations in LogMan.io Parsec: parsers, enrichers, and mappings. See Declarations for more.
Parser¶
A parser is the type of declaration that takes the original event or a specific field of a partially-parsed event as input, analyzes its individual parts, and then stores them as key-value pairs to the event.
Mapping¶
A mapping declaration is the type of declaration that takes a partially parsed event as input, renames the field names, and eventually converts the data types. It works together with a schema (ECS, CEF). It also works as a filter to leave out data that is not needed in the final parsed event.
Enricher¶
An enricher is the type of declaration that supplement a partially parsed event with additional data.
Migration to Parsec with event lanes¶
For migrating from LogMan.io Parser or LogMan.io Parsec version less than v24.14
, which does not use event lanes, use the following migration guide.
Prerequisites¶
- LogMan.io Depositor >
v24.11-beta
- LogMan.io Baseliner >
v24.11-beta
- LogMan.io Correlator >
v24.11-beta
Migration steps¶
-
Create new event lane YAML file
/EventLanes/<tenant>/<eventlane>.yaml
in the Library. -
Add the following properties to the eventlane:
/EventLanes/tenant/eventlane.yaml--- define: type: lmio/event-lane parsec: name: /Parsers/path/to/parser kafka: received: topic: received.<tenant>.<stream> events: topic: events.<tenant>.<stream> others: topic: others.<tenant> elasticsearch: events: index: lmio-<tenant>-events-<eventlane> others: index: lmio-<tenant>-others
(Replace
<tenant>
,<stream>
,<eventlane>
and/path/to/parser
with the specific values.) -
Create configuration for LogMan.io Parsec:
lmio-parsec.conf[tenant] name=<tenant> [eventlane] name=/EventLanes/<tenant>/<eventlane>.yaml [library] providers= zk:///library ... [kafka] bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092 [zookeeper] servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
-
Use either the old Consumer group or create a new one. First, open kafdrop, search for the corresponding
received
topic, see Consumers and find Group ID of the old LMIO Parser/LMIO Parsec. Decide, whether to keep the oldgroup.id
or create a new one.Warning
By creating new
group.id
, a new consumer group will be created and begin to read events from the start. (This depends onauto.offset.reset
parameter of Kafka cluster, which is by defaultearliest
.)In case you want to keep the old
group.id
, add the following section to the configuration:lmio-parsec.conf[pipeline:ParsecPipeline:KafkaSource] group_id=<your group id>
Otherwise,
group.id
will be automatically created based on the event lane name:lmio-parsec-<tenant>-<eventlane>
. -
Start the service. Ensure it is running by looking at its logs.
You should see no error logs. If so, see troubleshooting. You should also see notice logs similar to these:
NOTICE lmioparsec.app Event Lane /EventLanes/default/linux-syslog-rfc3164-10001.yaml loaded successfully. NOTICE lmioparsec.app [sd timezone="Europe/Prague" charset="utf-8" schema="/Schemas/ECS.yaml" parser="/Parsers/Linux/Common"] Configuration loaded. NOTICE lmioparsec.declaration_loader [sd parsers="3" mappings="1" enrichers="1"] Declarations loaded. NOTICE lmioparsec.parser.pipeline [sd source_topic="received.default.linux-syslog-rfc3164-10001" events_topic="events.default.linux-syslog-rfc3164-10001" others_topic="others.default" group.id="custom-group-id"] ParsecPipeline is ready.
There you should see the correct Kafka topics,
group.id
,charset
,schema
andtimezone
. -
Ensure the service is consuming from the right topic with the group id. Open once again kafdrop and find the received topic. Check whether either the new consumer group was created or whether the Combined Lag of the old group starts decreasing.
-
Check if new messages are incoming into Kafka events topic.
events topic is not created
Check for others topic. If new messages are coming there, the parsing rule is not correct. Check once again if you are using the proper parser name in event lane:
yaml "eventlane.yaml"
parsec:
name: /Parsers/path/to/parser
If so, then parsing rules are incorrect and should be changed.
-
Check if LogMan.io Depositor is running. Open events topic and check if Depositor is consuming from it (in Combined Lag).
-
Check if messages are visible in Discover screen on LogMan.io UI.
Messages are not visible at all
If you cannot find data on Discover screen, wait for some time (cca 1-2 minutes), the process might take some time. Then, check if the proper events topic exists and whether the event lane is properly configured. If so, check if messages are not incoming with incorrect timezone by setting time range to (-1 day, +1 day).
Messages are visible in incorrect timezone
If timezone or used schema is incorrect, you can overwrite it inside event lane:
```yaml title="/EventLanes/<tenant>/<eventlane>.yaml"
define:
type: lmio/event-lane
timezone: UTC
schema: /Schemas/ECS.yaml
```
Troubleshooting for LogMan.io Parsec¶
Timezone¶
Critical Logs¶
CRITICAL lmio-parsec.app Missing 'zookeeper' section in configuration.
Add the [zookeeper]
section in configuration:
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
CRITICAL lmioparsec.app Missing configuration option '[tenant] name'.
Add tenant name to the configuration:
[tenant]
name=my-tenant
CRITICAL lmioparsec.app Configuration option '[eventlane] name' must start with '/EventLanes/'.
Modify event lane name so it starts with /EventLanes
:
[eventlane]
name=/EventLanes/<tenant>/<eventlane>.yaml
CRITICAL lmioparsec.app Configuration option '[parser] name' must start with '/Parsers/'.
Make sure that path for parsing rules starts with /Parsers
.
define:
type: lmio/event-lane
parsec:
name: /Parsers/path/to/parsers
CRITICAL lmioparsec.app Cannot find file '/EventLanes/tenant/eventlane.yaml' in Library, exiting.
Make sure that the file /EventLanes/tenant/eventlane.yaml
exists and its enabled for given tenant.
CRITICAL lmioparsec.app Cannot read '/EventLanes/tenant/eventlane.yaml' declaration: <error description>, exiting.
There is a syntax error in /EventLanes/tenant/eventlane.yaml
file. Try to correct it inspired by the error description.
Error Logs¶
ERROR: lmioparsec.declaration_loader Cannot construct pipeline: no declarations found.
Make sure you have configured the proper path for parsing rules in event lane file.
Warning Logs¶
WARNING lmioparsec.declaration_loader Missing 'schema' section in /Parsers/Linux/Common/50_enricher.yaml
Every enricher rule that stands behind mapping should contain schema in define
section:
define:
type: parser/enricher
schema: /Schemas/ECS.yaml
Ended: Parsec
Parser ↵
Cascade Parser¶
Example¶
---
define:
name: Syslog RFC5424
type: parser/cascade
field_alias: field_alias.default
encoding: utf-8 # none, ascii, utf-8 ... (default: utf-8)
target: parsed # optional, specify the target of the parsed event (default: parsed)
predicate:
!AND
- !CONTAINS
what: !EVENT
substring: 'ASA'
- !INCLUDE predicate_filter
parse:
!REGEX.PARSE
what: !EVENT
regex: '^(\w{1,3}\s+\d+\s\d+:\d+:\d+)\s(?:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|([^\s]+))\s%ASA-\d+-(.*)$'
items:
- rt:
!DATETIME.PARSE
value: !ARG
format: '%b %d %H:%M:%S'
flags: Y
- dvchost
Section define
¶
This section contains the common definition and meta data.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be parser/cascade
.
Item field_alias
¶
Name of the field alias lookup to be loaded, so that alias names of event attributes can be used in the declaration alongside their canonical names.
Item encoding
¶
Encoding of the incoming event.
Item target
(optional)¶
Default target pipeline of the parsed event, unless specified differently in context
.
The options include: parsed
, lookup
, unparsed
Item description
(optional)¶
Longed, possibly multiline, human-readable description of the declaration.
Section predicate
(optional)¶
The predicate
filters incoming events using an expression.
If the expression returns True
, the event will enter parse
section.
If the expression returns False
, then the event is skipped.
Other returned values are undefined.
This section can be used to speed-up parsing by skipping lines with obviously non-relevant content.
Include of nested predicate filters¶
Predicate filters are expressions located in a dedicated file, that can be included in many different predicates as their parts.
If you want to include an external predicate filter, located either in include
or filters
folder
(this one is a global folder located at the top hierarchy of the LogMan.io library),
use !INCLUDE
statement:
!INCLUDE predicate_filter
where predicate_filter
is the name of the file plus .yaml
extension.
The content of predicate_filter.yaml
is an expression to be included, like:
---
!EQ
- !ITEM EVENT category
- "MyEventCategory"
Section parse
¶
This section specifies the actual parsing mechanism.
It expects a dictionary to be returned or None
, which means that the parsing was not successful.
Typical statements in parse
section¶
!FIRST
statement allows to specify a list of parsing declarations, which will be evaluated in the order (top-down), the first declaration which returns non-None
value stops the iteration and this value is returned.
!REGEX.PARSE
statement allows to transform the log line into a dictionary structure. It also allows to attach sub-parsers to further decompose substrings.
Output routing¶
To indicate that the parser will not
parse the event it received so far,
an attribute target
needs to be set to unparsed
within the context
.
Then, other parsers in the pipeline may receive and parse the event.
In the same way, the target can be set to different destination groups,
such as parsed
.
To set the target
in the context
, the !CONTEXT.SET
is used:
- !CONTEXT.SET
what: <... expression ...>
set:
target: unparsed
Example of use in the parser. If no regex matches the incoming event,
event is posted to unparsed
target, so other parsers in the row
may process it.
!FIRST
- !REGEX.PARSE
what: !EVENT
regex: '^(one)\s(two)\s(three)$'
items:
- one
- two
- three
- !REGEX.PARSE
what: !EVENT
regex: '^(uno)\s(duo)\s(tres)$'
items:
- one
- two
- three
# This is where the handling of partially parsed event starts
- !CONTEXT.SET
set:
target: unparsed
- !DICT
set:
unparsed: !EVENT
Complex Event Parser¶
Complex Event Parser parses incoming complex events such as lookup events
(i. e. create, update, delete of a lookup) and puts them into lmio-output
topic
in Kafka.
From there, the parsed complex events are also posted to input
topic
by LogMan.io Watcher instances, so that correlators and dispatchers
may react to the events as well.
Sample declaration¶
The sample YAML declaration for lookup events in Complex Event Parser may look as follows:
p00_json_preprocessor.yaml¶
---
define:
name: Preprocessor for JSON with tenant extraction
type: parser/preprocessor
tenant: JSON.tenant
function: lmiopar.preprocessor.JSON
p01_lookup_event_parser.yaml¶
---
define:
name: Lookup Event Parser
type: parser/cascade
predicate:
!AND
- !ISNOT
- !ITEM CONTEXT JSON.lookup_id
- !!null
- !ISNOT
- !ITEM CONTEXT JSON.action
- !!null
parse:
!DICT
set:
"@timestamp": !ITEM CONTEXT "JSON.@timestamp"
end: !ITEM CONTEXT "JSON.@timestamp"
deviceVendor: TeskaLabs
deviceProduct: LogMan.io
dvc: 172.22.0.12
dvchost: lm1
deviceEventClassId: lookup:001
name: !ITEM CONTEXT JSON.action
fname: !ITEM CONTEXT JSON.lookup_id
fileType: lookup
categoryObject: /Host/Application
categoryBehavior: /Modify/Configuration
categoryOutcome: /Success
categoryDeviceGroup: /Application
type: Base
tenant: !ITEM CONTEXT JSON.tenant
customerName: !ITEM CONTEXT JSON.tenant
The declarations should always be part of LogMan.io Library.
LogMan.io Parser Configuration¶
First it is needed to specify which library to load the declarations from, which can be either ZooKeeper or File.
Also, every running instance of the parser must know which groups to load from the libraries, see below:
# Declarations
[declarations]
library=zk://zookeeper:12181/lmio/library.lib ./data/declarations
groups=cisco-asa@syslog
include_search_path=filters/parser;filters/parser/syslog
raw_event=event.original
count=count
tenant=tenant
timestamp=end
groups
- names of groups to be used from the library separated by spaces; if the group
is located within a folder's subfolder, use slash as a separator, f. e. parsers/cisco-asa@syslog
If the library is empty or the groups are not specified,
all events, including their context items, are dumpted
into lmio-others
Kafka topic and processed by LogMan.io Dispatcher
as they were not parsed.
include_search_path
- specifies folders to search for YAML files to be later used in !INCLUDE expression
statement (such as !INCLUDE myFilterYAMLfromFiltersCommonSubfolder) in declarations, separated by ;.
By specifying asterisk *
after a slash in the path, all subdirectories will be recursively included.
!INCLUDE expression expects file name without path and without extension as input.
The behavior is similar to -I
include attribute when building C/C++ code.
raw_event
- field name of the input event log message (aka raw)
tenant
- field name of tenant/client is stored to
count
- field name the count of events is stored to, defaults to 1
timestamp
- field name of timestamp attribute
Next, it is needed to know which Kafka topics to use at the input and output, if the parsing was successful or unsuccessful. Kafka connection needs to be also configured to know which Kafka servers to connect to.
# Kafka connection
[connection:KafkaConnection]
bootstrap_servers=lm1:19092;lm2:29092;lm3:39092
[pipeline:ParsersPipeline:KafkaSource]
topic=collected
# group_id=lmioparser
# Kafka sinks
[pipeline:EnrichersPipeline:KafkaSink]
topic=parsed
[pipeline:ParsersPipeline:KafkaSink]
topic=unparsed
The last mandatory section specifies which Kafka topic to use for the information about changes in lookups (i. e. reference lists) and which ElasticSearch instance to load them from.
# Lookup persistent storage
[asab:storage] # this section is used by lookups
type=elasticsearch
[elasticsearch]
url=http://elasticsearch:9200
# Update lookups pipelines
[pipeline:LookupChangeStreamPipeline:KafkaSource]
topic=lookups
[pipeline:LookupModificationPipeline:KafkaSink]
topic=lookups
Installation¶
Docker Compose¶
lmio-parser:
image: docker.teskalabs.com/lmio/lmio-parser
volumes:
- ./lmio-parser:/data
Configuring new LogMan.io Parser instance¶
To create a new parser instance for a new data source (files, lookups, SysLog etc.) the following three steps must be taken:
- Creation of a new Kafka topic to load the collected data to
- Configuration of associated LogMan.io Parser instances in site-repository
- Deployment
Creation of a new Kafka topic to load the collected data to¶
First, it is needed to create a new collected events topic.
Collected events topics are specific for every data source type and tenant (customer). The standard for naming such Kafka topics is as follows:
collected-<tenant>-<type>
where tenant is the lowercase tenant name and type is the data source type. Examples include:
collected-railway-syslog
collected-ministry-files
collected-johnandson-databases
collected-marksandmax-lookups
Collected topic for all tenants can have the following format:
collected-default-lookups
To create a new Kafka topic:
1.) Enter any Kafka container via docker exec -it o2czsec-central_kafka_1 bash
2.) Use the following command to create the topic using /usr/bin/kafka-topics
:
/usr/bin/kafka-topics --zookeeper lm1:12181,lm2:22181,lm3:32181 --create --topic collected-company-type -partitions 6 --replication-factor 1
The number of partitions are dependant on the expected amount of data and number of instances of LogMan.io Parser. Since in most deployment there are three running servers in the cluster, it is recommended to use at least three partitions.
Configuration of associated LogMan.io Parser instances in site-repository¶
Enter the site repository with configurations for LogMan.io cluster. To learn more about the site repository, please refer to the Naming Standards in Reference section.
Then in every server folder (such as lm1, lm2, lm3) create the following entry in
docker-compose.yml
file:
<tenant>-<type>-lmio-parser:
restart: on-failure:3
image: docker.teskalabs.com/lmio/lmio-parser
network_mode: host
depends_on:
- kafka
- elasticsearch-master
volumes:
- ./<tenant>-<type>/lmio-parser:/data
- ../lookups:/lookups
- /data/hdd/log/<tenant>-<type>/lmio-parser:/log
- /var/run/docker.sock:/var/run/docker.sock
replace <tenant>
with tenant/customer name (such as railway) and <type>
with data type (such as lookups),
examples include:
railway-lookups-lmio-parser
default-lookups-lmio-parser
hbbank-syslog-lmio-parser
When the Docker Compose entry is included in docker-compose.yml
, follow these steps:
1.) In every server folder (lm1, lm2, lm3), create <tenant>-<type>
folder
2.) In <tenant>-<type>
folders, create lmio-parser
folder
3.) In the created lmio-parser
folders, create lmio-parser.conf
file
4.) Modify the lmio-parser.conf
and enter the following configuration:
[asab:docker]
name_prefix=<server_name>-
socket=/var/run/docker.sock
# Declarations
[declarations]
library=zk://lm1:12181,lm2:22181,lm3:32181/lmio/library.lib ./data/declarations
groups=<group>
raw_event=raw_event
count=count
tenant=tenant
timestamp=@timestamp
# API
[asab:web]
listen=0.0.0.0 0
[lmioparser:web]
listen=0.0.0.0 0
# Logging
[logging:file]
path=/log/log.log
backup_count=3
rotate_every=1d
# Kafka connection
[connection:KafkaConnection]
bootstrap_servers=lm1:19092,lm2:29092,lm3:39092
[pipeline:ParsersPipeline:KafkaSource]
topic=collected-<tenant>-<type>
group_id=lmio_parser_<tenant>_<type>
# Kafka sinks
[pipeline:EnrichersPipeline:KafkaSink]
topic=lmio-events
[pipeline:ParsersPipeline:KafkaSink]
topic=lmio-others
[pipeline:ErrorPipeline:KafkaSink]
topic=lmio-others
[asab:zookeeper]
servers=lm1:12181,lm2:22181,lm3:32181
path=/lmio/library.lib
[zookeeper]
urls=lm1:12181,lm2:22181,lm3:32181
servers=lm1:12181,lm2:22181,lm3:32181
path=/lmio/library.lib
# Lookup persistent storage
[asab:storage] # this section is used by lookups
type=elasticsearch
[elasticsearch]
url=http://<server_name>:9200/
username=<secret_username>
password=<secret_password>
# Update lookups pipelines
[pipeline:LookupChangeStreamPipeline:KafkaSource]
topic=lmio-lookups
group_id=lmio_parser_<tenant>_<type>_<server_name>
[pipeline:LookupModificationPipeline:KafkaSink]
topic=lmio-lookups
# Metrics
[asab:metrics]
target=influxdb
[asab:metrics:influxdb]
url=http://lm4:8086/
db=db0
username=<secret_username>
password=<secret_password>
where replace every occurrence of:
<group>
with parser declaration group loaded in ZooKeeper;
for more information refer to Library in Reference section of this documentation
<server_name>
with root server folder name such as lm1, lm2, lm3
<tenant>
with your tenant name such as hbbank, default, railway etc.
<type>
with your data source type such as lookups, syslog, files, databases etc.
<secret_username>
and <secret_password>
with ElasticSearch and InfluxDB technical account credentials,
which can be seen in other configurations in the site repository
For more information about what each of the configuration section means, please refer to Configuration section in the side menu.
Deployment¶
To deploy the new parser, please:
1.) Go to each of the LogMan.io servers (lm1, lm2, lm3)
2.) Do git pull
in the site repository folder, which should be located in /opt
directory
3.) Run docker-compose up -d <tenant>-<type>-lmio-parser
to start the LogMan.io Parser instance
4.) Deploy and configure SyslogNG, LogMan.io Ingestor etc. to send the collected data to collected-<tenant>-<type>
Kafka topic
4.) See logs in /data/hdd/log/<tenant>-<type>/lmio-parser
folder for any errors to debug
(replace <tenant>
and <type>
accordingly)
Notes¶
To create data stream for lookups, please use lookups
as type and refer to Lookups section in the side menu
to properly create the parsing declaration group.
DNS Enricher¶
DNS Enricher enriches event with information loaded from DNS server(s), such as hostnames.
Example¶
Declaration¶
---
define:
name: DNSEnricher
type: enricher/dns
dns_server: 8.8.8.8,5.5.4.8 # optional
attributes:
device.ip:
hostname: host.hostname
source.ip:
hostname:
- host.hostname
- source.hostname
Input¶
{
"source.ip": "142.251.37.110",
}
Output¶
{
"source.ip": "142.251.37.110",
"host.hostname": "prg03s13-in-f14.1e100.net",
"source.hostname": "prg03s13-in-f14.1e100.net"
}
Section define
¶
This section defines the name and the type of the enricher,
which in the case of DNS Enricher is always enricher/dns
.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be enricher/dns
.
Item dns_server
¶
The list of DNS servers to ask for information, separated by comma ,
.
Section attributes
¶
Specify dictionary with attribues to load the IP address or other DNS-lookup information from.
Each attribute should be followed by another dictionary with the list of keys to extract from the DNS server.
Then the value of every key is either string with the name of the event attribute to store the looked up value in, or a list, if the value should be inserted into more than one event attribute.
Field Alias Lookup & Enricher¶
Lookup¶
Field Alias lookup contains information about canonical names of event attributes, together with their possible aliases (like short names etc.).
Field Alias lookup ID must contain the following substring: field_alias
The lookup record has the following structure:
key: canonical_name
value: {
"aliases": [
alias1, # f. e. short name
alias2, # f. e. long name
...
]
}
The field aliases can be specified in parsers', standard enrichers' and correlators'
define section, so that alias names used in the declarative file (like !ITEM EVENT alias
)
are translated to canonical names, when accessing an existing element
(i. e. !ITEM EVENT alias
or !ITEM EVENT canonical_name
).
Also, the lookup should be used in Field Alias enricher to transform all aliases into canonical names after successful parsing in LogMan.io Parser.
Enricher¶
Field Alias enriches the event with canonical names of the existing attributes, that are named by one of the specified aliases, while deleting the alias attributes in the event.
Declaration¶
---
define:
name: FieldAliasEnricher
type: enricher/fieldalias
lookup: field_alias.default
Section define
¶
This section defines the name and the type of the enricher,
which in the case of Field Alias is always enricher/fieldalias
.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be enricher/fieldalias
.
IP2Location Enricher (OBSOLETE)¶
This enricher is obsoleted, please use IPEnricher instead.
IP2Location enriches the event with specified location attributes based on IPV4 or IPV6 value.
Example¶
Declaration¶
---
define:
name: IP2Location
type: enricher/ip2location
zones:
- myLocalZone
- ip2location
- ...
attributes:
ip_addr1:
country_short: firstCountry
city: firstCity
L: firstL
ip_addr2:
country_short: secondCountry
city: secondCity
L: secondL
...
Input¶
Feb 5 10:50:01 0:0:0:0:0:ffff:1f1f:e001 %ASA-1-105043 test
Output¶
{
'rt': 1580899801.0,
'msg': 'test',
'ip_addr1': '0:0:0:0:0:ffff:1f1f:e001',
'firstCountry': 'CZ',
'firstCity': 'Brno',
'firstL': {
'lat': 49.195220947265625,
'lon': 16.607959747314453
}
}
Section define
¶
This section defines the name and the type of the enricher,
which in the case of IP2Location is always enricher/ip2location
.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be enricher/ip2location
.
Value enricher/geoip
is obsoleted equivalent.
Section zones
¶
Specify a list of zones (database files or streams), which are going to be used by the enricher. First zone that successfully performs the lookup is used, so order them by priority.
Section attributes
¶
Specify dictionary with event IPV6 attributes to search the lookup for, such as dvchost
.
Inside the dictionary, mention fields/attributes from the lookup that are going to be loaded
plus the attribute name in the event after it. For example:
ip_addr1:
country_short: firstCountry
city: firstCity
L: firstL
will search the IP to GEO lookup for IP stored event["ip_addr1"]
,
load country_short
, city
, L
from the lookup (if present) and map them to
event["firstCountry"]
, event["firstCity"]
, event["firstL"]
Lookup attributes¶
The following lookup attributes, if present in the lookup's zone, may be used for further mapping:
country_short: string
country_long: string
region: string
city: string
isp: string
L: dictionary (includes: lat: float, lon: float)
domain: string
zipcode: string
timezone: string
netspeed: string
idd_code: string
area_code: string
weather_code: string
weather_name: string
mcc: string
mnc: string
mobile_brand: string
elevation: float
usage_type: string
High Performance Parsing¶
High performance parsing is a parsing that is compiled directly to the machine code, thus ensuring highest possible speed of parsing incoming events.
All built-in preprocessors as well as declarative expressions !PARSE
and !DATETIME.PARSE
offer high performance parsing.
Procedural parsing¶
In order for the machine/instruction code to be compiled via LLVM and C, all expressions need to provide definition of the procedural parsing, meaning that each character(s) in the parsing input string needs have defined the output length and output type.
While for preprocessors the procedure is transparent and not shown to the user,
in !PARSE
and !DATETIME.PARSE
expressions, the exact procedure needs with types and format to be defined in the format
attribute:
!DATETIME.PARSE
what: "2021-06-11 17"
format:
- year: {type: ui64, format: d4}
- '-'
- month: {type: ui64, format: d2}
- '-'
- day: {type: ui64, format: d2}
- ' '
- hour: {type: ui64, format: d2}
First item in the format
attribute corresponds to the first character(s) in the incoming message,
here year
is formed from first four characters and traslated to integer (2021
).
If only a single character is specified, it is skipped and not stored in the output parsed structure.
High Performance Expressions¶
!DATETIME.PARSE
¶
!DATETIME.PARSE
implicitly creates a datetime from the parsed structure,
which has following attributes:
-
year
-
month
-
day
-
hour
(optional) -
minute
(optional) -
second
(optional) -
microsecond
(optional)
Format - long version¶
The attributes need to be specified in the format
inlet:
!DATETIME.PARSE
what: "2021-06-11 1712X000014"
format:
- year: {type: ui64, format: d4}
- '-'
- month: {type: ui64, format: d2}
- '-'
- day: {type: ui64, format: d2}
- ' '
- hour: {type: ui64, format: d2}
- minute: {type: ui64, format: d2}
- 'X'
- microsecond: {type: ui64, format: dc6}
Format - short version¶
The format
can use shortened notation with %Y
, %m
, %d
, %H
, %M
, %S
and %u
(microsecond) placeholders,
which represent unsigned numbers based on the format in the example above:
!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: "%Y-%m-%dT%H:%MZ"
The format
statement can be simplified, if the datetime format is standardized, such as RFC3339
or iso8601
:
!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: iso8601
If timezone is different from UTC, also it needs to be explicitly specified:
!DATETIME.PARSE
what: "2021-06-16T11:17Z"
format: iso8601
timezone: Europe/Prague
Available types¶
Integer¶
-
{type: ui64, format: d2}
- exactly 2 characters to unsigned integer -
{type: ui64, format: d4}
- exactly 4 characters to unsigned integer -
{type: ui64, format: dc6}
- 1 to 6 characters to unsigned integer
IP Enricher¶
IP Enricher extends events with geographic and other data associated with the given IPv6 or IPv4 address. This module replaces the IP2Location Enricher and the GeoIP Enricher, which are now obsolete.
Declaration¶
Section define
¶
This section defines the name and the type of the enricher.
Item name
¶
Short human-readable name of this declaration.
Item type
¶
There are four types of IP Enricher available, differing in IP version (IPv4 or IPv6) and IP address representation (integer or string). Using the integer input is the faster, preferred option.
enricher/ipv6
processes IPv6 addresses in 128-bit decimal integer format (such as 281473902969579).enricher/ipv4
processes IPv4 addresses in 32-bit decimal integer format (such as 3221226219).enricher/ipv6str
processes IPv6 addresses in colon-separated hexadecimal string format (such as 2001:db8:0:0:1:0:0:1) as defined in RFC 5952. It can also convert and process IPv4 string addresses (likeenricher/ipv4str
).enricher/ipv4str
processes IPv4 addresses in dotted decimal string format (such as 192.168.16.0) as defined in RFC 4001.
Item base_path
¶
Specifies the base URL path which contains lookup zone files.
It can point to:
* local filesystem directory, e.g. /path/to/files/
* a location in zookeeper, e.g. zk://zookeeper-server:2181/path/to/files/
* an HTTP location, e.g. http://localhost:3000/path/to/files/
Section tenants
¶
IP Enricher can be configured for multitenant settings. This section lists tenants which should be considered by the enricher in creating tenant-specific lookups. Events annotated with a tenant that is not listed in this section will only get global enrichment (see below).
Section zones
¶
Specifies a list of lookup zones to be used by the enricher, plus the information whether they are global or tenant-specific.
Global lookups can enrich any event, regardless of its tenant. Tenant-specific lookups can only enrich events with matching tenant context.
Lookup zones should be ordered by their priority, highest to lowest, as the lookup iterates over the zones sequentially and stops as soon as the first match is found.
zones:
- tenant: lookup-zone-1.pkl.gz
- tenant: lookup-zone-2.pkl.gz
- global: lookup-zone-glob.pkl.gz
The zone names must match the corresponding file names including the extension.
The global
lookup files need to exist directly in the base_path
directory.
The tenant
lookup files need to be organized under base_path
into folders
by their respective tenant. For example, assuming we have declared
first_tenant
and second_tenant
, the zones declaration above expects
the following file structure:
/base_path/
- first_tenant/
- lookup-zone-1.pkl.gz
- lookup-zone-2.pkl.gz
- second_tenant/
- lookup-zone-1.pkl.gz
- lookup-zone-2.pkl.gz
- lookup-zone-glob.pkl.gz
An incoming event whose contxt is second_tenant
will first try to match
in lookup second_tenant/lookup-zone-1.pkl.gz
, then in
second_tenant/lookup-zone-2.pkl.gz
and finally in lookup-zone-glob.pkl.gz
Section attributes
¶
This section specifies which event attributes contain the IP address and what attributes shall be added to the event if a match is found. It has the following dictionary structure:
ip_address:
lookup_attribute_name: event_attribute_name
The IP address is extracted from the event attribute ip_address
.
If it matches any lookup zone, the value of lookup_attribute_name
is saved
into event_attribute_name
. For example:
source_ip:
country_code: source_country
This will try to match the event based on its source_ip
attribute and
store the corresponding country_code
value into source_country
event field.
Zone name enrichment¶
It may be useful to record the name of the lookup zone where the match happened.
To add the zone name into the event, use zone
as the lookup attribute name, e.g.:
source_ip:
zone: source_zone_name
This will add the name of the matched lookup into the event field source_zone_name
.
Note that if there is a field called "zone" in the lookup, its value will be used instead of the lookup name.
The lookup name is set at the creation of the lookup file.
It defaults to the name of the source file, but it can be configured to some other value.
See the lmiocmd ipzone from-csv
command in LogMan.io Commander for details.
Example usage¶
Declaration file¶
---
define:
name: IPEnricher
type: enricher/ipv6
tenants:
- some-tenant
- another-tenant
zones:
- tenant: lookup-zone-1.pkl
- global: ip2location.pkl.gz
attributes:
ip_addr1:
country_code: sourceCountry
city_name: sourceCity
L: sourceLocation
ip_addr2:
country_code: destinationCountry
city_name: destinationCity
L: destinationL
...
Here, the enricher reads the IP address from the event attribute ip_addr1
.
Then it tries to find the address in its lookup ojects: first in
lookup-zone-1.pkl
, then in ip2location.pkl.gz
.
If it finds a match, it retrieves the lookup values country_code
,
city_name
and L
, and saves them in their respective event fields
sourceCountry
, sourceCity
and sourceLocation
.
It proceeds analogically for the second address ip_addr2
.
The result can be seen below.
Input¶
Feb 5 10:50:01 0:0:0:0:0:ffff:1f1f:e001 %ASA-1-105043 test
The line above may be parsed into the following dictionary.
{
'rt': 1580899801.0,
'msg': 'test',
'ip_addr1': '0:0:0:0:0:ffff:1f1f:e001'
}
This is passed to the IP Enricher we declared above.
Output¶
{
'rt': 1580899801.0,
'msg': 'test',
'ip_addr1': '0:0:0:0:0:ffff:1f1f:e001',
'sourceCountry': 'CZ',
'sourceCity': 'Brno',
'sourceLocation': (49.195220947265625, 16.607959747314453)
}
IP zone lookup file¶
The IP lookup file is a pickled Python dictionary.
It can be simply created from CSV file using the lmiocmd ipzone from-csv
command found in LogMan.io Commander.
The CSV needs to contain a header row with column names.
There needs to be an ip_from
and an ip_to
column, and at least one other
column with desired lookup values.
For example:
ip_from,ip_to,zone_info,latitude,longitude
127.61.100.0,127.61.111.255,my secret base,48.224673,-75.711505
127.61.112.0,127.61.112.255,my submarine,22.917923,267.490378
NOTE
The zones defined in one lookup file must not overlap.
NOTE
The IP zones in the CSV file are treated as closed intervals, i.e. both ip_from
and ip_to
fields are
included in the zone they delimit.
IP2Location¶
This command is also able to create a lookup file from IP2Location™ CSV databases. Note that these files don't include column names, so the header row needs to be added to the CSV file manually before creating the lookup.
IP Resolve Enricher & Expression¶
IP Resolve enriches the event with canonical hostname and/or IP based on either IP address AND network/space, or any hostname AND network connected to the IP address in the lookup.
IP Resolve lookup ID must contain the following substring: ip_resolve
The lookup record has the following structure:
key: [IP, network]
value: {
"hostnames": [
canonical_hostname,
hostname2,
hostname3
...
]
}
Example¶
Declaration #1 - Enricher¶
---
define:
name: IPResolve
type: enricher/ipresolve
lookup: lmio_ip_resolve # optional
source:
- ip_addr_and_network_try1
- ip_addr_and_network_try2
- hostname_and_network_try3
- [!IP.PARSE ip4, !ITEM EVENT network4]
...
ip: ip_addr_try1
hostname: host_name
Declaration #2 - Expression¶
!IP.RESOLVE
source:
- ip_addr_and_network_try1
- ip_addr_and_network_try2
- hostname_and_network_try3
- [!IP.PARSE ip4, !ITEM EVENT network4]
...
ip: ip_addr_try1
hostname: host_name
with: !EVENT
lookup: lmio_ip_resolve # optional
Input¶
Feb 5 10:50:01 0:0:0:0:0:ffff:1f1f:e001 %ASA-1-105043 test
Output¶
{
'rt': 1580899801.0,
'msg': 'test',
'ip_addr_try1': 281471203926017,
'host_name': 'my_hostname'
}
Section define
¶
This section defines the name and the type of the enricher,
which in the case of IP Resolve is always enricher/ipresolve
.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be enricher/ipresolve
.
Section source
¶
Specify a list of attributes to lookup. Every attribute should be in the following format:
[IP, network]
[hostname, network]
If network is not specified, global
will be used.
The first successful lookup returns the output values (ip
, hostname
).
Section ip
¶
Specify the attribute to store the lookuped IP address in.
Section hostname
¶
Specify the attribute to store the lookuped canonical hostname in.
Canonical hostname is the first in the lookup value's hostnames
.
Loading the lookup from a file¶
The IP Resolve lookup data can be loaded from a file using LogMan.io Collector
input:FileBlock
.
Hence, the data are available in the LogMan.io Parser, where they should be
posted to lookup
target. Thus the lookup will not enter the input
topic,
but the lookups
topic, from where it is going to be processed
by LogMan.io Watcher to update data in ElasticSearch.
The LogMan.io Watcher expects the following event format:
{
'action': 'full',
'data': {
'items': [{
'_id': [!IP.PARSE 'MyIP', 'MyNetwork'],
'hostnames': ['canonical_hostname', 'short_hostname', 'another_short_hostname']
}
]
},
'lookup_id': 'customer_ip_resolve'
}
where action
equals full
signifies, that the existing lookup content should be
replaced with the items
in data
To create this structure, use the following declarative example of Cascade Parser:
---
define:
name: Demo of IPResolve parser
type: parser/cascade
target: lookup
parse:
!DICT
set:
action: full
lookup_id:
!JOIN
items:
- !ITEM CONTEXT filename
- ip_resolve
delimiter: '_'
data:
!DICT
set:
items:
!FOR
each:
!REGEX.SPLIT
what: !EVENT
regex: '\n'
do:
!FIRST
- !CONTEXT.SET
set:
_temp:
!REGEX.SPLIT
what: !ARG
regex: ';'
- !DICT
set:
_id:
- !IP.PARSE
value: !ITEM CONTEXT _temp.0
- MyNetworkOrSpace
hostnames:
!LIST
append:
- !ITEM CONTEXT _temp.1
- !ITEM CONTEXT _temp.2
Parsing lookups¶
When a lookup is received from LogMan.io Collector via LogMan.io Ingestor, it can either be a whole lookup content (full frame), or just one record (delta frame).
Preprocessing¶
Based on the input lookup file format, a preprocessor should be used in order to simplify following declarations and optimize the speed of lookup loading. Usually, either JSON, XML or CSV preprocessor will be used:
---
define:
name: Preprocessor for CSV
type: parser/preprocessor
function: lmiopar.preprocessor.CSV
Thus, the parsed file content is stored in CONTEXT
, where it can be accessed from.
Full frame¶
In order to store the entire lookup in ElasticSearch through LogMan.io Watcher,
and notify other instances of LogMan.io Parser and LogMan.io Correlator about the change
in the entire lookup, a Cascade Parser declaration should be used with target: lookup
configuration.
Thus, the lookup will not enter the input
topic,
but the lookups
topic, from where it is going to be processed
by LogMan.io Watcher to update data in ElasticSearch.
The LogMan.io Watcher expects the following event format:
{
'action': 'full',
'data': {
'items': [{
'_id': 'myId',
...
}
]
},
'lookup_id': 'myLookup'
}
where action
equals full
signifies, that the existing lookup content should be
replaced with the items
in data
.
To create this structure, use the following declarative example of Cascade Parser.
Sample declaration¶
---
define:
name: Demo of lookup loading parser
type: parser/cascade
target: lookup
parse:
!DICT
set:
action: full
lookup_id: myLookup
data:
!DICT
set:
items:
!FOR
each: !ITEM CONTEXT CSV
do:
!DICT
set:
_id: !ITEM ARG myId
...
When the lookup content enters the LogMan.io Parser, the parsed lookup is being sent to LogMan.io Watcher to store it in ElasticSearch.
Delta frame¶
In order to update ONE item in an existing lookup in ElasticSearch through LogMan.io Watcher,
and notify other instances of LogMan.io Parser and LogMan.io Correlator about the change
in the lookup, a Cascade Parser declaration should be used with target: lookup
configuration.
Thus, the lookup item will not enter the input
topic,
but the lookups
topic, from where it is going to be processed
by LogMan.io Watcher to update data in ElasticSearch.
The LogMan.io Watcher expects the following event format:
{
'action': 'update_item',
'data': {
'_id': 'existingOrNewItemId',
...
},
'lookup_id': 'myLookup'
}
where action
equals update_item
signifies, that the existing lookup item content should be
replaced items in data
, or a new lookup item should be created.
To create this structure, use the following declarative example of Cascade Parser.
Sample declaration¶
---
define:
name: Demo of lookup item loading parser
type: parser/cascade
target: lookup
parse:
!DICT
set:
action: update_item
lookup_id: myLookup
data:
!DICT
set:
_id: !ITEM CONTEXT CSV.0.myID
...
When the lookup content enters the LogMan.io Parser, the parsed lookup is being sent to LogMan.io Watcher to store it in ElasticSearch.
MAC Vendor Enricher¶
MAC Vendor enriches event with specified vendor attributes based on their MAC address value (only first 6 characters are considered to detect the vendor).
Example¶
Declaration¶
---
define:
name: MACVendor
type: enricher/macvendor
lookup: lmio_mac_vendor # optional
attributes:
MAC1: detectedVendor1
MAC2: detectedVendor2
...
Input¶
Feb 5 10:50:01 0:0:0:0:0:ffff:1f1f:e001 %ASA-1-105043 5885E9001183
Output¶
{
'rt': 1580899801.0,
'MAC1': '5885E9001183',
'detectedVendor1': 'Realme Chongqing Mobile Telecommunications Corp Ltd',
}
Section define
¶
This section defines the name and the type of the enricher,
which in the case of Mac Vendor is always enricher/macvendor
.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be enricher/macvendor
.
Section attributes
¶
Specify dictionary with event
's MAC attributes to search the lookup for, such as MAC1
.
Inside the dictionary, mention the attribute name in the event for the detected vendor to be stored in.
For example:
MAC1:
detectedVendor1
will search the Mac Vendor lookup for MAC stored event["MAC1"]
,
load the vendor to event["detectedVendor1"]
, if successfully looked up.
Lookup files¶
MAC Vendor enricher lookup files are based on OUI standard: standards-oui.ieee.org/oui.txt
The files are stored in the default path directory (/lookups/macvendor
),
which can be overridden in configuration:
[lookup:lmio_mac_vendor]
path=...
lmio_mac_vendor
is the provided lookup ID in the enricher definition, which defaults to lmio_mac_vendor
Parser Builder¶
The builder is a tool for an easy creation of parser/enricher declarations.
To start a builder, run:
python3 builder.py -w :8081 ./example/asa-parser
The path argument(s) specify the folder (or folders) with parsers and enrichers declarations (aka YAML files). It is recommended to point into a YAML library.
YAML files are loaded in the order as specified by the command-line and then by sorting *.yaml files found in a respective directory in alphabetical order.
-I
argument allows to specify folders that will be used as a base for !INCLUDE
directive. Multiple entries are allowed.
-w
argument is to specify the HTTP port.
Parser preprocessor¶
The parser preprocessor allows to preprocess the input event by a imperative code, e.g. Python, Cython, C etc.
Example¶
---
define:
name: Demo of the build-in Syslog preprocessor
type: parser/preprocessor
tenant: Syslog_RFC5424.STRUCTURED_DATA.soc@0.tenant # (optional)
count: CEF.cnt # (optional)
function: lmiopar.preprocessor.Syslog_RFC5424
tenant
specifies the tenant attribute to be read and passed to context['tenant']
for further distribution of parsed and unparsed events to tenant specific
indices/storages in LogMan.io Dispatcher
count
specifies the count attribute
with count of events to be read and passed to context['count']
Built-in preprocessors¶
lmiopar.preprocessor
module contains following commonly used preprocessors.
There preprocessors are optimized for high performace deployments.
Syslog RFC5425 built-in preprocessor¶
function: lmiopar.preprocessor.Syslog_RFC5424
This is a preprocessor for the Syslog protocol (new) according to RFC5425.
The input for this preprocessor is a valid Syslog entry, e.g.:
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog 10 ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry.
The output is, a message part of the log in the event and parsed elements in the context.syslog_rfc5424
.
event: An application event log entry.
context:
Syslog_RFC5424:
PRI: 165
FACILITY: 20
PRIORITY: 5
VERSION: 1
TIMESTAMP: 2003-10-11T22:14:15.003Z
HOSTNAME: mymachine.example.com
APP_NAME: evntslog
PROCID: 10
MSGID: ID47
STRUCTURED_DATA:
exampleSDID@32473:
iut: 3
eventSource: Application
eventID: 1011
...
Syslog RFC3164 built-in preprocessor¶
function: lmiopar.preprocessor.Syslog_RFC3164
This is a preprocessor for the BSD syslog Protocol (old) according to RFC3164.
The Syslog RFC3164 preprocessor can be configured in the define
section:
define:
type: parser/preprocessor
year: 1999
timezone: Europe/Prague
year
specifies the numeric representation of the year that will be applied to the timestamp of the logs.
Also, you may specify smart
(default) for the advanced selection of the year based on the month.
timezone
specifies the timezone of the logs, the default is UTC
.
The input for this preprocessor is a valid Syslog entry, e.g.:
<34>Oct 11 22:14:15 mymachine su[10]: 'su root' failed for lonvick on /dev/pts/8
The output is, a message part of the log in the event and parsed elements in the context.syslog_rfc3164
.
event: "'su root' failed for lonvick on /dev/pts/8"
context:
Syslog_RFC3164:
PRI: 34
PRIORITY: 2
FACILITY: 4
TIMESTAMP: '2003-10-11T22:14:15.003Z'
HOSTNAME: mymachine
TAG: su
PID: 10
TAG
and PID
are optional parameters.
CEF built-in preprocessor¶
function: lmiopar.preprocessor.CEF
This is a preprocessor for the CEF or Common Event Format.
define:
type: parser/preprocessor
year: 1999
timezone: Europe/Prague
year
specifies the numeric representation of the year that will be applied to the timestamp of the logs.
Also, you may specify smart
(default) for the advanced selection of the year based on the month.
timezone
specifies the timezone of the logs, the default is UTC
.
The input for this preprocessor is a valid CEF entry, e.g.:
CEF:0|Vendor|Product|Version|foobar:1:2|Failed password|Medium| eventId=1234 app=ssh categorySignificance=/Informational/Warning categoryBehavior=/Authentication/Verify
The output is, a message part of the log in the event and parsed elements in the context.CEF
:
context:
CEF:
Version: 0
DeviceVendor: Vendor
DeviceProduct: Product
DeviceVersion: Version
DeviceEventClassID: 'foobar:1:2'
Name: Failed password
Severity: Medium
eventId: '1234'
app: ssh
categorySignificance: /Informational/Warning
categoryBehavior: /Authentication/Verify
CEF can contain also a Syslog header. This is supported by chaining relevant Syslog preprocessor with a CEF preprocessor. Please refer to a preprocessor chaining chapter for details.
Apache HTTP Server log formats built-in preprocessor¶
There are high performance preprocessors for common Apache HTTP server access logs.
function: lmiopar.preprocessor.Apache_Common_Log_Format
This is a preprocessor for the Apache Common Log Format.
function: lmiopar.preprocessor.Apache_Combined_Log_Format
This is a preprocessor for the Apache Combined Log Format.
Apache Common Log example¶
Input:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Output:
context:
Apache_Access_Log:
HOST: '127.0.0.1'
IDENT: '-'
USERID: 'frank'
TIMESTAMP: '2000-10-10T20:55:36.000Z'
METHOD: 'GET'
RESOURCE: '/apache_pb.gif'
PROTOCOL: 'HTTP/1.0'
STATUS_CODE: 200
DOWNLOAD_SIZE: 2326
Apache Combined Log example¶
Input:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Output:
context:
Apache_Access_Log:
HOST: '127.0.0.1'
IDENT: '-'
USERID: 'frank'
TIMESTAMP: '2000-10-10T20:55:36.000Z'
METHOD: 'GET'
RESOURCE: '/apache_pb.gif'
PROTOCOL: 'HTTP/1.0'
STATUS_CODE: 200
DOWNLOAD_SIZE: 2326
REFERE': http://www.example.com/start.html
USER_AGENT: Mozilla/4.08 [en] (Win98; I ;Nav)
Microsoft ULS built-in preprocessor¶
function: lmiopar.preprocessor.Microsoft_ULS
This is a preprocessor for the Microsoft_ULS according to Microsoft Docs.
For Microsoft SharePoint ULS logs, that do not contain server name nor correlation fields, a dedicated preprocessor is provided:
function: lmiopar.preprocessor.Microsoft_ULS_Sharepoint
The Microsoft SharePoint ULS preprocessor can be configured in the define section:
define:
type: parser/preprocessor
year: 1999
timezone: Europe/Prague
year
specifies the numeric representation of the year that will be applied to the timestamp of the logs.
Also, you may specify smart
(default) for the advanced selection of the year based on the month.
timezone
specifies the timezone of the logs, the default is UTC
.
The input for this preprocessor is a valid Microsoft ULS Sharepoint entry, e.g.:
04/28/2021 12:31:57.69 mssdmn.exe (0x38E0) 0x4D10 SharePoint Server Search Connectors:SharePoint dvt6 High SetSTSErrorInfo ErrorMessage = Error from SharePoint site: WebExceptionStatus: SendFailure The underlying connection was closed: An unexpected error occurred on a send. hr = 90141214 [sts3util.cxx:6994] search\native\gather\protocols\sts3\sts3util.cxx 3aeca97a-a9db-4010-970e-fe01483bfd4f
The output is, a message part of the log in the event and parsed elements in the context.Microsoft_ULS
.
event: Message included in the log.
context:
Microsoft_ULS:
TIMESTAMP 1619613117.69
PROCESS: mssdmn.exe (0x38E0)
THREAD: 0x4D10
PRODUCT: SharePoint Server Search
CATEGORY: Connectors:SharePoint
EVENTID: dvt6
LEVEL: High
Query String preprocessor¶
function: lmiopar.preprocessor.Query_String
This is a preprocessor for the query string (key=value&key=value...) such as meta information from LogMan.io Collector
Example of input:
file_name=log.log&search=true
The output is, a message part of the log in the event and parsed elements in the context.QUERY_STRING
.
event: Message included in the log.
context:
QUERY_STRING:
file_name: log.log
search: true
JSON built-in preprocessor¶
function: lmiopar.preprocessor.JSON
This is a preprocessor for the JSON format. It expects the input in a binary or textual format, the output dictionary is placed in the event.
Hence, the input for this preprocessor is a valid JSON entry.
XML built-in preprocessor¶
function: lmiopar.preprocessor.XML
This is a preprocessor for the XML format. It expects the input in a binary or textual format, the output dictionary is placed in the event.
Hence, the input for this preprocessor is a valid XML entry, e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Schannel" Guid="{1f678132-5938-4686-9fdc-c8ff68f15c85}" />
<EventID>36884</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2020-06-26T07:12:01.331577900Z" />
<EventRecordID>30286</EventRecordID>
<Correlation ActivityID="{8e20742a-4b06-0002-c274-208e064bd601}" />
<Execution ProcessID="788" ThreadID="948" />
<Channel>System</Channel>
<Computer>XX</Computer>
<Security UserID="S-1-5-21-1627182167-2524376360-74743131-1001" />
</System>
<UserData>
<EventXML xmlns="LSA_NS">
<Name>localhost</Name>
</EventXML>
</UserData>
<RenderingInfo Culture="en-US">
<Message>The certificate received from the remote server does not contain the expected name. It is therefore not possible to determine whether we are connecting to the correct server. The server name we were expecting is localhost. The TLS connection request has failed. The attached data contains the server certificate.</Message>
<Level>Error</Level>
<Task />
<Opcode>Info</Opcode>
<Channel>System</Channel>
<Provider />
<Keywords />
</RenderingInfo>
</Event>
The output of the preprocessor in the event
:
{
"System.EventID": "36884",
"System.Version": "0",
"System.Level": "2",
"System.Task": "0",
"System.Opcode": "0",
"System.Keywords": "0x8000000000000000",
"System.EventRecordID": "30286",
"System.Channel": "System",
"System.Computer": "XX",
"UserData.EventXML.Name": "localhost",
"RenderingInfo.Message": "The certificate received from the remote server does not contain the expected name. It is therefore not possible to determine whether we are connecting to the correct server. The server name we were expecting is localhost. The TLS connection request has failed. The attached data contains the server certificate.",
"RenderingInfo.Level": "Error",
"RenderingInfo.Opcode": "Info",
"RenderingInfo.Channel": "System"
}
CSV built-in preprocessor¶
function: lmiopar.preprocessor.CSV
This is a preprocessor for the CSV format. It expects the input in a binary or textual format, the output dictionary is placed in the event.
Hence, the input for this preprocessor is a valid CSV entry, e.g.:
user,last_name\njack,black\njohn,doe
The output of the preprocessor in the context["CSV"]
:
{
"lines": [
{"user": "jack", "last_name": "black"},
{"user": "john", "last_name": "doe"}
]
}
Parameters¶
In define
section of the CSV preprocessor,
the following parameters may be set for CSV reading:
delimiter: (default: ",")
escapechar: escape character
doublequote: allow doublequote (default: true)
lineterminator: line terminator character, either \n or \r (default is the operation system line separator)
quotechar: default quote character (default: "\"")
quoting: type of quoting
skipinitialspace: skip initial space (default: false)
strict: strict mode (default: false)
Custom preprocessors¶
A custom preprocessors can be called from the parser, the respective code has to be accessible by a parser microservice thru a common Python import way.
---
define:
name: Demo of the custom Python preprocessor
type: parser/preprocessor
function: mypreprocessors.preprocessor
mypreprocessors
is a module respective a folder with __init__.py
that contains a function preprocessor()
.
The parser specifies a function
to call.
It uses Python notation and it will automatically import the module.
The signature of the function:
def preprocessor(context, event):
...
return event
Preprocessor may (1) modify the event (!EVENT
) and/or (2) modify the context (!CONTEXT
).
The output of the preprocessor
function will be passed to a subsequent parsers.
Preprocessor parser doesn't produce parsed events directly.
If the function returns None, the parsing of the eveny is silently terminated.
If the funtion raises the exception, the exception will be logged and the event will be forwarded into unparsed
output.
Chaining of preprocessors¶
Preprocessors can be chained in order to parse more complex input formats. The output (aka event) of the first preprocessor is fed as an input of the second preprocessor (and so on).
For example, the input is a CEF format with Syslog RFC3164 header:
<14>Jan 28 05:51:33 connector-test CEF_PARSED_LOG: CEF:0|Vendor|Product|Version|foobar:1:2|Failed password|Medium| eventId=1234 app=ssh categorySignificance=/Informational/Warning categoryBehavior=/Authentication/Verify
The pipeline contains two preprocessors:
p01_parser.yaml
:
---
define:
name: Preprocessor for Syslog RFC5424 part of the message
type: parser/preprocessor
tenant: Syslog_RFC5424.STRUCTURED_DATA.soc@0.tenant
function: lmiopar.preprocessor.Syslog_RFC5424
p02_parser.yaml
:
---
define:
name: Preprocessor for CEF part of the message
type: parser/preprocessor
function: lmiopar.preprocessor.CEF
and final parser p03_parser.yaml
:
---
define:
name: Finalize by parsing the event into a dictionary
type: parser/cascade
parse:
!DICT
set:
Syslog_RFC5424: !ITEM CONTEXT Syslog_RFC5424
CEF: !ITEM CONTEXT CEF
Message: !EVENT
Output example:
context:
CEF:
Version: 0
DeviceVendor: Vendor
DeviceProduct: Product
DeviceVersion: Version
DeviceEventClassID: 'foobar:1:2'
Name: Failed password
Severity: Medium
eventId: '1234'
app: ssh
categorySignificance: /Informational/Warning
categoryBehavior: /Authentication/Verify
Syslog_RFC3164:
PRI: 14
FACILITY: 1
PRIORITY: 6
HOSTNAME: connector-test'
TAG: CEF_PARSED_LOG
TIMESTAMP': '2020-01-28T05:51:33.000Z'
Message: ''
Cisco ASA built-in preprocessor¶
function: lmiopar.preprocessor.CiscoASA
Warning
This preprocessor will be replaced by SP-Lang based parser.
Standard Enricher¶
Standard Enricher enriches the parsed event with additional fields. The enrichment takes place after one of the cascade parsers within the parsing group successfully matched and parsed the original input.
Example¶
---
define:
name: Example of standard enricher
type: enricher/standard
field_alias: field_alias.default
enrich:
- !DICT
with: !EVENT
set:
myEnrichedField:
!LOWER
what: "You Have Been Enriched"
Section define
¶
This section contains the common definition and meta data.
Item name
¶
Shorter human-readable name of the enricher's declaration.
Item type
¶
The type of this declaration, must be enricher/standard
.
Item field_alias
¶
Name of the field alias lookup to be loaded, so that alias names of event attributes can be used in the declaration alongside their canonical names.
Item description
(optional)¶
Longed, possibly multiline, human-readable description of the declaration.
Section enrich
¶
This section specifies the actual enrichment of the incoming event. It expects a dictionary to be returned.
Typical statements in enrich
section¶
!DICT
statement allows to add fields / attributes to the already parsed event
Testing of parsers¶
It is important to test parsers to verify their functionality with various inputs. LogMan.io offers tools for manual and automated testing of parsers.
LogMan.io Parse Utility¶
This utility is meant for manual execution of parsers from command-line. It is useful for testing, since it applies selected parser groups on the input and unprocessed events are stored in a dedicated file so that parser can be improved till this "unparsed" output is empty. It is designed for parsing a very large inputs.
The parse utility is a command-line program. It is started by following command:
python3 ./parse.py -i input-syslog.txt -u unparsed-syslog.txt ./example/syslog_rfc5424-parser
-i, --input-file
specifies the file with input lines for parsing
-u, --unparsed-file
specifies the file to store the unparsed events from the input in
and then follow the parsers group(s) from a library, where to load the declarative parsers from.
The following application runs the parsing on a given input file with records divided by new lines, such as:
Feb 5 10:50:01 192.168.1.1 %ASA-1-105043 test1
Feb 5 10:55:10 192.168.1.1 %ASA-1-105043 test2
Feb 10 8:25:00 192.168.A1.1 X %ASA-1-105044 test3
and produces a file with only unparsed
events, which has the same structure:
Feb 10 8:25:00 192.168.A1.1 X %ASA-1-105044 test3
Parser Unit test¶
The LogMan.io parses provides the tool for unit test execution over the library of parser and enricher declarations.
To start:
python3 ./test.py ./example [--config ./config.json]
The tool seeks for tests in the library, loads them and then execute them in the order.
Format of unit tests¶
Unit test file has to be placed in test
directory and the name of the file has to comply with test*.yaml
template. One YAML test file can contain one or more YAML documents with a test specification.
---
input: |
line 1
line 2
...
groups:
# This means that everything from input will be parsed
unparsed: []
parsed:
- msg: line
num: 1
- msg: line
num: 2
Extending a parser's pipeline¶
We ship LogMan.io Library with standard parsers organized into pre-defined groups. However, sometimes you will want to extend the parsing process with custom parsers or enrichers.
Consider the following input event to be parsed with parsers from LogMan.io Library with group ID lmio_parser_default_syslog_rfc3164:
<163>Feb 22 14:12:56 vmhost01 2135: ERR042: Something went wrong.
Such event will be parsed into a structured event that looks like this:
{
"@timestamp": 1614003176,
"ecs.version": "1.6.0",
"event.kind": "event",
"event.dataset": "syslog.rfc3164",
"message": "ERR042: Something went wrong.\n",
"host.name": "vmhost01",
"tenant": "default",
"log.syslog.priority": 163,
"log.syslog.facility.code": 20,
"log.syslog.severity.code": 3,
"event.ingested": 1614004510.4724128,
"_s": "SzOe",
"_id": "[ID]",
"log.original": "<163>Feb 22 14:12:56 vmhost01 2135: ERR042: Something went wrong.\n"
}
The input event, however, contains another keyword of interest - an error code "ERR042", that is not part of the structured event. We can extract the value into a custom field of the structured event by adding an enricher (a type of a parser) that slices the "message" part of the event and picks up the error code.
Locate The Parsers Group To Extend¶
In the example above we use parsers with group ID lmio_parser_default_syslog_rfc3164
. So let's navigate to this group's folder in the LogMan.io Library:
$ cd /opt/lmio-ecs # ... or your other location of lmio-ecs
$ cd syslog_rfc3164-parser
Create A New Declaration File¶
By default, with no extensions, there are these files in the parsers group's folder:
$ ls -l
p01-parser.YAML p02-parser.YAML
These files contain parsers' declarations.
For a declaration of the new enricher, create file e01-enricher.yaml
.
- The "e" stands for "enricher"
- The "01" stands for the priority this enricher will be given
- The "-enricher" can be replaced with anything meaningful to you
- "yaml" is the mandatory extension
Add Contents To The Declaration File¶
Define¶
The Declaration is a YAML file with a YAML header (empty in our case) and a mandatory definition block. We are adding a standard enricher with the name "Error Code Enricher".
Append the following to the declaration file:
---
define:
name: Error Code Enricher
type: enricher/standard
Predicate¶
We want our enricher to be applied to selected messages only, so we need to declare a Predicate using the declarative language.
Let's apply the enrichment to messages from host vmhost01
.
Append the following to the declaration file:
predicate:
!EQ
- !ITEM EVENT host.name
- "vmhost01"
Enrich¶
Looking at the "message" of the example event, we want to split the message by colons, take the value of the first item of results and store it as "error.code" (or another ECS field).
We can achieve that again with declarative language.
Append the following to the declaration file:
enrich:
!DICT
with: !EVENT
set:
error.code: !CUT
what: !ITEM EVENT message
delimiter: ':'
field: 0
The result event passed to the parsers pipeline will consist of all fields from the original event and of one other field "error.code", the value of which is a result of !CUT
ting the "message" field from the original event (!ITEM EVENT message
) using :
as delimiter and picking up the item at index 0
.
This is how the contents of e01-enricher.yaml
look like as a result:
---
define:
name: Error Code Enricher
type: enricher/standard
predicate:
!EQ
- !ITEM EVENT host.name
- "vmhost01"
enrich:
!DICT
with: !EVENT
set:
error.code: !CUT
what: !ITEM EVENT message
delimiter: ':'
field: 0
Apply changes¶
The new declaration should be kept in version control. The lmio-parser instance that uses the parsers' group ID must be restarted.
Conclusion¶
We added a new enricher into the lmio_parser_default_syslog_rfc3164's parsers pipeline.
New events from the host vmhost01 will now be parsed and enriched resulting in this output event:
{
"@timestamp": 1614003176,
"ecs.version": "1.6.0",
"event.kind": "event",
"event.dataset": "syslog.rfc3164",
"message": "ERR042: Something went wrong.\n",
"host.name": "vmhost01",
"tenant": "default",
"log.syslog.priority": 163,
"log.syslog.facility.code": 20,
"log.syslog.severity.code": 3,
"event.ingested": 1614004510.4724128,
"_s": "SzOe",
"_id": "[ID]",
"log.original": "<163>Feb 22 14:12:56 vmhost01 2135: ERR042: Something went wrong.\n",
"error.code": "ERR042"
}
Ended: Parser
Depositor ↵
LogMan.io Depositor¶
TeskaLabs LogMan.io Depositor is a microservice responsible for storing events in Elasticsearch and setting up Elasticsearch artifacts (like index templates and ILM policies) based on event lane declarations. LogMan.io Depositor stores the successfully parsed or correlated events and other events in their proper Elasticsearch indices.
Note
LogMan.io Depositor replaces LogMan.io Dispatcher.
Important notes¶
Prerequisites and configuration¶
- Depositor requires a specific Elasticsearch setting with node roles provided, see Prerequisites
- Depositor's default lifecycle policy requires node roles to be set in Elasticsearch's configuration, see Prerequisites
- Depositor by default stops sending data to Elasticsearch if cluster health is below
50 %
, see Configuration - Depositor considers all event lane files regardless of if they are disabled for the given tenant in the UI or not
Index management¶
- Depositor creates its own index template and lifecycle policy (ILM) for each index specified in the
events
andothers
sections within the event lane declaration, see Event Lane - Depositor's default index template has 6 shards and 1 replica
- The field mapping (types of the fields) in the index template are based on the schema, which by default is
/Schemas/ECS.yaml
, unless specified in the configuration or event lane, see Event Lane
Lifecycle details¶
- Depositor's default lifecycle policy has limit of 16 GB per primary shard per index (the default maximum index size is thus 6 shards * 16 GB * 2 for replica = 192 GB)
- Depositor's default lifecycle policy has shrinking enabled when entering the warm phase
- Depositor's default lifecycle policy deletes data after 180 days
Migration¶
- When migrating LogMan.io Dispatcher to LogMan.io Depositor, see the Migration section
Depositor prerequisites¶
LogMan.io Depositor has the following dependencies:
- Elasticsearch
- Apache ZooKeeper
- Apache Kafka
- LogMan.io Library with an
/EventLanes
folder and a schema in the/Schemas
folder
Elasticsearch configuration¶
The Elasticsearch cluster needs to be configured in the following way in order for LogMan.io Depositor to work properly.
The following is a Docker Compose entry of Elasticsearch nodes, when using a 3-node cluster architecture with lm1
, lm2
, and lm3
server nodes.
Note
Please note that, in the Docker Compose file, the proper node roles are assigned to Elasticsearch nodes based on the ILM. For example, hot nodes for the ILM hot phase must contain node roles data_hot
and data_content
.
When creating Docker Compose records for Elasticsearch nodes, the following attributes must be changed:
- NODE_ID: The name of the server where the Elasticsearch instance is running
- INSTANCE_ID: The name of the Elasticsearch instance. Make sure its postfix
-1
is changed to-2
at the second instance of this service etc. INSTANCE_ID is thus a unique identifier for each of the instances. - network.host: The name of the server where the Elasticsearch instance is running
- node.attr.rack_id: The name of the server rack (for large deployments) or the name of the server where the Elasticsearch instance is running
- discovery.seed_hosts: The server host names and ports of all Elasticseach master nodes
- xpack.security.transport.ssl.certificate: The path to the certificate specific for the given Elasticsearch instance
- xpack.security.transport.ssl.key: The path to the certificate key specific for the given Elasticsearch instance
- volumes: The path to the given Elasticsearch instance's data
elasticsearch-master-1:
network_mode: host
user: "1000:1000"
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
environment:
- NODE_ID=lm1
- SERVICE_ID=elasticsearch
- INSTANCE_ID=elasticsearch-master-1
- network.host=lm1 # (1)
- node.attr.rack_id=lm1 # (2)
- node.name=elasticsearch-master-1
- node.roles=master,ingest
- cluster.name=lmio-es # (3)
- cluster.initial_master_nodes=elasticsearch-master-1,elasticsearch-master-2,elasticsearch-master-3 # (6)
- discovery.seed_hosts=lm1:9300,lm2:9300,lm3:9300
- http.port=9200
- transport.port=9300 # (4)
- "ES_JAVA_OPTS=-Xms4g -Xmx4g" # (5)
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.certificate=certs/elasticsearch-master-1/elasticsearch-master-1.crt
- xpack.security.transport.ssl.key=certs/elasticsearch-master-1/elasticsearch-master-1.key
volumes:
- /data/ssd/elasticsearch/elasticsearch-master-1/data:/usr/share/elasticsearch/data
- ./elasticsearch/certs:/usr/share/elasticsearch/config/certs
restart: always
-
The node will bind to the public address and will also use it as its publish address.
-
Rack ID or datacenter name. This is meant for ES to effectively and safely manage replicas. For smaller installations, a hostname is fine.
-
The name of the Elasticsearch cluster. There is only one Elasticsearch cluster in LogMan.io.
-
Ports for internal communication among nodes.
-
Memory allocated by this Elasticsearch instance. 31 GB is the maximum recommended value, and the server node must have adequate memory available (if there are three Elasticsearch nodes with 31 GB and one master with 4 GB, there must be at least 128 GB available).
-
Intial master nodes are the instance IDs of all Elasticsearch master nodes available in the LogMan.io cluster. The master nodes' names must be aligned with node.name. In LogMan.io (as defined by Maestro), it is the same as INSTANCE_ID.
elasticsearch-hot-1:
network_mode: host
user: "1000:1000"
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
depends_on:
- es-master
environment:
- NODE_ID=lm1
- SERVICE_ID=elasticsearch
- INSTANCE_ID=elasticsearch-hot-1
- network.host=lm1 # (1)
- node.attr.rack_id=lm1 # (2)
- node.attr.data=hot # (3)
- node.name=elasticsearch-hot-1
- node.roles=data_hot,data_content # (6)
- cluster.name=lmio-es # (4)
- cluster.initial_master_nodes=elasticsearch-master-1,elasticsearch-master-2,elasticsearch-master-3 # (8)
- discovery.seed_hosts=lm1:9300,lm2:9300,lm3:9300
- http.port=9201
- transport.port=9301 # (5)
- "ES_JAVA_OPTS=-Xms31g -Xmx31g" # (7)
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.certificate=certs/elasticsearch-hot-1/elasticsearch-hot-1.crt
- xpack.security.transport.ssl.key=certs/elasticsearch-hot-1/elasticsearch-hot-1.key
volumes:
- /data/ssd/elasticsearch/elasticsearch-hot-1/data:/usr/share/elasticsearch/data
- ./elasticsearch/certs:/usr/share/elasticsearch/config/certs
-
The node will bind to the public address and will also use it as its publish address.
-
Rack ID or datacenter name. This is meant for ES to effectively and safely manage replicas. For smaller installations a hostname is fine.
-
Attributes
node.attr.data
are in the configuration because of backward compatibility for legacy ILM, where custom allocation bynode.attr.data
is used. This applies for installations of LogMan.io before 01/2024. -
The name of the Elasticsearch cluster. There is only one Elasticsearch cluster in LogMan.io.
-
Ports for internal communication among nodes.
-
Node roles are here for ILM default allocation to work properly.
-
Memory allocated by this Elasticsearch instance. 31 GB is the maximum recommended value and the server node must have adequate memory available (if there are three Elasticsearch nodes with 31 GB and one master with 4 GB, there must be at least 128 GB available).
-
Intial master nodes are the instance IDs of all Elasticsearch master nodes available in the LogMan.io cluster. The master nodes names must be aligned with node.name. In LogMan.io (as defined by Maestro), it is the same as INSTANCE_ID.
elasticsearch-warm-1:
network_mode: host
user: "1000:1000"
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
depends_on:
- es-master
environment:
- NODE_ID=lm1
- SERVICE_ID=elasticsearch
- INSTANCE_ID=elasticsearch-warm-1
- network.host=lm1 # (1)
- node.attr.rack_id=lm1 # (2)
- node.attr.data=warm # (3)
- node.name=elasticsearch-warm-1
- node.roles=data_warm # (6)
- cluster.name=lmio-es # (4)
- cluster.initial_master_nodes=elasticsearch-master-1,elasticsearch-master-2,elasticsearch-master-3 # (8)
- discovery.seed_hosts=lm1:9300,lm2:9300,lm3:9300
- http.port=9202
- transport.port=9302 # (5)
- "ES_JAVA_OPTS=-Xms31g -Xmx31g" # (7)
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.certificate=certs/elasticsearch-warm-1/elasticsearch-warm-1.crt
- xpack.security.transport.ssl.key=certs/elasticsearch-warm-1/elasticsearch-warm-1.key
volumes:
- /data/hdd/elasticsearch/elasticsearch-warm-1/data:/usr/share/elasticsearch/data
- ./elasticsearch/certs:/usr/share/elasticsearch/config/certs
-
The node will bind to the public address and will also use it as its publish address.
-
Rack ID or datacenter name. This is meant for ES to effectively and safely manage replicas. For smaller installations a hostname is fine.
-
Attributes
node.attr.data
are in the configuration because of backward compatibility for legacy ILM, where custom allocation bynode.attr.data
is used. This applies for installations of LogMan.io before 01/2024. -
The name of the Elasticsearch cluster. There is only one Elasticsearch cluster in LogMan.io.
-
Ports for internal communication among nodes.
-
Node roles are here for ILM default allocation to work properly.
-
Memory allocated by this Elasticsearch instance. 31 GB is the maximum recommended value and the server node must have adequate memory available (if there are three Elasticsearch nodes with 31 GB and one master with 4 GB, there must be at least 128 GB available).
-
Intial master nodes are the instance IDs of all Elasticsearch master nodes available in the LogMan.io cluster. The master nodes names must be aligned with node.name. In LogMan.io (as defined by Maestro), it is the same as INSTANCE_ID.
elasticsearch-cold-1:
network_mode: host
user: "1000:1000"
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
depends_on:
- es-master
environment:
- NODE_ID=lm1
- SERVICE_ID=elasticsearch
- INSTANCE_ID=elasticsearch-cold-1
- network.host=lm1
- node.attr.rack_id=lm1 # (2)
- node.attr.data=cold # (3)
- node.name=elasticsearch-cold-1
- node.roles=data_cold # (6)
- cluster.name=lmio-es # (4)
- cluster.initial_master_nodes=elasticsearch-master-1,elasticsearch-master-2,elasticsearch-master-3 # (8)
- discovery.seed_hosts=lm1:9300,lm2:9300,lm3:9300
- http.port=9203
- transport.port=9303 # (5)
- "ES_JAVA_OPTS=-Xms31g -Xmx31g" # (7)
- ELASTIC_PASSWORD=$ELASTIC_PASSWORD
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.certificate=certs/elasticsearch-cold-1/elasticsearch-cold-1.crt
- xpack.security.transport.ssl.key=certs/elasticsearch-cold-1/elasticsearch-cold-1.key
volumes:
- /data/hdd/elasticsearch/elasticsearch-cold-1/data:/usr/share/elasticsearch/data
- ./elasticsearch/certs:/usr/share/elasticsearch/config/certs
-
The node will bind to the public address and will also use it as its publish address.
-
Rack ID or datacenter name. This is meant for ES to effectively and safely manage replicas. For smaller installations a hostname is fine.
-
Attributes
node.attr.data
are in the configuration because of backward compatibility for legacy ILM, where custom allocation bynode.attr.data
is used. This applies for installations of LogMan.io before 01/2024. -
The name of the Elasticsearch cluster. There is only one Elasticsearch cluster in LogMan.io.
-
Ports for internal communication among nodes.
-
Node roles are here for ILM default allocation to work properly.
-
Memory allocated by this Elasticsearch instance. 31 GB is the maximum recommended value and the server must have adequate memory available (if there are three Elasticsearch nodes with 31 GB and one master with 4 GB, there must be at least 128 GB available).
-
Intial master nodes are the instance IDs of all Elasticsearch master nodes available in the LogMan.io cluster. The master nodes names must be aligned with node.name. In LogMan.io (as defined by Maestro), it is the same as INSTANCE_ID.
Index templates¶
LogMan.io Depositor creates its own index templates with the events
index from the event lane's elasticsearch
configuration, adding the postfix -template
. All previous index templates, if present, must have a different name and their priority set to 0.
Depositor configuration¶
Configuration sample¶
This is the most basic configuration required for LogMan.io Depositor:
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=zk:///library
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
[elasticsearch]
url=http://es01:9200
Zookeeper¶
Specify locations of the Zookeeper server in the cluster.
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Hint
For non-production deployments, the use of a single Zookeeper server is possible.
Library¶
Specify the path(s) to the library to load declarations from.
[library]
providers=zk:///library
Hint
Since ECS.yaml
schema in /Schemas
is utilized by default, consider using the LogMan.io Common Library.
Kafka¶
Specify bootstrap servers of the Kafka cluster.
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
Hint
For non-production deployments, the use of a single Kafka server is possible.
Elasticsearch¶
Specify URLs of Elasticsearch master nodes.
The ESConnection
section is used for setting advanced parameters of the connection (see below).
The elasticsearch
section is used for storing URLs and authorization items.
The asab:storage
section is used to explicitly allow the storage initialization.
[asab:storage]
type=elasticsearch
[connection:ESConnection]
precise_error_handling=true
bulk_out_max_size=1582912
output_queue_max_size=5
loader_per_url=1
cluster_status_throttle=red
cluster_status_unthrottle=green
active_shards_percent_throttle=50
retry_errors=unavailable_shards_exception
throttle_errors=circuit_breaking_exception
[elasticsearch]
url=http://es01:9201
username=MYUSERNAME
password=MYPASSWORD
Hint
The URL should point to the Elasticsearch hot node on the same server the Depositor is deployed to.
Elasticsearch Connection Advanced Settings¶
precise_error_handling¶
Specifies Elasticsearch should return information about which events caused the issue together with the error.
bulk_out_max_size¶
Size of a single bulk being sent to Elasticsearch in bytes.
Events are grouped into bulks to lower the number of requests sent to Elasticsearch.
output_queue_max_size¶
Maximum queue size for Elasticsearch bulks.
If the number is exceeded, the given pipeline is throttled.
loader_per_url¶
Number of tasks/loaders per URL. It specifies the number of requests which can be sent simultaneously to every URL specified in the URL attribute.
cluster_status_throttle¶
The state the cluster must enter in order to stop/throttle the Depositor. Can be set to none
.
Default: red
Options: red
, yellow
, none
cluster_status_unthrottle¶
The state the cluster must enter in order to resume/unthrottle the Depositor, if throttled.
Default: green
Options: red
, yellow
, green
active_shards_percent_throttle¶
The minimum percentage of total shards that must be active/available in order for Depositor to send events.
The value, which is a percentage, should be set to 100 / (number of replicas + 1)
.
Default: 50
retry_errors¶
List of retryable errors, separated by comma, which cause the retry of the event to be sent again to the events index specified in the event lane.
Note
This configuration option is unnecessary in most cases, so it is recommended to exclude it from your configuration file.
throttle_errors¶
List of errors, separated by comma, that cause the throttle of Dispatcher until resolved.
Note
This configuration option is unnecessary in most cases, so it is recommended to exclude it from your configuration file.
Declarations¶
Optional section to specify where to load the declarations of event lanes from and which schema is going to be used by default (if it is not specified in the given event lane declaration).
[declarations]
path=/EventLanes/
schema=/Schemas/ECS.yaml
Hint
Make sure to change the schema if you are using a schema other than ECS in your deployment as your default. Changing the path for event lanes is discouraged.
Event Lanes¶
Relation to LogMan.io Depositor¶
TeskaLabs LogMan.io Depositor reads all event lanes from the library and creates Kafka-to-Elasticsearch pipelines based
on kafka
and elasticsearch
sections.
Note
All deployed instances of TeskaLabs LogMan.io Depositor share the same Group ID within Kafka. This means that all depositors reading all event lanes will distribute the Kafka partitions among themselves and thus provide scalability natively.
Declaration¶
This example is the most basic event lane definition possible, located in the /EventLanes
folder in the library:
---
define:
type: lmio/event-lane
kafka:
events:
topic: events-default
others:
topic: others-default
elasticsearch:
events:
index: lmio-default-events
others:
index: lmio-default-others
When Depositor is started and the event lane is loaded, Depositor creates two pipelines, one for events
and the other for others
. The input is specified in the kafka
section, while the output index alias is specified in elasticsearch
section. Elasticsearch then automatically maps the alias name to the proper index name ending with -0000
number.
Warning
Complex events lane need custom declarations. Unlike Depositor's predecessor Dispatcher, Depositor does not natively read from the events-complex
Kafka topic.
Note
Depositor considers ALL event lane files regardless of if they are disabled for the given tenant in the UI or not. Depositor is not a tenant-specific service.
Index template¶
When Depositor is started, and then periodically every ten minutes, it creates index template in Elasticsearch for the given event lane. The mappings in the index template are based on the default schema, which is /Schemas/ECS.yaml
or another schema specified in the Depositor's configuration.
The default schema path can be overriden in the event lane by specifying the schema
attribute in the define section:
---
define:
type: lmio/event-lane
schema: /Schemas/CEF.yaml
kafka:
...
elasticsearch:
...
It is also possible to specify number_of_shards
and number_of_replicas
in the settings section in elasticsearch
:
---
define:
type: lmio/event-lane
schema: /Schemas/CEF.yaml
kafka:
...
elasticsearch:
...
events:
...
settings:
number_of_shards: 6
number_of_replicas: 1
The default number_of_shards
is 6 and number_of_replicas
is 1.
Note
Please consider carefully before changing the default settings and schema. Changing the defaults usually causes issues such as non-matching detection rules for the given event lane that uses a different schema.
Warning
Changes to the index template will only take effect after the next index rollover if an index already exists in Elasticsearch.
Lifecycle Policy¶
When Depositor is started, and then periodically every ten minutes, it refreshes the Index Lifecycle Policy in Elasticsearch for the given event lane.
Default¶
The default lifecycle policy contains four phases: hot
, warm
, cold
, and delete
.
The default hot phase for the given index ends when primary shard size exceedes 16 GB or is older than 7 days.
The default warm phase for the given index starts either when hot ends, or after 7 days, and turns on shrinking.
The default cold phase for the given index starts after 14 days.
The delete phase deletes the index after 180 days.
---
define:
type: lmio/event-lane
schema: /Schemas/CEF.yaml
kafka:
...
elasticsearch:
...
events:
...
lifecycle:
hot:
min_age: "0ms"
actions:
rollover:
max_primary_shard_size: "16gb"
max_age: "7d"
set_priority:
priority: 100
warm:
min_age: "3d"
actions:
shrink:
number_of_shards: 1
set_priority:
priority: 50
cold:
min_age: "14d"
actions:
set_priority:
priority: 0
delete:
min_age: "180d"
actions:
delete:
delete_searchable_snapshot: true
Custom¶
The default ILM can be changed, even though it is not recommended for most cases. You can do so by specifying the lifecycle
section within the event lane's elasticsearch
section:
---
define:
type: lmio/event-lane
schema: /Schemas/CEF.yaml
kafka:
...
elasticsearch:
...
events:
...
lifecycle:
hot:
min_age: "0ms"
actions:
rollover:
max_primary_shard_size: "25gb" # We want bigger primary shards than default
max_age: "7d"
set_priority:
priority: 100
warm:
min_age: "7d"
actions:
shrink:
number_of_shards: 1
set_priority:
priority: 50
cold:
min_age: "14d"
actions:
set_priority:
priority: 0
# There is no delete phase
Index¶
When Depositor is started, and periodically every ten minutes, Depositor checks if the indices for the given aliases from events
and others
sections within elasticsearch
exist.
If these indices are absent, Depositor creates the new index ending with -000001
, enabling it to write and assign the alias.
If the indices already exist, Depositor takes no action.
Dispatcher migration to Depositor¶
The migration from LogMan.io Dispatcher to LogMan.io Depositor needs to be done one event lane at a time following the steps mentioned below.
Warning
Before starting the migration, you must follow the Prerequisites, making sure to properly configure node roles for Elasticsearch nodes in the cluster.
Migration steps¶
Select one event lane to be migrated and follow this guide:
1. In Kibana, go to Management > Stack Management, then Index Management. Click on Index Templates and find the index template associated with the event lane being migrated. Usually, the name is in the format of lmio-tenant-events-eventlane-template
. In the Actions column (three dots) on the right, click on Clone.
2. In Clone, Change the Name to backup-lmio-tenant-events-eventlane-template
, and set the Priority to 0
.
3. Go to Review template and click Create template.
4. Check that the backup-lmio-tenant-events-eventlane-template
template exists in the Index Template tab.
5. Delete the original lmio-tenant-events-eventlane-template
, and keep only the backup you just created.
6. Go to the LogMan.io UI, to the Library section and the /EventLanes
folder
7. If the event lane file does not exist already, create the new event lane file with the name fortigate.yaml
(replace "fortigate" with your event lane name) in the /EventLanes/tenant
folder (replace "tenant" with the name of your tenant). If the /EventLanes/tenant
folder does not exist already, you need to create it in ZooKeeper UI.
8. Create the kafka
and elasticsearch
sections for the given event lane with both events
and others
sections specified (see Event Lane). The default schema for field mapping is /Schemas/ECS.yaml
, unless specified in the event lane.
9. If not deployed, deploy LogMan.io Depositor with kafka
, elasticsearch
, zookeeper
, and library
sections specified (see Configuration).
10. Check LogMan.io Depositor logs for warnings. Please check both Docker logs and file logs (if file logs are configured). The Docker logs can be accessed via the following command:
docker logs -f -n 1000 <lmio-depositor>
Replace <lmio-depositor>
with the LogMan.io Depositor Docker container name in your deployment.
11. In Kibana, go to Management > Stack Management, then Index Management, and check that the new lmio-tenant-events-eventlane-template
and lmio-tenant-others-template
index templates were created by Depositor. Click on the index template and check its settings and mappings. The default settings include 6 shards and 1 replica (see Event Lane).
12. In Kibana, go to Management > Stack Management, then Index Lifecycle Policies and check if lmio-tenant-events-eventlane-ilm
and lmio-tenant-others-ilm
were created. Click on their name to check the hot, warm, cold, and delete phase settings.
13. If LogMan.io is not deployed or configured for this purpose already, deploy or configure Parsec to send data to the Kafka event topic specified in the event lane declaration (here: fortigate.yaml
). Please see the Parsec Configuration section.
14. In Kibana, go to Management > Dev Tools and run index rollover, replacing tenant
and eventlane
with the name of your tenant and your event lane:
POST /lmio-tenant-eventlane/_rollover
15. Check that the new index written in the response in the box on the right side of the screen was created. Go to Management > Stack Management, then Index Management, to the tab Indices and find the index lmio-tenant-events-eventlane-0000x
.
16. Click on lmio-tenant-events-eventlane-0000x
, check that it is connected to the proper lifecycle policy, which should be lmio-tenant-events-eventlane-ilm
, and also check that Current phase is hot. Then, click on Settings and Mappings to check the number of shards (default is 6) and fields mapping that is loaded from the schema. The default schema is /Schemas/ECS.yaml
, unless specified in the event lane.
17. In Kibana, go to Analysis > Discover and check that the data is coming to the given event lane.
18. In LogMan.io UI, go to Discover and check that the data is coming to the given event lane.
19. Repeat steps 1 to 18 for all remaining event lanes (their events index). Only then you can finish the migration by doing the same procedure for the others
indices.
Hint
In the following days, periodically check that all indices are connected to the lifecycle policy (step 16). Also, make sure the indices in hot
phase are allocated to the hot
Elasticsearch nodes, which can be seen in Kibana in Management > Stack Monitoring > Indices.
Note
When you can confirm that everything is working properly after a week, you can delete the original backup index template backup-lmio-tenant-events-eventlane-template
.
Troubleshooting¶
Advice for addressing the most commonly enountered issues:
After index rollover, the data is not coming through¶
After a rollover, Elasticsearch usually takes a few minutes to display the data.
Once a few minutes have passed, check the lmio-tenant-others
data in Kibana's Discover or LogMan.io UI's Discover. If there is no related data, check the others.tenant
topic in Kafka UI. The error messages in the logs should be specific enough to describe why the data could not be stored in Elasticsearch. It usually means that the wrong schema is being used. See the Event Lane section or step 8 in the Migration section.
For advanced users: When you set the index template priority in backup-lmio-tenant-events-eventlane-template
(from step 4 in Migration) from 0 to 2 or more and do an index rollover, this old backup index template will be used to create new indices, while the new index template created by Depositor will be disregarded. This should give you more time to investigate the issue in production environments. Do not forget then to lower the priority in backup-lmio-tenant-events-eventlane-template
back to 0.
The SSD storage is becoming full due to Elasticsearch¶
In this case, you need to adjust the lifecycle policy in the given event lane declaration (for example, fortigate.yaml
) to move data from the hot
to warm
phase sooner. By default, the data becomes warm in 3 days
. For more information on how to set custom lifecycle policy, see the Event Lane section.
What is the maximum index size, and how can I change it?¶
Depositor's default lifecycle policy has limit of 16 GB per primary shard per index, so the default maximum index size is thus 6 shards * 16 GB * 2 for replica = 192 GB per index.
To change the maximum index size, you need to specify a custom lifecycle policy within the event lane declaration (for example, fortigate.yaml
), where you specify the max_primary_shard_size
attribute. For more information on how to set a custom lifecycle policy, see the Event Lane section.
There is only lmio-tenant-events
index with no number postfix, and it is not linked to any lifecycle policy¶
Every index managed by Depositor must end with -00000x
, for instance lmio-tenant-events-eventlane-000001
.
If it is not the case, please check both Docker logs and file logs (if file logs are configured). The Docker logs can be accessed via the following command:
docker logs -f -n 1000 <lmio-depositor>
Warning
There is no simple way to rename a preexisting index called lmio-tenant-events
with no lifecycle policy to lmio-tenant-events-eventlane-000001
. Hence, stop Depositor, delete the index lmio-tenant-events
, then restart Depositor. Always check the logs every time you restart Depositor.
There is an index with "Index lifecycle error" and "no_node_available_exception", what happened?¶
This issue usually happens when Elasticsearch is restarted during the index's shrinking phase. The exact message is then: """NoNodeAvailableException[could not find any nodes to allocate index [myindex] onto prior to shrink]
In order to resolve the issue:
1. In Kibana, go to Management > Stack Management > Index management, then Indices, and select your index by clicking on its name.
2. In the index detail, click on Edit settings.
3. Inside, set the following values to null:
index.routing.allocation.require._name: null
index.routing.allocation.require._id: null
4. Click Save.
5. Go to Management > Dev Tools and run the following command, replacing myindex
with the name of your index:
POST /myindex/_ilm/retry
6. Go back to Stack Management > Index management, Indices and check that the ILM error has disappeared for the given index.
There are many logs in others
, and I cannot find the ones with the interface
attribute¶
Kafka Console Consumer can be used to obtain events from multiple topics, here from all topics starting with events
.
Next, you can grep the field in quotation marks:
/usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --whitelist "events.*" | grep '"interface"'
This command gives you all incoming logs with the interface
attribute from all events
topics.
Others Schema¶
Others schema specifies the schema for error events that occurred during parsing or storage process. It is derived from ECS schema naming.
---
define:
name: Others Schema
type: lmio/schema
fields:
_id:
type: "str"
representation: "base85"
docs: Unique identifier for the event, encoded in base85 format
'@timestamp':
type: "datetime"
docs: Timestamp when the others event occurred (this should be the current time)
event.ingested:
type: "datetime"
docs: Timestamp when the event was ingested
event.created:
type: "datetime"
docs: Timestamp when the event was created
event.original:
type: "str"
elasticsearch:
type: "text"
docs: Original unparsed event message
event.dataset:
type: "str"
docs: Dataset name for the event
error.code:
type: "str"
docs: https://www.elastic.co/guide/en/ecs/current/ecs-error.html#field-error-code
error.id:
type: "str"
docs: Unique identifier for the error
error.message:
type: "text"
elasticsearch:
type: "text"
docs: Error message details
error.stack_trace:
type: "text"
elasticsearch:
type: "text"
docs: Stack trace information for the error
error.type:
type: "str"
docs: Type of error encountered
tenant:
type: "str"
docs: Identifier for the tenant
Depositor internals¶
Translation table¶
The translation table from SP-Lang types to ElasticSearch types is part of the Depositor repository and container in lmiodepositor/elasticsearchartifacts/translation_table.json
.
Datetime conversion¶
Depositor checks for fields in the events that have the type datetime
in the schema.
If the value of the field is in SP-Lang datetime integers, the date is transformed to ISO format suitable for ElasticSearch.
However, before deployment, make sure the translation table mentioned above contains proper format definition.
Ended: Depositor
Baseliner ↵
LogMan.io Baseliner¶
TeskaLabs LogMan.io Baseliner is a microservice that, based on declarations, detects deviation from a calculated activity baseline.
A baseline is a set of calculated statistical metrics about the activity of a given entity for a given time period.
Examples of entities include: user, device, host server, dataset etc.
Note
If you upgrade baseliner to v24.07-beta or newer, the Baseliner will start the learning phase again. This is required due to the changed approach to database. The following updates from v24.07-beta will not require the learning phase to start again.
LogMan.io Baseliner configuration¶
LogMan.io Baseliner requires following dependencies:
- Apache ZooKeeper
- NGINX (for production deployments)
- Apache Kafka
- MongoDB with
/data/db
folder mapped to SSD (/data/ssd/mongo/data
) - Elasticsearch
- SeaCat Auth
- LogMan.io Library with
/Baselines
folder and a schema in/Schemas
folder
MongoDB data folder location
When using Baseliner, MongoDB MUST have its data folder located on SSD/fast drive, not HDD. Having the MongoDB data folder located on HDD will result in all services using MongoDB slowing down.
Example¶
This is the most basic configuration required for each instance of LogMan.io Baseliner:
[declarations]
# The /Baselines is a default path
groups=/Baselines
[tenants]
ids=default
[pipeline:BaselinerPipeline:KafkaSource]
topic=^events.tenant.*
[pipeline:OutputPipeline:KafkaSink]
topic=complex.tenant
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=zk:///library
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
[elasticsearch]
url=http://es01:9200/
[mongodb.storage]
mongodb_uri=mongodb://mongodb1,mongodb2,mongodb3/?replicaSet=rs0
mongodb_database=baseliners
[auth]
multitenancy=yes
public_keys_url=http://localhost:8081/openidconnect/public_keys
Zookeeper¶
Specify locations of the Zookeeper server in the cluster:
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Hint
For non-production deployments, the use of a single Zookeeper server is possible.
Library¶
Specify the path(s) to the Library to load declarations from:
[library]
providers=zk:///library
Hint
Since ECS.yaml
schema in /Schemas
is utilized by default, consider using the LogMan.io Common Library.
Kafka¶
Define the Kafka cluster's bootstrap servers:
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
Hint
For non-production deployments, the use of a single Kafka server is possible.
ElasticSearch¶
Specify URLs of Elasticsearch master nodes.
Elasticsearch is necessary for using lookups, e.g. as a !LOOKUP
expression or a lookup trigger.
[elasticsearch]
url=http://es01:9200
username=MYUSERNAME
password=MYPASSWORD
MongoDB¶
Specify the URL of the MongoDB cluster with a replica set.
MongoDB stores the baselines and counters of incoming events.
[mongodb.storage]
mongodb_uri=mongodb://mongodb1,mongodb2,mongodb3/?replicaSet=rs0
mongodb_database=baseliners
Auth¶
The Auth section enables multitenancy, restricting baseline access to only users with access to the specified tenant:
[auth]
multitenancy=yes
public_keys_url=http://localhost:8081/openidconnect/public_keys
Input¶
The events for the baselines are read from the Kafka topics:
[pipeline:BaselinerPipeline:KafkaSource]
topic=^events.tenant.*
Declarations (optional)¶
Define the path for baseline declarations. The default path is /Baselines
, and the default fallback schema is /Schemas/ECS.yaml
.
If you are using a schema other than ECS (the Elastic Common Schema), you can customize the schema path.
[declarations]
groups=/Baselines
schema=/Schemas/ECS.yaml
Tenants¶
Specify the tenant for which to create the baseline. You can list multiple tenants, separating IDs with a comma, but it is recommended to have just one tenant per baseline.
[tenants]
ids=tenant1
tenant_url=http://localhost:8080/tenant
It is recommended to run at least one instance of Baseliner per tenant. In most cases, a single instance per tenant is appropriate.
Output¶
If triggers are utilized, you can change the default topic for the output pipeline:
[pipeline:OutputPipeline:KafkaSink]
topic=complex.tenant
Web APIs¶
The Baseliner provides one web API.
The Web API is designed for communication with the UI.
[web]
listen=0.0.0.0 8999
The default port of the public web API is tcp/8999
.
This port is designed to serve as the NGINX upstream for connections from Collectors.
Declarations for defining baselines¶
The declarations for baselines are loaded from the Library from the folder specified in the configuration, such as /Baseliners
.
Note
The Baseliner uses /Schemas/ECS.yaml
by default, so /Schemas/ECS.yaml
must also be present in the Library.
Declaration¶
This is an example of a baseline definition, located in the /Baseliners
folder in the Library:
---
define:
name: Dataset
description: Creates baseline for each dataset and trigger alarms if the actual number deviates
type: baseliner
baseline:
region: Czech Republic
period: day
learning: 4
classes: [workdays, weekends, holidays]
evaluate:
key: event.dataset
timestamp: "@timestamp"
analyze:
test:
!AND
- !LT
- !ARG VALUE
- 1
- !GT
- !ARG MEAN
- 10.0
trigger:
- event:
# Threat description
# https://www.elastic.co/guide/en/ecs/master/ecs-threat.html
threat.framework: "MITRE ATT&CK"
threat.indicator.sightings: !ITEM EVENT value
threat.indicator.confidence: "High"
threat.indicator.name: !ITEM EVENT dimension
- notification:
type: email
to: ["myemail@example.co"]
template: "/Templates/Email/Notification_baseliner_dimension.md"
variables:
name: "Logs are not coming to the dataset within the given UTC hour."
dimension: !ITEM EVENT dimension
hour: !ITEM EVENT hour
Sections¶
baseline¶
baseline: # (1)
region: Czech Republic #(2)
period: day # (3)
learning: 4 # (4)
classes: [workdays, weekends, holidays] # (5)
- Defines how the given baseline is built.
- Defines in which region the activity is happening (for calculating holidays and so on).
- Defines the timespan for the baseline. The period can be either
day
orweek
. - Defines the number of periods (here day) that occur from the baseliner beginning to receive input until the user can see the baseline analysis. Additional details below.
- Define which days in the week we want to monitor. Classes can include any/all:
workdays
,weekends
, andholidays
.
learning
The learning
field defines the learning phase.
The learning phase is the time from the first occurrence of the dimension value in the input of the Baseliner instance until the point when the baseline is shown to the user and the analysis takes place. In the declaration, learning
is the number of periods. The learning phase is calculated separately for holidays, weekends and working days. Baselines are rebuilt overnight (housekeeping).
In this example, the period
is day
, so learning
is 4 days. Considering the calendar, a learning phase of 4 days beginning on Friday means 4 working days, and thus ends on Wednesday night.
predicate (optional)¶
The predicate
section filters incoming events to be considered as activity in the baseline.
Write filters with TeskaLabs SP-Lang. Visit Predicates or the SP-Lang documentation for details.
evaluate¶
This section specifies which attributes from the event are going to be used in the baseline build.
evaluate:
key: event.dataset # (1)
timestamp: "@timestamp" # (2)
- Specifies the attribute/entity to monitor.
- Specifies the attribute in which the time dimension of the event activity is stored in.
analyze¶
The test
section in analyze
specifies when to run the trigger, if the actual activity (!ARG VALUE
) deviates from the baseline. Write tests in SP-Lang.
analyze:
test:
!AND #(1)
- !LT #(2)
- !ARG VALUE # (3)
- 1
- !GT # (4)
- !ARG MEAN # (5)
- 10.0
- All expressions nested under
AND
must be true for the test to pass. Here, if the value is less than 1 and the mean is greater than 10, the trigger is run. - "Less than"
- Get (
!ARG
) the value (VALUE
). If the value is less than1
as specified, the!LT
expression is true. - "Greater than"
- Get (
!ARG
) the mean (MEAN
). If the mean is greater than10.0
as specified, the!GT
expression is true.
The following attributes are available, used in SP-Lang notation:
TENANT: "str",
VALUE: "ui64",
STDEV: "fp64",
MEAN: "fp64",
MEDIAN: "fp64",
VARIANCE: "fp64",
MIN: "ui64",
MAX: "ui64",
SUM: "ui64",
HOUR: "ui64",
KEY: "str",
CLASS: "str",
trigger¶
The trigger
section defines the activity that is triggered to run after a successful analysis. (More about triggers.)
Baseliner creates events
Upon every analysis (every hour), Baseliner creates an event to summarize its analysis. These Baseliner-created events are available to use (as EVENT
) with expressions such as !ARG
and !ITEM
, meaning you can pull values from the events for your trigger activities.
These Baseliner-created events include the fields:
tenant
The name of the tenant the baseline belogs to.
dimension
The dimension the baseline belongs to, as specified in evaluate
.
class
The class the baseline was calculated from.
Options include: workdays
, weekends
, and holidays
hour
The number of the UTC hour the analysis happend in.
value
The value of the current counter of events for the given UTC hour.
Notification trigger¶
A notification trigger sends a message, such as an email. See Email notifications for more details about sending email notifications and using email templates.
An example of a notification trigger:
trigger:
- notification:
type: email #(1)
to: ["myemail@example.co"] # (2)
template: "/Templates/Email/Notification_baseliner_dimension.md" # (3)
variables: # (4)
name: "Logs are not coming to the dataset within the given UTC hour."
dimension: !ITEM EVENT dimension # (5)
hour: !ITEM EVENT hour
- Specifies an email notification
- Recipient address
- Filepath to the email template in the LogMan.io Library
- Begins the section that gives directions for how to fill the blank fields from the email template. The blank fields in the template being used in this example are
name
,dimension
, andhour
. - Uses SP-Lang to get information (
!ITEM
) from the Baseliner-createdEVENT
(detailed below). In this case, the template fielddimension
will be filled with the value ofdimension
taken from the Baseliner-created event.
Event trigger¶
You can use an event trigger to create a log or event, which you'll be able to see in the TeskaLabs LogMan.io UI.
Example of an event trigger:
- event: # (1)
threat.framework: "MITRE ATT&CK"
threat.indicator.sightings: !ITEM EVENT value
threat.indicator.confidence: "High"
threat.indicator.name: !ITEM EVENT dimension
- This new event is a threat description using threat fields from Elasticsearch
Analysis in UI¶
By default, the LogMan.io UI provides displays of analyses for user
and host
.
Specify the analysis in the schema (default: /Schemas/ECS.yaml
) like this:
host.id:
type: "str"
analysis: host
user.id:
type: "str"
analysis: user
...
If then the tenant is configured to use this schema (ECS by default), the host.id
and user.id
fields in Discover will show a link to the given baseline.
Analysis host
uses the baseline named Host
by default:
---
define:
name: Host
Analysis user
uses the baseline named User
by default:
---
define:
name: User
If a specific analysis cannot locate its associated baseline, the UI will display an empty screen for that analysis.
Note
Both baselines needed for analysis are distributed as part of the LogMan.io Common Library.
Ended: Baseliner
Correlator ↵
LogMan.io Correlator¶
TeskaLabs LogMan.io Correlator is a microservice responsible for performing detections and finding patters in data based on correlation rules.
LogMan.io Correlator is always deployed for a given tenant.
Important notes¶
-
Each correlator has mandatory sections in the configuration files, see Configuration section.
-
Correlator cannot work with correlation rules. See Window Correlator section for more information on how to create correlation rules.
LogMan.io Correlator Configuration¶
First it is needed to specify which library to load the declarations from, which can be either ZooKeeper or File.
[library]
providers=zk://library
ZooKeeper Library layer requires zookeeper
configuration section.
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Also, every running instance of the parser must know which groups to load from the libraries and which tenant it belongs to, see below:
# Tenant
[tenant]
ids=mytenant
# Declarations
[declarations]
groups=Firewall Common Authentication
# Complex event lane (optional)
[eventlane]
path=/EventLanes/mytenant/complex.yaml
groups
- names of groups to be used from the library separated by spaces; if the group
is located within a folder's subfolder, use slash as a separator, f. e. /Correlators/Firewall
Next, it is needed to know which Kafka topics to use at the default fallback input and output (unless specified in the correlations in logsources section and complex event lane).
Kafka connection needs to be also configured to know which Kafka servers to connect to.
# Kafka connection
[kafka]
bootstrap_servers=lm1:19092;lm2:29092;lm3:39092
# The default Kafka topic to read from when no logsource is specified in the correlation rule (optional)
[pipeline:CorrelatorsPipeline:KafkaSource]
topic=lmio-events
group_id=lmio_correlator_firewall
# The default kafka topic for event trigger unless there is a complex event specified (optional)
[pipeline:OutputPipeline:KafkaSink]
topic=lmio-output
The last mandatory section specifies which Elasticsearch setting that allow to work with Lookups. For more information, see Lookups section.
# Lookup persistent storage
[elasticsearch]
url=http://elasticsearch:9200
Installation¶
Docker Compose¶
lmio-correlator:
image: docker.teskalabs.com/lmio/lmio-correlator:VERSION
volumes:
- ./lmio-correlator:/conf
- /data/ssd/lookups:/lookups
- /data/hdd/log/lmio-correlator:/log
- /data/ssd/correlators/lmio-correlator:/data
Replace lmio-correlator
with the name of the correlator's instance.
The correlator needs to know its configuration path, path to lookups (the folder can be empty, depens if the lookups are used), logging path and the path to store its data.
Warning
The data path is mandatory and must be located on the fast drive, that is SSD.
Window Correlator¶
Window correlator detects incoming events based on the predicate
section and stores them in data structures based on the evaluate
section. If then the test
rule in analyze
section is matched, trigger
section is called.
The following sample correlation detects more than or equal to 5 error connections between two IP addresses:
Sample¶
---
define:
name: "Network T1046 Network Service Discovery"
description: "Detects more than or equal to 5 error connections between two IP addresses"
type: correlator/window
logsource:
type: "Network"
mitre:
technique: "T1046"
tactic: "TA0007"
predicate:
!OR
- !EQ
- !ITEM EVENT log.level
- "error"
- !EQ
- !ITEM EVENT log.level
- "critical"
- !EQ
- !ITEM EVENT log.level
- "emergency"
evaluate:
dimension: [source.ip, destination.ip]
by: "@timestamp"
resolution: 60
analyze:
window: hopping
aggregate: sum
span: 10
test:
!GE
- !ARG
- 5
trigger:
- event:
threat.indicator.confidence: "Medium"
threat.indicator.ip: !ITEM EVENT source.ip
threat.indicator.port: !ITEM EVENT source.port
threat.indicator.type: "ipv4-addr"
Section define
¶
This section contains the common definition and meta data.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be correlator/window
.
Item description
(optional)¶
Longed, possibly multiline, human-readable description of the declaration.
Section logsource
¶
Specifies the types of event lanes that should the incoming events be read from.
Section predicate
¶
The predicate
filters incoming events using an expression.
If the expression returns True
, the event will enter evaluate
section.
If the expression returns False
, then the event is skipped.
Other returned values are undefined.
Include of nested predicate filters¶
Predicate filters are expressions located in a dedicated file, that can be included in many different predicates as their parts.
If you want to include an external predicate filter, located either in the library,
use !INCLUDE
statement:
!INCLUDE /predicate_filter.yaml
where /predicate_filter
is the path of the file in the library.
The content of predicate_filter.yaml
is an expression to be included, like:
---
!EQ
- !ITEM EVENT category
- "MyEventCategory"
Section evaluate
¶
The evaluate
section specifies primary key, resolution and other attributes that are applied on the incoming event.
The evaluate
function is to add the event into the two dimensional structure, defined by a time and a primary key.
Item dimension
¶
Specifies simple or compound primary key (or dimension) for the event.
The dimension
is defined by names of the input event fields.
Example of the simple primary key:
evaluate:
dimension: [source.ip]
Note
Tenant is added automatically to the dimension list.
Example of the compound primary key:
evaluate:
dimension: [source.ip, destination.ip]
If exactly one dimension like DestinationHostname
is a list in the original event
and the correlation should happen for each one of the dimension values, the dimension should be wrapped in [
]
:
evaluate:
dimension: [source.ip, destination.ip, [DestinationHostname] ]
Item by
¶
Specified the name of the field of the input event that contains a date/time information, which will be used for evaluation. Default is: @timestamp
.
Item event_count
(optional)¶
Name of the attribute, that specifies the count for correlation within one event, hence influencing the "sum of events" in analysis. Defaults to 1.
Item resolution
¶
Specifies the resolution of the time aggregation of the correlator. The unit is seconds.
evaluate:
resolution: 3600 # 1 hour
Default value: 3600
Item saturation
(optional)¶
Specifies the duration of the silent
time interval after the trigger is fired.
It is specific for the dimension.
The unit is resolution.
Default value: 3
Section analyze
(optional)¶
The section analyze
contains the configuration of the time window that is applied on the input events.
The result of the time window analysis is subjected to the configurable test. When the test is successful (aka returns True
), the trigger
is fired.
Note: The section is optional, the default behavior is to fire the trigger
when there is at least one event in the tumbling of the span equals 2.
Item when
(optional)¶
Specifies when the analysis of the events in the windows should happen.
Options:
event
(default): Analysis happens after an event comes and is evaluated, usually useful for match and arithmetic correlationperiodic/...
: Analysis happens after a specified interval in seconds, such asperiodic/10
(every 10 seconds),periodic/1h
(every 3600 seconds / one hour) etc. Usually useful for UEBA evaluation.
Periodic analysis requires the time window resolution and span to be set properly, so the analysis does not happen too often.
Item window
(optional)¶
Specifies what kind of time window to use.
Options:
tumbling
: Fixedspan
(duration), non-overlapping, gap-less contiguous time intervalshopping
: Fixedspan
(duration), overlapping windows contiguous time intervals
Default value: hopping
Item span
¶
Specifies the width of the window.
The unit is resolution
.
Item aggregate
(optional)¶
Specifies what aggregation functions to be applied on events in the window.
Aggegate functions¶
sum
: Summationmedian
: Medianaverage
: The (weighted) averagemean
: The arithmetic meanstd
: The standard deviationvar
: The variancemean spike
: For spike detection. The baseline is mean value, return the percentage.median spike
: For spike detection. The baseline is median value, return the percentage.unique count
: For a unique count of the event attribute(s):dimension
has to be provided.
Default value: sum
Example of the unique count:
analyze:
window: hopping
aggregate: unique count
dimension: client.ip
span: 6
test:
!GE
- !ARG
- 5
Trigger when 5 and more unique client.ip
s are observed.
Item test
(optional)¶
The test
is an expression that is applied on the output of the aggregate
calculation.
If the expression returns True
, the trigger
will be fired if a dimension is not already saturated.
If the expression returns False
, then no action is taken.
Other returned values are undefined.
Section trigger
¶
The trigger
section specifies what kinds of actions to be taken when the trigger
is fired by test
in the analyze
section.
See correlator triggers chapter for details.
Stashing Correlator¶
In the field of cybersecurity, a stashing correlator is a tool that collects and temporarily stores related events based on specific identifiers, such as message IDs. This process helps track and analyze sequences of events over time, allowing for the identification of patterns or anomalies that may indicate security threats. By grouping related events, the stashing correlator enhances the ability to monitor and respond to potential cyber incidents, providing a comprehensive view of activities that may not be apparent when examining individual events alone.
Analogy for Better Understanding¶
Think of the stashing correlator as a detective piecing together clues from different sources. Imagine a detective gets two notes: one with a person's name and another with their location. Individually, these notes don't give much information, but the detective notices both notes have the same case number (ID). The detective combines them to understand who is where. Similarly, the stashing correlator joins related events using a common attribute, like an ID, to create a complete picture for further analysis.
Example Scenario¶
Imagine you receive two pieces of mail: one telling you who sent a letter and the other telling you who received it. Individually, these pieces of mail don't provide the full picture. The stashing correlator is like a smart assistant that notices both pieces have the same tracking number (message ID) and combines them into one comprehensive record, showing both the sender and the recipient. This helps in understanding the full context of the communication.
Consider the following logs from a Postfix event lane for a given tenant:
June 14 09:19:21 alice postfix/qmgr[59833]: F3710A248D: from=<alice@example.com>, size=304, nrcpt=1 (queue active)
June 14 09:19:21 alice postfix/local[60446]: F3710A248D: to=<bob@example.com>, orig_to=<alice>, relay=local, delay=0.04, delay>
These logs are processed separately by LogMan.io Parsec. The first log indicates who sent the email (alice@example.com
), and the second log shows the recipient (bob@example.com
). The parsed logs look like this:
# Log #1
{
"email.message_id": "F3710A248D",
"email.from.address": ["alice@example.com"],
...
}
# Log #2
{
"email.message_id": "F3710A248D",
"email.to.address": ["bob@example.com"],
...
}
To connect all the information for future analysis, it is necessary to consolidate the events into a single log that contains both the sender and recipient information. The stashing correlator performs this task by joining the parsed events using a common attribute, such as the message ID (F3710A248D
). If no other event with the same message ID arrives within the send_after_seconds period, a new event is created that includes all the gathered information:
# Stashed log
{
"email.message_id": "F3710A248D",
"email.from.address": ["alice@example.com"],
"email.to.address": ["bob@example.com"],
...
}
This consolidated log can then be analyzed by other detection correlators, such as the Window Correlator, to further investigate and respond to potential security incidents.
Sample¶
The following sample stashes events by their message IDs and sends them after 10 seconds of no activity:
---
define:
name: "Stashing events by message ID"
description: "Example for stashing events by message ID"
type: correlator/stashing
logsource:
vendor: [WietseVenama]
predicate:
!IN
what: message.id
where: !EVENT
stash:
dimension: message.id
send_after_seconds: 10 # Send the stashed message after seconds with no activity
Section define
¶
This section contains the common definition and metadata.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be correlator/stashing
.
Item description
(optional)¶
Longer, possibly multiline, human-readable description of the declaration.
Section logsource
¶
Specifies the sources of the logs, indicating the vendors or products whose events coming from event lanes will be processed.
Section predicate
¶
The predicate
filters incoming events using an expression. If the expression returns !!true
, the event will enter the stash
section. If !!false
, the event is skipped.
Include of nested predicate filters¶
Predicate filters are expressions located in a dedicated file, that can be included in many different predicates as their parts.
To include an external predicate filter:
!INCLUDE /predicate_filter.yaml
where /predicate_filter is the path of the file in the library.
Section stash
¶
Specifies the stashing behavior, including the dimension for grouping events and the timeout for sending stashed events after inactivity.
Item dimension
¶
Specifies the primary key (or dimension) for the event. Defined by names of the input event fields.
Example of a primary key:
stash:
dimension: [message.id]
Item send_after_seconds
¶
Specifies the timeout for sending stashed events after a period of inactivity. The unit is seconds.
stash:
send_after_seconds: 10 # 10 seconds
Summary¶
In summary, the stashing correlator acts as a smart organizer, collecting and joining related events to create a full picture for further analysis. This helps in identifying patterns and anomalies in cybersecurity, providing a more effective way to monitor and respond to potential incidents.
Entity Correlator¶
Window correlator detects incoming events based on the predicate
section and stores them in data structures based on the evaluate
section. If the dimension detected from the event does not produce any data, lost
in the triggers
section is called, otherwise seen
in triggers
section is called.
Example¶
define:
name: Detection of user entity behavior
description: Detection of user entity behavior
type: correlator/entity
span: 5
delay: 5m # analysis time = delay + resolution
logsource:
vendor: "Microsoft"
predicate:
!AND
- !EQ
- !ITEM EVENT message
- "FAIL"
- !EQ
- !ITEM EVENT device.vendor
- "Microsoft"
- !EQ
- !ITEM EVENT device.product
- "Exchange Server"
evaluate:
dimension: [source.user]
by: @timestamp # Name of event field with an event time
resolution: 60 # unit is second
lookup_seen: active_users
lookup_lost: inactive_users
triggers:
lost:
- event:
severity: "Low"
seen:
- event:
severity: "Low"
Section define
¶
This section contains the common definition and meta data.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be correlator/entity
.
Item span
¶
Specifies the width of the window.
The unit is resolution
.
Item delay
(optional)¶
Analysis happens after a specified period in seconds and this period is based on resolution
.
If there is a need to prolong the period and hence delay the analysis, delay
option can be specified, such as 300
(300 seconds), 1h
(3600 seconds / one hour) etc.
Item aggregation_count_field
¶
Name of the attribute, that specifies the number of events within one aggregated event, hence influencing the sum of events in analysis. Defaults to 1.
Item description
(optional)¶
Longed, possibly multiline, human-readable description of the declaration.
Section logsource
¶
Specifies the types of event lanes that should the incoming events be read from.
Section predicate
(optional)¶
The predicate
filters incoming events using an expression.
If the expression returns True
, the event will enter evaluate
section.
If the expression returns False
, then the event is skipped.
Other returned values are undefined.
Section evaluate
¶
The evaluate
section specifies primary key, resolution and other attributes that are applied on the incoming event.
The evaluate
function is to add the event into the two dimensional structure, defined by a time and a primary key.
Item dimension
¶
Specifies simple or compound primary key (or dimension) for the event.
The dimension
is defined by names of the input event fields.
Example of the simple primary key:
evaluate:
dimension: [source.ip]
Note
Tenant is added automatically to the dimension list.
Example of the compound primary key:
evaluate:
dimension: [source.ip, destination.ip]
If exactly one dimension like DestinationHostname
is a list in the original event
and the correlation should happen for each one of the dimension values, the dimension should be wrapped in [
]
:
evaluate:
dimension: [source.ip, destination.ip, [DestinationHostname] ]
Item by
¶
Specified the name of the field of the input event that contains a date/time information, which will be used for evaluation. Default is: @timestamp
.
Item resolution
(optional)¶
Specifies the resolution of the time aggregation of the correlator. The unit is seconds.
evaluate:
resolution: 3600 # 1 hour
Default value: 3600
Item lookup_seen
¶
lookup_seen
specifies the lookup ID where to write seen entities with the last seen time
Item lookup_lost
¶
lookup_lost
specifies the lookup ID where to write lost entities with the last analysis time
Section triggers
¶
The triggers
section specifies kinds of actions to be taken when the periodic analysis happens.
The supported actions are lost
and seen
.
See correlator triggers chapter for details.
seen
triggers¶
Seen triggers are executed when the analysis finds events, that entered the window in the analyzed time.
The dimension (entity name) can be obtained via !ITEM EVENT dimension
.
Timestamp of the last event, that came to the window in the specified dimension, can be obtained via !ITEM EVENT last_event_timestamp
(entity updated).
Example:
seen:
- lookup: user_inventory
key: !ITEM EVENT dimension
set:
last_seen: !ITEM EVENT last_event_timestamp
lost
triggers¶
Lost triggers are executed when the analysis discovers, that no events came to the specified dimension in the analyzed time (entity drop).
The dimension (entity name) can be obtained via !ITEM EVENT dimension
.
Example:
lost:
- event:
severity: "Low"
dimension: !ITEM EVENT dimension
Match Correlator¶
Match correlator detects incoming events based on the predicate
section. If the event matches the predicate filter, trigger
section is called.
Hint
Always consider using Window Correlator instead of Match Correlator, as Match Correlator produces one output event per one input event and so does not do any grouping of incoming events based on time.
Sample¶
---
define:
name: "Network T1046 Network Service Discovery"
description: "Detects a connection between two IP addresses"
type: correlator/match
logsource:
type: "Network"
mitre:
technique: "T1046"
tactic: "TA0007"
predicate:
!OR
- !EQ
- !ITEM EVENT log.level
- "error"
- !EQ
- !ITEM EVENT log.level
- "critical"
- !EQ
- !ITEM EVENT log.level
- "emergency"
trigger:
- event:
threat.indicator.confidence: "Medium"
threat.indicator.ip: !ITEM EVENT source.ip
threat.indicator.port: !ITEM EVENT source.port
threat.indicator.type: "ipv4-addr"
Section define
¶
This section contains the common definition and meta data.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be correlator/match
.
Item description
(optional)¶
Longed, possibly multiline, human-readable description of the declaration.
Section logsource
¶
Specifies the types of event lanes that should the incoming events be read from.
Section predicate
¶
The predicate
filters incoming events using an expression.
If the expression returns True
, the event will enter trigger
section.
If the expression returns False
, then the event is skipped.
Other returned values are undefined.
Section trigger
¶
The trigger
section specifies what kinds of actions to be taken when the trigger
is fired by success result in predicate
section.
See correlator triggers chapter for details.
Triggers¶
Triggers define output of correlators, baselines etc.
They live in the trigger
section of the correlator.
Each rule in the library can define many triggers (it is a list).
The trigger can access the original event by !EVENT
statement, it is the last event that passed evaluation test.
The value from the aggregator function is availabe at !ARG
.
event
trigger¶
This trigger inserts a new event
into the complex event lane.
Example of the event trigger:
trigger:
- event:
threat.indicator.confidence: "Medium"
threat.indicator.ip: !ITEM EVENT source.ip
threat.indicator.port: !ITEM EVENT source.port
threat.indicator.type: "ipv4-addr"
There may be up to 5 results, like in mean spike
aggregator:
trigger:
- event:
events: !ARG EVENTS
MeanSpike:
!GET
from: !ARG RESULTS
what: 0
MeanSpikeLastCount:
!GET
from: !ARG RESULTS
what: 1
MeanSpikeMean:
!GET
from: !ARG RESULTS
what: 2
lookup
trigger¶
Lookup trigger manipulates with the content of the lookup. It means that it can add (set), increment (add), decrement (sub) and remove (delete) an entry in the lookup.
The entry is identified by a key
, which is a unique primary key.
Example of the trigger that adds an entry to the lookup user_list
:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
set:
score: 1
Example of the trigger that removes an entry from the lookup user_list
:
trigger:
- lookup: user_list
delete: !ITEM EVENT user.name
Example of the trigger that increments a counter (field my_counter
) in the entry of the lookup user_list
:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
add: my_counter
Example of the trigger that decrements a counter (field my_counter
) in the entry of the lookup user_list
:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
sub: my_counter
If the counter field does not exist, it is created with the default value of 0.
notification
trigger¶
This trigger inserts a new notification into the primary data path, that is read by asab-iris
.
Example of the notification trigger:
- notification:
type: email
template: "/Templates/Email/notification_4728.md"
to: eliska.novotna@teskalabs.com
variables:
name: "brute-force"
events: !ARG
Ended: Correlator
Alerts ↵
LogMan.io Alerts¶
TeskaLabs LogMan.io Alerts is a microservice responsible for managing alert/incident tickets, providing an API and Kafka handler through the lmio-alerts
topic.
LogMan.io Alerts configuration¶
LogMan.io Alerts has the following dependencies:
- Apache ZooKeeper
- NGINX (for production deployments)
- Apache Kafka
- MongoDB
- Elasticsearch
- TeskaLabs SeaCat Auth
- LogMan.io Library with an
/Alerts
folder and a schema in the/Schemas
folder
Example¶
This example is the most basic configuration required for LogMan.io Alerts:
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=zk:///library
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
[elasticsearch]
url=http://es01:9200/
[asab:storage]
mongodb_uri=mongodb://mongodb1,mongodb2,mongodb3/?replicaSet=rs0
[auth]
multitenancy=yes
public_keys_url=http://localhost:8081/openidconnect/public_keys
Zookeeper¶
Specify locations of Zookeeper servers in the cluster.
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Hint
For non-production deployments, the use of a single Zookeeper server is possible.
Library¶
Specify the path(s) to the Library from which to load declarations.
[library]
providers=zk:///library
Hint
Since the ECS.yaml
schema in /Schemas
is utilized by default, consider using the LogMan.io Common Library.
Kafka¶
Specify bootstrap servers of the Kafka cluster.
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
Hint
For non-production deployments, the use of a single Kafka server is possible.
ElasticSearch¶
Specify URLs of Elasticsearch master nodes.
Elasticsearch is used to load events associated with a ticket.
[elasticsearch]
url=http://es01:9200/
username=MYUSERNAME
password=MYPASSWORD
MongoDB¶
Specify the URL of the MongoDB cluster with replica set.
Tickets are stored to MongoDB.
[asab:storage]
type=mongodb
mongodb_uri=mongodb://mongodb1,mongodb2,mongodb3/?replicaSet=rs0
Auth¶
The Auth section ensures that users can access only their own assigned tenants to set up alerts, supporting multitenancy.
It also checks for the resources mentioned in the alert/incident workflow declaration.
[auth]
multitenancy=yes
public_keys_url=http://localhost:8081/openidconnect/public_keys
Input¶
The Alerts microservice contains a Kafka interface that reads incoming alerts from the lmio-alerts
topic. The topic name or the group ID can be changed using:
[pipeline:TicketPipeline:KafkaSource]
topic=lmio-alerts
group_id=lmio-alerts
Note
Changing the input topic for alerts is discouraged to avoid unnecessary complications.
Output for event trigger¶
[pipeline:OutputPipeline:KafkaSink]
topic=lmio-events-complex
Warning
The event
trigger should not be used in any alert's workflow declarations. Use notifications instead.
Worfklow¶
The location of workflows for alerts and incidents is always /Alerts/Workflow
.
Web APIs¶
Alerts provides one web API.
The web API is designed for communication with the UI.
[web]
listen=0.0.0.0 8953
The default port of the public web API is tcp/8953
.
This port is designed to serve as the NGINX upstream for connections from collectors.
Workflow¶
Workflow in Alerts¶
TeskaLabs LogMan.io Alerts reads the following workflows:
/Alerts/Workflow/alert.yaml
: Workflow used for tickets with the typealert
/Alerts/Workflow/incident.yaml
: Workflow used for tickets with the typeincident
Declaration¶
This is the most basic possible example of a workflow definition, located in the /Alerts/Workflow
folder in the Library:
---
define:
type: alerts/workflow
workflow:
open:
label: "Open"
transitions:
triaged:
resources: lmio:alert:triaged-to-new
closed: {}
triaged:
label: "Triaged"
transitions:
closed: {}
closed:
label: "Closed"
trigger:
...
The workflow specifies the states the given ticket may enter, with the first one being the one the newly created ticket is assigned with.
Each state (here open
, triaged
, and closed
) contains the following attributes:
Label¶
The label
attribute is a string shown to the user in the UI.
Transitions¶
Defines possible transitions to other states. The states are listed below with either an empty braces {}
dictionary as the value, or the name or list of the resource(s) the given user must be assigned to in order to move the ticket to the state specified by the transition.
When a ticket changes its state, a trigger
(if specified) section is called.
Taxonomy of alerts¶
TeskaLabs LogMan.io provides a taxonomy for organizing and managing the various artefacts generated within the system, making it easier for cybersecurity analysts to prioritize their workload and respond effectively to security threats.
The taxonomy is organized into following tree:
- Event
- Log
- Complex
- Ticket
- Alert
- Incident
Here is the explaination of each category and their subcategories:
Event¶
Events are records of activities that occur within an organization's network, systems, or applications.
Events can be further classified into:
Log¶
Logs are basic records generated by various devices, systems, or applications that store information about their activity. Examples include firewall logs, server logs, or application logs. These logs help analysts understand what is happening within the organization's environment and can be used for detecting security threats and anomalies.
Complex¶
Complex events refer to correlated or aggregated events that may indicate a security incident or require further analysis. They are generated by correlators and other detectors that gather events from various sources, analyze them, and create alerts based on predefined rules or machine learning algorithms.
Ticket¶
Tickets are created by cyber security analysts or automated correlators and detectors to track and manage security events that require attention. The ticket can refer to zero, one or more events.
Tickets can be further classified into:
Alert¶
Alerts are generated when a specific event, series of events, or anomaly is detected that may indicate a potential security threat. Alerts typically require immediate attention from cybersecurity analysts to triage, investigate, and determine if the ticket is a genuine security incident.
Incident¶
Incidents are confirmed security events that have been investigated and classified as genuine threats. They represent a higher level of severity than alerts and often involve a coordinated response from multiple teams, such as incident response or network administration, to contain, remediate, and recover from the threat.
Ended: Alerts
Lookups ↵
Lookups¶
Lookups are dictionaries of entities with attributes that are relevant either for parsing or for detection of cybersecurity incidents.
Lookups can be:
- A simple list of suspicious IP addresses, active VPN connections, etc.
- Dictionaries of user names with user attributes like user.id, user.email, etc.
- Dictionaries of compound keys like IP address and user name combinations for monitoring user activity.
What do lookups do?
Lookups, being like dictionaries, contain additional useful information about the data you already have that can make your logs more informative and valuable.
A simple example:
Your organization has logs about sent emails, which include the email address of the sender.
However, you want the logs in your LogMan.io UI to include the sender’s name, not just their email address.
So, you have a lookup in which each item is an employee's email address with the employee's name associated.
If you use this lookup in the enrichment part of the parsing process, the parser “looks up” the employee’s name based on their email address in this dictionary-like lookup and includes the employee’s name in the log.
Quickstart¶
In order to set up lookups:
- Create a lookup declaration in the LogMan.io Library (the lookup description)
- Create the lookup and its content in the Lookups section in the UI (the lookup content)
- Add the lookup to the relevant parsing and/or correlation rules in the Library (the lookup application)
Note
Make sure all relevant components are deployed, see Deployment.
Declarations¶
All lookups are defined by their declarations stored in the /Lookups
folder.
The naming convention for declarations is lookupname.yaml
, for instance myuserlookup.yaml
:
---
define:
type: lookup
name: myuserlookup
keys:
- name: userid
type: str
fields:
username:
type: str
In define
, specify the lookup type
, lookup name
(tenant information will be added automatically), keys with their names (optional) and types and fields in the output record structure. The record structure is NOT based on a schema and should NOT contain periods.
Note
Names of keys and fields cannot contain special characters like a period, etc.
Lookup types¶
Generic lookups¶
Generic lookups serve to create list of keys or key-value pairs. The type
in the declaration in the define
section is just lookup
:
---
define:
type: lookup
...
When it comes to parsing, generic lookups can be used only in the standard enricher with the !LOOKUP
expression.
For more information about generic Lookups, see Generic Lookups.
IP address lookups¶
IP address Range Lookup¶
IP address Range Lookup uses the IP address ranges, such as 192.168.1.1
to 192.168.1.10
, as keys.
The declaration of an IP address range lookup must contain type lookup/ipaddressrange
in the define
section and two keys with type ip
in the keys
section:
define:
type: lookup/ipaddressrange
name: mylookup
group: mygroup
keys:
- name: range1
type: ip
- name: range2
type: ip
fields:
...
Single IP address lookup¶
A single IP address lookup is a lookup that has exactly one IP address key with type ip
that can be associated with an optional and variable number of attributes, defined by none or multiple values under fields
.
In order to use single IP lookups together with the following enrichers, the type of the lookup in the define
section must always be lookup/ipaddress
.
---
define:
type: lookup/ipaddress
name: mylookup
group: mygroup
keys:
- name: sourceip
type: ip
fields:
...
For more information about IP address lookups, see IP Address Lookups.
Deployment¶
Design¶
lmio-watcher
¶
LogMan.io Watcher manages the content of lookups in Elasticsearch. Watcher reads the lookup events from HTTP(S) API and Kafka.
lmio-lookupbuilder
¶
LogMan.io Lookup Builder takes generic lookup contents from Elasticsearch and lookup declarations from the Library and builds lookup binary files. The lookup binary files are then used by other microservices such as LogMan.io Parsec, LogMan.io Correlator, etc.
lmio-ipaddrproc
¶
LogMan.io IP Address Processor takes IP address lookup contents from Elasticsearch and lookup declarations from thè Library and builds IP lookup binary files. The IP lookup binary files are then used by other microservices such as LogMan.io Parsec, LogMan.io Correlator, etc. It also downloads built-in lookups from Azure storage from the internet.
Step-by-step guide¶
In order to work with lookups, follow these deployment steps. For more information about what lookups are and what they are used for, go to Lookups.
-
At every machine within the LogMan.io cluster, deploy one instance of LogMan.io Watcher.
-
At every machine within the LogMan.io cluster, deploy one instance of
lmio-lookupbuilder
.The information about configuration and records in Docker Compose is located in the Configuration section.
-
At every machine within the LogMan.io cluster, deploy one instance of
lmio-ipaddrproc
The information about configuration and records in Docker Compose is located in the Configuration section.
-
Add the path to the
/lookups
folder to the Docker Compose volumes section of every instance of LogMan.io Parsec, LogMan.io Correlator, LogMan.io Alerts, and LogMan.io Baseliner. The path is by default:volumes: - /data/ssd/lookups:/lookups
-
Include LogMan.io Watcher in the configuration file of every NGINX instance as a location record to
/api/lmio-lookups
:location /api/lmio-lookup { auth_request /_oauth2_introspect; rewrite ^/api/lmio-lookup/(.*) /$1 break; proxy_pass http://lmio-watcher; }
Notice the proxy_pass that points to
lmio-watcher
upstream, which should be defined at the top of each NGINX configuration file:upstream lmio-watcher { server HOSTNAME_OF_FIRST_SERVER_IN_THE_CLUSTER:8952 max_fails=0 fail_timeout=30s; server HOSTNAME_OF_SECOND_SERVER_IN_THE_CLUSTER:8952 max_fails=0 fail_timeout=30s; server HOSTNAME_OF_THIRD_SERVER_IN_THE_CLUSTER:8952 max_fails=0 fail_timeout=30s; }
Replace
HOSTNAME_OF_FIRST_SERVER_IN_THE_CLUSTER
,HOSTNAME_OF_SECOND_SERVER_IN_THE_CLUSTER
,HOSTNAME_OF_THIRD_SERVER_IN_THE_CLUSTER
with the hostnames of the servers that LogMan.io Watcher is deployed to in the LogMan.io cluster environment.That's it! Now you are ready to create lookup declarations and lookup content. Go back to Lookups for next steps.
Configuration¶
lmio-lookupbuilder
¶
LogMan.io Lookup Builder takes generic lookup contents from Elasticsearch and lookup declarations from Library and builds lookup binary files. The lookup binary files are then used by other microservices such as LogMan.io Parsec, LogMan.io Correlator, etc.
LogMan.io Lookup Builder has the following dependencies:
- Elasticsearch
- Zookeeper
- Library
- Tenants to build lookups for
Docker Compose¶
lmio-lookupbuilder:
network_mode: host
image: docker.teskalabs.com/lmio/lmio-lookupbuilder:VERSION
volumes:
- ./lmio-lookupbuilder:/conf
- /data/ssd/lookups:/lookups
restart: always
logging:
options:
max-size: 10m
Configuration file¶
This is the most basic required configuration:
[tenants]
ids=mytenant
[elasticsearch]
url=http://es01:9200/
username=MYUSERNAME
password=MYPASSWORD
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=zk:///library
Alternatively, instead of specifying tenant ids directly you can add all tenants from the LogMan.io cluster with the following configuration:
[tenants]
tenant_url=http://<SEACAT_AUTH_NODE>:3081/tenant
Replace <SEACAT_AUTH_NODE>
with the hostname where SeaCat Auth service runs.
lmio-ipaddrproc
¶
LogMan.io IP Address Processor takes IP adress lookup contents from Elasticsearch and lookup declarations from the Library and builds IP lookup binary files. The IP lookup binary files are then used by other microservices such as LogMan.io Parsec, LogMan.io Correlator, etc. It also downloads built-in lookups from Azure storage from the internet.
LogMan.io IP Address Processor has the following dependencies:
- ElasticSearch
- Zookeeper
- Library
- Tenants to build lookups for
Docker Compose¶
lmio-ipaddrproc:
network_mode: host
image: docker.teskalabs.com/lmio/lmio-ipaddrproc:VERSION
volumes:
- ./lmio-ipaddrproc:/conf
- /data/ssd/lookups:/lookups
restart: always
logging:
options:
max-size: 10m
Configuration file¶
This is the most basic required configuration:
[tenants]
ids=mytenant
[elasticsearch]
url=http://es01:9200/
username=MYUSERNAME
password=MYPASSWORD
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=zk:///library
Alternatively, instead of specifying tenant ids directly you can add all tenants from the LogMan.io cluster with the following configuration:
[tenants]
tenant_url=http://<SEACAT_AUTH_NODE>:3081/tenant
Replace <SEACAT_AUTH_NODE>
with the hostname where SeaCat Auth service runs.
Generic lookups¶
TeskaLabs LogMan.io generic lookups serve to create lists of keys or key-value pairs. The type
in the declaration in the define
section is just lookup
:
---
define:
type: lookup
...
When it comes to parsing, generic lookups can be used only in the standard enricher with the !LOOKUP
expression.
Creating a generic lookup¶
There are always three steps to enable lookups:
- Create a lookup declaration in the LogMan.io Library (the lookup description)
- Create the lookup and its content in the Lookups section in the UI (the lookup content)
- Add the lookup to the relevant parsing and/or correlation rules in the Library (the lookup application)
Use case: User lookup¶
A user lookup is used to get user information such as username and email by the user ID.
-
In LogMan.io, go to the Library.
-
In the Library, go to the folder
/Lookups
. -
Create a new lookup declaration for your lookup, such as "userlookup.yaml", making sure the file has a YAML extension
-
Add the following declaration:
define: type: lookup name: userlookup group: user keys: - name: userid type: str fields: user_name: type: str email: type: str
Make sure the
type
is alwayslookup
.Change the
name
in thedefine
section to your lookup name.The
group
is used then in the enrichment process to locate all lookups that share the same group. The value is a unique identifier of the group (use case), here:user
To
fields
, add names and types of the lookup attributes. This example usesuser_name
andemail
as strings.Currently, these types are supported:
str
,fp64
,si32
,geopoint
, andip
. -
Save the declaration.
-
In LogMan.io, go to Lookups.
-
Create a new lookup with the same name as above, i.e. "userlookup". Specify the user ID as the key.
-
Create records in the lookup with the user ID as the key and fields as specified above.
-
Add the following enricher to the LogMan.io Parsec rule that should utilize the lookup:
define: type: enricher/standard enrich: user_name: !GET from: !LOOKUP what: userlookup what: !GET from: !EVENT what: user.id
This sample enricher obtains
user_name
from theuserlookup
based on theuser.id
attribute from the parsed event.
IP address lookups¶
TeskaLabs LogMan.io offers an optimized set of lookups for IP addresses, called IP Lookups.
There are always three steps to enable IP Lookups:
- Create a lookup declaration in the LogMan.io Library (the lookup description)
- Create the lookup and its content in the Lookups section in the UI (the lookup content)
- Add the lookup to the relevant parsing and/or correlation rules in the Library (the lookup application)
IP address to geographical location lookup¶
IP Geo Location
is when, based on IP address range such as 192.168.1.1
to 192.168.1.10
, you want to obtain the geographic location of the IP address such as city name, latitude, longitude etc.
Built-in IP address to geographical location lookup
When the IP address from the event does not match any of the provided geo
lookups, the default public IP lookup provided by TeskaLabs LogMan.io will be used.
-
In LogMan.io, go to the Library.
-
In the Library, go to the folder
/Lookups
. -
Create a new lookup declaration for your lookup, like "ipgeolookup.yaml" with a YAML extension
-
Add the following declaration:
define: type: lookup/ipaddressrange name: ipgeolookup group: geo keys: - name: range1 type: ip - name: range2 type: ip fields: location: type: geopoint value: lat: 50.0643081 lon: 14.443938 city_name: type: str
Make sure the
type
is alwayslookup/ipaddressrange
.Change the
name
in thedefine
section to your lookup name.The
group
is used then in the enrichment process to locate all lookups that share the same group. The value is a unique identifier of the group (use case), here:geo
Keep the keys as they are in order to specify ranges.
To
fields
, add names and types of the lookup attributes. Here in the example there is only city, but there can also be location (geolocation latitude and longitude) etc:fields: name: type: str continent_name: type: str city_name: type: str location: type: geopoint
When using the Elastic Common Schema (ECS), all available geo fields that can be used are specified in the documentation: https://www.elastic.co/guide/en/ecs/current/ecs-geo.html
The
value
attribute will be used as default.Currently, these types are supported:
str
,fp64
,si32
,geopoint
, andip
-
Save
-
In LogMan.io, go to Lookups.
-
Create a new lookup with the same name as above, i.e. "ipgeolookup". Specify two keys with the names:
range1
,range2
. -
Create records in the lookup with the ranges as keys and fields as specified above (in the example, there is only city in the value dictionary stored in the lookup).
-
Add the following enricher to the LogMan.io Parsec rule that should utilize the lookup:
define: type: enricher/ip group: geo schema: ecs: postfix: geo.
Specify the group of the lookups to be used in the
group
attribute. It should be the same as the group mentioned above in the lookup declaration. Tenants are resolved automatically.The enrichment is done on every field that has the type
ip
in the schema.Postfix specifies the postfix for the attribute:
If input is
source.ip
Then output is
source.geo.<NAME_OF_THE_ATTRIBUTE>
When it comes to default public GEO lookup (see above), the following items are filled by default:
city_name: type: str country_iso_code: type: str location: type: geopoint region_name: type: str
IP address range lookup¶
The IP address range lookup uses the IP address ranges, such as 192.168.1.1 to 192.168.1.10, as keys.
The declaration of an IP address range lookup must contain type lookup/ipaddressrange
in the define
section and two keys with type ip
in the keys
section:
define:
type: lookup/ipaddressrange
name: mylookup
group: mygroup
keys:
- name: range1
type: ip
- name: range2
type: ip
fields:
...
Use case: Private IP address to zone enrichment¶
You can use the IP-to-zone lookup when, based on IP address range such as 192.168.1.1
to 192.168.1.10
, you want to obtain the zone name, floor name and other information (like a company's building, if it is a private or public zone etc.) etc.
Hint
Use IP-to-zone lookups for private IP address enrichment.
-
In LogMan.io, go to the Library.
-
In Library, go to the folder
/Lookups
. -
Create a new lookup declaration for your lookup, like "ipzonelookup.yaml" with a YAML file extension
-
Add the following declaration:
define: type: lookup/ipaddressrange name: ipzonelookup group: zone keys: - name: range1 type: ip - name: range2 type: ip fields: location: type: geopoint value: lat: 50.0643081 lon: 14.443938 zone_name: type: str value: myzone floor_name: type: str
Make sure the type is always
lookup/ipaddressrange
.Change the
name
in define section to your lookup name.The
group
is used then in the enrichment process to locate all lookups that share the same group. The value is a unique identifier of the group (use case), here:zone
Keep the keys as they are in order to specify ranges.
To
fields
, add names and types of the lookup attributes. Here in the example there is only floor name, but there can also be room name, company name etc.:yaml fields: floor_name: type: str
The
value
attribute will be used as default.Currently, these types are supported:
str
,fp64
,si32
,geopoint
, andip
-
Save
-
In LogMan.io, go to Lookups.
-
Create a new lookup with the same name as above, i.e. "ipzonelookup". Specify two keys with the names:
range1
,range2
. -
Create records in the lookup with the ranges as keys and fields as specified above (in the example, there is only floor in the value dictionary stored in the lookup).
-
Add the following enricher LogMan.io Parsec rule that should utilize the lookup:
define: type: enricher/ip group: floor schema: ecs: prefix: lmio.ipenricher. postfix: zone.
Specify the group of the lookups to be used in the
group
attribute. It should be the same as the group mentioned above in the lookup declaration. Tenants are resolved automatically.The enrichment is done on every field that has the type
ip
in the schema.Prefix specifies the prefix, and postfix specifies the postfix for the attribute:
If input is
source.ip
, then output islmio.ipenricher.source.zone.<NAME_OF_THE_ATTRIBUTE>
Single IP address lookup¶
The single IP address lookup is a lookup that has exactly one IP address key with type ip
that can be associated with an optional and variable number of attributes, defined by none or multiple values under fields
.
In order to use single IP lookups together with the following enrichers, the type of the lookup in the define
section must always be lookup/ipaddress
.
---
define:
type: lookup/ipaddress
name: mylookup
group: mygroup
keys:
- name: sourceip
type: ip
fields:
...
Use case: Bad IP addresses lookup¶
You can use bad IP enrichment when, based on a single IP address such as 192.168.1.1
, you want to obtain the information about the IP's risk score, etc.
-
In LogMan.io, go to the Library.
-
In the Library, go to the folder
/Lookups
. -
Create a new lookup declaration for your lookup, like "badips.yaml" with a YAML file extension.
-
Add the following declaration:
--- define: type: lookup/ipaddress name: badips group: bad keys: - name: sourceip type: ip fields: base: type: si32
Make sure the type is always
lookup/ipaddress
.Change the
name
in define section to your lookup name.The
group
is used then in the enrichment process to locate all lookups that share the same group. The value is a unique identifier of the group (use case), here:bad
Keep one key in
keys
section with the typeip
. The name should not contain periods or any other special characters.To fields, add names and types of the lookup attribute. Here in the example there is
base
as integer, but there can also be other security-related fields from https://www.elastic.co/guide/en/ecs/current/ecs-vulnerability.html:fields: base: type: si32
Currently, these types are supported:
str
,fp64
,si32
,geopoint
, andip
-
Save
-
In LogMan.io, go to "Lookups".
-
Create a new lookup with the same name as above, i. e. badips. Specify the IP address as key.
-
Create records in the lookup with the IP address as the key and fields as specified above (in the example, there is only
base
in the value dictionary stored in the lookup). -
Add the following enricher LogMan.io Parsec rule that should utilize the lookup:
define: type: enricher/ip group: bad schema: ecs: # https://www.elastic.co/guide/en/ecs/current/ecs-vulnerability.html prefix: lmio.vulnerability. postfix: score.
Specify the group of the lookups to be used in the
group
attribute. It should be the same as the group mentioned above in the lookup declaration. Tenants are resolved automatically.The enrichment is done on every fields, that have type
ip
in the schema.Prefix is added to the field with the resolved attributes to be used for futher mapping:
If input is
source.ip
Then output is
lmio.vulnerability.source.score.<NAME_OF_THE_ATTRIBUTE>
-
Based on the attribute and the subsequent mapping, a correlation with a notification trigger can be added to
/Correlators
to notify about the bad IP with score's base being higher than a threshold:--- define: name: Bad IP Notification description: Bad IP Notification type: correlator/window predicate: !AND - !IN what: source.ip where: !EVENT - !GT - !ITEM EVENT lmio.vulnerability.source.score.base - 2 evaluate: dimension: [tenant, source.ip] by: "@timestamp" # Name of event field with an event time resolution: 60 # unit is second saturation: 10 # unit is resolution analyze: window: hopping # that is default aggregate: sum # that is default span: 2 # 2 * resolution from evaluate = my time window test: !GE - !ARG - 1 trigger: - event: !DICT type: "{str:any}" with: message: "Bad IP Notification" events: !ARG EVENTS source.ip: !ITEM EVENT source.ip event.dataset: correlation - notification: type: email to: [logman@example.co] template: "/Templates/Email/Notification.md" variables: !DICT type: "{str:any}" with: name: Bad IP Notification events: !ARG EVENTS dimension: !ITEM EVENT source.ip
Use case: IP address to asset enriment¶
Use IP-to-asset enrichment when, based on a single IP address such as 192.168.1.1
, you want to obtain the information from the prepared lookup about asset information, device, host etc.
-
In LogMan.io, go to the Library.
-
In Library, go to the folder
/Lookups
. -
Create a new lookup declaration for your lookup, like "ipassetlookup.yaml" with YAML extension
-
Add the following declaration:
--- define: type: lookup/ipaddress name: ipassetlookup group: asset keys: - name: sourceip type: ip fields: asset: type: str
Make sure the
type
is alwayslookup/ipaddress
.Change the
name
in define section to your lookup name.The
group
is used then in the enrichment process to locate all lookups that share the same group. The value is a unique identifier of the group (use case), here:asset
Keep one key in
keys
section with the typeip
. The name should not contain dots or any other special characters.To
fields
, add names and types of the lookup attribute. Here in the example is the asset and hostname:fields: asset: type: str hostname: type: str
Currently, these types are supported:
str
,fp64
,si32
,geopoint
andip
-
Save
-
In LogMan.io, go to Lookups.
-
Create a new lookup with the same name as above, i. e. "ipassetlookup". Specify the IP address as the key.
-
Create records in the lookup with the IP address as the key and fields as specified above.
-
Add the following enricher to the LogMan.io Parsec rule that should utilize the lookup:
--- define: type: enricher/ip group: asset schema: ecs: prefix: lmio.ipenricher.
Specify the group of the lookups to be used in the
group
attribute. It should be the same as the group mentioned above in the lookup declaration. Tenants are resolved automatically.The enrichment is done on every field that has the type
ip
in the schema.Prefix is added to the field with the resolved attributes to be used for futher mapping:
If input is
source.ip
, then output islmio.ipenricher.source.<NAME_OF_THE_ATTRIBUTE>
Adaptive lookups¶
Adaptive lookups empower TeskaLabs LogMan.io event processing components such as LogMan.io Parsec, LogMan.io Correlator, and LogMan.io Alerts with the capability to automatically update lookups for real-time data enrichment using rules.
Custom rules can dynamically add or remove entries from these lookups based on the insights gleaned from incoming logs or other events. This ensures that your threat detection and response strategies remain agile, accurate, and aligned with the ever-changing cyber threat landscape, providing an essential layer of intelligence to your security operations.
Triggers¶
The lookup content is manipulated by a specific entry in the trigger
section of the declaration.
It means that it can create (set
), increment (add
), decrement (sub
), and remove (delete
) an entry in the lookup.
The entry is identified by a key
, which is a unique primary key.
Example of a trigger that adds an entry to the lookup user_list
:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
set:
event.created: !NOW
foo: bar
Example of a trigger that removes an entry from the lookup user_list
:
trigger:
- lookup: user_list
delete: !ITEM EVENT user.name
Example of a trigger that increments a counter (field my_counter
) in the entry of the lookup user_list
:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
add: my_counter
Example of a trigger that decrements a counter (field my_counter
) in the entry of the lookup user_list
:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
sub: my_counter
For both add
and sub
, the counter field name can be omitted. Hence the default attribute _counter
will be used implicitly:
trigger:
- lookup: user_list
key: !ITEM EVENT user.name
sub:
If the counter field does not exist, it is created with the default value of 0.
Note
Lookup entries can be accessed from the declarative expressions by !GET
and !IN
entries.
Lookups API¶
Lookup changes can include creating or deleting the entire lookup structure, as well as adding, updating, or removing specific items within a lookup. Additionally, items can be automatically deleted when they expire. These changes can be made through the system's UI (HTTPS API) or through Apache Kafka.
The lookup events are sent by each component that creates the lookup events to lmio-lookups
topics.
Lookup event structure¶
A lookup event has a JSON structure with three mandatory attributes:
action
, lookup_id
, and data
. The @timestamp
and tenant
attributes are added automatically as well as other configured
meta attributes.
action
¶
Specifies the action the lookup event caused. The action can be taken on the entirety lookup or just one of its items. Refer to the list below to see all available actions and their associated events.
lookup_id
¶
ID or name of the lookup.
The lookup ID in lookup events also contains the tenant name after the period .
character,
so every component knows which tenant the lookup is specific for.
data
¶
Specification od lookup data (i.e. lookup item) to be created or updated, as well as meta information (in case of item deletion).
Lookup items contain their ID in the _id
attribute of the data
structure.
The _id
is a string based on:
Single key¶
If the lookup has only one key (e.g. userName),
the _id
is the value itself for a string value.
'data': {
'_id': 'JohnDoe'
}
...
If the value is in bytes, _id
is UTF-8 decoded string representation of the value.
If the value is neither a string nor bytes, it is handled the same way as ID when using compound keys.
Compound key¶
If the lookup consists of more keys (e.g. [userName, location]),
the _id
is a hash representation of the value.
The original values are then stored in _keys
attribute inside the data
structure:
'data': {
'_id': '<INTERNAL_HASH>',
'_keys': ['JohnDoe', 'Company 1']
}
...
Create lookup¶
When a lookup is created, the following action is produced:
{
'@timestamp': <UNIX_TIMESTAMP>,
'tenant': <TENANT>,
'action': 'create_lookup',
'data': {},
'metadata': {
'_keys': ['key1name', 'key2name' ...]
...
},
'lookup_id': 'myLookup.tenant'
}
Meta data contains information about the lookup creation, such as names of the individual keys (e.g. [userName, location]) in the case of compound keys.
Delete lookup¶
When a lookup is deleted, the following action is produced:
{
'@timestamp': <UNIX_TIMESTAMP>,
'tenant': <TENANT>,
'action': 'delete_lookup',
'data': {},
'lookup_id': 'myLookup.tenant'
}
Create item¶
When an item is created, the following action is produced:
{
'@timestamp': <UNIX_TIMESTAMP>,
'tenant': <TENANT>,
'action': 'create_item',
'data': {
'_id': 'newItemId',
'_keys': [],
...
},
'lookup_id': 'myLookup.tenant'
}
Update item¶
When an item is updated, the following action is produced:
{
'@timestamp': <UNIX_TIMESTAMP>,
'tenant': <TENANT>,
'action': 'update_item',
'data': {
'_id': 'existingOrNewItemId',
'_keys': [],
...
},
'lookup_id': 'myLookup.tenant'
}
Delete item¶
When an item is deleted, the following action is produced.
Expiration¶
In case of deletion due to an expiration:
{
'@timestamp': <UNIX_TIMESTAMP>,
'tenant': <TENANT>,
'action': 'delete_item',
'data': {
'_id': 'existingItemId',
'reason': 'expiration'
},
'lookup_id': 'myLookup.tenant'
}
Please note: Unless the option use_default_expiration_when_update
is disabled (set to false)
in the Lookup meta information, the expiration is refreshed with each lookup item update
(current time + default expiration). Thus the deletion due to expiration will happen only if
there was no update of the item in the meantime for the duration of the expiration period.
Delete¶
For other reasons:
{
'@timestamp': <UNIX_TIMESTAMP>,
'tenant': <TENANT>,
'action': 'delete_item',
'data': {
'_id': 'existingItemId',
'reason': 'delete'
},
'lookup_id': 'myLookup.tenant'
}
Lookup internals¶
Lookup Service¶
The lookup management service handling:
- Read-only lookup objects that can run both in asynchronous and synchronous mode (there is an on tick refresh)
- Write-only lookup modifier objects
- Builder objects for synchronous lookups files to be used by builder services
The service provides an interface both to BSPump and to SP-Lang.
Lookup Types¶
Synchronous lookups¶
Synchronous lookups are lookups loaded from files, which include:
- IP address lookups
- Elasticsearch lookups serialized to file
The creation of synchronous lookups is handled by the LogMan.io Lookup Builder (see configuration), whose output stored in /lookups
folder.
Asynchronous lookups¶
Lookups directly loaded from Elasticsearch.
If the synchronous lookup file is missing or is corrupted, the processing is automatically handled by asynchronous lookups.
Asynchronous lookups require less setting, but are less optimal than synchronous lookups.
Ended: Lookups
Warden ↵
LogMan.io Warden¶
TeskaLabs LogMan.io Warden is a microservice that periodically performs predefined detections on parsed events stored in Elasticsearch. The Elasticsearch indices to load events from are obtained through event lane declarations for the given tenant that are stored in /EventLanes/
folder in the library. The detections create alerts in LogMan.io Alerts microservice.
The following detections are available:
- IP detection that detects IP addresses stored in a lookup
LogMan.io Warden configuration¶
LogMan.io Warden requires following dependencies:
- Apache ZooKeeper
- Apache Kafka
- Elasticsearch
- SeaCat Auth
- LogMan.io Alerts
- LogMan.io Library with
/EventLanes
and/Lookups/
foldera and a schema in/Schemas
folder
Example¶
This is the most basic configuration required for each instance of LogMan.io Warden:
[tenant]
name=default
[ip]
lookup=ipbad
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=zk:///library
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
[elasticsearch]
url=http://es01:9200/
[auth]
public_keys_url=http://localhost:8081/openidconnect/public_keys
Tenant¶
Specify the tenant which LogMan.io Warden is deployed and will run detections for.
[tenant]
name=mytenant
It is recommended to run one instance of LogMan.io Warden per tenant.
Detections¶
IP¶
Specify the lookup that lists the IP addresses that should be detected.
[ip]
lookup=ipbad
The lookup's key MUST be the ip
type in the lookup declaration, that is stored in /Lookups/
folder in the library.
---
define:
type: lookup/ipaddress
name: ipbad
group: bad
keys:
- name: sourceip
type: ip # The type of the key must be an IP
Zookeeper¶
Specify locations of the Zookeeper server in the cluster:
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Hint
For non-production deployments, the use of a single Zookeeper server is possible.
Library¶
Specify the path(s) to the Library to load declarations from:
[library]
providers=zk:///library
Hint
Since ECS.yaml
schema in /Schemas
is utilized by default, consider using the LogMan.io Common Library.
Kafka¶
Define the Kafka cluster's bootstrap servers:
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
Hint
For non-production deployments, the use of a single Kafka server is possible.
ElasticSearch¶
Specify URLs of Elasticsearch master nodes.
Elasticsearch is necessary for using lookups, e.g. as a !LOOKUP
expression or a lookup trigger.
[elasticsearch]
url=http://es01:9200
username=MYUSERNAME
password=MYPASSWORD
Ended: Warden
Exports ↵
Exports¶
Export features are provided by BS-Query microservice enabling clients to effortlessly create, customize, and manage database-to-file exports, offering diverse options such as format selection, data download, email delivery, and automated scheduling.
To learn how to create and monitor exports, please visit the corresponding user manual section.
Terminology¶
Export¶
You can meet with the word "export" in various contexts.
- Exported file
- First of all, "export" is the data extracted from the data source (typically a database). Let's call it an "exported file"
- Export in the UI
- An export in the LogMan.io Web Application is a record informing about the state of the export and provides additional information. It offers also an interface for downloading the content - the "exported file".
- Export declaration
- Export in the context of the Library (YAML files in the
Exports
section of the Library) is a "declaration of an export" - a blueprint saying how to create a new export. - Export object
- Last but not least, "Export" is an object in BS-Query application and its representation on disk. The BS-Query Export object stores minimum information in memory. Instead, it serves as a link to the data storage.
Data Storage¶
The data storage is organized as follows:
.
└── data
└── <export_id>.exp
├── content
│ └── <export_id>.json
├── declaration.yaml
└── export.json
└── schedule.json
content
directory stores the exported data. This directory can contain no or one file only.declaration.yaml
stores all variables needed for this export. You will find a complete structure of the declaration in the this chapter.export.json
stores metadata of this particular export. It might look like this:{"state": "finished", "_c": 1692975120.0054746, "_m": 1692975120.0054772, "export_size": 181610228, "_f": 1692975170.0347197}
schedule.json
is there only for scheduled exports. It stores the timestamp of the next run.
Data Source¶
You can see "data source" or even "datasource" in various contexts.
- Data source
- Source of the data. This is the original database or other technology that we extract data from.
- Data source declaration
- This is a YAML file in the
DataSources
section of the Library. It is a blueprint/manual specifying the connection to the external source of the data. - Datasource object
- Object in the BS-Query application responsible for connecting to the data source/database and data extraction.
Declaration¶
A declaration is a manual or blueprint for the system prescribing how to execute. Declaration of an export simply prescribes what should be in the resulting exported file. You will find these declarations in YAML format in the Library. Learn how to read or write the export and data source declarations in the this chapter.
Export Life Cycle¶
Each export has a state. Each state transition triggers an action.
Created¶
First, the export is created based on the input. Every export must be created based on a declaration. There are multiple ways to provide a declaration - as JSON or YAML file or as a reference to Library.
An export ID is generated and the <export_id>.exp
folder appears in the data storage.
Then, the export is moved to the "extracting" state.
Extracting¶
Each export declaration contains a "datasource" item. This is a reference to /DataSources
Library section.
In the processing state, BS-Query reads the declaration of the data source from the Library and creates a datasource object - this is a specific connection to the database of choice. Each export object creates and uses one datasource object.
Query
is a string variable from the export declaration saying which data to collect.
Using document databases, each document is processed one by one, creating a stream, which lowers memory usage and enhances performance.
Each document/record goes through transformation functions. Learn more about how to transform the exported data based on the schema.
Then, it is stored in a format selected in the output
section of the export declaration.
Not all datasources support all output formats.
Compressing¶
The compressing step is optional, based on the export declaration.
Postprocessing¶
In this stage the export content is not edited anymore, it is shipped to selected targets, e-mail being the first of a choice. BS-Query uses ASAB Iris to send exports through e-mail.
Finished¶
There are two options. The export finished successfully and is ready for download or there was an exception in the export life-cycle and such export "finished with error".
There are known errors that can be prevented by providing better input. Such errors are designed to provide enough information on UI. Export object that finished with an exception is stored together with the information about the error. Unknown errors are labeled with GENERAL_ERROR
code.
Scheduled Exports¶
Scheduled exports do not follow the standard export life cycle. Instead, they end up in scheduled
state when created.
There are two types of scheduled exports.
One-off scheduled exports¶
These exports are planned for one specific moment in the future. When created, the future timestamp is stored in the schedule.json
file. When this time comes, BS-Query creates the new export that inherits the declaration of the scheduled export BUT the schedule
item. The original scheduled export is not needed anymore and is deleted.
Repeated scheduled exports¶
These exports contain a cron format string in the schedule
option of the declaration. This cron format schedule is used to calculate the time of the next trigger when the new export inheriting the scheduled export's declaration is created in the same way as one-off exports. The scheduled export does not get deleted but calculates new future timestamp the cycle repeats.
Warning
Mind the scheduled export behavior when troubleshooting. A scheduled export has always at least two IDs. The one of the original scheduled export and the second one of the descendent running export. It can be even more complicated - when editing a scheduled export, it gets deleted and a new export with a new ID is created. Like this, scheduled export (as seen from the UI) has multiple IDs (being multiple export objects in BS-Query application).
Exports and Library¶
There are three types of Library artifacts used by the BS-Query application.
DataSources¶
Declarations of data sources are vital for BS-Query functionality. There are no exports without a data source.
BS-Query application supports the following types of data source declarations:
datasource/elasticsearch
datasource/pyppeteer
/DataSources/elasticsearch.yaml
define:
type: datasource/elasticsearch
specification:
index: lmio-{{tenant}}-events*
request:
<key>: <value>
<key>: <value>
query_params:
<key>: <value>
<key>: <value>
define¶
type
: a technical name that helps to find the DataSource declaration in the Library.
specification¶
index
: a collection of JSON documents in Elasticsearch. Each document is a set of fields that contain data presented as key-value pairs. For more detailed expalanation refer to this article.
There is also a number of other items that can be configured in a DataSource declaration. Those are standard Elasticsearch API parameters through which you can fine-tune your declaration template to determine specific content of the requested data and/or actions performed on it. One such parameter is size
- the number of matching documents to be returned in one request.
request¶
For more details, please refer to Elastic documentation.
query_params¶
For more details, please refer to Elastic documentation.
Exports¶
Export declarations specify how to retrieve data from a data source. The YAML file contains the following sections:
define¶
The define section includes the following parameters:
name
: The name of the export.datasource
: The name of the DataSource declaration in the library, specified as an absolute path to the library.output
: The output format for the export. Available options are "raw", "csv", and "xlsx" for ES DataSources, and "raw" for Kafka DataSources.header
: When using "csv" or "xlsx" output, you must specify the header of the resulting table as an array of column names. These will appear in the same order as they are listed.schedule
- There are three options how to schedule an export - datetime in a format "%Y-%m-%d %H:%M" (e.g. 2023-01-01 00:00) - timestamp as integer (e.g. 1674482460) - cron - you can refer to http://en.wikipedia.org/wiki/Cron for more detailsschema
: Schema in which the export should be run.
Schema
There is always a schema configured for each tenant. (See Tenants
section of configuration). The export declaration can state a schema to which it belongs.
If the schema of the export declaration des not match the tenant schema configuration, the export stops executing.
target¶
The target section includes the following parameters:
type
: An array of target types for the export. Possible options are "download", "email", and "jupyter". "download" is always selected if the target section is missing.email
: For email target type, you must specify at least theto
field, which is an array of recipient email addresses. Other optional fields include: -cc
: an array of CC recipients -bcc
: an array of BCC recipients -from
: the sender's email address (string) -subject
: the subject of the email (string) -body
: a file name (with suffix) stored in the Template folder of the library, used as the email body template. You can also add specialparameters
to be used in the template. Otherwise, use any keyword from the define section of your export as a template parameter (for any export it is:name
,datasource
,output
, for specific exports, you can also use parameters.compression
,header
,schedule
,timezone
,tenant
).
query¶
The query field must be a string.
Tip
In addition to these parameters, you can use keywords specific to the data source declaration in your export declaration. If there are any conflicts, the data source declaration will take precedence.
schema¶
You can add partial schema that overrides common schema configured.
This feature allows schema-based transformations on exported data. This comprises:
- Conversion from timstamp to human readable date format, where schema specifies
datetime
type - Deidentification
/Exports/example_export.yaml
define:
name: Export e-mail test
datasource: elasticsearch
output: csv
header: ["@timestamp", "event.dataset", "http.response.status_code", "host.hostname", "http.request.method", "url.original"]
target:
type: ["email"]
email:
to:
- john.doe@teskalabs.com
query: >
{
"bool": {
"filter": [{
"prefix": {
"http.version": {
"value": "HTTP/1.1"
}
}
}]
}
}
schema:
fields:
user.name:
deidentification:
method: hash
source.address:
deidentification:
method: ip
Templates¶
The Templates
section of the Library is used when sending Exports by e-mail. The e-mail body must be based on a template. Place a custom template to the Templates/Email
directory. You can use jinja templating in these files. See jinja docs for more info. All keys from the export declaration can be used as jinja variables.
Deidentification¶
LogMan.io enables you to export any data for custom purposes, e.g. external analysis. Logs very often contain sensitive personal data about users. Use deidentification during exports whenever providing the data to third parties. Deidentification algorithms keep the granularity and uniqueness of the data allowing analysis of security incidents without exposing users personal information.
Use schema to apply deidentification methods on exported data.
Deidentification methods¶
hash
-
Uses SHA256 algorithm to hash the value.
email
-
Searches for email addresses using regular expression
^(.*)@(.*)(\..*)$
(john.doe@company.com) and SHA256 to hash the name and domain separately. Returnsnot email
if value does not fit the regular expression. username
-
Is a combination of
hash
andemail
method. It applies email method but returns hashed value if the value doesn't fit the regular expression. filepath
-
Hashes the filename, but keeps the extension. Allows further analysis based on the file types.
ip
-
Randomizes the last part of ipv4 address.
drop
-
Erases the sensitive data entirely from the export.
Configuration of BS-Query¶
Common cluster configuration¶
This configuration depends on other cluster services and must align with the cluster architecture.
Locations of the Zookeeper server in the cluster.
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Layers of the Library.
[library]
providers=zk:///library
Telemetry.
[asab:metrics]
Data sources¶
Here are some example configuration sections needed for data source connection.
[mongodb]
mongodb_uri=mongodb://mongo-1/?replicaSet=rs0
[elasticsearch]
url=
http://es01:9200
http://es02:9200
http://es03:9200
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
[pyppeteer]
url=http://<example_url>
username=<pyppeteer_username>
password=<pyppeteer_password>
Authorization¶
[auth]
BS-Query specific configuration¶
Directory to store all exports.
[storage]
path=/data/bsquery
Provide URL to ASAB Iris microservice and the default email template present in /Templates/Email
folder of the Library.
The File size limit is in bytes and the default setting is 50 MB.
[target:email]
url=http://<example_url>
template=bsquery.md
file_size_limit=52 000 000
Name of a directory shared with Jupyter Notebook. To configure jupyter target, Jupyter Notebook and BS-Query must share a common directory on the filesystem.
[target:juptyer]
path=/data/jupyter
There is a default limit of rows for exports in Excel format of 50 000 rows. Be aware that this limit protects BS-Query from loading too much data in memory.
[xlsx]
limit=60000
BS-Query reads custom tenant configuration from ASAB Config. If tenant configuration is not available it defaults to this configuration of time zone and used Schema. It defaults to UTC and ECS Schema. Expiration says in how many days a finished export gets deleted. Default is 30 days.
[default]
schema=/Schemas/ECS.yaml
timezone=UTC
expiration=30
Provide a secret key and expiration of a download link in seconds. The key is a random string. Keep the expiration as short as possible. A default expiration configuration is recommended.
[download]
secret_key=<custom_secret_key>
expiration=5
Ended: Exports
Reports ↵
ASAB Pyppeteer¶
The ASAB Pyppeteer microservice facilitates the creation of scheduled reports.
ASAB Pyppeteer operates a headless chromium browser that can open and generate reports. It allows you to use the scheduling function of Exports to define when a report will be created and dispatched via email.
It's important to note that each user may have different access levels to various sections of LogMan.io, including reports. Consequently, scheduled reports must be verified and permitted in the same manner as the original scheduling action. For information on setting up authorization, please proceed to the auth section.
Authorization of scheduled reports¶
A scheduled report contains information regarding its author. When it's time for the report to be printed and sent, the ASAB Pyppeteer microservice impersonates the author, ensuring the report is created from the specific user's perspective and access level.
To configure BS-Query (Exports), SeaCat Auth, and ASAB Pyppeteer correctly to allow complete communication between services, follow these steps:
1. ASAB Pyppeteer configuration¶
Make sure the ASAB Pyppeteer instance can access SeaCat Auth.
[seacat_auth]
url=http://localhost:3081
2. SeaCat Auth configuration¶
Make sure SeaCat Auth configuration allows creating machine-to-machine credentials.
[seacatauth:credentials:m2m:machine]
mongodb_uri=mongodb://localhost:27017
mongodb_database=auth
3. Create ASAB Pyppeteer Credentials¶
Refer to the user manual for instructions on creating and assigning credentials, resources, roles, and tenants.
First, create a resource authz:impersonate
and a global role with this resource (named e.g. "impersonator").
Then, create new machine
credentials with <pyppeteer_username>
and <pyppeteer_password>
and assign it the "impersonator" role and relevant tenants.
4. Enter pyppeteer credentials to BS-Query configuration¶
[pyppeteer]
url=http://localhost:8895
username=<pyppeteer_username>
password=<pyppeteer_password>
Warning
Be aware that ASAB Pyppeteer cannot impersonate a superuser. Therefore, a user with a superuser role cannot create scheduled reports unless they are explicitly assigned a role with the bitswan:report:access
resource.
Ended: Reports
Configuration ↵
Branding¶
Customer specified branding can be set within the LogMan.io WebUI application.
Static branding¶
Example:
let ConfigDefaults = {
title: "App Title",
brand_image: {
full: "media/logo/header-full.svg",
minimized: "media/logo/header-minimized.svg",
}
};
Dynamic branding¶
The branding can be configured using a dynamic configuration.
The dynamic configuration is injected with the use of ngx_http_sub_module
, since it replace the pre-defined content of index.html
(in our case).
More about ngx_http_sub_module
There are 3 options for dynamic branding - header logo, title and custom CSS styles.
Header logo¶
To replace default header logo, the nginx sub_filter
configuration has to follow <meta name="header-logo-full">
and <meta name="header-logo-minimized">
replacement rules with the particular name
. The replacement must have a content
prop, otherwise the content of the replacement will not be propagated. content
has to include a string with path to the logo.
Size of the branding images can be found here
Full¶
Example of importing full size logo (when sidebar of the application is not collapsed)
sub_filter '<meta name="header-logo-full">' '<meta name="header-logo-full" content="/<location>/<path>/<to>/<custom_branding>/<logo-full>.svg">';
Minimized¶
Example of importing minimized size logo (when sidebar of the application is collapsed)
sub_filter '<meta name="header-logo-minimized">' '<meta name="header-logo-minimized" content="/<location>/<path>/<to>/<custom_branding>/<logo-minimized>.svg">';
Title¶
Example of replacing application title, configuration has to follow <meta name="title">
replacement rules with the particular name
. The replacement must have a content
prop, otherwise the content of the replacement will not be propagated. content
has to include a string with the application title.
sub_filter '<meta name="title">' '<meta name="title" content="Custom app title">';
Custom CSS styles¶
Example of importing custom CSS styles, configuration has to follow <meta name="custom-css-file">
replacement rules with the particular name
. The replacement must have a content
prop, otherwise the content of the replacement will not be propagated. content
has to include a string with path to the CSS file.
sub_filter '<meta name="custom-css-file">' '<meta name="custom-css-file" content="/<location>/<path>/<to>/<custom_branding>/<custom-file>.css">';
Custom CSS file example¶
.card .card-header-login .card-header-title h2 {
color: violet !important;
}
.text-primary {
color: yellowgreen !important;
}
Define the nginx path to dynamic branding content¶
To allow the location of the dynamic (custom) branding content, it has to be defined in the nginx setup.
# Path to location (directory) with the custom content
location /<location>/<path>/<to>/<custom_branding> {
alias /<path>/<to>/<custom_branding>;
}
Full example¶
Full example of nginx configuration with custom branding
...
location /<location> {
root /<path>/<to>/<build>;
index index.html;
sub_filter '<meta name="header-logo-full">' '<meta name="header-logo-full" content="/<location>/<path>/<to>/<custom_branding>/<logo-full>.svg">';
sub_filter '<meta name="header-logo-minimized">' '<meta name="header-logo-minimized" content="/<location>/<path>/<to>/<custom_branding>/<logo-minimized>.svg">';
sub_filter '<meta name="title">' '<meta name="title" content="Custom app title">';
sub_filter '<meta name="custom-css-file">' '<meta name="custom-css-file" content="/<location>/<path>/<to>/<custom_branding>/<custom-file>.css">';
sub_filter_once on;
}
# Path to location (directory) with the custom content
location /<location>/<path>/<to>/<custom_branding> {
alias /<path>/<to>/<custom_branding>;
}
...
Styling guide¶
Every image HAS TO be provided in SVG (vectorized). Use of pixel formats (PNG, JPG, ...) is strongly discouraged. While creating the branding images, use full width/height of the canvas (ratio 3:1 on full and 1:1 on minimized version). No padding is required for optimal viewing experience.
Branding images¶
format: SVG
Full:
* rendered ratio: 3:1
* rendered size: 150x50 px
Minimized:
* rendered ratio: 1:1
* rendered size: 50x50 px
Branding is located in top-left
corner on large screens. Fullsize
branding image is used when sidebar is uncollapsed and is substituted by mimnimized
version upon collapsing. On smaller screens (<768px), branding in sidebar disappeares and only fullsized branding image appears in the top-center
position of the page.
Logo should be suitable for use in both light & dark mode.
SidebarLogo is always located at the bottom
of sidebar. Minimized version appeares upon the sidebar's collapsion.
Full:
* rendered size: 90x30 px
Minimized:
* rendered size: 30x30 px
Note: A full image is also used on the splash screen, 30% of the width of the screen.
Discover configuration¶
Discover screen setup¶
The Discover screen is used for displaying and exploring the data (not only) in ElasticSearch.
The configuration of Discover screen can be loaded from Library
or from static file in public
folder the same way, as it is in case of Dashboards
.
The type of filtered data relies on the specification
which must be defined together with datetimeField
. Those are crucial values without which the filtering is not possible.
Discover configuration¶
Library configuration¶
Library configuration is stored in the Library node. It must be of JSON type.
To get the configuration from Library, asab_config
service must be running with the configuration pointing to the main node of Library. For more info, please refer here: http://gitlab.teskalabs.int/lmio/asab-config
The configuration from Library is editable
In the Discover Library node, there can be multiple configuration files within each of it only one discover configuration screen can be set. Another discover screen has to be configured in new Library configuration node.
All configuration files from Discover
Library node are loaded in one API call.
Library configuration structure¶
Config structure in Library
- main Library node
- config
- Discover
- **config**.json
- type
- Discover.json
- schema
- config is the name of the particular Discover configuration, it must be of
json
type.
In Library, the path to the config file looks like:
/<main Library node>/config/Discover/<discoverConfig>.json
Schema path will be as following:
/<main Library node>/type/Discover.json
Example of above described Library structure for multiple Discover config file case:
- logman
- config
- Discover
- declarative.json
- default.json
- speed.json
- type
- Discover.json
IMPORTANT NOTE
Schema (type
) and config file (config
) must be set in Library, otherwise the discover will not be loaded correctly.
All configuration files from Discover
Library node are loaded in one API call.
Example of the configuration:¶
{
"Discover:datasource": {
"specification": "declarative*",
"datetimeField": "@timestamp",
"type": "elasticsearch"
}
}
Where
- object key serves the purpose of the naming the object. It must be named as
Discover:datasource
. - type is the type of the searching engine
- specification is the url with the ElasticSearch index pattern. If one would like to seek for all the data, the url must end with an asterisk
*
. This is a mandatory parameter. - datetimeField is the index of the item's datetime. It is a mandatory parameter since it is needed for searching/scrolling with ElasticSearch.
Schema (optional setup)¶
Don't confuse with Library schema
Set up the name to obtain schema from the library (if present), which is then applied to values defined within the schema. With the schema, we can apply actions to values corresponding the defined type
, e.g. using the ASAB-WebUI's DateTime component for time values.
{
...
"Discover:schema": {
"name": "ECS"
}
...
}
Example of schemas structure in the library:
- library
- Schemas
- Discover.yaml
- ECS.yaml
...
Example of the schema in the library:
---
define:
name: Elastic Common Schema
type: common/schema
description: https://www.elastic.co/guide/en/ecs/current/index.html
fields:
'@timestamp':
type: datetime
label: "Datetime"
unit: seconds
docs: https://www.elastic.co/guide/en/ecs/current/ecs-base.html#field-timestamp
Authorization (optional setup)¶
Discover configuration can be limited for access only with specific tenant(s). This means, that users without particular tenant(s) are not able to access the discover configuration with its data source. This is convenient e.g. when administrator wants to limit access to discover configuration with sensitive data to particular group(s) of users.
If configuration is being set directly in the Library (and not via Configuration tool), its recommended to add Authorization
section and leave tenants
key as an empty string (if no limitation is required). This will help to keep up the same structure of configuration across the Discover configurations:
{
...
"Authorization": {
"tenants": ""
}
...
}
Example of Authorization
settings within the configuration where limited access is required:
{
...
"Authorization": {
"tenants": "tenant one, tenant two"
}
...
}
Where key tenants serves the purpose of displaying and using the configuration only by specific tenant(s). Multiple tenants can be specified, separated by comma. The type of the tenants
key is string
.
Prompt settings (optional setup)¶
Prompt settings section provides additional option to setup Discover prompt or change its defaults.
Example of Discover:prompts
section within the configuration:
{
...
"Discover:prompts": {
"dateRangePicker:datetimeStart": "now-15m",
"dateRangePicker:datetimeEnd": "now+15s"
...
},
...
}
Setup custom datetime range periods¶
Sometimes its desired to setup custom datetime period for data display, because data are laying e.g. outside of default period set for Discover. The default period is now-1H, which should seek for data within now
and 1 hour
back. For example, this could be set in the Discover:prompts
as following:
{
...
"Discover:prompts": {
"dateRangePicker:datetimeStart": "now-1H",
"dateRangePicker:datetimeEnd": "now"
},
...
}
Where the dateRangePicker:datetimeStart
and dateRangePicker:datetimeEnd
are the periods which sets up the range to the starting period (initial) and to the ending period (final).
The setup possibilities for both periods are:
- now-
n
s - now-
n
m - now-
n
H - now-
n
d - now-
n
w - now-
n
M - now-
n
Y - now
- now+
n
s - now+
n
m - now+
n
H - now+
n
d - now+
n
w - now+
n
M - now+
n
Y
Where
- n
is the number, e.g. 2,
- s
indicate seconds,
- m
indicate minutes,
- H
indicate hours,
- d
indicate days,
- w
indicate weeks,
- M
indicate months,
- Y
indicate years
Other values will be ignored.
It is possible to e.g. setup only one period as in this example, the second period will remain default:
{
...
"Discover:prompts": {
"dateRangePicker:datetimeStart": "now-2H"
},
...
}
Another datetime range setup example, where data are displayed 15 hours to the past and sought 10 minutes into the future:
{
...
"Discover:prompts": {
"dateRangePicker:datetimeStart": "now-15H",
"dateRangePicker:datetimeEnd": "now+10m"
},
...
}
Library schema¶
To set up a discover screen manually in the Library, one must set the discover schema in valid JSON format.
Schema must be provided and stored in /<main Library node>/type/<discoverType>.json
The schema can look as following:
{
"$id": "Discover schema",
"type": "object",
"title": "Discover schema",
"description": "The Discover schema",
"default": {},
"examples": [
{
"Discover:datasource": {
"specification": "declarative*",
"datetimeField": "@timestamp",
"type": "elasticsearch"
}
}
],
"required": [],
"properties": {
"Discover:datasource": {
"type": "string",
"title": "Discover source",
"description": "The data specification for Discover screen",
"default": {},
"examples": [
{
"specification": "declarative*",
"datetimeField": "@timestamp",
"type": "elasticsearch"
}
],
"required": [
"specification",
"datetimeField",
"type"
],
"properties": {
"specification": {
"type": "string",
"title": "Specification",
"description": "Specify the source of the data",
"default": "",
"examples": [
"declarative*"
]
},
"datetimeField": {
"type": "string",
"title": "Datetime",
"description": "Specify the datetime value for data source",
"default": "",
"examples": [
"@timestamp"
]
},
"type": {
"type": "string",
"title": "Type",
"description": "Select the type of the source",
"default": [
"elasticsearch",
"sentinel"
],
"$defs": {
"select": {
"type": "select"
}
},
"examples": [
"elasticsearch*"
]
}
}
},
"Discover:prompts": {
"type": "string",
"title": "Discover prompts",
"description": "Update Discover prompt configuration",
"default": {},
"examples": [],
"required": [],
"properties": {
"dateRangePicker:datetimeStart": {
"type": "string",
"title": "Starting date time period",
"description": "Setup the prompt's starting date time period",
"default": "now-1H",
"examples": [
"now-1H"
]
},
"dateRangePicker:datetimeEnd": {
"type": "string",
"title": "Ending date time period",
"description": "Setup the prompt's ending date time period",
"default": "now",
"examples": [
"now"
]
}
}
},
"Discover:schema": {
"type": "string",
"title": "Discover schema name",
"description": "Apply schema over discover values",
"default": {},
"properties": {
"name": {
"type": "string",
"title": "Schema name",
"description": "Set up the schema name for configuration (without file extension)",
"default": ""
}
}
},
"Authorization": {
"type": "string",
"title": "Discover authorization",
"description": "Limit access to discover configuration by tenant settings",
"default": {},
"examples": [],
"required": [],
"properties": {
"tenants": {
"type": "string",
"title": "Tenants",
"description": "Specify the tenant(s) separated by comma to restrict the usage of this configuration (optional)",
"default": "",
"examples": [
"tenant1, tenant2"
]
}
}
}
},
"additionalProperties": false
}
Example of passing config props¶
Example of passing config props to the DiscoverContainer:
...
this.App.Router.addRoute({
path: "/discover",
exact: true,
name: 'Discover',
component: DiscoverContainer,
props: {
type: "Discover"
}
});
...
this.App.Navigation.addItem({
name: "Discover",
url: "/discover",
icon: 'cil-compass'
});
When using DiscoverContainer
as a component in your container, then the props can be passed as following:
<DiscoverContainer type="Discover" />
The static application config file remains empty:
module.exports = {
app: {
},
webpackDevServer: {
port: 3000,
proxy: {
'/api/elasticsearch': {
target: "http://es-url:9200",
pathRewrite: {'^/api/elasticsearch': ''}
},
'/api/asab_print': {
target: "http://asab_print-url:8083",
pathRewrite: {'^/api/asab_print': ''}
},
'/api/asab_config': {
target: "http://asab_config-url:8082",
pathRewrite: {'^/api/asab_config': ''}
}
}
}
}
Static configuration¶
Discover screen does not have to be obtained only from Library. Another option is to configure it directly in the JSON file and save it in projects public
folder.
Example of static configuration¶
In index.js
, developer have to specify:
The JSON file with the configuration can be stored anywhere in the public
folder, but it is strongly recommended to store it in /public/discover/
folder to distinguish it from the other publicly accessible components.
- Config structure in
public folder
- public
- discover
- JSON config file
- dashboards
- locales
- media
- index.html
- manifest.json
The URL of the static config sotred in public
folder can look like:
https://my-project-url/discover/Discover-config.json
Example of Discover-config.json
:
[
{
"Config name 1": {
"Declarative": {
"specification": "declarative*",
"datetimeField": "last_inform",
"type": "elasticsearch"
}
}
},
{
"Config name 2": {
"Default": {
"specification": "default*",
"datetimeField": "@timestamp",
"type": "elasticsearch"
}
}
}
]
Example of passing config props¶
Passing config props to the App:
this.App.Router.addRoute({
path: "/discover",
exact: true,
name: 'Discover',
component: DiscoverContainer,
props: {
type: "https://my-project-url/discover/Discover-config.json"
}
});
this.App.Navigation.addItem({
name: "Discover",
url: "/discover",
icon: 'cil-compass'
});
When using DiscoverContainer
as a component in your container, then the props can be passed as following:
<DiscoverContainer type="https://my-project-url/discover/Discover-config.json" />
The static application config file remains empty:
module.exports = {
app: {
},
webpackDevServer: {
port: 3000,
proxy: {
'/api/elasticsearch': {
target: "http://es-url:9200",
pathRewrite: {'^/api/elasticsearch': ''}
},
'/api/asab_print': {
target: "http://asab_print-url:8083",
pathRewrite: {'^/api/asab_print': ''}
},
'/api/asab_config': {
target: "http://asab_config-url:8082",
pathRewrite: {'^/api/asab_config': ''}
}
}
}
}
Language localizations¶
LogMan.io WebUI provides customization of language localizations. It use i18n
internalization library. For details, refer to: https://react.i18next.com
Import and set custom localisation¶
LogMan.io WebUI allows to re-define text of application components and messages for every section of the application. The language localizations are stored in JSON files called translate.json
.
Custom locales can be loaded into the LogMan.io WebUI application via config file.
The files are loaded from e.g. external folder served by nginx
, where one can store it among CSS styling and other site configuration.
Example of definition in static config file of LogMan.io WebUI:
module.exports = {
app: {
i18n: {
fallbackLng: 'en',
supportedLngs: ['en', 'cs'],
debug: false,
backend: {
{% raw %}loadPath: 'path/to/external_folder/locales/{{lng}}/{{ns}}.json',{% endraw %}
{% raw %}addPath: 'path/to/external_folder/locales/add/{{lng}}/{{ns}}',{% endraw %}
}
}
}
}
Where
* fallbackLng
is a fallback language
* suportedLngs
are supported languages
* debug
if set to true, displays the debug messages in the console of the browser
* backend
is backend plugin for loading resources from the server
The path/to/external_folder/
is a path to the external folder with the locales
folder served by nginx
. There has to be 2 folders referencing to supported languages. Those folders are en
and cs
in which translate.json
files are stored, as you can see in the folder structure below:
* external_folder
* locales
* cs
* translation.json
* en
* translation.json
Custom translate.json file example¶
en
{
"i18n": {
"language": {
"en": "English",
"cs": "Česky"
}
},
"LogConsole": {
"Connection lost": "Connection lost, will reconnect ...",
"Mark": "Mark",
"Clear": "Clear"
},
...
}
cs
{
"i18n": {
"language": {
"en": "English",
"cs": "Česky"
}
},
"LogConsole": {
"Connection lost": "Spojení ztraceno, připojuji se ...",
"Mark": "Označit",
"Clear": "Smazat"
},
...
}
Dashboards ↵
Dashboard and Widget configuration¶
Dashboard setup¶
To edit layout or dashboard configuration, user needs to have dashboards:admin
resource assigned.
Setting dashboard manually from Library¶
Library configuration¶
Library configuration is stored in the Library node. It must be of JSON type.
To get the configuration from Library, asab_library
service must be running with the configuration pointing to the main node of Library.
The configuration from Library is editable - the position and size of the widgets can be saved to the Library directly from the BS-WebUI.
Set up dashboard with Library configuration¶
-
Dashboards configuration files must be stored in
dashboard/Dashboards
node of Library as in the following structure example. -
the Library config file (
config
), which define your Library node with the configuration in the Library Dashboards node. Theconfig
name is also the name of the Dashboard displayed in the sidebar of the appplication. It is a mandatory parameter. -
IMPORTANT NOTE - the config file MUST have an
.json
extension, otherwise it won't be able to display the config file in theLibrary
module and thus trigger features asEdit
orDisable
.
Config structure in Library
- main Library node (`library`)
- Dashboards
- **config**.json
-
**main Library node** usually it should be named as
library
-
**config** is the name of the paricular Dashboard configuration, e.g.
My Dashboard.json
In Library, the path to the config file looks like:
/<main Library node>/Dashboards/<dashboardConfig>.json
Take a look at the dashboard configuration example.
Dashboard configuration¶
Datasource¶
It is primarily used for setting up a data source for widgets. There can be unlimited number of datasources.
Example of data source setup
{
...
"Dashboard:datasource:elastic": {
"type": "elasticsearch", // Source of the data
"datetimeField": "@timestamp", // Type of the datetime
"specification": "es-pattern*" // Index pattern or URL pattern
},
...
}
Advanced Elasticsearch setup
{
...
"Dashboard:datasource:elastic": {
// Basic setup
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "es-pattern*",
// Advanced setup
"sortData": "asc", // Sort "asc" or "desc" data during the processing (optional)
// Advanced elasticsearch setup
"size": 100, // Max size of the hits in the response (default 20) (optional)
"aggregateResult": true, // Charts only (not PieChart) - it will ask ES for aggregated values (optional)
"groupBy": "user.id", // Charts only - it will ask ES for aggregation by term defined in the "groupBy" key (optional)
"matchPhrase": "event.dataset: microsoft-office-365" // For default displaying particular match of a datasource
},
...
}
Connect datasource with the widget¶
If datasource is not assigned to widget, no data are processed for displaying in the widget.
{
...
"Dashboard:widget:somewidget": {
"datasource": "Dashboard:datasource:elastic",
...
},
...
}
Prompts¶
Prompt settings section provides additional option to setup Dashboard prompt or change its defaults.
Usage within the config file:
{
...
"Dashboard:prompts": {
"dateRangePicker": true, // Enable Date range picker prompt
"filterInput": true, // Enable filter input prompt
"submitButton": true // Enable submit button prompt
},
...
}
Setup custom datetime range periods¶
Sometimes its desired to setup custom datetime period for data displayal, because data are laying e.g. outside of default period set for Dashboard. The default period is now-1H, which should seek for data within now
and 1 hour
back. For example, this could be set in the Dashboard:prompts
as following:
{
...
"Dashboard:prompts": {
...
"dateRangePicker": true,
"dateRangePicker:datetimeStart": "now-1H",
"dateRangePicker:datetimeEnd": "now",
...
},
...
}
Where the dateRangePicker:datetimeStart
and dateRangePicker:datetimeEnd
are the periods which sets up the range to the starting period (initial) and to the ending period (final).
The setup possibilites for both periods are:
- now-
n
s - now-
n
m - now-
n
H - now-
n
d - now-
n
w - now-
n
M - now-
n
Y - now
- now+
n
s - now+
n
m - now+
n
H - now+
n
d - now+
n
w - now+
n
M - now+
n
Y
Where
- n
is the number, e.g. 2,
- s
indicate seconds,
- m
indicate minutes,
- H
indicate hours,
- d
indicate days,
- w
indicate weeks,
- M
indicate months,
- Y
indicate years
Other values will be ignored.
It is possible to e.g. setup only one period as in this example, the second period will remain default:
{
...
"Dashobard:prompts": {
...
"dateRangePicker": true,
"dateRangePicker:datetimeStart": "now-1H",
...
},
...
}
Another datetime range setup example, where data are displayed 15 hours to the past and seeked 10 minutes into the future:
{
...
"Dashboard:prompts": {
...
"dateRangePicker": true,
"dateRangePicker:datetimeStart": "now-15H",
"dateRangePicker:datetimeEnd": "now+10m",
...
},
...
}
<!-- One can specify their button title, color, path to the endpoint and formItems, which is an array of objects.
In formItems
is possible to specify the type of input form (username, phone, email and password) and their titles and hint messages. On submit, filled information will be set as a JSON body of the POST request to the specified path of telco
service.
widgets: [{
component_id: "WidgetContainer",
dataSource: "ES",
type: "Value",
title: "Title",
...
actionButton: {
title: 'Trigger', // Title of the button
backgroundColor: 'primary', // Color of the button (only reactstrap and bootstrap colors, if wider range needed, configure it via external CSS)
buttonOutline: true, // Set the outline of the button (default false)
action:"/path/of/the/endpoint", // Url path of the endpoint (without protocol and host)
formItems:[
{form:"username", title:"Username", hint: "Input username"},
{form:"phone", title:"Phone", hint: "Input phone"},
{form:"email", title:"Email", hint: "Input email"},
{form:"password", title:"Password", hint: "Input password"}
] // Forms to use in the popover
}, // Add an aritrary button which will perform some action via triggering an endpoint defined in path
...
}]
``` -->
### Grid system (optional setup)
Grid can be configured unique for various dashboards. Thus grid configuration can be implemented into the dashboard's configuration, as seen in example. If not specified in the configuration, the default Grid setup is used.
"Dashboard:grid": {
"preventCollision": false // If set to true, it prevents widgets from collision on the grid
},
"Dashboard:grid:breakpoints": {
"lg": 1200,
"md": 996,
"sm": 768,
"xs": 480,
"xxs": 0
},
"Dashboard:grid:cols": {
"lg": 12,
"md": 10,
"sm": 6,
"xs": 4,
"xxs": 2
},
...
} ``` Above setup is also the default dashboard setup.
Authorization / Disable configuration¶
Dashboard can be limited for access only with specific tenant(s). This means, that users without particular tenant(s) are not able to access the dashboard. This is convenient e.g. when administrator wants to limit access to dashboards with sensitive data to particular group(s) of users.
To disable configuration for specific tenant, one have to navigate themselves to Library section of the Application and Disable
particular file by clicking on Switcher in the file.
Disabled file name of particular Dashboard configuration will be then added to .disabled.yaml
file with the affected tenant(s) in the Library node.
Humanize¶
Component used for conversion number values to human readable form.
Diplays values in human readable form, like:
0.000001 => 1 µ
0.00001 => 10 µ
0.0001 => 100 µ
0.001 => 1 m
0.01 => 10 m
0.1 => 100 m
1 => 1
10 => 10
100 => 100
1000 => 1 k
10000 => 10 k
100000 => 100 k
1000000 => 1 M
etc
It can be used for value and multiple value widgets.
To enable Humanize
component in the widget, one has to set
- "humanize": true
in the widget configuration
- "base": <number>
define the base for conversion (recalculation), optional parameter, default is 1000
- "decimals": <number>
define how many decimals it should display, optional parameter
- "displayUnits": true
display the prefix (i.e. µ, m, k, M, G) of the units, optional parameter, default false
- "units": <string>
display the user defined suffix of the units (e.g. B, Hz, ...)
displayUnits
and units
will be put together in the widget and the result can look like MHz
where M
is a prefix and Hz
is a user defined suffix
```
{
...
"Dashboard:widget:valuewidget": {
...
"humanize": true,
"base": 1024,
"decimals": 3,
"displayUnits": true,
"units": "B",
...
},
...
} ```
Hint (optional setup)¶
Hint can be added for any widget except Tools widget. In this way, the hint will be embedded as a tooltip hint beside the Widget title.
After adding the hint, an info icon will appear in the Widget's header (beside the title). After hovering over the icon, the inserted hint will be displayed.
Hint can be only of string
type.
Example of how to add a Hint:
```
{
...
"Dashboard:widget:somewidget": {
...
"hint": "Some hint",
...
},
...
} ```
Widget layout¶
Widget size and position on the grid can be defined for every widget in the configuration. If not set, the widget has its predefined values for layout and position and will be rendered accordingly.
Widgets can be moved and resized within the grid through Dashboard setting prompt
>> Edit
. This is available for users with dashboards:admin
resource.
Example of basic layout settings:
```
{
...
"Dashboard:widget:somewidget": {
...
// Basic setup
"layout:w": 4, // Widget width in grid units
"layout:h": 2, // Widget height in grid units
"layout:x": 0, // Position of the widget on x axis in grid units
"layout:y": 0, // Position of the widget on y axis in grid units
// Custom setup (optional)
"layout:minW": 2, // Minimal width of the widget
"layout:minH": 2, // Minimal height of the widget
"layout:maxW": 6, // Maximal width of the widget
"layout:maxH": 6, // Maximal height of the widget
...
},
...
}
Example of advanced layout settings:
{
...
"Dashboard:widget:somewidget": {
...
// Advanced setup (optional)
"layout:isBounded": false, // If true and draggable, item will be moved only within grid
"layout:resizeHandles": ?Array<'s' | 'w' | 'e' | 'n' | 'sw' | 'nw' | 'se' | 'ne'> = ['se'], // By default, a handle is only shown on the bottom-right (southeast) corner
"layout:static": ?boolean = false, // Fix widget on static position (cannot be moved nor resized). If true, equal to `isDraggable: false, isResizable: false`
"layout:isDraggable": ?boolean = true, // If false, will not be draggable. Overrides `static`
"layout:isResizable": ?boolean = true, // If false, will not be resizable. Overrides `static`
...
},
...
} ```
Widgets¶
Example of widget within the config file: ``` { ...
"Dashboard:widget:somewidget": {
"datasource": "Dashboard:datasource:elastic",
"type": "Value",
"field": "some.variable",
"title": "Some title",
"layout:w": 2,
"layout:h": 1,
"layout:x": 0,
"layout:y": 1
},
...
} ```
Value widget¶
Commonly used to display single value, altough it can also display datetime and the filtered value widget at once. ``` { ...
"Dashboard:widget:valuewidget": {
// Basic setup
"datasource": "Dashboard:datasource:elastic",
"type": "Value", // Type of the widget
"field": "some.variable", // Field (value) displayed in the widget
"title": "Some title", // Title of the widget
// Advanced setup (optional)
"onlyDateResult": true, // Display just date with time
"units": "GB", // Units of the field value
"displayWidgetDateTime": true, // Display date time in the widget underneath the value
"hint": "Some hint", // Display hint of the widget
// Humanize value (can be used for transforming values to human readable form e.g. bytes to GB) (optional)
"humanize": true, // Enable Humanize component
"base": 1024, // Base for the value recalculation for Humanize component
"decimals": 3, // Round value to n digits in Humanize component
"displayUnits": true, // Display prefix of the unit size (like k, M, T,...) in Humanize component
// Layout setup
"layout:w": 2,
"layout:h": 1,
"layout:x": 0,
"layout:y": 1
},
...
} ```
Multiple value widget¶
Used to display multiple values in one widget ``` { ...
"Dashboard:widget:mutliplevaluewidget": {
// Basic setup
"datasource": "Dashboard:datasource:elastic",
"type": "MultipleValue", // Type of the widget
"field:1": "some.variable1", // Fields (values) displayed in the widget
"field:2": "some.variable2", // Number of fields is unlimited
"field:3": "date.time",
"title": "Some title", // Title of the widget
// Advanced setup (optional)
"units": "GB", // Units of the fields value
"displayWidgetDateTime": true, // Display date time in the widget underneath the value
"hint": "Some hint", // Display hint of the widget
// Humanize value (can be used for transforming values to human readable form e.g. bytes to GB) (optional)
"humanize": true, // Enable Humanize component
"base": 1024, // Base for the value recalculation for Humanize component
"decimals": 3, // Round value to n digits in Humanize component
"displayUnits": true, // Display prefix of the unit size (like k, M, T,...) in Humanize component
// Layout setup
"layout:w": 2,
"layout:h": 1,
"layout:x": 0,
"layout:y": 1
},
...
} ```
Status indicator widget¶
Used to display value and status color based on exceedance / subceedance of the limit values. Its core is similar to Value widget. There are 2 types of the setting - one is for displaying colors based on number range, the other is based on displaying values based on string. Number range ``` { ...
"Dashboard:widget:indicatorwidget": {
// Basic setup
"datasource": "Dashboard:datasource:elastic",
"type": "StatusIndicator", // Type of the widget
"field": "some.variable", // Field (value) displayed in the widget
"title": "Some title", // Title of the widget
"lowerBound": 4000, // Lower limit bound
"upperBound": 5000, // Upper limit bound
// Advanced setup (optional)
"lowerBoundColor": "#a9f75f", // Lower bound color
"betweenBoundColor": "#ffc433", // Midst bound color
"upperBoundColor": "#C70039 ", // Upper bound color
"nodataBoundColor": "#cfcfcf", // No data color
"units": "GB", // Units of the field value
"displayWidgetDateTime": true, // Display date time in the widget underneath the value
"hint": "Some hint", // Display hint of the widget
// Layout setup
"layout:w": 2,
"layout:h": 1,
"layout:x": 0,
"layout:y": 1
},
...
} ```
Table widget¶
Used to display multiple values in a form of table. ``` { ...
"Dashboard:widget:tablewidget": {
// Basic setup
"datasource": "Dashboard:datasource:elastic",
"type": "Table", // Type of the widget
"field:1": "@timestamp", // Fields (values) displayed in the widget
"field:2": "event.dataset", // Number of fields is unlimited
"field:3": "host.hostname", // Fields also indicates items displayed in the table header
"title": "Some title", // Title of the widget
// Advanced setup (optional)
"dataPerPage": 5, // Number of data per page
"disablePagination": true, // Disable pagination
"units": "GB", // Units of the field value
"hint": "Some hint", // Display hint of the widget
// Layout setup
"layout:w": 3,
"layout:h": 3,
"layout:x": 0,
"layout:y": 1
},
...
} ```
Tools widget¶
It is similar to Tools module of ASAB WebUI, but transformed to a widget for use in the dashboard. ``` { ...
"Dashboard:widget:toolswidget": {
// Basic setup
"type": "Tools", // Type of the widget
"title": "BitSwan", // Title of the widget
"redirectUrl": "http://www.teskalabs.com", // Redirect URL
"image": "tools/bitswan.svg", // Location of the Tools image (can be also base64 image string instead of path to the location)
// Layout setup
"layout:w": 1,
"layout:h": 1,
"layout:x": 0,
"layout:y": 0
},
...
} ```
Markdown widget¶
Commonly used to edit and display written description in Markdown format. This widget allows user to edit & save the description e.g. of particular dashboard for broader explanation.
For editing, one needs to have at least dashboards:admin
resource.
```
{
...
"Dashboard:widget:mdwidget": {
// Basic setup
"type": "Markdown", // Type of the widget
"title": "Some title", // Title of the widget
// Advanced setup (optional)
"description": "Some description", // Display description in markdown
"hint": "Some hint", // Display hint of the widget
// Layout setup
"layout:w": 2,
"layout:h": 1,
"layout:x": 0,
"layout:y": 1
},
...
} ```
Chart widgets¶
Used to display values in a chart form. For more info about used chart library, please follow this link
Redirection to Discover screen¶
PieChart and BarChart widgets offer redirection to Discover screen option by default.
This feature filter into selected value in the chart and redirect to Discover screen. It is avaiable only for grouped data (groupBy
setup in datasource settings). It can be disabled by setting "disableRedirection": true
in the widget.
When multiple configurations are used in Discover, one can specify configuration name in the widget which will be used in redirection to filter to correct configuration. If multiple Discover configurations are used in one application, it is recommended to specify the configuration names within widgets to avoid redirections to wrong configurations and datasources in Discover. It can be set up with widget prop "configName": "<discover-config-name>"
>>> where <discover-config-name>
is the name of the discover configuration file without extension, e.g. some-config.json
>>> "configName": "some-config"
.
Note: Redirection removes all previously filtered items and modify date time range stored in local storage for Discover screen according to selected in Dashboard chart.
Chart widget colors¶
All charts offers a possiblity to be displayed with one of the pre-defined color schemes.
The color spectra differs based on chart type.
If no or wrong color is specified, the default
color spectra is used.
The color is specified by color
variable in the widget settings:
"Dashboard:widget:barchartwidget": {
"datasource": "Dashboard:datasource:elastic",
"type": "BarChart",
"title": "Some title",
...
"color": "sunset",
...
},
- PieChart
- for gradient color spectra - sunset
, secondary
, safe
, warning
, danger
, default
- for mixed color spectra - cold
, rainbow
, default
- BarChart and others
- single color spectra - sunset
, safe
, warning
, danger
, default
BarChart widget¶
``` { ...
"Dashboard:widget:barchartwidget": {
// Basic setup
"datasource": "Dashboard:datasource:elastic",
"type": "BarChart", // Type of the widget
"title": "Some title", // Title of the widget
"xaxis": "@timestamp", // Values displayed on x-axis
"yaxis": "request.bytes", // Values displayed on y-axis
// Advanced setup (optional)
"table": true, // Allows to display table instead of chart (on button click)
"xlabel": "timestamp", // x-axis label, default is datetime
"ylabel": "bytes", // y-axis label
"xaxisUnit": "ts", // x-axis units
"yaxisUnit": "byte", // y-axis units
"xaxisDomain": ['auto', 'auto'], // Range to display on x-axis (default [0, 'auto'])
"yaxisDomain": ['auto', 'auto'], // Range to display on y-axis (default [0, 'auto'])
"horizontal": true, // Allows the chart to be displayed horizontally
"width": "50%", // Width of the chart in the widget
"height": "50%", // Height of the chart in the widget
"convertBy": 1000, // Chart values will be divided by this number. It serves the purpose and need of data conversion to e.g. MHz, GB, etc.
"hint": "Some hint", // Display hint of the widget
"disableRedirection": true, // Disable redirection to Discover screen (only for BarChart)
"configName": "config-name", // Name of the particular Discover configuration (only for BarChart)
"color": "safe", // Color specification of the widget
// Layout setup
"layout:w": 6,
"layout:h": 3,
"layout:x": 0,
"layout:y": 0
},
...
}
To display aggregation in the chart, one has to set it in the data source setup:
{
...
"Dashboard:datasource:elastic": {
// Basic setup
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "es-pattern*",
"aggregateResult": true // Charts only - it will ask ES for aggregated values (optional)
},
...
} ```
ScatterChart widget¶
Setting of this widget is the same as BarChart widget's. ``` { ...
"Dashboard:widget:scatterchartwidget": {
// Basic setup
...
"type": "ScatterChart", // Type of the widget
...
},
...
} ```
AreaChart widget¶
Setting of this widget is the same as BarChart widget's. ``` { ...
"Dashboard:widget:areachartwidget": {
// Basic setup
...
"type": "AreaChart", // Type of the widget
...
},
...
}
##### LineChart widget
Setting of this widget is the same as BarChart widget's.
"Dashboard:widget:linechartwidget": {
// Basic setup
...
"type": "LineChart", // Type of the widget
...
},
...
}
##### Stacked BarChart widget
**Grouped chart**
Datasource configuration:
"Dashboard:datasource:elastic-stacked": {
// Basic setup
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "pattern*",
"groupBy": [
"sender.address",
"recipient.address"
],
"size": 100, // Define the max size of the group (default is top 20)
"stackSize": 100 // Define the max size of the stacked events (default is top 50)
// Advanced setup
"matchPhrase": "event.dataset: microsoft-office-365" // For default displaying particular match of a datasource
},
...
}
Chart configuration:
{
...
"Dashboard:widget:stackedbarchartwidget": {
"datasource": "Dashboard:datasource:elastic-stacked",
"title": "Some stacked barchart title",
"type": "StackedBarChart",
// Advanced setup (optional)
"table": true, // Allows to display table instead of chart (on button click)
"xlabel": "Sender x recipient", // x-axis label, default is datetime
"ylabel": "Count", // y-axis label
},
...
}
**Aggregated chart on timescale**
Datasource configuration:
{
...
"Dashboard:datasource:elastic-aggregation-stacked": {
// Basic setup
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "pattern*",
"aggregateResult": true, // Set aggregation to true
"groupBy": "o365.audit.Workload", // Value to group aggregation by
"aggregateEvents": [
"device.properties.OS",
"organization.id"
], // Additional events for aggregation (optional but recommended)
"size": 100 // Define the max size of the group (default is top 20)
// Advanced setup
"matchPhrase": "event.dataset: microsoft-office-365" // For default displaying particular match of a datasource
},
...
}
Chart configuration:
{
...
"Dashboard:widget:stackedbarchartwidget": {
"datasource": "Dashboard:datasource:elastic-aggregation-stacked",
"title": "Some stacked barchart title",
"type": "StackedBarChart",
// Advanced setup (optional)
"table": true, // Allows to display table instead of chart (on button click)
"xlabel": "Sender x recipient", // x-axis label, default is datetime
"ylabel": "Count", // y-axis label
},
...
} ```
PieChart widget¶
To display values in the chart, one has to set groupBy
in the data source setup.
groupBy
will ask ES for aggregation by term defined in the "groupBy" key:
```
{
...
"Dashboard:datasource:elastic-groupby": {
// Basic setup
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "es-pattern*",
"groupBy": "user.id",
"size": 100, // Define the max size of the group (default is top 20)
// Advanced setup
"matchPhrase": "event.dataset: microsoft-office-365" // For default displaying particular match of a datasource
},
...
}
"Dashboard:widget:piechartwidget": {
// Basic setup
"datasource": "Dashboard:datasource:elastic-groupby",
"type": "PieChart", // Type of the widget
"title": "Some title", // Title of the widget
// Advanced setup (optional)
"table": true, // Allows to display table instead of chart (on button click)
"tooltip": true, // Display either Tooltip or text (by default) in the widget. Alternatively, `tooltip: "both"` can be set to display both possiblilities
"useGradientColors": true, // PieChart will use pre-default gradient colors - by default set to false
"displayUnassigned": true, // Display unassigned values (empty string keys or keys with dash) within the PieChart - by default set to false
"field": "timestamp", // Field for displaying the values in the Pie chart (if groupBy not defined in datasource)
"width": "50%", // Width of the chart in the widget
"height": "50%", // Height of the chart in the widget
"hint": "Some hint", // Display hint of the widget
"disableRedirection": true, // Disable redirection to Discover screen
"configName": "config-name", // Name of the particular Discover configuration,
"color": "safe", // Color specification of the widget - spectre and names can differ if gradient colors are used
// Layout setup
"layout:w": 6,
"layout:h": 3,
"layout:x": 0,
"layout:y": 0
},
...
}
#### FlowChart widgets
Used to display flowcharts in svg format
##### FlowChart widget
Flowchart widget is based on mermaid.js flowcharts. For more info, please follow this [link](https://mermaid-js.github.io){:target="_blank"}
"Dashboard:widget:flowchart": {
// Basic setup
"type": "FlowChart", // Type of the widget
"title": "Gantt chart", // Title of the widget
"content": "gantt\ntitle A Gantt Diagram\ndateFormat YYYY-MM-DD\nsection Section\nA task:a1, 2014-01-01, 30d\nAnother task:after a1,20d\nsection Another\nTask in sec:2014-01-12,12d\nanother task: 24d", // Content of the flowchart
// Advanced setup (optional)
"hint": "Some hint", // Display hint of the widget
// Layout setup
"layout:w": 6,
"layout:h": 3,
"layout:x": 0,
"layout:y": 0
},
...
}
**Content of the flowchart**
Content of the flowchart must be of string type. As it is a part of JSON, newlines **must** be separated by `\n`
<!-- TODO: remove this comment when obtaining of the flowchart from URL or API will be implemented -->
There is a plan to implement the content obtainment of the flowchart from API or URL in the next iterations.
Example of conversion of the mermaid string flowchart to be compatible with JSON setting in Library
**Original string:**
gantt
title A Gantt Diagram
dateFormat YYYY-MM-DD
section Section
A task :a1, 2014-01-01, 30d
Another task :after a1 , 20d
section Another
Task in sec :2014-01-12 , 12d
another task : 24d
**Modified string**
gantt\ntitle A Gantt Diagram\ndateFormat YYYY-MM-DD\nsection Section\nA task:a1, 2014-01-01, 30d\nAnother task:after a1,20d\nsection Another\nTask in sec:2014-01-12,12d\nanother task: 24d
```
Dashboard configuration example¶
Example:
{
"Dashboard:datasource:elastic": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "default-events*"
},
"Dashboard:datasource:elastic-aggregation": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "default-events*",
"aggregateResult": true
},
"Dashboard:datasource:elastic-size100": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "default-events*",
"size": 100
},
"Dashboard:datasource:elastic-stacked": {
"type": "elasticsearch",
"datetimeField": "@timestamp",
"specification": "default-events*",
"groupBy": [
"sender.address",
"recipient.address"
],
"matchPhrase": "event.dataset:microsoft-office-365",
"size": 50,
"stackSize": 100
},
"Dashboard:grid": {
"preventCollision": false
},
"Dashboard:grid:breakpoints": {
"lg": 1200,
"md": 996,
"sm": 768,
"xs": 480,
"xxs": 0
},
"Dashboard:grid:cols": {
"lg": 12,
"md": 10,
"sm": 6,
"xs": 4,
"xxs": 2
},
"Dashboard:prompts": {
"dateRangePicker": true,
"dateRangePicker:datetimeStart": "now-15H",
"dateRangePicker:datetimeEnd": "now+10s",
"filterInput": true,
"submitButton": true
},
"Dashboard:widget:table": {
"datasource": "Dashboard:datasource:elastic-size100",
"field:1": "@timestamp",
"field:2": "event.dataset",
"field:3": "host.hostname",
"title": "Table",
"type": "Table",
"layout:w": 6,
"layout:h": 4,
"layout:x": 0,
"layout:y": 9,
"layout:minW": 2,
"layout:minH": 3
},
"Dashboard:widget:hostname": {
"datasource": "Dashboard:datasource:elastic",
"field": "host.hostname",
"title": "Hostname",
"type": "Value",
"layout:w": 2,
"layout:h": 1,
"layout:x": 10,
"layout:y": 12
},
"Dashboard:widget:lastboot": {
"datasource": "Dashboard:datasource:elastic",
"field": "@timestamp",
"units": "ts",
"title": "Last boot",
"type": "Value",
"layout:w": 2,
"layout:h": 1,
"layout:x": 8,
"layout:y": 12,
"layout:minH": 1
},
"Dashboard:widget:justdate": {
"datasource": "Dashboard:datasource:elastic",
"field": "@timestamp",
"onlyDateResult": true,
"title": "Just date",
"type": "Value",
"layout:w": 4,
"layout:h": 2,
"layout:x": 8,
"layout:y": 9
},
"Dashboard:widget:displaytenant": {
"datasource": "Dashboard:datasource:elastic",
"field": "tenant",
"title": "Tenant",
"type": "Value",
"layout:w": 2,
"layout:h": 2,
"layout:x": 6,
"layout:y": 9
},
"Dashboard:widget:baraggregationchart": {
"datasource": "Dashboard:datasource:elastic-aggregation",
"title": "Request body bytes aggregation",
"type": "BarChart",
"xaxis": "@timestamp",
"yaxis": "http.request.body.bytes",
"yaxisDomain": [
"auto",
0
],
"ylabel": "bytes",
"layout:w": 6,
"layout:h": 3,
"layout:x": 0,
"layout:y": 6,
"layout:minW": 4,
"layout:minH": 3,
"layout:isBounded": true
},
"Dashboard:widget:barchart": {
"datasource": "Dashboard:datasource:elastic",
"title": "Request body bytes",
"type": "BarChart",
"hint": "Some hint",
"width": "95%",
"xaxis": "@timestamp",
"yaxis": "http.request.body.bytes",
"ylabel": "bytes",
"layout:w": 6,
"layout:h": 3,
"layout:x": 6,
"layout:y": 6
},
"Dashboard:widget:scatterchart": {
"datasource": "Dashboard:datasource:elastic-size100",
"title": "Request body bytes scatter size 100",
"type": "ScatterChart",
"xaxis": "@timestamp",
"xlabel": "datetime",
"yaxis": "http.request.body.bytes",
"ylabel": "http.request.body.bytes",
"layout:w": 6,
"layout:h": 3,
"layout:x": 6,
"layout:y": 0,
"layout:minH": 2,
"layout:maxH": 6
},
"Dashboard:widget:scatteraggregationchart": {
"datasource": "Dashboard:datasource:elastic-aggregation",
"title": "Request body bytes scatter aggregation",
"type": "ScatterChart",
"xaxis": "@timestamp",
"yaxis": "http.request.body.bytes",
"xlabel": "datetime",
"ylabel": "count",
"layout:w": 6,
"layout:h": 3,
"layout:x": 0,
"layout:y": 0
},
"Dashboard:widget:areachart": {
"datasource": "Dashboard:datasource:elastic",
"height": "100%",
"title": "Request body bytes area",
"type": "AreaChart",
"width": "95%",
"xaxis": "@timestamp",
"yaxis": "http.request.body.bytes",
"ylabel": "area bytes",
"layout:w": 6,
"layout:h": 3,
"layout:x": 6,
"layout:y": 3,
"layout:minH": 2,
"layout:maxH": 6,
"layout:resizeHandles": [
"sw"
]
},
"Dashboard:widget:areaaggregationchart": {
"datasource": "Dashboard:datasource:elastic-aggregation",
"title": "Request body bytes area aggregation",
"type": "AreaChart",
"xaxis": "@timestamp",
"xlabel": "datetime",
"yaxis": "http.request.body.bytes",
"ylabel": "count",
"layout:w": 6,
"layout:h": 3,
"layout:x": 0,
"layout:y": 3
},
"Dashboard:widget:multiplevalwidget": {
"datasource": "Dashboard:datasource:elastic",
"type": "MultipleValue",
"title": "Multiple values",
"field:1": "event.dataset",
"field:2": "http.response.status_code",
"field:3": "url.orignal",
"layout:w": 2,
"layout:h": 2,
"layout:x": 6,
"layout:y": 11
},
"Dashboard:widget:statusindicatorwidget": {
"datasource": "Dashboard:datasource:elastic",
"type": "StatusIndicator",
"title": "Bytes exceedance",
"field": "http.request.body.bytes",
"units": "bytes",
"lowerBound": 20000,
"upperBound": 40000,
"lowerBoundColor": "#a9f75f",
"betweenBoundColor": "#ffc433",
"upperBoundColor": "#C70039 ",
"nodataBoundColor": "#cfcfcf",
"layout:w": 2,
"layout:h": 1,
"layout:x": 10,
"layout:y": 11
},
"Dashboard:widget:toolswidget": {
"type": "Tools",
"title": "Grafana",
"redirectUrl": "http://www.grafana.com",
"image": "tools/grafana.svg",
"layout:w": 2,
"layout:h": 1,
"layout:x": 8,
"layout:y": 11
},
"Dashboard:widget:flowchart": {
"title": "Gantt chart",
"type": "FlowChart",
"content": "gantt\ntitle A Gantt Diagram\ndateFormat YYYY-MM-DD\nsection Section\nA task:a1, 2014-01-01, 30d\nAnother task:after a1,20d\nsection Another\nTask in sec:2014-01-12,12d\nanother task: 24d",
"layout:w": 12,
"layout:h": 2,
"layout:x": 0,
"layout:y": 13
},
"Dashboard:widget:markdown": {
"title": "Markdown description",
"type": "Markdown",
"description": "## Markdown content",
"layout:w": 12,
"layout:h": 2,
"layout:x": 0,
"layout:y": 15
},
"Dashboard:widget:barchart-stacked": {
"datasource": "Dashboard:datasource:elastic-stacked",
"title": "Grouped sender X recipient address",
"type": "StackedBarChart",
"xlabel": "Sender x Recipient",
"ylabel": "Count",
"layout:w": 12,
"layout:h": 4,
"layout:x": 0,
"layout:y": 17
}
}
Office 365 dashboard¶
Example:
{
"Dashboard:prompts": {
"dateRangePicker": true,
"filterInput": true,
"submitButton": true
},
"Dashboard:datasource:elastic-office365-userid": {
"datetimeField": "@timestamp",
"groupBy": "user.id",
"matchPhrase": "event.dataset:microsoft-office-365",
"specification": "lmio-default-events*",
"type": "elasticsearch",
"size": 100
},
"Dashboard:datasource:elastic-office365-clientip": {
"datetimeField": "@timestamp",
"groupBy": "client.ip",
"matchPhrase": "event.dataset:microsoft-office-365",
"specification": "lmio-default-events*",
"type": "elasticsearch",
"size": 100
},
"Dashboard:datasource:elastic-office365-activity": {
"datetimeField": "@timestamp",
"groupBy": "o365.audit.Workload",
"matchPhrase": "event.dataset:microsoft-office-365",
"specification": "lmio-default-events*",
"type": "elasticsearch",
"size": 100
},
"Dashboard:datasource:elastic-office365-actions": {
"datetimeField": "@timestamp",
"groupBy": "event.action",
"matchPhrase": "event.dataset:microsoft-office-365",
"specification": "lmio-default-events*",
"type": "elasticsearch",
"size": 50
},
"Dashboard:widget:piechart": {
"datasource": "Dashboard:datasource:elastic-office365-clientip",
"title": "Client IP",
"type": "PieChart",
"table": true,
"layout:w": 6,
"layout:h": 4,
"layout:x": 6,
"layout:y": 0
},
"Dashboard:widget:piechart2": {
"datasource": "Dashboard:datasource:elastic-office365-userid",
"title": "User ID's",
"type": "PieChart",
"useGradientColors": true,
"table": true,
"tooltip": true,
"layout:w": 6,
"layout:h": 4,
"layout:x": 0,
"layout:y": 0
},
"Dashboard:widget:piechart3": {
"datasource": "Dashboard:datasource:elastic-office365-activity",
"title": "Activity by apps",
"type": "PieChart",
"table": true,
"tooltip": true,
"layout:w": 6,
"layout:h": 4,
"layout:x": 6,
"layout:y": 4
},
"Dashboard:widget:barchart": {
"datasource": "Dashboard:datasource:elastic-office365-actions",
"title": "Actions",
"type": "BarChart",
"table": true,
"xaxis": "event.action",
"xlabel": "Actions",
"ylabel": "Count",
"layout:w": 6,
"layout:h": 4,
"layout:x": 0,
"layout:y": 4
}
}
Widgets catalogue¶
Closer view to the widget's settings.
For Dashboard and widget's setup, please refer here.
Value widgets¶
Usage of value widgets is, when displaying data in text form is needed.
Single value widget¶
Serves the purpose of displaying just a single value.
Value can be of type: string
, boolean
, number
.
Single value can display just a date (convert e.g. unix timestamp to human readable date), if needed.
Multiple value widget¶
Serves the purpose displaying n
values in one widget.
Except displaying n
values in one widget, multiple value widget contents the same features as Single value widget.
Values can be of type: string
, boolean
, number
.
Status indicator widget¶
Status indicator widget displays a number value and a color indication based on defined boundaries.
Value can be of type: number
.
Table widget¶
Table widget can display multiple values in the table form.
Values can be of type: string
, boolean
, number
.
Tool widgets¶
Tool widgets serves the purpose of a "button" - that when clicked - open a new tab in the browser with a link (URL address) specified in the configuration of the widget. Images for a tool widget can be loaded directly in the application's public
folder or loaded as a base64
string image.
Chart widgets¶
Usage of chart widgets is, when displaying data in graphical form is needed.
Recharts are used as library for rendering data in the chart form.
BarChart widget¶
BarChart widget has 4 types of displaying data:
BarChart widget with sample data¶
Displaying the data based on some key
on the timeline.
BarChart widget with aggregated data¶
Displaying count of the data based on some key
on the timeline.
BarChart widget with grouped data¶
Displaying the data based on key
by the total count.
BarChart widget with data table¶
Displaying the chart data in the table. This feature can be enabled in the chart settings and activated by clicking the button in top right corner of the widget.
AreaChart widget¶
Contains the same features as BarChart widget, but it displays the data in the widget in area form.
ScatterChart widget¶
Contains the same features as BarChart widget, but it displays the data in the widget in scatter form.
PieChart widget¶
PieChart displays only grouped data in form of pie chart and table.
PieChart with gradient colors¶
In this example, tooltip is used for displaying detailed informations about the particular part (tooltip is not displayed in this image).
PieChart with multiple colors¶
In this example, tooltip is replaced by indication of an active part in the upper left corner of the widget.
FlowChart widget¶
Mermaid is used as a library for rendering flowcharts.
Mermaid playground can be found here.
Ended: Dashboards
Ended: Configuration
Maestro ↵
ASAB Maestro¶
ASAB Maestro is a technology for cluster management.
It is responsible for: - Installation and updating of TeskaLabs LogMan.io - Management of cluster services - Monitoring of the cluster
ASAB Maestro was developed to overcome the challenges of labor-intensive manual cluster configuration.
It brings several advantages:
- Fast installation of TeskaLabs LogMan.io
- Human errors reduction
- Consistency across all deployment sites
- Monitoring across all layers - hardware, containerization, application
- Easy updates of TeskaLabs LogMan.io
Overview of ASAB Maestro functionality¶
Automation¶
TeskaLabs LogMan.io and our other applications are typically deployed on-premises into customer environments, termed sites. ASAB Maestro ensures consistent and rapid deployment across multiple sites through extensive automation. Support teams can assist more customers using fewer resources as automation makes them higly efficient and all sites have a unified setup.
The system guarantees consistent configurations across all applications, cluster technologies (like Apache Kafka and Elasticsearch), and the API gateway (NGINX). ASAB Maestro also streamlines the deployment of web applications to the cluster and handles the deployment of content such as database schemas, initial data loads, and more.
Cluster management¶
Management of cluster services is done from the TeskaLabs LogMan.io Web UI.
ASAB Maestro enforces a global version, representing a comprehensive release version that delineates the versions of all deployed components and confirms their compatibility. It directly results in an easy upgrade procedure when a new product version is released.
Monitoring¶
ASAB Maestro's includes also a centralized cluster monitoring. This monitoring encompasses logging and telemetry from all components that runs in the cluster.
Main components of ASAB Maestro¶
Diagram: Example of the 5 node cluster managed by the ASAB Maestro.
Containerization¶
Beneath the surface, ASAB Maestro employs Docker, and more specifically, Docker Compose, to manage containers. Alternatively, it is also compatible with Podman, providing additional flexibility and security.
ASAB Maestro extends beyond the capabilities of Docker Compose without intricacies and overhead that may come with systems like Kubernetes.
ASAB Remote Control¶
ASAB Remote Control (asab-remote-control
) is a microservice that is responsible for a central cluster management.
It must run in at least one instance in the cluster.
The recommended setup is three instances, next to each ZooKeeper instance.
ASAB Governator¶
ASAB Governator (asab-governator
) is a microservice that interacts locally with the Docker technology.
ASAB Governator must run on each node of the cluster.
The ASAB Governator connects to a ASAB Remote Control.
ASAB Maestro Library¶
ASAB Maestro Library is an open-source repository managed by TeskaLabs with the description of microservices that can be launched in the cluster. The library lives at github.com/TeskaLabs/asab-maestro-library.
Note: Other libraries can be added on top of ASAB Maestro Library to extend the set of managed microservices in the cluster.
Terminology¶
Cluster¶
Cluster is a set of computer nodes working together to achieve high availability and fault tolerance, acting as a single system. The nodes in a cluster can host services, and resources that are distributed among them. Clustering is a common approach used to enhance the performance, availability, and scalability of software applications and services.
Clusters can be geographically distributed across different data centers, cloud providers, regions, or even continents. This geographic distribution helps in reducing latency, increasing redundancy, and ensuring data locality for improved application performance and user experience.
A single cluster is called a site and it is identified by site_id
.
Node¶
Node serves as the fundamental building unit of a cluster, representing either a physical or a virtual machine (server) that runs Linux and supports containerization through Docker or Podman.
Each node within a cluster is assigned a unique identifier, node_id
.
This node_id
not only distinguishes each node but also doubles as the hostname of the node, aiding in network communications and node management within the cluster.
The node is managed by the ASAB Governator microservice.
Tip
node_id
always resolves to the IP address of the node at each location of the cluster.
Core node¶
Core node contributes to the shared consensus of the cluster. This consensus is crucial for maintaining the integrity, consistency, and synchronization of data and operations across the cluster.
Core nodes are instrumental in achieving consensus, a process that ensures all nodes in the cluster agree on the state of data and the sequence of transactions. This agreement is vital for Data Integrity, Transaction Order and Fault Tolerance.
Typically, the first three nodes added to the cluster are designated as core nodes. This number is chosen to balance the need for redundancy and the desire to avoid the overhead of coordinating a large number of nodes. Having three core nodes means that the system can tolerate the failure of one node while still achieving consensus.
Warning
In larger deployments, the number of core nodes MUST BE configured as an odd number, such as 5, 7, 9 and so on, to prevent split-brain scenarios and ensure a majority can be reached for consensus.
Peripheral node¶
Peripheral node, as opposed to a core node, does not participate in forming the shared consensus of the cluster. Its primary function is to provide data services, execute computations, host services, ot it has other supplementary roles, extending the capabilities and capacity of the cluster.
While peripheral nodes do not contribute to the consensus-building of the cluster, they maintain a symbiotic relationship with core nodes.
Service¶
A service is a collection of microservices that provides a cluster-wide functionality. The example is ASAB Library, Zookeeper, Apache Kafka, Elasticsearch and so on.
Each service has its unique identifier within the cluster, called service_id
.
Instance¶
Instance represent a single container that runs a given service. Respectivelly, if a service is scheduled to run on multiple nodes of the cluster, each running entity is the instance of a given service.
Each instance has its unique identifier within the cluster, called instance_id
.
Each instance also has its number, called instance_no
.
Each instance is also tagged by respective service_id
and node_id
.
Tip
instance_no
can be also a string!
Sherpa¶
Sherpa is an auxiliary container launched next to the instances of some specific services. It runs only for a brief time, till the sherpa task is completed.
Typical use case for sherpa is an initial setup of the database.
Model¶
Model represents a wanted layout of the cluster.
It is one or multiple YAML files stored in the Library.
ASAB Maestro manages the cluster by applying the model. All services listed in the model are run or updated by ASAB Remote Control and ASAB Governator services.
Model file name model.yaml
is meant for editing. Other files are created automatically through other LogMan.io components.
Descriptor¶
Descriptor is a YAML file living in the Library saying how a service, more precisely an instance of the service, should be created and launched.
Application¶
Application is a cluster level set of descriptors and their versions.
Global Version¶
Each Application can have multiple global versions. Each version is specified in a version file stored in the Library which stores version for each service of the application.
Tech / Technology¶
Tech stands for a technology. The technology is a specific type of the service (ie NGINX, Mongo) that provides some resources to other services.
Get started ↵
Get Started¶
Install TeskaLabs LogMan.io in few steps.
Single node installation¶
Following steps describe single-node installation of TeskaLabs LogMan.io.
Prerequisites¶
To set up hardware, please follow these instructions. You should start with fresh Linux installation and Docker service running.
If you install LogMan.io together with other services into one machine, be aware that LogMan.io controls these directories by default:
/opt/site/
/data/hdd/
/data/ssd/
Make sure you empty these directories before the installation.
sudo rm -rf /opt/site/* /data/hdd/* /data/ssd/*
If possible, stop all Docker containers and prune all Docker data
docker system prune -af
To run the installation in Podman instead of Docker, please follow these instructions.
Make sure you've configured required Linux kernel parameters.
Bootstrap¶
Run this command to start LogMan.io bootstrap.
docker run -it --rm --pull always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /opt/site:/opt/site \
-v /opt/site/docker/conf:/root/.docker/ \
-v /data:/data \
-e NODE_ID=`hostname -s` \
--network=host \
pcr.teskalabs.com/asab/asab-governator:stable \
python3 -m governator.bootstrap
A simple GUI starts in the terminal.
- Choose to install first cluster node.
- You'll be asked for credentials into TeskaLabs Docker registry. Authorization to download the software is a critical step.
- Check the node_id. It is the hostname of the server and it is resolvable to the IP address listed.
In the end of the Bootstrap process you should get 6 services running in 6 Docker containers:
- ZooKeeper
- ASAB Remote Control
- ASAB Governator
- ASAB Library
- ASAB Config
- Zoonavigator
Note
To see all running Docker containers, simply use:
docker ps
To see all installed Docker images, use:
docker images
Meet the model¶
In your browser, visit http://<node_id>:9001/zoonavigator
. (Replace <node_id>
with the actual node ID, the hostname of the server or the IP address.)
Go to /library/Site/model.yaml
The model will look similar to this one:
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "<node_id>"} }
params:
PUBLIC_URL: "https://localhost" # change this default to custom domain
applications:
- name: "ASAB Maestro"
version: v24.13
Adding new service to the deployment is as easy as adding two lines into the yaml file. Try to add one instance of Mongo service.
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "<node_id>"} }
mongo:
- <node_id>
params:
PUBLIC_URL: "https://localhost" # change this default to custom domain
applications:
- name: "ASAB Maestro"
version: v24.13
When you are happy with the model. You need to apply the model to the deployment. Like that you'll "synchronize the model with the reality".
ASAB Remote Control service is responsible for applying the model. You can use the ASAB Remote Control API to apply changes.
curl -X 'POST' 'http://<node_id>:8891/node/<node_id>' -H 'Content-Type: application/json' -d '{"command": "up"}'
You will see asab-governator-up
container started. This container will install mongo for you and will be removed when finished.
Install WebUI¶
The dependecies for LogMan.io WebUI are:
- nginx
- seacat-auth
Extend the model:
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "<node_id>"} }
mongo:
- <node_id>
seacat-auth:
- <node_id>
nginx:
- <node_id>
params:
PUBLIC_URL: "https://<node_id>"
applications:
- name: "ASAB Maestro"
version: v24.13
webapps:
/: LogMan.io WebUI
/auth: SeaCat Auth WebUI
In section webapps
, specify web applications and locations where they should be accessible. You'll need both LogMan.io and SeaCat Auth web applications.
webapps:
/: LogMan.io WebUI
/auth: SeaCat Auth WebUI
Remember to specify public URL. You can use hostname (node ID) in case you don't have any domain prepared.
params:
PUBLIC_URL: "https://<node_id>"
Finally, apply the model:
curl -X 'POST' 'http://<node_id>:8891/node/<node_id>' -H 'Content-Type: application/json' -d '{"command": "up"}'
Access WebUI¶
The LogMan.io WebUI is now accessible from the Public URL set in the model. Ask for the default login credentials.
Set LMIO Common Library and SMTP¶
In the UI, go to Maintenance > Configuration. To proceed with installation of LogMan.io services, enter distribution point of the LogMan.io Common Library and set it as the second layer of the Library.
libsreg+https://libsreg.z6.web.core.windows.net,libsreg-secondary.z6.web.core.windows.net/lmio-common-library
Configure SMTP server. It is crucial when creating new credentials and allowing people to access LogMan.io WebUI.
To propagate both of these configurations to all the services, apply the changes again.
curl -X 'POST' 'http://<node_id>:8891/node/<node_id>' -H 'Content-Type: application/json' -d '{"command": "up"}'
Now you can create user credentials and suspend the default admin
.
Install 3rd party services¶
Extend the model to install all 3rd party services required in the LogMan.io applications.
Elasticsearch can be very demanding in memory allocation. Set how much memory to allocate to Elasticsearch if needed. Default is 2GB for master node and 28GB for each data node.
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "<node_id>"} }
mongo:
- <node_id>
seacat-auth:
- <node_id>
nginx:
- <node_id>
influxdb:
- <node_id>
grafana:
- <node_id>
telegraf:
- <node_id>
jupyter:
- <node_id>
kafka:
- <node_id>
kafdrop:
- <node_id>
kibana:
- <node_id>
elasticsearch:
instances:
master-1:
node: <node_id>
hot-1:
node: <node_id>
descriptor:
environment:
ES_JAVA_OPTS: "-Xms2g -Xmx2g"
warm-1:
node: <node_id>
descriptor:
environment:
ES_JAVA_OPTS: "-Xms2g -Xmx2g"
cold-1:
node: <node_id>
descriptor:
environment:
ES_JAVA_OPTS: "-Xms2g -Xmx2g"
params:
PUBLIC_URL: "https://<node_id>"
applications:
- name: "ASAB Maestro"
version: v24.13
- name: "LogMan.io"
version: v24.13
webapps:
/: LogMan.io WebUI
/auth: SeaCat Auth WebUI
Apply the changes!
curl -X 'POST' 'http://<node_id>:8891/node/<node_id>' -H 'Content-Type: application/json' -d '{"command": "up"}'
Install ASAB and LogMan.io services¶
To enable descriptors from the LogMan.io application (stored in LogMan.io Common Library) you need to specify the application and its global version in the model. Also, add more services to the model.
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "<node_id>"} }
mongo:
- <node_id>
seacat-auth:
- <node_id>
nginx:
- <node_id>
influxdb:
- <node_id>
grafana:
- <node_id>
telegraf:
- <node_id>
jupyter:
- <node_id>
kafka:
- <node_id>
kafdrop:
- <node_id>
kibana:
- <node_id>
elasticsearch:
instances:
master-1:
node: <node_id>
hot-1:
node: <node_id>
descriptor:
environment:
ES_JAVA_OPTS: "-Xms2g -Xmx2g"
warm-1:
node: <node_id>
descriptor:
environment:
ES_JAVA_OPTS: "-Xms2g -Xmx2g"
cold-1:
node: <node_id>
descriptor:
environment:
ES_JAVA_OPTS: "-Xms2g -Xmx2g"
asab-pyppeteer:
- <node_id>
bs-query:
- <node_id>
asab-iris:
- <node_id>
lmio-installer:
- <node_id>
lmio-receiver:
- <node_id>
lmio-depositor:
- <node_id>
lmio-elman:
- <node_id>
lmio-alerts:
- <node_id>
params:
PUBLIC_URL: "https://<node_id>"
applications:
- name: "ASAB Maestro"
version: v24.13
- name: "LogMan.io"
version: v24.13
webapps:
/: LogMan.io WebUI
/auth: SeaCat Auth WebUI
Apply the changes!
curl -X 'POST' 'http://<node_id>:8891/node/<node_id>' -H 'Content-Type: application/json' -d '{"command": "up"}'
Congratulations! TeskaLabs LogMan.io is installed!
You can continue your installation by adding a new tenant and connecting log sources. See how to connect a log simulator.
Under Construction
Install log simulator¶
To install log simulator, you'll need a running TeskaLabs LogMan.io installation.
The log simulator is a part of LogMan.io Collector. Default configuration of LogMan.io Collector provides you with simulated logs of Microsoft 365, Microsoft Windows Events technologies and Linux sample logs in RFC 3164 format.
Create a tenant¶
Create a tenant in which you want to simulate logs.
- Create new tenant in the UI (Auth&Roles > Tenants > New tenant)
- Assign your credentials to the new tenant
- Go to Maintenance > Configuration and create a new configuration in the
Tenants
folder with the name of your tenant. In the new configuration select ECS schema and your timezone - Log out and log in into the new tenant
Add library with simulated log sources¶
In the UI, go to Maintanance > Configuration
Add next layer of the Library.
libsreg+https://libsreg.z6.web.core.windows.net/lmio-collector-library
Add collector service to model¶
Add lmio-collector
service to services
section of model.yaml
file.
services:
...
lmio-collector:
- <node_id>
Apply the changes!
curl -X 'POST' 'http://<node_id>:8891/node/<node_id>' -H 'Content-Type: application/json' -d '{"command": "up"}'
In the Web UI, go the the Collectors screen and provision new collector.
Create eventlane and start parsing¶
Simply use Event Lane Manager:
curl -X 'PUT' 'http://<node_id>:8954/create-eventlane' -H 'Content-Type: application/json' -d '{"tenant": "<your tenant>", "stream": "microsoft-365-mirage", "node_id": "<node_id>" }'
ASAB Maestro bootstrap¶
The bootstrap is a process of how to deploy ASAB Maestro on the new cluster node.
Bootstap using Docker¶
$ docker run -it --rm --pull always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /opt/site:/opt/site \
-v /opt/site/docker/conf:/root/.docker/ \
-v /data:/data \
-e NODE_ID=`hostname -s` \
--network=host \
pcr.teskalabs.com/asab/asab-governator:stable \
python3 -m governator.bootstrap
Bootstrap using Podman¶
The Podman deployment is inherently root-less, adding extra layer of the security to the deployment.
Note
Version 4+ or Podman is strongly recommended.
This is how to install Podman to Ubuntu 22.04 LTS:
$ export ubuntu_version='22.04'
$ export key_url="https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/unstable/xUbuntu_${ubuntu_version}/Release.key"
$ export sources_url="https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/unstable/xUbuntu_${ubuntu_version}"
$ echo "deb $sources_url/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:unstable.list
$ curl -fsSL $key_url | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/devel_kubic_libcontainers_unstable.gpg > /dev/null
$ sudo apt update
$ sudo apt install podman
Configure Podman:
$ systemctl --user start podman.socket
$ systemctl --user enable podman.socket
$ sudo ln -s /run/user/${UID}/podman/podman.sock /var/run/docker.sock
$ loginctl enable-linger ${USER}
Prepare the OS filesystem layout:
$ sudo mkdir /opt/site
$ sudo chown ${USER} /opt/site
$ mkdir -p /opt/site/docker/conf
$ sudo mkdir /data
$ sudo chown ${USER} /data
Launch the bootstrap:
$ podman run -it --rm --pull always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /opt/site:/opt/site \
-v /opt/site/docker/conf:/root/.docker/ \
-v /data:/data \
-e NODE_ID=`hostname -s` \
--network=host \
pcr.teskalabs.com/asab/asab-governator:stable \
python3 -m governator.bootstrap
Connect new log source and parse data¶
1. Install LogMan.io Collector¶
LogMan.io Collector is a component of LogMan.io that works outside of the LogMan.io cluster. Configure LogMan.io Collector with CommLink protocol to communicate with LogMan.io Receiver, which is a component within the LogMan.io cluster responsible for ingesting messages and storing them in Archive.
Configure collector to use CommLink.
connection:CommLink:commlink:
url: https://<your-domain>/lmio-receiver
Default configuration of LogMan.io Collector provides multiple TCP and UDP ports open for common log sources.
2. Provision LogMan.io Collector¶
In the LogMan.io WebUI, go to Collectors screen and click Provision. Fill in the identity of your LogMan.io Collector. LogMan.io Collector is provisioned into the active tenant.
Warning
Be sure you provision the LogMan.io Collector into the right tenant if you can access multiple tenants. The Collector is provisioned into the active tenant.
Once provisioned, the Collector sends logs to LogMan.io.
You can find new stream in the Archive and data flowing in.
3. Create event lane¶
To parse the data from Archive, an event lane needs to be created. It is a declaration that specifies how the data flow from the archive through crucial components and which parser rule is applied for the stream.
In the Library, create new file in the /EventLanes
folder. For single-node installation, use this template:
define:
type: lmio/event-lane
parsec:
name: /Parsers/<path to parser>
kafka:
received:
topic: received.<tenant>.<stream>
events:
topic: events.<tenant>.<stream>
others:
topic: others.<tenant>
elasticsearch:
events:
index: lmio-<tenant>-events-<stream>
settings:
number_of_replicas: 0
others:
index: lmio-<tenant>-others
settings:
number_of_replicas: 0
Example
In example, let's assume we have new stream in the Archive called linux-rsyslog-10010
, in tenant example
.
You can use Linux/Common parser of the LMIO Common Library.
Create file /EventLanes/example/linux-rsyslog-10010.yaml
define:
type: lmio/event-lane
parsec:
name: /Parsers/Linux/Common
kafka:
events:
topic: events.example.linux-rsyslog-10010
others:
topic: others.example
received:
topic: received.example.linux-rsyslog-10010
elasticsearch:
events:
index: lmio-example-events-linux-rsyslog-10010
settings:
number_of_replicas: 0
others:
index: lmio-example-others
settings:
number_of_replicas: 0
Number of replicas in Elasticsearch
This example is for single-node installation. Single node cannot carry replicas. Thus, the number_of_replicas
is zero. The default setup is 3-node installation, default settings of Elasticsearch index is number_of_replicas: 1
and does not need to be specified in the Event Lane declaration.
4. Add LogMan.io Parsec to model¶
Each Event Lane requires its own LogMan.io Parsec instance. Adjust model to add a LogMan.io Parsec instance. Use this template:
services:
...
lmio-parsec:
instances:
<tenant>-<stream>-<instance_no>:
asab:
config:
eventlane:
name: /EventLanes/<eventlane>.yaml
tenant:
name: <tenant>
node: <node_id>
Example
In this example, stream is named linux-rsyslog-10010
inside example
tenant. Node ID is specific for your installation.
Let's assume it is example_node
. Instance number (instance_no
) must be unique for each LogMan.io Parsec instance of this tenant and stream.
services:
.
.
.
lmio-parsec:
instances:
example-linux-rsyslog-10010-1:
asab:
config:
eventlane:
name: /EventLanes/example/linux-rsyslog-10010.yaml
tenant:
name: example
node: example_node
Warning
Make sure you use absolute paths when referencing to the file or directory in Library.
For example: /Parsers/Linux/Common
5. Apply changes¶
Apply changes in the Library to the installation.
In the terminal, inside /opt/site
directory, run command:
./gov.sh up <node_id>
An instance of LogMan.io Parsec will be created and start parsing data from selected stream.
Eventually, parsed data appear in the Discovery screen.
Ended: Get started
Library setup ↵
Library is a home for various declarations that describe the components required for LogMan.io functionality and their specific behaviour.
Maestro functionality is dependent and managed by several types of Library files:
- The integral part is the model. It states which components to deploy.
- Descriptors provide instructions on how to install each service.
- Version files comprise versions of all software components, creating global version of each application.
All declarations and files used within ASAB Maestro live in the /Site
directory. You will always find model files and application directories on the top level of the /Site
directory. All descriptors and version files are specific for the application. Descriptors, Versions and Web Applications are discussed in further chapters.
There is one more folder in each application called simply Files
. Files in these directories are sorted by the services they belong to. They can be referenced in the descriptors.
ASAB Maestro Model¶
Model is a YAML file or multiple of them describing wanted layout of the cluster. It is the main point of interaction between the user and ASAB Maestro. Model is a place where to customize LogMan.io installation.
Model file(s)¶
Model files are stored in the library at /Site/
folder.
The user-level model file is /Site/model.yaml
.
The administrator edits this file manually, i.e. in the Library.
Multiple model files can exist. Especially managed model pieces are separated into specific files. All model files are merged into a single model data structure when the model is being applied.
Warning
Don't edit model files labeled as automatically generated.
Structure of the model¶
Example of the /Site/model.yaml
:
define:
type: rc/model
services:
nginx:
instances:
1: {node: "node1"}
2: {node: "node2"}
2: {node: "node3"}
mongo:
- {node: "node1"}
- {node: "node2"}
- {node: "node3"}
myservice:
instances:
id1: {node: "node1"}
webapps:
/: My Web application
/auth: SeaCat Auth WebUI
/influxdb: InfluxDB UI
applications:
- name: "ASAB Maestro"
version: v23.32
- name: "My Application"
version: v1.0
params:
PUBLIC_URL: https://ateska-lmio
Section define
¶
define:
type: rc/model
It specifies the type of this YAML file by stating type: rc/model
.
Section services
¶
This section lists all services in the cluster. In the example above, services are "nginx", "mongo" and "myservice".
Each service name must correspond with the respective descriptor in the Library.
The services
section is prescribed as follows:
services:
<service_id>:
instances:
<instance_no>:
node: <node_id>
<instance-level overrides>
...
<service-level overrides>
Add new instance¶
The instances
section of the service entry in services
must specify on which node to run each instance.
This is a canonical, fully expanded form:
myservice:
instances:
1:
node: node1
2:
node: node2
3:
node: node2
Service myservice
is scheduled to run in three instances (instance number 1, 2 and 3) at nodes node1
and node2
.
Following forms are available for brevity:
myservice:
instances: {1: node1, 2: node2, 3: node2}
myservice:
instances: ["node1", "node2", "node2"]
The last example defines only one instance (of the number 1) of the service myservice
that will be scheduled to the node node1
:
myservice:
instances: node1
Removed instances¶
Some services need a fixed instance number for a whole lifecycle of the cluster, especially if some instances are removed.
Renaming and moving instances
Whenever renaming or moving the instances from one node to another, keep in mind there's no reference between "old" and "new" instance. It means that one instance is being deleted and second one created. If you move an instance of a service from one node to another, be aware that data stored on that node and managed or used by the service are not moved.
ZooKeeper
The instance number in ZooKeeper service is used to by the ZooKeeper technology to identify the instance within the cluster. Thus, changing instance number means to remove one ZooKeeper node from the cluster and add a new one.
The removed instance is number two:
myservice:
instances:
1: {node: "node1"}
# There used to be another instance here but it is removed now
3: {node: "node2"}
In the reduced form, null
has to be used:
myservice: ["node1", null, "node2"]
Overriding the descriptor values¶
To override values from the descriptor, you can enter these values on marks <instance-level overrides>
or <service-level overrides>
respectively.
In the following example, number of cpu
is set to 2 at Docker Compose and also the asab
section from the descriptor of the asab-governator
is overriden on the instance level:
services:
...
asab-governator:
instances:
1:
node: node1
descriptor:
cpus: 2
asab:
config:
remote_control:
url:
- http://nodeX:8891/rc
The same override, but on the service level:
services:
...
asab-governator:
instances: [node1, node2]
descriptor:
cpus: 2
asab:
config:
remote_control:
url:
- http://nodeX:8891/rc
Section webapps
¶
The webapps
section describes what web applications to install into the cluster.
See the NGINX chapter for more details.
Section applications
¶
The application section lists the applications from the Library to be included.
applications:
- name: <application name>
version: <application version>
...
The application lives in the Library in the /Site/<application name>/
folder.
The version
is specified in the version
file at /Site/<application name>/Versions/<application version>.yaml
.
Multiple applications can be deployed together in the same cluster if there are multiple application entries in the applications
section of the model.
Version file¶
Example of the version file
/Site/ASAB Maestro/Versions/v23.32.yaml
:
define:
type: rc/version
product: ASAB Maestro
version: v23.32
versions:
zookeeper: '3.9'
nginx: '1.25.2'
mongo: '7.0.1'
asab-remote-control: latest
asab-governator: stable
asab-library: v23.15
asab-config: v23.31
seacat-auth: v23.37-beta
asab-iris: v23.31
Section params
¶
This section contains key/value cluster-level (global) parametrization of the site.
Model-level extensions¶
Some technologies allows the model to specify extensions to their configuration.
Example of the NGINX model-level extension:
define:
type: rc/model
...
nginx:
https:
location /:
- gzip_static on
- alias /webroot/lmio-webui/dist
Multiple model files¶
Besides the user-level model file (/Site/model.yaml
), you can find there also generated model files named based on this pattern: /Site/model-*.yaml
.
Model files are merged into one big model just before processing by ASAB Remote Control.
ASAB Maestro descriptor¶
Descriptors are YAML files living in the Library. Each application consists of a group of descriptors /Site/<application name>/Descriptors/
.
A descriptor provides detailed infomation about the service and/or technology. Descriptors serve as specific extensions of the model.
Note
Descriptors are provided by the authors of each application.
Structure of the descriptor¶
Example of /Site/ASAB Maestro/Descriptors/mongo.yaml
:
define:
type: rc/descriptor
name: MongoDB document database
url: https://github.com/mongodb/mongo
descriptor:
image: library/mongo
volumes:
- "{{SLOW_STORAGE}}/{{INSTANCE_ID}}/data:/data/db"
- "{{SITE}}/{{INSTANCE_ID}}/conf:/etc/mongo:ro"
command: mongod --config /etc/mongo/mongod.conf --directoryperdb
healthcheck:
test: ["CMD-SHELL", 'echo "db.runCommand(\"ping\").ok" | mongosh 127.0.0.1:27017/rs0 --quiet']
interval: 60s
timeout: 10s
retries: 5
start_period: 30s
sherpas:
init:
image: library/mongo
entrypoint: ["mongosh", "--nodb", "--file", "/script/mongo-init.js"]
command: ["echo", "DONE"]
volumes:
- "{{SITE}}/{{INSTANCE_ID}}/script:/script:ro"
- "{{SITE}}/{{INSTANCE_ID}}/conf:/etc/mongo:ro"
depends_on: ["{{INSTANCE_ID}}"]
environment:
MONGO_HOSTNAMES: "{{MONGO_HOSTNAMES}}"
files:
- "conf/mongod.conf": |
net:
bindIp: 0.0.0.0
port: 27017
replication:
replSetName: rs0
- "script/mongo-init.js"
Templating¶
Descriptors utilize Jinja2 templates that are expanded when a descriptor is applied.
Common parameters:
{{NODE_ID}}
: Node identification / hostname of the host machine (node1
).{{SERVICE_ID}}
: Service identification (i.e.mongo
).{{INSTANCE_ID}}
: Instance identification (i.e.mongo-2
).{{INSTANCE_NO}}
: Instance number (i.e,2
).{{SITE}}
: Directory with the site config on the host machine (i.e./opt/site
).{{FAST_STORAGE}}
: Directory with the fast storage on the host machine (i.e./data/ssd
).{{SLOW_STORAGE}}
: Directory with the slow storage on the host machine (i.e./data/hdd
).
Note
Other parameters can be specified within descriptors, in the model or provided by technologies.
Technologies¶
Not only that multiple Library files are composed into the final configuration. There are also techs playing their parts. Techs are part of ASAB Remote Control microservice and provide extra cluster configuration.
Some of them also introduce specific sections to the descriptors.
Learn more about Techs
Composability¶
Descriptors can be overridden in the deployment via specific configuration options or through model.
/Site/<application name>/Descriptors/__commons.yaml__
file of the Library is a common base for all descriptors of the application. It specifically contains entries for network mode, restart policy, logging and others.- The specific descriptor of the service (e.g.
/Site/<application name>/Descriptors/nginx.yaml
) is layered on the top of the content of the__commons__.yaml
- Model can override the descriptor.
Merge algorithm
This composability is implemented through a merge algorithm. You'll find the same algorithm being used in multiple cases where chunks from various sources result into a functional site configuration.
Library layering
To get a full picture of the Library within ASAB Maestro, learn also about ASAB Library layering.
Sections¶
Section define
¶
define:
type: rc/descriptor
name: <human-readable name>
url: <URL with relevant info>
type
must be rc/descriptor
.
Items name
and url
provide information about the service and/or technology.
Section params
¶
Specify parameters for templating of this and all other descriptors. Any parameter specified in this section can be used in the double courly brackets for Jinja2 templating.
define:
type: rc/descriptor
params:
MY_PARAMETER: "ABCDEFGH"
descriptor:
environment: "{{MY_PARAMETER}}"
Section secrets
¶
Similar to params
, secrets can be also used as parameters for templating. However, their value is not specified in the descriptor but generated and stored in the Vault. You can customize the secret by specifying type
and lenght
. Default is "token" of 64 bytes.
define:
type: rc/descriptor
secrets:
MY_SECRET:
type: token
length: 32
descriptor:
environment: "{{MY_SECRET}}"
Warning
Parts of the descriptor are used directly to prepare docker-compose.yaml
. The section secrets
can be specified in the docker-compose.yaml
as well. However, this functionality of Docker Compose is omitted within ASAB Maestro and fully replaced by secrets
section of the descriptor.
Section descriptor
¶
The descriptor
section is a template for a service
section of the docker-compose.yaml
.
Following transformations are done:
- The Jinja2 variables are expanded.
- The version from
../Versions/...
is appended toimage
, if not present. - Specific techs performs custom transformations, these are typically marked by
null
.
Details about volumes
¶
The service has following three storage location available for their persistent data:
{{SITE}}
: the site directory (i.e./opt/site/...
){{SLOW_STORAGE}}
: the slow storage (i.e./data/hdd/...
){{FAST_STORAGE}}
: the fast storage (i.e./data/ssd/...
)
Each instance can create a sub-directory in any of above locations named by its instance_id
.
Section files
¶
This section specifies what files to be copied into the instance sub-directory of the site directory (i.e. /opt/site/...
).
Subsequently, this content can be made available to the instance container by relevant volumes
entry.
List files using the following scheme:
files:
- destination:
source: file_name.txt
OR
files:
- destination:
content: |
Multiline plain text
that will be written into
the destination path.
Destination¶
There are three possible destinations:
- This service
- Other service
- ZooKeeper
1. This service¶
File path specified is relative to the destination in the site directory.
In example, this record in the descriptor...
files:
- script/mongo-init.js:
source: some_source_dir/mongo-init.js
...will create /opt/site/mongo-1/script/mongo-init.js
file if the INSTANCE ID of the mongo instance is mongo-1
.
2. Other service¶
Use URL with service
scheme to target the file into other service.
In example, this record in ANY descriptor of a service in model...
files:
- service://mongo/script/mongo-init.js:
source: some_source_dir/mongo-init.js
...will create /opt/site/mongo-1/script/mongo-init.js
file if mongo instance is present in the model too and its INSTANCE ID is mongo-1
.
3. ZooKeeper¶
Use zk
URL scheme to specify path in ZooKeeper where to upload the file. The file is in "managed" mode. It means that it is always updated according to the current state of the library.
files:
- zk:///asab/library/settings/lmio-library-settings.json:
source: asab-library/setup.json
In this example, a ZooKeeper node with path /asab/library/settings/lmio-library-settings.json
will be created, or updated if it already exists.
Source¶
Source is a relative path to library folder assigned as /Site/<application>/Files/<service>/
. E.g. for mongo
service it referes to /Site/ASAB Maestro/Files/mongo/
.
Source can be both file or folder. Folder path must end with trailing slash.
File source syntax abbreviations
If source is missing in the declaration it shares the same path with the destination.
This entry copies the /Site/ASAB Maestro/Files/mongo/script/mongo-init.js
into /opt/site/mongo-1/script/mongo-init.js
, assuming the instance
identification is mongo-1
:
files:
- "script/mongo-init.js"
The similar entry with trailing /
copies the whole directory from /Site/ASAB Maestro/Files/mongo/script/conf
into /opt/site/mongo-1/conf/
directory.
files:
- "conf/"
Files are not templated
Unlike descriptors and model, files stored in /Site/<application name>/Files/<service_id>/
directory are not templated. That means that curly brackets with parameters are not replaced by the respective values. If you need to use templating within the file, enter the file into the decriptor directly, using the multiline string operator ("|").
Content¶
Declare the content of the file directly in the descriptor. This is especially convenient for short files and/or files that require parameters provided by maestro.
Literal style using pipe (|
) in the yaml file enables writing multiline strings (block scalars).
files:
- "conf/mongod.conf":
content: |
net:
bindIp: 0.0.0.0
port: 27017
replication:
replSetName: rs0
The content
keyword can be omitted for bravity.
files:
- "conf/mongod.conf": |
net:
bindIp: 0.0.0.0
port: 27017
replication:
replSetName: rs0
Section sherpas
¶
The sherpas are auxiliary containers that are launched together with the main instance containers. Sherpa containers are expected to finish relativelly quickly and they are not restarted. Sherpas exited sucessfuly (with exit code being 0) are promptly deleted.
Example:
sherpas:
init:
image: library/mongo
entrypoint: ["mongosh", "--nodb", "--file", "/script/mongo-init.js"]
command: ["echo", "DONE"]
volumes:
- "{{SITE}}/{{INSTANCE_ID}}/script:/script:ro"
- "{{SITE}}/{{INSTANCE_ID}}/conf:/etc/mongo:ro"
depends_on: ["{{INSTANCE_ID}}"]
environment:
MONGO_HOSTNAMES: "{{MONGO_HOSTNAMES}}"
This defines an init
sherpa container.
The container name of the sherpa would be mongo-1-init
, the INSTANCE_ID parameter remains mongo-1
.
The content of the sherpa is a template for a relevant part of the docker-compose.yaml
.
If sherpa does not specify the image
, the service image including the version is used.
Alternatively, we recommend to use docker.teskalabs.com/asab/asab-governator:stable
as an image for a sherpa, since this image is always present and doesn't need to be downloaded.
Versions in ASAB Maestro¶
The global version of an application is specified in the applications
section of the model:
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "lmc01"} }
params:
PUBLIC_URL: "https://maestro.logman.io"
applications:
- name: "ASAB Maestro"
version: v23.47
- name: "LogMan.io"
version: v24.01
Version files describing global versions live in /Site/<application name>/Versions
directory of the Library. E.g. there is a /Site/ASAB Maestro/Versions
directory for ASAB Maestro application.
The version file v24.01.yaml
might look like this:
define:
type: rc/version
product: ASAB Maestro
version: v24.01
versions:
zookeeper: '3.9'
asab-remote-control: latest
asab-governator: stable
asab-library: v23.15-beta
asab-config: v23.45
seacat-auth: v23.47
asab-iris: v23.31-alpha
nginx: '1.25.2'
elasticsearch: '7.17.12'
mongo: '7.0.1'
kibana: '7.17.2'
influxdb: '2.7.1'
telegraf: '1.28.2'
grafana: '10.0.8'
kafdrop: '4.0.0'
kafka: '7.5.1'
jupyter: "lab-4.0.9"
webapp seacat-auth: v23.29-beta
The define
section specifies file type and provides more information about it. It can also serve to store comments and notes.
versions
section takes names of the services as keys and their versions as values. Those are versions of respective docker images. A special records in this list are web applications. Use webapp
keyword to assign a version for specific web application. If the version is not specified, latest verion is used.
Web Applications¶
To install a web application you need:
- Web application stated in the model
- Nginx (and SeaCat Auth)
- Respective web application file in the Library
Model¶
Use webapps
section to state which web applications should be installed. Choose nginx location where each web app is served.
Example of /Site/model.yaml
define:
type: rc/model
services:
zoonavigator:
instances:
1:
node: "lmc01"
nginx:
instances:
1:
node: "lmc01"
mongo:
instances:
1:
node: "lmc01"
seacat-auth:
instances:
1:
node: "lmc01"
applications:
- name: "ASAB Maestro"
version: v23.47
params:
PUBLIC_URL: "https://maestro.logman.io"
webapps:
/: LogMan.io WebUI
/auth: SeaCat Auth WebUI
Dependencies¶
The web apps can be only served from a proxy server Nginx.
Make sure your public URL in the params
section in your model is correct.
Most of the web applications require authorization server. To run LogMan.io web UI successfully, install also SeaCat Auth and Mongo as its dependency.
Web Application file¶
The web app declaration contains the distribution point, Nginx specification and list of the web apps.
- Choose between
mfe
andspa
- Choose server ("https", "http", "internal")
- Specify nginx location where to serve the web application
- Specify the name of the web application
Note
mfe
stands for "micro-frontend" application. LogMan.io Web UI consist of many microfrontend applications.
spa
stands for "single-packed" application.
Version of each application is stated in the version file. Applications not listed in the version files are used in their latest
version.
Web application descriptor for MFE¶
define:
type: rc/webapp
name: TeskaLabs LogMan.io WebUI
url: https://teskalabs.com
webapp:
distribution: https://asabwebui.z16.web.core.windows.net/
mfe:
https:
/: lmio_webui
/asab_config_webui: asab_config_webui
/asab_library_webui: asab_library_webui
/asab_maestro_webui: asab_maestro_webui
/asab_tools_webui: asab_tools_webui
/bs_query_webui: bs_query_webui
/lmio_analysis_webui: lmio_analysis_webui
/lmio_lookup_webui: lmio_lookup_webui
The section webapp
and the key distribution
specifies the base URL from which the application is distributed.
The section mfe
contains the specification of the server (https
, http
or internal
) to which the installation will be performed.
Inside of the server, there is a dictionary of the location "subpath" (/
) and the MFE component name (lmio_webui
).
One location should be /
, it is the entry point into the MFE application.
Web application descriptor for SPA¶
define:
type: rc/webapp
name: TeskaLabs SeaCat Auth WebUI
url: https://teskalabs.com
webapp:
distribution: https://asabwebui.z16.web.core.windows.net/
spa:
https: seacat-auth
The section webapp
and the key distribution
specifies the base URL from which the application is distributed.
The section spa
contains the specification of the server (https
, http
or internal
) to which the installation will be performed.
The value seacat-auth
specifies the name of (singular) SPA component to be installed.
Versioning¶
Versions of web application components are specified in the respective /Site/<application name>/Versions/v<application version>.yaml
file:
define:
type: rc/version
product: ASAB Maestro
version: v23.32
versions:
...
webapp seacat-auth: 'v23.13-beta'
webapp lmio_webui: 'v23.43'
The web application has a prefix webapp
with the trailing space.
If the version is not specified, the "master" version is assumed.
This file provides the compatible version combination of the web application components and respective microservices.
Running web application distribution sherpa manually¶
You may need to execute the web application distribution sherpa manually ie to upgrade to a recent version of the web application.
This is how it is done:
$ cd /opt/site
$ ./gov.sh compose up nginx-1-webapp-dist
[+] Running 1/1
✔ Container nginx-1-webapp-dist Created 0.1s
Attaching to nginx-1-webapp-dist
nginx-1-webapp-dist | Installing lmio_webui (mfe) ...
nginx-1-webapp-dist | lmio_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing asab_config_webui (mfe) ...
nginx-1-webapp-dist | asab_config_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing asab_maestro_webui (mfe) ...
nginx-1-webapp-dist | asab_maestro_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing asab_tools_webui (mfe) ...
nginx-1-webapp-dist | asab_tools_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing bs_query_webui (mfe) ...
nginx-1-webapp-dist | bs_query_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing lmio_analysis_webui (mfe) ...
nginx-1-webapp-dist | lmio_analysis_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing lmio_lookup_webui (mfe) ...
nginx-1-webapp-dist | lmio_lookup_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing lmio_observability_webui (mfe) ...
nginx-1-webapp-dist | lmio_observability_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing lmio_parser_builder_webui (mfe) ...
nginx-1-webapp-dist | lmio_parser_builder_webui already installed and up-to-date.
nginx-1-webapp-dist | Installing seacat-auth (spa) ...
nginx-1-webapp-dist | seacat-auth already installed and up-to-date.
nginx-1-webapp-dist exited with code 0
$
Ended: Library setup
Techs ↵
Technologies in ASAB Maestro¶
Technology is a specific type of a service (i.e. NGINX, Mongo) that provides resources to other services.
Besides its main functionality as a service, ASAB Remote Control microservice extends the technologies and their impact on the cluster configuration.
Some configuration options require up-to-date knowledge of the cluster components. For example, if a microservice needs configuration of Kafka servers, ASAB Remote Control's Kafka tech checks where Kafka is running and provides the configuration.
Each technology is designed to provide one or many from following features:
Parameters¶
Technology provides parameters that can be used in the model and descriptor during templating.
Descriptor section¶
A technology may utilize its specific section of the descriptors. For example, see Nginx tech.
Configuration of ASAB Services¶
ASAB service in this context is recognized by having asab
section in the descriptor.
See ASAB tech to know how configuration of an ASAB service is built.
Techs can expand configuration of ASAB services (e.g. Elasticsearch or Kafka techs).
Example
define:
type: rc/descriptor
name: ASAB Remote Control
descriptor:
image: docker.teskalabs.com/asab/asab-remote-control
volumes:
- "{{SITE}}/{{INSTANCE_ID}}/conf:/conf:ro"
asab:
configname: conf/asab-remote-control.conf
config: {}
nginx:
api: 8891
Adjusting conifguration of the service¶
Some techs adjust their own configuration based on the current cluster layout.
ASAB Services within ASAB Maestro¶
Configuration of ASAB Services¶
This technology provides every ASAB service with its specific configuration.
asab
section must be specified in the descriptor.
The asab
section requires:
configname
- Name of the configuration file that corresponds with the Dockerfile of the service and mapping of the volumes (Dockerfiles are not covered by the ASAB Maestro at all.)config
- Specific configuration required on the top of the general and generated configuration written in YAML format.
define:
type: rc/descriptor
name: ASAB Remote Control
descriptor:
image: docker.teskalabs.com/asab/asab-remote-control
volumes:
- "{{SITE}}/{{INSTANCE_ID}}/conf:/conf:ro"
asab:
configname: conf/asab-remote-control.conf
config: {}
The configuration is being composed in this order:
- The most important is the generated configuration and it overrides any other. This is the configuration provided from the cluster technologies.
- Second is the service configuration governed by ASAB Config, editable from the Web UI.
- General configuration is also inside ASAB Config and reachable from Web UI. This configuration is common for all ASAB Services. It consists of Library and SMTP Server configuration.
- Configuration present in the model. Instance configuration overrides service configuration.
- Configuration from the descriptor of the service.
- Last but not least is the default configuration. It ensures that the service gets connected to Library.
{ "library": { "providers": [ "zk:///library", "git+https://github.com/TeskaLabs/asab-maestro-library.git", ], } }
ASAB Governator technology¶
Extends configuration of ASAB Governator instances to provide up-to-date connection urls of all ASAB Remote Control instances in the cluster. It does not affect other services.
Part of the ASAB Governator configuration created by the tech:
[remote_control]
url=http://asab-remote-control-1:8891/rc
http://asab-remote-control-2:8891/rc
http://asab-remote-control-3:8891/rc
ASAB Config technology¶
Descriptors can specify default cluster configurations thanks to ASAB Config technology.
Descriptor section asab-config
¶
This technology reads asab-config
section from all descriptors and creates cluster configuration.
E.g. Kafdrop descriptor specifies configuration file Kafdrop.json
for configuration type Tools
:
define:
type: rc/descriptor
name: Kafdrop
url: https://github.com/obsidiandynamics/kafdrop
descriptor:
image: obsidiandynamics/kafdrop
environment:
SERVER_PORT: 9000 # this is the only working port
SERVER_SERVLET_CONTEXTPATH: /kafdrop
KAFKA_BROKERCONNECT: '{{KAFKA_BOOTSTRAP_SERVERS}}'
asab-config:
Tools:
Kafdrop:
file:
{
"Tool": {
"image": "media/tools/kafka.svg",
"name": "Kafdrop",
"url": "/kafdrop"
},
"Authorization": {
"tenants": "system"
}
}
if_not_exists: true
The instruction in the descriptor is to create configuration Kafdrop
of type Tools
. Inside the Kafdrop
configuration, you can see two sections: file
and if_not_exists
.
file
section expects configuration file, a json file directly inserted into the descriptor (as here in the example). The other option is to specify a file from the/Site/Files/<service_id>/
directory of the Library, similarly as in thefiles
section of the descriptor.if_not_exists
allows only two options:true
orfalse
. Default isfalse
- it means that the configuration is uploaded and updated according to the descriptor everytime the model is being applied. Whentrue
, the configuration is created only if it does not yet exist. That means, that such configuration can be changed manually and won't be overwritten by the automatic actions of ASAB Remote Control. On the other hand, such configuration is not updated with new versions of the descriptor.
Elasticsearch in ASAB Maestro¶
Elasticsearch technology extends Elasticsearch service configuration and connects Elasticsearch to other services.
Parameters¶
List of parameters created by Elasticsearch tech, available to the model and descriptors whenever Elasticsearch is present in the installation.
- ELASTIC_HOSTS_KIBANA
-
A string from URLs to all ElasticSearch master nodes.
"[http://lmc01:9200, http://lmc02:9201, http://lmc03:9202]"
Kibana descriptor
define: type: rc/descriptor name: Kibana url: https://www.elastic.co/kibana descriptor: image: docker.elastic.co/kibana/kibana volumes: - "{{SITE}}/{{INSTANCE_ID}}/config:/usr/share/kibana/config" files: - "config/kibana.yml": | # https://github.com/elastic/kibana/blob/main/config/kibana.yml server.host: {{NODE_ID}} elasticsearch.hosts: {{ELASTIC_HOSTS_KIBANA}} elasticsearch.username: "elastic" elasticsearch.password: {{ELASTIC_PASSWORD}} xpack.monitoring.ui.container.elasticsearch.enabled: true server.publicBaseUrl: {{PUBLIC_URL}}/kibana server.basePath: "/kibana" server.rewriteBasePath: true
- ELASTIC_PASSWORD
-
A secret allowing access and write operations to Elasticsearch.
Configuration of ASAB Services¶
Every ASAB Service obtains elasticsearch
configuration section.
[elasticsearch]
url=http://lmc01:9200
http://lmc02:9201
http://lmc03:9202
username=elastic
password=<ELASTIC_PASSWORD>
Elasticsearch configuration¶
Environment variables¶
There are several environment variables in the Elasticserach descriptor set to null
and replaced by the tech.
- node.roles
-
Roles are read from the model for each instance The names specified for each instance (master, hot, warm, cold) are translated into node.riles as followed:
- master ->
node.roles=master,ingest
- hot ->
node.roles=data_content,data_hot
- warm ->
node.roles=data_warm
- cold ->
node.roles=data_cold
No other names and tiers are supported by Elasticsearch, nor ASAB Maestro.
- master ->
- http.port
-
Every Elasticsearch instance gets assigned a unique port based on its role. All master instances start at 9200, all hot instances start at 9250, all warm instances start at 9300, all cold instances start at 9350.
- transport.port
-
Every Elasticsearch instance gets assigned a unique port for inner (inter-elastic) communication. All master instances start at 9400, all hot instances start at 9450, all warm instances start at 9500, all cold instances start at 9550.
- cluster.initial_master_nodes
-
All master instances of Elasticsearch.
- discovery.seed_hosts
-
All master nodes excluding the one being configured.
- ES_JAVA_OPTS
-
If
-Xms
is not set, yet, it is set to-Xms2g
for master nodes and-Xms28g
for other nodes. If-Xmx
is not set, yet, it is set to-Xmx2g
for master nodes and-Xmx28g
for other nodes.
Certificates¶
Communication of the Elasticsearch instances is secured by certificates. Certificates are generated by the ASAB Remote Control, using its certificate authority.
Nginx configuration¶
Ports assigned to the master nodes are propagated to Nginx configuration to create an upstream
record for the elasticsearch service.
InfluxDB in ASAB Maestro¶
Extends InfluxDB service configuration and enables other services to connect to the InfluxDB.
Parameters¶
Organization
and bucket
parameters of InfluxDB configuration are set strictly as follows:
- INFLUXDB_ORG
-
system
- INFLUXDB_BUCKET
-
metrics
Configuration of ASAB Services¶
Every ASAB Service obtains asab:metrics
and asab:metrics:influxdb
configuration sections.
[asab:metrics]
target=influxdb
[asab:metrics:influxdb]
url=http://influxdb:8086/
token=I1x1URqoTP31o6lmnZO1gbm_FkskGoIkRnsVKoJLmLSOd8YQQNoLBkRpDzSxVJR17JoFQ3DvMXcmJn9ItjLoTQ
bucket=metrics
org=system
Environment variables¶
- DOCKER_INFLUXDB_INIT_USERNAME and DOCKER_INFLUXDB_INIT_PASSWORD
-
ASAB Maestro creates and stores (in Vault) admin access into the InfluxDB.
Kafka in ASAB Maestro ¶
Provides connection to all service through dynamic ASAB configuration or KAFKA_BOOTSTRAP_SERVERS
connection string.
Parameters¶
- KAFKA_BOOTSTRAP_SERVERS
-
Comma-separated URLs to Kafka instances
lmc01:9092,lmc02:9092,lmc03:9092
Configuration of ASAB Services¶
Every ASAB Service obtains kafka
configuration section.
[kafka]
bootstrap_servers=lmc01:9092,lmc02:9092,lmc03:9092
Mongo DB in ASAB Maestro¶
Enables other services to connect to Mongo database. Parameters are also used by Mongo and SeaCat Auth sherpas.
Parameters¶
- MONGO_HOSTNAMES
-
comma-separated instance ids of all Mongo instances
mongo-1,mongo-2,mongo-3
- MONGO_URI
-
URI to all Mongo instances
mongodb://mongo-1,mongo-2,mongo-3/?replicaSet=rs0
- MONGO_REPLICASET
- is set to
rs0
Configuration of ASAB Services¶
Every ASAB Service obtains mongo
configuration section.
[mongo]
url=mongodb://mongo-1,mongo-2,mongo-3/?replicaSet=rs0
NGINX in ASAB Maestro¶
NGINX technology provides:
- Application gateway capabilities
- Load balancing
- Service discovery
- Authorization for other services in the cluster
Servers¶
ASAB Maestro organizes NGINX configuration into following structure:
- HTTP server:
http
, see the config - HTTPS server:
https
, see the config - Internal HTTP server on a port tcp/8890:
internal
, see the config - Upstreams
NGINX configuration in descriptors¶
nginx
section of a descriptor provides information about how the respective service expects the NGINX to be configured.
It means that it specifies proxy forwarding rules that expose the microservice API.
The example of a descriptor /Site/.../Descriptors/foobar.yaml
:
define:
type: rc/descriptor
...
nginx:
api:
port: 5678
upstreams:
upstream-foobar-extra: 1234
https:
location = /_introspect:
- internal
- proxy_method POST
- proxy_pass http://{{UPSTREAM}}/introspect
- proxy_ignore_headers Cache-Control Expires Set-Cookie
location /subpath/api/foobar:
- rewrite ^/subpath/api/(.*) /$1 break
- proxy_pass http://upstream-foobar-extra
server:
- ssl_client_certificate shared/custom-certificate.pem
- ssl_verify_client optional
NGINX configuration to YAML conversion
NGINX configuration in ASAB Maestro is translated into YAML, so it can be included in the model or descriptors.
Following NGINX configuration snipplet:
location /api/myapp {
rewrite ^/api/myapp/(.*) /myapp/$1 break;
proxy_pass http://my-server:8080;
}
becomes in ASAB Maestro YAML files:
location /api/myapp:
- rewrite ^/api/myapp/(.*) /myapp/$1 break
- proxy_pass http://my-server:8080
Similarly, you can add configuration to server block:
server:
- ssl_client_certificate shared/lmio-receiver/client-ca-cert.pem
- ssl_verify_client optional
Section api
¶
api
section allows quick specification of the "main" API of the service.
The key port
specifies TCP port on which the API is exposed by the service.
This entry will generate respective location
and upstream
entries.
Full automation
The api
section can be easily the only section in the nginx
part of the service descriptor.
Section upstream
¶
Specific upstream of a service. Every instance of a service will be added to the upstreams record of the NGINX configuration.
upstream refers to all available instances of a service.
nginx:
upstreams:
upstream-foobar-extra: 1234
Service foobar
defines additional API "extra" that is available on the port tcp/1234
.
It becomes available as an upstream
with the name upstream-foobar-extra
and can be used as http://upstream-foobar-extra
in the proxy_pass
commands in locations.
Resulting upstream
configuration, assuming three instances of foobar
service are located on three nodes of the cluster:
upstream upstream-foobar-extra {
keepalive 32;
server server1:3081;
server server2:3081;
server server3:3081;
}
Server configuration¶
Other possibilities are implemented for each server separately (http
, https
, internal
).
Additional locations can be specified for the server.
Section location
¶
Typically, a proxy configuration of the particular component or the location of the statically served content.
Each additional location is added to the nginx configuration once per service, unless INSTANCE_ID parameter is used in the header of the location. Then, the location is introduced for each instance.
Section server
¶
Server-block configuration.
Model-level NGINX configuration¶
You can specify custom NGINX configuration on the model-level like so:
define:
type: rc/model
...
nginx:
https:
location /my-special-location:
- gzip_static on
- alias /webroot/lmio-webui/dist
This adds location
"/my-special-location" to https
server.
Web applications distribution¶
NGINX technology serves web applications.
webapp-dist
sherpa downloads and installs web applications defined in the model.
Web applications are deployed (if needed) everytime the model is applied (i.e. "up" command is issued).
Example of model.yaml
:
define:
type: rc/model
...
webapps:
/: My Web application
/auth: SeaCat Auth WebUI
/influxdb: InfluxDB UI
...
The section webapps
in the model prescribes deployment of three web applications:
- "My Web application" will be deployed to
/
location of the HTTPS server - "SeaCat Auth WebUI" will be deployed to
/auth
location of the HTTPS server - "InfluxDB UI" will be deployed to
/influxdb
location of the HTTPS server
Supported web application types:
SeaCat Auth in ASAB Maestro¶
SeaCat Auth is an open-source access control technology developed in TeskaLabs.
Integration to ASAB Maestro ensures automatic introspection for all api
locations on the https server.
"Mongo content" sherpa of SeaCat Auth service creates records in the auth
database of Mongo. It helps to integrate authorization of 3rd party services.
Other services can add extra configuration of SeaCat Auth instances if needed.
Descriptor section seacat-auth
¶
Every descriptor can use seacat-auth
section to add configuration section to the SeaCat Auth service and equired data to auth
Mongo database.
seacat-auth:
config:
"batman:elk":
url: "http://{{NODE_ID}}:9200"
username: elastic
password: "{{ELASTIC_PASSWORD}}"
content:
- "cl.json": |
[{
"_id": "kibana",
"application_type": "web",
"authorize_anonymous_users": false,
"client_name": "Kibana",
"code_challenge_method": "none",
"grant_types": [
"authorization_code"
],
"redirect_uri_validation_method": "prefix_match",
"redirect_uris": [
"{{PUBLIC_URL}}/kibana"
],
"response_types": [
"code"
],
"token_endpoint_auth_method": "none",
"cookie_entry_uri": "{{PUBLIC_URL}}/api/cookie-entry",
"client_uri": "{{PUBLIC_URL}}/kibana"
}]
Section seacath-auth:config
¶
Add configuration to all SeaCat Auth instances in YAML format.
Section seacat-auth:content
¶
Similarly as in the files
section of the descriptor, this is a list of records.
Each record is either a name of a file inside /Site/Files/<service_id>/
directory or a key:value record with key being the file name and value the file itself written as string.
The name of the file must correspond with target Mongo collection name.
Interaction with NGINX configuration¶
Presence of SeaCat Auth in the cluster adds introspection to NGINX configuration. Introspection endpoint is added and all NGINX locations originated from the nginx:api
adopt this introspection.
With SeaCat Auth in the cluster, only authorized requests can pass to the back-end services.
For internal communication among the services use internal
HTTP nginx server
In example, asab-governator
location that requires introspection endpoint (handled by the SeaCat Auth)
# GENERATED FILE!
location /api/asab-governator {
auth_request /_oauth2_introspect;
auth_request_set $authorization $upstream_http_authorization;
proxy_set_header Authorization $authorization;
rewrite ^/api/asab-governator/(.*) /$1 break;
proxy_pass http://upstream-asab-governator-8892;
# GENERATED FILE!
location = /_oauth2_introspect {
internal;
proxy_method POST;
proxy_set_body "$http_authorization";
proxy_pass http://upstream-seacat-auth-private/nginx/introspect/openidconnect;
proxy_set_header X-Request-URI "$scheme://$host$request_uri";
proxy_ignore_headers Cache-Control Expires Set-Cookie;
proxy_cache oauth2_introspect;
proxy_cache_key "$http_authorization $http_sec_websocket_protocol";
proxy_cache_lock on;
proxy_cache_valid 200 30s;
ZooKeeper in ASAB Maestro¶
ZooKeeper is the consensus technology for ASAB Maestro. All other services need to communicate with ZooKeeper to access cluster-level data. Thus, ZooKeeper Server string is provided as parameter to all services and ASAB services get [zoookeeper] configuration section from the ZooKeeper Tech.
Parameters¶
- ZOOKEEPER_SERVERS
-
Comma separated addresses to all ZooKeeper instances. In a three-node cluste (with nodes named lm1, lm2, lm3) the
ZOOKEEPER_SERVERS
parameter would be replaced withlm1:2181,lm2:2181,lm3:2181
string.Example
define: type: rc/descriptor name: Web-based ZooKeeper UI url: https://zoonavigator.elkozmon.com/ descriptor: image: elkozmon/zoonavigator volumes: - "{{SLOW_STORAGE}}/{{INSTANCE_ID}}/logs:/app/logs" environment: HTTP_PORT: "9001" CONNECTION_ZK_NAME: Local ZooKeeper CONNECTION_ZK_CONN: "{{ZOOKEEPER_SERVERS}}" AUTO_CONNECT_CONNECTION_ID: ZK BASE_HREF: /zoonavigator
Configuration of ASAB Services¶
Every ASAB Service obtains zookeeper
configuration section.
[zookeeper]
servers=lmc01:2181,lmc02:2181,lmc03:2181
Environment variables¶
Available for the respective ZooKeeper instance only.
- ZOO_MY_ID
-
Instance number of each ZooKeeper instance becomes
ZOO_MY_ID
environment variable of the ZooKeeper (Docker) container. That's why renaming of the ZooKeeper instances in the model could be problematic.
Ended: Techs
Consensus ↵
Consensus in ASAB Maestro¶
LogMan.io is a cluster technology. This fact brings high availability and security to the product. However, it also brings higher complexity of the system. Many services and microservices need to communicate among the cluster and share data. We use Apache ZooKeeper as a consensus technology in the distributed system. In ZooKeeper, all services have access to a "common truth" wherever in the cluster they are.
The core of the "common truth" is stored in the /asab
node of the ZooKeeper.
/asab
content¶
/asab/ca
- Certificate Authority/asab/config
- Cluster Configuration/asab/docker
- stores Docker configuration shared among the cluster, including credentials for Docker registry./asab/nodes
- Connected cluster nodes/asab/run
- data advertised by running ASAB microservices/asab/vault
- storage of secrets
You might notice that some pieces of information about the cluster overlap. To provide reliable data about the cluster, we use multiple data sources. You might notice multiple service discovery strategies and multi-level monitoring.
Certificate Authority¶
ASAB Remote Control creates and operates internal Certificate Authority (CA).
Technologies using CA¶
Elasticsearch¶
Communication among Elasticsearch instances is secured by custom certificates, automatically generated by ASAB Remote Control during ElasticSearch installation.
Nginx¶
Nginx is supplied by default with SSL certificates from local CA.
Cluster Configuration¶
It stores custom configuration options specific to the deployment and very often common for multiple services in the cluster.
ASAB Config¶
The ASAB Config microservice is probably the smallest microservice in the LogMan.io ecosystem, although this does not lessen its importance. It provides REST API to the content of the cluster configuration, mainly used by the Web UI.
The configuration is accessible and editable from the Web UI.
Oragnization of cluster configuration¶
Cluster configuration is organized by configuration types. Each type (e.g. Discover, Tenants) provides JSON schema describing the nature of the configuration files.
Each configuration file has to match with JSON schema of its type.
/asab/config
ZooKeeper node structure:
- /asab/config/
- Discover/
- lmio-system-events.json
- lmio-system-others.json
- Tenants/
- system.json
Example of asab/config/Tenants
JSON schema:
{
"$id": "Tenants schema",
"type": "object",
"title": "Tenants",
"description": "Configure tenant data",
"default": {},
"examples": [
{
"General": {
"schema": "/Schemas/ECS.yaml",
"timezone": "Europe/Prague"
}
}
],
"required": [],
"properties": {
"General": {
"type": "object",
"title": "General tenant configuration",
"description": "Tenant-specific data",
"default": {},
"required": [
"schema",
"timezone"
],
"properties": {
"schema": {
"type": "string",
"title": "Schema",
"description": "Absolute path to schema in the Library",
"default": [
"/Schemas/ECS.yaml",
"/Schemas/CEF.yaml"
],
"$defs": {
"select": {
"type": "select"
}
},
"examples": [
"/Schemas/ECS.yaml"
]
},
"timezone": {
"type": "string",
"title": "Timezone",
"description": "Timezone identifier, e.g. Europe/Prague",
"default": "",
"examples": [
"Europe/Prague"
]
}
}
}
},
"additionalProperties": false
}
Example of /asab/config/Tenants/system.json
:
{
"General": {
"schema": "/Schemas/ECS.yaml",
"timezone": "Europe/Prague"
}
}
Cluster Nodes¶
Example of /asab/nodes
organization:
- /asab/nodes/
- cluster_node_1/
- governator-<uuid>.json
- info.json
- mailbox/
- cluster_node2/
- cluster_node3/
Warning
There is a terminological contradiction that is particularly unfortunate in this chapter. In ASAB Maestro, the word "node" is reserved for a "cluster node" - typically an isolated server connected via a network to the other servers that make up the cluster.
However, ZooKeeper technology uses the term "node" for its file structure. ZooKeeper nodes can contain both data and child nodes. Comparing to a file system, each node behaves as both a file and a directory. Where possible, we refer to ZooKeeper nodes as files and directories. When the node is used both to store data and child nodes, this simplification does not work. Then, the terms "cluster node" and "ZooKeeper node" are used for clarification.
Cluster node¶
Each ZooKeeper node inside /asab/nodes
(e.g. cluster_node_1/
) contain IP address of the connected cluster node.
ip:
- 10.25.128.81
ASAB Governator connection¶
governator-<uuid>.json
file contains information about websocket connection from the ASAB Governator instance present on that cluster node.
Each connection is labeled by an uuid to prevent overlapping of two connection attempts. However, one cluster node should create only one Remote Control-Governator connection.
{"address": "10.25.128.81", "rc_node_id": "cluster_node_1"}
Cluster node detailed info¶
info.json
contains data about the node gathered by the ASAB Governator. It is periodically updated but not collected. Historical data are not present. Instead, follow ASAB metrics, InfluxDB and Grafana dashboards to gain infromation about the cluster health in time.
Mailbox¶
The mailbox/
directory helps manage tasks between two types of services.
ASAB Remote Control instances are identical across the cluster and can substitute for each other.
In contrast, ASAB Governator microservices are unique to each node and cannot be replaced by others.
The mailbox acts as a link between these services, enabling communication and task coordination.
When a user sends instructions from the Web UI, the mailbox helps ASAB Remote Control instances find and communicate with the correct ASAB Governator to execute the task.
ASAB Remote Control instance with the instructions from the user "sends a message" to ASAB Remote Control instance with the target ASAB Governator connection.
Running ASAB microservices¶
Each instance of a ASAB microservice can actively advertise data about itself to the consensus.
Ephemeral and sequenced ZooKeeper nodes are used. It means that:
- Each running instance creates a ZooKeeper node.
- Each node is connected to the running microservice. When microservice stops, the ZooKeeper node disappears.
What you can learn from this data about each instance of ASAB microservices:
- The instance is up and running
- Cluster node on which it is running
- Containerization technology (Docker, Podman or none)
- Image version and when and how it was built
- Time when the container was created and launched
- Port open for web requests
- Health if such microservice can indicate that
This data advertised by running ASAB microservices are among the inputs of cluster monitoring and serve in service discovery.
Example of /asab/run
ZooKeeper node:
- /asab/run/
- ASABConfigApplication.0000000108
- ASABConfigApplication.0000000115
- ASABConfigApplication.0000000144
- ASABGovernatorApp.0000000095
- ASABGovernatorApp.0000000098
- ASABGovernatorApp.0000000104
- ASABLibraryApplication.0000000113
- ASABLibraryApplication.0000000114
- ASABLibraryApplication.0000000147
- ASABRemoteControlApp.0000000109
- ASABRemoteControlApp.0000000110
- ASABRemoteControlApp.0000000107
Example of /asab/run/ASABConfigApplication.0000000108
:
{
"host": "asab-config-3",
"appclass": "ASABConfigApplication",
"launch_time": "2023-12-19T09:52:46.468038Z",
"process_id": 1,
"instance_id": "asab-config-3",
"node_id": "lmc03",
"service_id": "asab-config",
"containerized": true,
"created_at": "2023-11-06T17:18:54.206827Z",
"version": "v23.45",
"CI_COMMIT_TAG": "v23.45",
"CI_COMMIT_REF_NAME": "v23.45",
"CI_COMMIT_SHA": "cf3be4570f363b8e9ed400ffbaea8babac957688",
"CI_COMMIT_TIMESTAMP": "2023-11-06T17:15:54+00:00",
"CI_JOB_ID": "55305",
"CI_PIPELINE_CREATED_AT": "2023-11-06T17:16:58Z",
"CI_RUNNER_ID": "74",
"CI_RUNNER_EXECUTABLE_ARCH": "linux/amd64",
"web": [
[
"0.0.0.0",
8894
]
]
}
Ended: Consensus
Containers managed by ASAB Maestro¶
Hostname and container name¶
The hostname
and the container_name
is set to INSTANCE_ID
(ie mongo-1
).
/etc/hosts
¶
/etc/hosts
is provided by the ASAB Maestro with the names and IP addresses of all instances in the cluster.
This is used for service discovery purposes.
Example of the /etc/hosts
:
# This file is generated by ASAB Remote Control
# WARNING: DON'T MODIFY IT MANUALLY !!!
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# Nodes
192.168.64.1 node1
192.168.64.2 node2
# Instances
192.168.64.1 zoonavigator-1
192.168.64.1 nginx-1
192.168.64.1 mongo-1
192.168.64.1 seacat-auth-1
192.168.64.2 zookeeper-1
192.168.64.2 asab-governator-1
192.168.64.2 asab-library-1
192.168.64.2 asab-config-1
Note
The hosts
file is located at /opt/site/hosts
and mounted into containers.
Environment variables¶
Following environment variables are made available to each instance:
NODE_ID
SERVICE_ID
INSTANCE_ID
Note
Other environment variables can be provided by technologies.
Service Discovery¶
Service discovery in ASAB Maestro is a group of techniques how to find and reach specific service in the cluster network.
Identity¶
Each instance is provided with three identifiers:
- NODE_ID - identifier of the cluster node
- SERVICE_ID - name of the service
- INSTANCE_ID - identifier of the instance
This allows to search the services in various situations by the same names.
NGINX proxy server¶
nginx:api
option in the descriptors creates standardized Nginx configuration.
Example of ASAB Library descriptor:
define:
type: rc/descriptor
name: ASAB Library
descriptor:
image: docker.teskalabs.com/asab/asab-library
volumes:
- "{{SITE}}/{{INSTANCE_ID}}/conf:/conf:ro"
- "{{SLOW_STORAGE}}/{{INSTANCE_ID}}:/var/lib/asab-library"
asab:
configname: conf/asab-library.conf
config: {}
nginx:
api: 8893
All instances are supplied with a location inside the Nginx configuration.
location /api/<instance_id> {
...
}
Moreover, each service has its location and respective upstreams record. Your request is proxied to a random instance of the service.
location /api/<service_id> {
...
}
These locations are accessible in the:
- HTTPS server behind OAuth2 introspection (when authorization server like SeaCat Auth is installed) to be used mainly by Web UI,
- and the
internal
server. This one is accessible from within the cluster, must not be open to the internet and serves for internal communication of backend services.
PUBLIC_URL
It is crucial for the HTTPS server functionality and successful authorization to set PUBLIC_URL parameter in the model.
define:
type: rc/model
services:
zoonavigator:
instances: {1: {node: "lmc01"} }
params:
PUBLIC_URL: "https://maestro.logman.io"
Custom /etc/hosts
¶
Each running container is supplied by custom and automatically actualized configuration inside /etc/hosts
. IP address of each service or instance is resolved using respective INSTANCE_ID or SERVICE_ID.
Example of /etc/hosts
inside the container:
# This file is generated by ASAB Remote Control
# WARNING: DON'T MODIFY IT MANUALLY !!!
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# Nodes
10.35.58.41 lmc02
10.35.58.194 lmc03
10.35.58.88 lmc01
# Instances
10.35.58.88 zoonavigator-1
10.35.58.88 mongo-1
10.35.58.41 mongo-2
10.35.58.194 mongo-3
10.35.58.88 seacat-auth-1
10.35.58.88 nginx-1
10.35.58.88 asab-config-1
10.35.58.41 asab-config-2
10.35.58.194 asab-config-3
10.35.58.88 asab-governator-1
10.35.58.41 asab-governator-2
10.35.58.194 asab-governator-3
10.35.58.88 asab-library-1
10.35.58.41 asab-library-2
10.35.58.194 asab-library-3
10.35.58.88 asab-remote-control-1
10.35.58.41 asab-remote-control-2
10.35.58.194 asab-remote-control-3
10.35.58.88 zookeeper-1
10.35.58.41 zookeeper-2
10.35.58.194 zookeeper-3
# Services
10.35.58.88 zoonavigator
10.35.58.88 mongo
10.35.58.41 mongo
10.35.58.194 mongo
10.35.58.88 seacat-auth
10.35.58.88 nginx
10.35.58.88 asab-config
10.35.58.41 asab-config
10.35.58.194 asab-config
10.35.58.88 asab-governator
10.35.58.41 asab-governator
10.35.58.194 asab-governator
10.35.58.88 asab-library
10.35.58.41 asab-library
10.35.58.194 asab-library
10.35.58.88 asab-remote-control
10.35.58.41 asab-remote-control
10.35.58.194 asab-remote-control
10.35.58.88 zookeeper
10.35.58.41 zookeeper
10.35.58.194 zookeeper
Consensus and data advertised by running ASAB microservices¶
Each ASAB microservice advertise data about itself to the consensus. This data contain NODE_ID, SERVICE_ID and INSTANCE_ID resolved thanks to the custom /etc/hosts
and port. ASAB framework also offers boiler plate code to use aiohttp
client requests with only the SERVICE_ID or INSTANCE_ID of the target service. Thus, each ASAB microservice has tools to access any other ASAB microservice in the cluster.
Example of python code within ASAB application using Discovery Service
async def proxy_to_iris(self, json_data):
async with self.App.DiscoveryService.session() as session:
try:
async with session.put("http://asab-iris.service_id.asab/send_mail", json=json_data) as resp:
response = await resp.json()
except asab.api.discovery.NotDiscoveredError:
raise RuntimeError("ASAB Iris could not be reached.")
ASAB Governator¶
The ASAB Governator - or asab-governator
- is a microservice that has to run on each cluster node.
Connection to ASAB Remote Control¶
The ASAB Governator is connected to the ASAB Remote Control.
The local (ie localhost
) connection is preffered for core nodes but ASAB Governator will connect to other ASAB Remote Control instances if the local one is not present.
ASAB Governator will reconnect (if possible) to the local instance eventually.
Housekeeping¶
The ASAB Governator schedules regularly docker system prune
command to remove old and unused container images and other data.
Merge Algorithm¶
This algorithm is an integral piece of composability in ASAB Maestro.
ASAB Maestro composes various artifacts into a site configuration. In examples:
- You can override descriptor in the model
- Generated configuration of Elasticsearch is provided to each configuration file of each ASAB microservice.
Every declaration or configuration can be transformed into an object (or Python dictionary) or an array (or Python list) or their combination.
The merge algorithm takes two objects (dictionaries) signed with preference. One object is always "more important". When the objects start merging together, their content is compared. If their content is completely different, the resulting object contains all the information from both of them. When there's a conflict, both the objects contain the same key, only the value from the "more important" object is taken into the result. If two arrays are compared, the result is the sum of the arrays (lists).
Troubleshooting ASAB Maestro¶
Apply changes from command line¶
There is ./gov.sh
script in the /opt/site
folder on each node.
Use it to apply changes on any node.
This command applies changes in the model to the current node:
$ cd /opt/site
$ ./gov.sh up
Apply newest changes on a different node. (Replace <node_id>
with the actual node ID.)
$ ./gov.sh up <node_id>
Interacting with the Docker or Podman manually¶
The ./gov.sh
script also works exactly the same as a docker
command but in the correct cluster setup.
This includes the docker compose
bit.
This is useful when ASAB Maestro components are not working as expected and their UI or API is not available.
Example:
$ cd /opt/site
$ ./gov.sh compose up -d
[+] Running 6/6
✔ Container asab-config-1 Started 0.1s
✔ Container asab-remote-control-1 Started 0.1s
✔ Container zookeeper-1 Started 0.1s
✔ Container zoonavigator-1 Started 0.1s
✔ Container asab-library-1 Started 0.1s
✔ Container asab-governator-1 Started 0.1s
$
Manual update to the recent ASAB Governator¶
If you need to manually update the asab-governator
on the particular node, this is a proper procedure:
$ cd /opt/site
$ ./gov.sh image pull docker.teskalabs.com/asab/asab-governator:stable
$ ./gov.sh compose up -d asab-governator-1
Replace asab-governator-1
with a proper instance_id
on the asab-governator
on the given node.
Use ./gov.sh ps -a
to identify the instance_id
.
Nginx can't bind to port 80¶
Nginx log
nginx: [emerg] bind() to 0.0.0.0:80 failed (13: Permission denied)
Solution
$ sudo sysctl -w net.ipv4.ip_unprivileged_port_start=80
Elasticsearch does not start due to virtual memory allocation¶
Solution
$ sudo sysctl vm.max_map_count=262144
Ended: Maestro
Event Lane Manager ↵
LogMan.io Event Lane Manager¶
TeskaLabs LogMan.io Event Lane Manager (or LogMan.io Elman for short) is a microservice responsible for event lane management.
- Read more about event lanes and their declarations in Event Lane.
- Read more about the functionality of LogMan.io Elman and automatic creation of event lanes in Streams
- For the deployment of LogMan.io Elman microservice, see Configuration.
Note
LogMan.io Elman is a preview feature flag that will become a standard part of TeskaLabs LogMan.io in the future.
Configuration¶
Dependencies¶
- Apache Zookeeper is used for reading the tenant configuration.
- Apache Kafka is used for management of Kafka topics and automatic detection of new streams.
- ASAB Remote Control is needed for creation of LMIO Parsecs.
- ASAB Library is needed for uploading event lanes to Library.
Required configuration¶
[kafka]
bootstrap_servers=kafka-1:9092,kafka-2:9092,kafka-3:9092
[zookeeper]
servers=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
[library]
providers=
zk:///library
# (more layers if needed...)
Optional configuration¶
Optionally, number of partitions and retention policy can be configured in [kafkaservice]
section.
[kafkaservice]
enabled=true # Can be disabled
partitions=6 # (default: 6) he default number of partitions of Kafka received, events and others topics
retention.ms_max=604800000 # (default: 7 days) The maximal retention of Kafka topics, decreased to optimal if higher
retention.ms_min=172800000 # (default: 2 days) The minimal retention of Kafka topics, increased to optimal if lower
retention.ms_optimal=259200000 # (default: 3 days) The optimal retention of Kafka topics
Event Lane¶
When you connect a new log source in LogMan.io Collector provisioned to one tenant, events will start sending to LogMan.io Archive. LogMan.io Receiver will create a new stream for these events to store them in logical order. Therefore, every tenant owns multiple streams. Events from a particular stream can be (immediately or afterward) pulled from the Archive for parsing and storing in Elasticsearch database for further analysis.
To create a logical data stream in LogMan.io (from Archive through Kafka to Elasticsearch) and connect the adjacent content in Library(dashboards, reports, correlations, etc), the concept of event lane is used. The event lane is represented by a declaration in Library which determines:
- what parsing rules will be applied for the data stream
- what Library content is assigned to the data stream
- categorization of the data stream (e.g. categorization of the technology that is producing such events)
- what schema is used for the final structured data
- what additional enrichment will be applied to the stream
- configuration entries of the technologies operating with the data (Kafka, Elasticsearch)
The following image illustrates the process in the example, where the new log source
(Fortinet FortiGate) is connected (under operating tenant mytenant
):
- LogMan.io Receiver collects the incoming events and stores the data in Archive in the stream
fortinet-fortigate-10110
for tenantmytenant
. - These events can be pulled back from the Archive for parsing. There are sent to Kafka topic
received.mytenant.fortinet-fortigate-10110
. - LogMan.io Elman detects this topic, assigns a new event lane and creates single or multiple instances of LogMan.io Parsec.
- LogMan.io Parsec consumes raw events from that topic and sends the successfully parsed events and unparsed events to Kafka topics
events.mytenant.fortinet-fortigate-10110
andothers.mytenant
, respectively. - LogMan.io Depositor consumes events from those topics and stores them in Elasticsearch indexes
lmio-mytenant-events-fortinet-fortigate-10110
andlmio-mytenant-others
.
Event Lane declaration¶
Event Lane declaration is specified in a YAML file in Library.
The following example is event lane declaration for log source Microsoft 365 and tenant mytenant
:
define:
name: Microsoft 365
type: lmio/event-lane
timezone: UTC
parsec:
name: /Parsers/Microsoft/365
content:
reports: /Reports/Microsoft/365
dashboards: /Dashboards/Microsoft/365
kafka:
received:
topic: received.mytenant.microsoft-365-v1
events:
topic: events.mytenant.microsoft-365-v1
others:
topic: others.mytenant
elasticsearch:
events:
index: lmio-mytenant-events-microsoft-365-v1
others:
index: lmio-mytenant-others
Define¶
The define
part specifies the type of declaration and
the properties of event lane used for parsing and analyzing the data, such as used schema, timezone and charset.
define:
name: Microsoft 365
type: lmio/event-lane
schema: /Schemas/ECS.yaml # (optional, default: /Schemas/ECS.yaml)
timezone: Europe/Prague # (optional, default is obtained from the tenant configuration)
charset: utf-8 # (optional, default: utf-8)
Parsec¶
Section parsec
refers to the microservice LogMan.io Parsec. In particular, parsec/name
is the directory for parsing rules in Library. It has to always start with /Parsers/
:
parsec:
name: /Parsers/Microsoft/365
Content¶
Section content
refers to the event lane content in the Library, such as dashboards, reports, correlations, etc.
When a new event lane is created, LogMan.io Elman automatically enables the content described in this section.
content:
# Entire directory, described as a single string
dashboards: /Dashboards/Microsoft/365
# Multiple items described as a list
reports:
- /Reports/Microsoft/365/Daily Report.json
- /Reports/Microsoft/365/Weekly Report.json
- /Reports/Microsoft/365/Monthly Report.json
Kafka, Elasticsearch¶
Sections kafka
and elasticsearch
specify properties of Kafka topics and Elasticsearch indexes which belong to that eventlane. These are important for LogMan.io Parsec and LogMan.io Depositor.
The most important property is the name of received
, events
, and others
topics and lmio-events
and lmio-others
indexes.
Kafka topics follow the naming convention:
<type>.<tenant>.<stream>
Elasticsearch indexes follow the naming convention:
lmio-<tenant>-<type>-<stream>
where:
type
can bereceived
,events
orothers
tenant
is the name of the tenantstream
is the name of the log stream
kafka:
received:
topic: received.mytenant.microsoft-365-v1
events:
topic: events.mytenant.microsoft-365-v1
others:
topic: others.mytenant
elasticsearch:
events:
index: lmio-mytenant-events-microsoft-365-v1
others:
index: lmio-mytenant-others
Note
Every tenant has only one others
topic, therefore, there is no specification of the stream in others
topic and index.
Furthermore, additional properties of Elasticsearch (such as number of shards, index lifecycle etc) are configured in elasticsearch
section. Read more in LogMan.io Depositor documentation.
Event Lane template declaration¶
For automatic assignment of parsing rules and Library content, event lane templates are used. When a new stream is found, LogMan.io Elman seeks for a suitable event lane template. When it is found, new event lane is automatically fulfilled with the properties of that template.
The following example illustrates the event lane template for log source Microsoft 365:
---
define:
type: lmio/event-lane-template
name: Microsoft 365
stream: microsoft-365-v1
timezone: UTC
logsource:
vendor:
- microsoft
product:
- m365
service:
- audit
- activitylogs
parsec:
name: /Parsers/Microsoft/365
content:
dashboards: /Dashboards/Microsoft/365
reports: /Reports/Microsoft/365
Define¶
define:
type: lmio/event-lane-template
name: Microsoft 365
stream: microsoft-365-v1
timezone: UTC
- name: Human readable name for Event Lane, derived from the technology of the log source. It is used e.g. for configuration of Discover.
- stream: The name that will be matched with the actual stream.
There can be an exact match (such as in
microsoft-365-v1
), but wildcards (such as*
) are allowed to match a wide range of streams (e.g.fortinet-fortigate-*
). - timezone (optional): Various log sources send the events in a firmly established timezone (e.g. Microsoft 365 uses always UTC). To reflect that, timezone can be prescribed here. Otherwise, each event lane is handled manually.
Categorization¶
Section logsource
is used for categorization of the log source connected to the event lane.
It is derived from Sigma rules.
logsource:
vendor:
- microsoft
product:
- m365
service:
- audit
- activitylogs
Parsec¶
Option parsec/name
is the directory for parsing rules in Library. It has to always start with /Parsers/
:
parsec:
name: /Parsers/Microsoft/365
Content¶
Section content
refers to the event lane content in the Library, such as dashboards, reports, correlations, etc.
LogMan.io Elman automatically disables every content from all event lane templates. When a new event lane is created, its content is enabled.
content:
# Entire directory, described as a single string
dashboards: /Dashboards/Microsoft/365
# Multiple items described as a list
reports:
- /Reports/Microsoft/365/Daily Report.json
- /Reports/Microsoft/365/Weekly Report.json
- /Reports/Microsoft/365/Monthly Report.json
Streams in LogMan.io Elman¶
Detection of new streams in LogMan.io Elman¶
LogMan.io Elman periodically detects received.*
topics in Kafka cluster.
If a new received
topic without an existing event lane is found, a new event lane is created using event lane templates.
For known sources, there is an event lane template, which is used primarily for choosing parsing rules for that event lane.
LogMan.io Elman seeks an appropriate template in /Templates/EventLanes/
. One of the following cases can happen:
-
The event lane template is found in the library. Parsing rules and other specific content is then copied to the event lane. After that, a model for LogMan.io Parsec is created in
/Site/
and LogMan.io Elman sends UP command to ASAB Remote Control. Finally, all the content listed in the event lane is enabled in Library. -
The event lane template is not found in Library. Then an event lane is created, but the path for parsing rules is not fulfilled as well as other event lane specific content. The model is not created automatically. User action is then required. A user can fulfill suitable parsing rules manually in the event lane declaration. Finally, it is possible to update all models manually from the terminal using curl:
curl --location --request PUT 'localhost:8954/model'
Finally, one or more instances of LogMan.io Parsec are created, and the process of parsing starts.
Kafka topics¶
LogMan.io Elman updates properties of Kafka received.*
, events.*
, and others.*
topics.
In particular, the number of partitions for each topic is increased and the retention policy is set. See Configuration for more details.
Library content¶
LogMan.io Elman disables every content from all event lane templates in Library. When a new event lane is created, its content is automatically enabled for the event lane tenant.
Ended: Event Lane Manager
Authorization & Authentication¶
TeskaLabs LogMan.io uses TeskaLabs SeaCat Auth as a component that deals with user authorization, authentication, access control, roles, tenants, multi-factor authentications, integrations with other identity providers and so on.
TeskaLabs SeaCat Auth has its dedicated documentation site.
Features¶
- Authentication
- Second-factor Authentication (2FA) / Multi-factor Authentication (MFA)
- Supported factors:
- Password
- FIDO2 / Webauthn
- Time-based One-Time Password (TOTP)
- SMS code
- Request header (X-Header)
-
Machine-to-Machine
- API keys
-
Authorization
- Roles
-
Role-based access control (RBAC)
-
Identity management
- Federation of identities
- Supported providers:
- File (htpasswd)
- In-memory dictionary
- MongoDB
- ElasticSearch
- LDAP and Active Directory
- Custom provider
-
Multitenancy including tenant management for other services and applications
- Session management
- Single-sign on
- OpenId Connect / OAuth2
- Proof Key for Code Exchange aka PKCE for OAuth 2.0 public clients
- Authorization/authentication introspection backend for NGINX
- Audit trail
- Provisioning mode
- Structured logging (Syslog) and telemetry
LogMan.io Commander¶
LogMan.io Commander allows to run the following utility commands via command line or API.
encpwd
command¶
Passwords used in configurations can be protected by encryption.
Encrypt Password command encrypts password(s) to LogMan.io password format using AES cipher.
The passwords are then used in LogMan.io Collector declarative configuration in the following format:
!encpwd "<LMIO_PASSWORD>"
Configuration¶
The default AES key path can be configured in the following way:
[pwdencryptor]
key=/data/aes.key
Usage¶
Docker container¶
Command Line¶
docker exec -it lmio-commander lmiocmd encpwd MyPassword
API¶
LogMan.io Commander also serves an API endpoint, so the encpwd
command
can be reached via HTTP call:
curl -X POST -d "MyPassword" http://lmio-commander:8989/encpwd
library
command¶
Library command serves to insert library folder structure with all YAML declarations into ZooKeeper, where other components such as LogMan.io Parser and Correlator may dynamically download it from.
The folder structure can be located in the filesystem (mounted to the Docker container) or on GIT url.
This is how to initiate loading of the library into ZooKeeper cluster:
Configuration¶
It is necessary to properly configure the source folder and ZooKeeper output.
[source]
path=/library
[destination]
urls=zookeeper:12181
path=/lmio
The source path can be a GIT repository path prefixed with git://
:
[source]
path=git://<username>:<deploy_token>@<git_url_path>.git
In this way, the library will be automatically cloned from GIT into a temporary folder, uploaded to ZooKeeper and then the temporary folder deleted.
Usage¶
Docker container¶
Command Line¶
docker exec -it lmio-commander lmiocmd library load
Using explicitly defined configuration:
docker exec -it lmio-commander lmiocmd -c /data/lmio-commander.conf library load
API¶
LogMan.io Commander also serves an API endpoint, so the library
command
can be reached via HTTP call:
curl -X PUT http://lmio-commander:8989/library/load
See Docker Compose section below.
iplookup
command¶
The iplookup
command processes IP range databases and generates IP lookup files ready for use with lmio-parser IP Enricher.
It has two subcommands: iplookup from-csv
for processing general CSV files and iplookup from-ip2location
for processing IP2LOCATION CSV files.
Configuration¶
The source and destination directories can be set in a config file
[iplookup]
source_path=/data
destination_path=/data
iplookup from-csv
¶
Reads a generic CSV file and produces an IP Enricher lookup file.
The first row of the file is expected to be the header containing the column names.
The first two columns need to be ip_from
and ip_to
.
Command line interface¶
lmiocmd.py iplookup from-csv [-h] [--separator SEPARATOR] [--zone-name ZONE_NAME] [--gzip] [--include-ip-range] file_name
Positional arguments:
file_name
: Input CSV file
Optional arguments:
-h
,--help
: Show this help message and exit.-g
,--gzip
: Compress output file with gzip.-i INPUT_IP_FORMAT
,--input-ip-format INPUT_IP_FORMAT
: Format of input IP addresses. Defaults to 'ipv6'. Possible values:ipv6
: IPv6 adrress represented as string, e.g. ::ffff:c000:02eb,ipv4
: standard quad-dotted IPv4 adrress string, e.g. 192.0.2.235,ipv6-int
: IPv6 adrress as a 128-bit decimal integer, e.g. 281473902969579,ipv4-int
: IPv4 address as a 32-bit decimal integer, e.g. 3221226219.-s SEPARATOR
,--separator SEPARATOR
: CSV column separator.-o LOOKUP_NAME
,--lookup-name LOOKUP_NAME
: Name of output lookup. It is used as lookup zone name. By default, it is derived from input file name.--include-ip-range
: Include ip_from and ip_to fields in the lookup values.--force-ipv4
: Prevent mapping IPv4 addresses to IPv6. This is incompatible with IPv6 input formats.
Example usage¶
lmiocmd iplookup from-csv \
--input-ip-format ipv6 \
--lookup-name ip2country \
--gzip \
my-ipv6-zones.CSV
iplookup from-ip2location
¶
This command is similar to the iplookup from-csv
command above, but is tailored specifically for processing IP2Location™ CSV databases.
In case of IP2LOCATION LITE databases, the command can infer the input IP format and the column names from the file name.
However, it is possible to specify the column names explicitly
Command line interface¶
lmiocmd.py iplookup from-csv [-h] [--separator SEPARATOR] [--zone-name ZONE_NAME] [--gzip] [--include-ip-range] file_name
Positional arguments:
file_name
: Input CSV file
Optional arguments:
-h
,--help
: Show this help message and exit.-g
,--gzip
: Compress output file with gzip.-s SEPARATOR
,--separator SEPARATOR
: CSV column separator. Defaults to ','.-c COLUMN_NAMES
,--column-names COLUMN_NAMES
: Space-separated list of column names to use. By default, it is inferred from IP2LOCATION file name.-i INPUT_IP_FORMAT
,--input-ip-format INPUT_IP_FORMAT
: Format of input IP addresses. By default, it is inferred from IP2LOCATION file name. Possible values:ipv6-int
: IPv6 adrress as a 128-bit decimal integer, e.g. 281473902969579,ipv4-int
: IPv4 address as a 32-bit decimal integer, e.g. 3221226219.-o LOOKUP_NAME
,--lookup-name LOOKUP_NAME
: Name of output lookup. It is used as lookup zone name. By default, it is derived from input file name.-e, --keep-empty-rows
: Do not exclude rows with empty values (indicated by '-').--include-ip-range
: Include ip_from and ip_to fields in the lookup values.--force-ipv4
: Prevent mapping IPv4 addresses to IPv6.
Example usage¶
With automatic column names and input IP format:
lmiocmd iplookup from-ip2location \
--lookup-name ip2country \
--gzip \
IP2LOCATION-LITE-DB1.IPV6.CSV
With explicit column names and input IP format (the result will be equivalent to the example above):
lmiocmd iplookup from-ip2location \
--lookup-name ip2country \
--gzip \
--column names "ip_from ip_to country_code country_name" \
--input-ip-format ipv6-int
IP2LOCATION-LITE-DB1.IPV6.CSV
Docker Compose¶
File¶
The following docker-compose.yml
file pulls the LogMan.io Commander
image from TeskaLabs' Docker Registry and expects the configuration file
in ./lmio-commander
folder.
version: '3'
services:
lmio-commander:
image: docker.teskalabs.com/lmio/lmio-commander
container_name: lmio-commander
volumes:
- ./lmio-commander:/data
- /opt/lmio-library:/library
ports:
- "8989:8080"
The /opt/lmio-library
path leads to LogMan.io Library repository.
Run the container¶
docker-compose pull
docker-compose up -d
Declarative language SP-Lang¶
TeskaLabs LogMan.io uses SP-Lang as its declarative language for parsers, enrichers, correlators and other components.
SP-Lang has its dedicated documentation site.
Elastic Common Schema (ECS)¶
For more details, see the official documentation.
ECS Generic attributes table¶
Attribute | Description | Values as an example |
---|---|---|
@timestamp | Date/time when the event originated. | 2022-05-23T08:05:34.853Z |
client.ip | The IP address of the device that was used when the activity was logged. The IP address is displayed in either an IPv4 or IPv6 address format. | 52.108.224.1 |
cnt | Count of events. | 1 |
destination.ip | The original destination IP address of the device that was used when the activity was logged. | 85.162.11.26 |
ecs.version | ECS version this event conforms to. | 1.0.0 |
event.action | Description of the original event that triggered creating of the particular log. | UserLoggedIn, MessageTrace, FilePreviewed |
event.original | Full and unmodified log for auditing. | 10.42.42.42 - - [07/Dec ... |
http.request.method | HTTP request is an action to be performed on a resource identified by a given Request-URL. | get |
http.response.body.bytes | SIze of the HTTP request in bytes. | 2571 |
http.response.status_code | HTTP response status codes indicate whether a specific HTTP request has been successfully completed. | 200 |
http.version | Current version of the Hypertext Transfer Protocol. | 1.1 |
host.hostname | Hostname of the host. | webserver-blog-prod |
message | Text representation of the significant information from the event for succinct display in a log viewer. | "GET /blog HTTP/1.1" 200 2571 |
service.name | Your custom name for this service. | Company blog |
service.type | Type of the service used with this instance. | apache |
source.geo.* | Fields for geo-location. | |
url.original | Original url path. | /blog |
user.name | Name of the user. | Albus Dumbledore |
user_agent.* | Fields describing the user agent. | |
event.dataset | Name of the dataset. | microsoft-office-365 |
event.id | Unique identification value. | b4b4c44c-ff30-4ddd-bfbe-44e082570800 |
event.ingested | Timestamp when an event arrived in the central data store. | 2022-05-23T08:05:34.853Z |
event.kind | Value of this field can be used to inform how these kinds of events should be handled. | alert, enrichment, event, metric, state, pipeline_error, signal |
log.original | Raw log formate that is received before the parcing process takes place. | <165>Jan 17 12:20:25 hostname %ASA-5-111010: User 'harry', running 'N/A' from IP 192.68.0.2, executed 'write memory' |
organization.id | ID of the original source organization of an event. | TeskaLabsCom.onmicrosoft.com |
recipient.address | E-mail address of original recipient of the message. | accounting@teskalabs.com |
sender.address | E-mail address of original sender of the message. | support@teskalabs.com |
source.ip | IP address of the source device or system. | 149.72.113.167 |
tenant | Tenant identification in each event. | default |
user.id | User identification of each event. | automata@teskalabs.com |
TeskaLabs LogMan.io Integration Service¶
LogMan.io Integ allows TeskaLabs LogMan.io to be integrated with supported external systems via expected message format and output/input protocol.
LogMan.io Integ utilizes declarative Transformers specified in YAML files and stored in a group (like integ etc.) in the LogMan.io Library.
Configuration¶
To connect LogMan.io with an external system like ArcSight,
configure an instance of LogMan.io Integ,
while specifying the transformer group,
Kafka input and Kafka or TCP output(s) in custom sections like MyTCPOutput
:
# Set transformers
[declarations]
library=/library
groups=integ
include_search_path=filters/*
# Set input
[connection:KafkaConnection]
bootstrap_servers=lm1:19092,lm2:29092,lm3:39092
[pipeline:TransformersPipeline:KafkaSource]
topic=lmio-events
# Set output(s) - they are selected by individual transformers
[MyTCPOutput]
address=lm1:9999
[MyKafkaOutput]
topic=lmio-output
Transformers¶
Transformers are declarative processors, that create their own dedicated pipeline with KafkaSource and output sink.
Example¶
---
define:
name: ArcSight Transformer for Correlations
type: transformer/default
format: string
output:
- type: kafka/tcp/unix-stream/null
config: MyTCPOutput
format: string/json
latch: 0
predicate:
!EQ
- !ITEM EVENT type
- Correlation
transform:
!JOIN
delimiter: ""
items:
- !JOIN
delimiter: "|"
items:
- CEF:0
- TeskaLabs
- LogMan.io
- 1.2.3
- !ITEM EVENT name
- !DICT.FORMAT
what: !EVENT
type: cef
Section define
¶
This section contains the common definition and meta data.
Item name
¶
Shorter human-readable name of this declaration.
Item type
¶
The type of this declaration, must be transformer/default
.
Item description
(optional)¶
Longed, possibly multiline, human-readable description of the declaration.
Item field_alias
(optional)¶
Name of the field alias lookup to be loaded, so that alias names of event attributes can be used in the declaration alongside their canonical names.
Section output
¶
Section output
specifies the lists of outputs.
Item type
¶
Type of the output, f. e. kafka, unix-stream, tcp, null.
Item format
¶
Format of the event produced by the transformator, either string
or json
(default: string
).
Item config
¶
Config section for the given output stored in .conf
file,
such as MyKafkaOutput
or MyTCPOutput
(see configuration section above).
Specify topic
for Kafka output and address
for TCP/Unix Stream.
Item latch
¶
If set, output events are stored in latch queue to be accessible via periodic API call to /latch
. The number of stored events is passed through value, f. e. latch: 30 will keep last 30 events for each transformer. Default: !!null.
Section predicate
(optional)¶
The predicate
filters incoming events using an expression.
If the expression returns True
, the event will enter transform
section.
If the expression returns False
, then the event is skipped.
Other returned values are undefined.
This section can be used to speed-up the integration by skipping lines with obviously non-relevant content.
Include of nested predicate filters¶
Predicate filters are expressions located in a dedicated file, that can be included in many different predicates as their parts.
If you want to include an external predicate filter, located either in include
or filters
folder
(this one is a global folder located at the top hierarchy of the LogMan.io library),
use !INCLUDE
statement:
!INCLUDE predicate_filter
where predicate_filter
is the name of the file plus .yaml
extension.
The content of predicate_filter.yaml
is an expression to be included, like:
---
!EQ
- !ITEM EVENT category
- "MyEventCategory"
Section transform
¶
This section specifies the actual transforming mechanism.
It expects a dictionary to be returned or None
, which means that the transforming was not successful.
Kafka Topics¶
LogMan.io default topics¶
The following topics are default LogMan.io topics used to pass processed events among individual components.
Kafka topic | Producer | Consumer(s) | Type of stored messages |
---|---|---|---|
received.<tenant>.<stream> |
LogMan.io Receiver | LogMan.io Parsec | raw logs / events |
events.<tenant>.<stream> |
LogMan.io Parsec | LogMan.io Depositor LogMan.io Baseliner | successfully parsed events. |
others.<tenant> |
LogMan.io Parsec | LogMan.io Depositor | unsuccessfully parsed events. |
complex.<tenant> |
LogMan.io Correlator | LogMan.io Watcher | |
lmio-alerts |
LogMan.io Alerts | ||
lmio-lookups |
LogMan.io Lookups | lookup events (information about update of a lookup item) | |
lmio-notifications |
|||
lmio-stores |
LogMan.io topics within Event Lane¶
After the provisioning of LogMan.io Collector, LogMan.io Receiver automatically creates RECEIVED
Kafka topic based on the tenant and stream.
This name is of the form received.<tenant>.<stream>
. This topic is unique for every Event Lane.
LogMan.io Parsec consumes from RECEIVED
topic and parses the messages. When the parsing succeeds, message is sent to EVENTS
topic, when it fails, message is sent to OTHERS topic.
EVENTS
topic has the form events.<tenant>.<stream>
and is unique for every Event Lane.
OTHERS
topic has the form others.<tenant>
and stores all incorrectly parsed messages, regardless of the Event Lane, unique for every tenant.
LogMan.io Depositor consumes from both EVENTS
and OTHERS
topics and sends the messages to the corresponding Elasticsearch indexes.
Example
Suppose the name of the tenant is hogwarts
and the name of the stream is microsoft-365
. Then the following topics are created:
received.hogwarts.microsoft-365
: The topic where raw (unparsed) events are stored.events.hogwarts.microsoft-365
: The topic where successfully parsed events are stored.others.hogwarts
: If some events are parsed unsuccessfully, they are stored in this topic.
TeskaLabs LogMan.io Library¶
A library of declaration is a folder on the filesystem that holds declarations for parsers, enrichers, correlators and other YAML elements such as !INCLUDE
files.
A library has a prescribed structure:
library/
<parser group 1>/
p01_<parser>.yaml
p02_<parser>.yaml
e01_<enricher>.yaml
e02_<enricher>.yaml
include/
head_parser.yaml
spec_parser.yaml
...
test/
test01.yaml
...
<parser group 2>/
<parser group 2>/
<correlator group 1>/
<correlator group 2>/
...
include/
The parser group is a set of parser and enricher declarations that is operated within the same parser type.
Naming pattern¶
The naming pattern eg. p01_<...>.yaml
is recommended because it provides the control over the order of execution and a visual differentiator between parsers and enrichers.
The order of files being loaded to the pipeline is alphabetical, thus parser with a name p01_<...>.yaml
will be loaded
into the pipeline before the p02_<...>.yaml
parser.
Including declarations in the library¶
Declarations such as declarations of parsers can include other declarations from library include directories using the !INCLUDE expression.
The include directories are specified in include_search_path
configuration option for LogMan.io Parser, Correlator etc.:
[declarations]
include_search_path=filters;filters/firewall;filters/common;filters/authentication
By specifying asterisk *
after a slash, all subdirectories will be recursively included,
so that user does not have to specify each of them in the include_search_path
option:
[declarations]
include_search_path=filters/*
By default, the following include search path are always also implicitly included:
library/<group>/include
is the implicit location of the !INCLUDE
YAML files used within a parser group.
library/include
is the location of the !INCLUDE
YAML files used globally.
Declaration named predicate_filter.yaml
located in one of the include search path directories can then included in the following way:
predicate:
!AND
- !EQ
- !ITEM EVENT Type
- UseIt
- !INCLUDE predicate_filter
For more information, see Cascade Parser and Window Correlator sections.
Unit tests¶
library/<* group/test
is location of the unit test for the given group, see lmio-parser
and lmio-correlator
for more details about how to approach unit tests of the library.
The library is designed to be easily manageable by a version control systems such as Git or Subversion.
Naming Standards¶
Product repositories¶
Product repositories contain source codes of the all LogMan.io components (data pumps, services, UI etc.).
Names of product repositories,that are always developed and maintained by TeskaLabs,
always start with lmio-
prefix.
Product repositories are not to be modified by customers nor partners.
Site config repositories¶
Site config repositories contain site configurations of every deployed LogMan.io component, base technology as well as Docker Compose file(s) for the given deployment on a given server.
Site config repositories can also be maintained by a partner or a customer.
Their names always start with site-
prefix.
Partners' repositories¶
Repositories maintained by a partner or a customer always contain their lowercase name after the prefix, separated by dashes.
Hence, for site repositories, the format looks as follows:
site-<partner_name>-<location>
Where location is the deployment location (name of the company that manages the server).
The location, if generally known, can be described in other explicit manner, such
as build-env
to signify the build environment.
Notes¶
Keyword environment
is always shortened to env
.
Keyword encrypt
is always shortened to enc
.
Multi-tenancy¶
TeskaLabs LogMan.io is a multi-tenancy based product, which requires to specify tenant (customer) identification in each log event, which can be done automatically using LogMan.io Collector.
Each LogMan.io Parser instance is tenant-specific as well as indices stored in ElasticSearch. Thus, only logs belonging to the given tenant are displayed to the given tenant in Kibana or LogMan.io UI.
Also, flow monitoring metrics in Grafana are tenant based and thus allow to monitor the flow of logs for each individual customer.
Naming standard¶
Because of TeskaLabs LogMan.io technology base, all tenant identifications must be lowercase without any special characters (such as #, * etc.).
Tools¶
List of 3rd party OpenSource tool that can be integrated with TeskaLabs LogMan.io. Once enabled, they are available at "Tools" section of the LogMan.io.
Cyber security¶
Kibana¶
Kibana is open software for vizualization of incoming logs. It's one of the Elastic tool. Cyber security teams uses Kibana mainly for analytical tasks.
More info at Kibana Guide
TheHive¶
TheHive is security incident response platform.
More info at thehive-project.org
MISP¶
MISP is open software for correlating, storing and sharing cyber security indicators, malware analysis, etc.
More info at misp-project.org
Data science¶
Jupyter Notebook¶
JupyterLab is the web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.
More info at Jupyter Notebook
Technical support¶
ZooNavigator¶
ZooKeeper browser and editor.
https://github.com/elkozmon/zoonavigator
Grafana¶
Observability and data visualization platform. For metrics visualization.
https://github.com/grafana/grafana
KafDrop¶
Kafka Web UI
Overview of network ports¶
This chapter contains an overview of the network ports used by LogMan.io. The majority of ports are internal, accessible only from the internal network of the cluster.
Internal network¶
Port | Protocol | Component |
---|---|---|
tcp/8890 | HTTP | NGINX (Internal API gateway) |
tcp/8891 | HTTP | asab-remote-control |
tcp/8892 | HTTP | asab-governator |
tcp/8893 | HTTP | asab-library |
tcp/8894 | HTTP | asab-config |
tcp/8895 | HTTP | asab-pyppeteer |
tcp/8896 | HTTP | asab-iris |
tcp/8900 | HTTP | seacat-auth (Private API) |
tcp/8950 | HTTP | lmio-receiver (Private API) |
tcp/8951 | HTTP | lmio-ipaddrproc |
tcp/8952 | HTTP | lmio-watcher |
tcp/8953 | HTTP | lmio-alerts |
tcp/8954 | HTTP | lmio-elman |
tcp/8955 | HTTP | lmio-lookupbuilder |
tcp/8956 | HTTP | lmio-ipaddrproc |
tcp/8957 | HTTP | lmio-correlator-builder |
tcp/8958 | HTTP | lmio-charts |
tcp/3443 | HTTPS | lmio-receiver (Public API) |
tcp/3080 | HTTP | lmio-receiver (Public API) |
tcp/3081 | HTTP | seacat-auth (Public API) |
tcp/8790 | HTTP | bs-query |
tcp/8810 | HTTP | ACME.sh |
tcp/9092 | Kafka | Apache Kafka |
tcp/9000 | HTTP | Kafdrop |
tcp/2181 | ZAB | Apache Zookeeper |
tcp/9001 | HTTP | Zoonavigator |
tcp/8086 | HTTP | InfluxDB |
tcp/8888 | HTTP | Jupyter Notebook |
tcp/5601 | HTTP | Kibana |
tcp/3000 | HTTP | Grafana |
tcp/27017 | proprietary | MongoDB |