Skip to content

Exports and Library

There are three types of Library artifacts used by the BS-Query application.

DataSources

Declarations of data sources are vital for BS-Query functionality. There are no exports without a data source.

BS-Query application supports the following types of data source declarations:

  • datasource/elasticsearch
  • datasource/pyppeteer

/DataSources/elasticsearch.yaml

define:
    type: datasource/elasticsearch

specification:
    index: lmio-{{tenant}}-events*

request:
    <key>: <value>
    <key>: <value>

query_params:
    <key>: <value>
    <key>: <value>

define

  • type: a technical name that helps to find the DataSource declaration in the Library.

specification

  • index: a collection of JSON documents in Elasticsearch. Each document is a set of fields that contain data presented as key-value pairs. For more detailed expalanation refer to this article.

There is also a number of other items that can be configured in a DataSource declaration. Those are standard Elasticsearch API parameters through which you can fine-tune your declaration template to determine specific content of the requested data and/or actions performed on it. One such parameter is size - the number of matching documents to be returned in one request.

request

For more details, please refer to Elastic documentation.

query_params

For more details, please refer to Elastic documentation.

Exports

Export declarations specify how to retrieve data from a data source. The YAML file contains the following sections:

define

The define section includes the following parameters:

  • name: The name of the export.
  • datasource: The name of the DataSource declaration in the library, specified as an absolute path to the library.
  • output: The output format for the export. Available options are "raw", "csv", and "xlsx" for ES DataSources, and "raw" for Kafka DataSources.
  • header: When using "csv" or "xlsx" output, you must specify the header of the resulting table as an array of column names. These will appear in the same order as they are listed.
  • schedule- There are three options how to schedule an export - datetime in a format "%Y-%m-%d %H:%M" (e.g. 2023-01-01 00:00) - timestamp as integer (e.g. 1674482460) - cron - you can refer to http://en.wikipedia.org/wiki/Cron for more details
  • schema: Schema in which the export should be run.

Schema

There is always a schema configured for each tenant. (See Tenants section of configuration). The export declaration can state a schema to which it belongs. If the schema of the export declaration des not match the tenant schema configuration, the export stops executing.

target

The target section includes the following parameters:

  • type: An array of target types for the export. Possible options are "download", "email", and "jupyter". "download" is always selected if the target section is missing.
  • email: For email target type, you must specify at least the to field, which is an array of recipient email addresses. Other optional fields include: - cc: an array of CC recipients - bcc: an array of BCC recipients - from: the sender's email address (string) - subject: the subject of the email (string) - body: a file name (with suffix) stored in the Template folder of the library, used as the email body template. You can also add special parameters to be used in the template. Otherwise, use any keyword from the define section of your export as a template parameter (for any export it is: name, datasource, output, for specific exports, you can also use parameters. compression, header, schedule, timezone, tenant).

query

The query field must be a string.

Tip

In addition to these parameters, you can use keywords specific to the data source declaration in your export declaration. If there are any conflicts, the data source declaration will take precedence.

schema

You can add partial schema that overrides common schema configured.

This feature allows schema-based transformations on exported data. This comprises:

  • Conversion from timstamp to human readable date format, where schema specifies datetime type
  • Deidentification

/Exports/example_export.yaml

define:
    name: Export e-mail test
    datasource: elasticsearch
    output: csv
    header: ["@timestamp", "event.dataset", "http.response.status_code", "host.hostname", "http.request.method", "url.original"]

target: 
    type: ["email"]
    email: 
        to:
        - john.doe@teskalabs.com

query: >
    {
        "bool": {
            "filter": [{
                "prefix": {
                    "http.version": {
                        "value": "HTTP/1.1"
                    }
                }
            }]
        }
    }

schema:
    fields:
        user.name:
            deidentification:
                method: hash

        source.address:
            deidentification:
                method: ip

Templates

The Templates section of the Library is used when sending Exports by e-mail. The e-mail body must be based on a template. Place a custom template to the Templates/Email directory. You can use jinja templating in these files. See jinja docs for more info. All keys from the export declaration can be used as jinja variables.