Exports and Library¶
There are three types of Library artifacts used by the BS-Query application.
DataSources¶
Declarations of data sources are vital for BS-Query functionality. There are no exports without a data source.
BS-Query application supports the following types of data source declarations:
datasource/elasticsearch
datasource/pyppeteer
/DataSources/elasticsearch.yaml
define:
type: datasource/elasticsearch
specification:
index: lmio-{{tenant}}-events*
request:
<key>: <value>
<key>: <value>
query_params:
<key>: <value>
<key>: <value>
define¶
type
: a technical name that helps to find the DataSource declaration in the Library.
specification¶
index
: a collection of JSON documents in Elasticsearch. Each document is a set of fields that contain data presented as key-value pairs. For more detailed expalanation refer to this article.
There is also a number of other items that can be configured in a DataSource declaration. Those are standard Elasticsearch API parameters through which you can fine-tune your declaration template to determine specific content of the requested data and/or actions performed on it. One such parameter is size
- the number of matching documents to be returned in one request.
request¶
For more details, please refer to Elastic documentation.
query_params¶
For more details, please refer to Elastic documentation.
Exports¶
Export declarations specify how to retrieve data from a data source. The YAML file contains the following sections:
define¶
The define section includes the following parameters:
name
: The name of the export.datasource
: The name of the DataSource declaration in the library, specified as an absolute path to the library.output
: The output format for the export. Available options are "raw", "csv", and "xlsx" for ES DataSources, and "raw" for Kafka DataSources.header
: When using "csv" or "xlsx" output, you must specify the header of the resulting table as an array of column names. These will appear in the same order as they are listed.schedule
- There are three options how to schedule an export - datetime in a format "%Y-%m-%d %H:%M" (e.g. 2023-01-01 00:00) - timestamp as integer (e.g. 1674482460) - cron - you can refer to http://en.wikipedia.org/wiki/Cron for more detailsschema
: Schema in which the export should be run.
Schema
There is always a schema configured for each tenant. (See Tenants
section of configuration). The export declaration can state a schema to which it belongs.
If the schema of the export declaration des not match the tenant schema configuration, the export stops executing.
target¶
The target section includes the following parameters:
type
: An array of target types for the export. Possible options are "download", "email", and "jupyter". "download" is always selected if the target section is missing.email
: For email target type, you must specify at least theto
field, which is an array of recipient email addresses. Other optional fields include: -cc
: an array of CC recipients -bcc
: an array of BCC recipients -from
: the sender's email address (string) -subject
: the subject of the email (string) -body
: a file name (with suffix) stored in the Template folder of the library, used as the email body template. You can also add specialparameters
to be used in the template. Otherwise, use any keyword from the define section of your export as a template parameter (for any export it is:name
,datasource
,output
, for specific exports, you can also use parameters.compression
,header
,schedule
,timezone
,tenant
).
query¶
The query field must be a string.
Tip
In addition to these parameters, you can use keywords specific to the data source declaration in your export declaration. If there are any conflicts, the data source declaration will take precedence.
schema¶
You can add partial schema that overrides common schema configured.
This feature allows schema-based transformations on exported data. This comprises:
- Conversion from timstamp to human readable date format, where schema specifies
datetime
type - Deidentification
/Exports/example_export.yaml
define:
name: Export e-mail test
datasource: elasticsearch
output: csv
header: ["@timestamp", "event.dataset", "http.response.status_code", "host.hostname", "http.request.method", "url.original"]
target:
type: ["email"]
email:
to:
- john.doe@teskalabs.com
query: >
{
"bool": {
"filter": [{
"prefix": {
"http.version": {
"value": "HTTP/1.1"
}
}
}]
}
}
schema:
fields:
user.name:
deidentification:
method: hash
source.address:
deidentification:
method: ip
Templates¶
The Templates
section of the Library is used when sending Exports by e-mail. The e-mail body must be based on a template. Place a custom template to the Templates/Email
directory. You can use jinja templating in these files. See jinja docs for more info. All keys from the export declaration can be used as jinja variables.