awk

Executes an AWK program on messages. This processor is very powerful as it offers a range of custom functions for querying and mutating message contents and metadata.

awk:
codec: text
program: BEGIN { x = 0 } { print $0, x; x++ }

Works by feeding contents as the input based on a codec and replaces the contents with the result. If the result is empty (nothing is printed by the program) then the original message contents remain unchanged.

Comes with a wide range of custom functions for accessing message metadata, json fields, printing logs, etc. These functions can be overridden by functions within the program.

Fields

codec

string A codec defines how messages should be inserted into the AWK program as variables. The codec does not change which custom Benthos functions are available. The text codec is the closest to a typical AWK use case.

Options are: none, text, json.

program

string An AWK program to execute

parts

array An optional array of message indexes of a batch that the processor should apply to. If left empty all messages are processed. This field is only applicable when batching messages at the input level.

Indexes can be negative, and if so the part will be selected from the end counting backwards starting from -1.

Codecs

A codec can be specified that determines how the contents of the message are fed into the program. This does not change the custom functions.

none

An empty string is fed into the program. Functions can still be used in order to extract and mutate metadata and message contents. This is useful for when your program only uses functions and doesn't need the full text of the message to be parsed by the program.

text

The full contents of the message are fed into the program as a string, allowing you to reference tokenised segments of the message with variables ($0, $1, etc). Custom functions can still be used with this codec.

This is the default codec as it behaves most similar to typical usage of the awk command line tool.

json

No contents are fed into the program. Instead, variables are extracted from the message by walking the flattened JSON structure. Each value is converted into a variable by taking its full path, e.g. the object:

{
"foo": {
"bar": {
"value": 10
},
"created_at": "2018-12-18T11:57:32"
}
}

Would result in the following variable declarations:

foo_bar_value = 10
foo_created_at = "2018-12-18T11:57:32"

Custom functions can also still be used with this codec.

AWK Functions

json_get(path)

Attempts to find a JSON value in the input message payload by a dot separated path and returns it as a string. This function is always available even when the json codec is not used.

json_set(path, value)

Attempts to set a JSON value in the input message payload identified by a dot separated path, the value argument will be interpreted as a string. This function is always available even when the json codec is not used.

In order to set non-string values use one of the following typed varieties:

  • json_set_int(path, value)
  • json_set_float(path, value)
  • json_set_bool(path, value)

json_delete(path)

Attempts to delete a JSON field from the input message payload identified by a dot separated path. This function is always available even when the json codec is not used.

create_json_object(key1, val1, key2, val2, ...)

Generates a valid JSON object of key value pair arguments. The arguments are variadic, meaning any number of pairs can be listed. The value will always resolve to a string regardless of the value type. E.g. the following call:

create_json_object("a", "1", "b", 2, "c", "3")

Would result in this string:

{"a":"1","b":"2","c":"3"}

create_json_array(val1, val2, ...)

Generates a valid JSON array of value arguments. The arguments are variadic, meaning any number of values can be listed. The value will always resolve to a string regardless of the value type. E.g. the following call:

create_json_array("1", 2, "3")

Would result in this string:

["1","2","3"]

metadata_set(key, value)

Set a metadata key for the message to a value. The value will always resolve to a string regardless of the value type.

metadata_get(key) string

Get the value of a metadata key from the message.

timestamp_unix() int

Returns the current unix timestamp (the number of seconds since 01-01-1970).

timestamp_unix(date) int

Attempts to parse a date string by detecting its format and returns the equivalent unix timestamp (the number of seconds since 01-01-1970).

timestamp_unix(date, format) int

Attempts to parse a date string according to a format and returns the equivalent unix timestamp (the number of seconds since 01-01-1970).

The format is defined by showing how the reference time, defined to be Mon Jan 2 15:04:05 -0700 MST 2006 would be displayed if it were the value.

timestamp_unix_nano() int

Returns the current unix timestamp in nanoseconds (the number of nanoseconds since 01-01-1970).

timestamp_unix_nano(date) int

Attempts to parse a date string by detecting its format and returns the equivalent unix timestamp in nanoseconds (the number of nanoseconds since 01-01-1970).

timestamp_unix_nano(date, format) int

Attempts to parse a date string according to a format and returns the equivalent unix timestamp in nanoseconds (the number of nanoseconds since 01-01-1970).

The format is defined by showing how the reference time, defined to be Mon Jan 2 15:04:05 -0700 MST 2006 would be displayed if it were the value.

timestamp_format(unix, format) string

Formats a unix timestamp. The format is defined by showing how the reference time, defined to be Mon Jan 2 15:04:05 -0700 MST 2006 would be displayed if it were the value.

The format is optional, and if omitted RFC3339 (2006-01-02T15:04:05Z07:00) will be used.

timestamp_format_nano(unixNano, format) string

Formats a unix timestamp in nanoseconds. The format is defined by showing how the reference time, defined to be Mon Jan 2 15:04:05 -0700 MST 2006 would be displayed if it were the value.

The format is optional, and if omitted RFC3339 (2006-01-02T15:04:05Z07:00) will be used.

print_log(message, level)

Prints a Benthos log message at a particular log level. The log level is optional, and if omitted the level INFO will be used.