Skip to main content

parquet_encode

EXPERIMENTAL

This component is experimental and therefore subject to change or removal outside of major version releases.

Encodes Parquet files from a batch of structured messages.

Introduced in version 4.4.0.

# Config fields, showing default values
label: ""
parquet_encode:
schema: []
default_compression: uncompressed

This processor uses https://github.com/segmentio/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases.

Examples

In this example we use the batching mechanism of an aws_s3 output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it.

output:
aws_s3:
bucket: TODO
path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet'
batching:
count: 1000
period: 10s
processors:
- parquet_encode:
schema:
- name: id
type: INT64
- name: weight
type: DOUBLE
- name: content
type: BYTE_ARRAY
default_compression: zstd

Fields

schema

Sorry! This field is missing documentation.

Type: array

schema[].name

The name of the column.

Type: string

schema[].type

The type of the column, only applicable for leaf columns with no child fields.

Type: string
Options: BOOLEAN, INT32, INT64, FLOAT, DOUBLE, BYTE_ARRAY.

schema[].repeated

Whether the field is repeated.

Type: bool
Default: false

schema[].optional

Whether the field is optional.

Type: bool
Default: false

schema[].fields

A list of child fields.

Type: array

# Examples

fields:
- name: foo
type: INT64
- name: bar
type: BYTE_ARRAY

default_compression

The default compression type to use for fields.

Type: string
Default: "uncompressed"
Options: uncompressed, snappy, gzip, brotli, zstd, lz4raw.