s3

Downloads objects within an Amazon S3 bucket, optionally filtered by a prefix. If an SQS queue has been configured then only object keys read from the queue will be downloaded.

# Common config fields, showing default values
input:
s3:
bucket: ""
prefix: ""
sqs_url: ""
sqs_body_path: Records.*.s3.object.key
sqs_bucket_path: ""
sqs_envelope_path: ""
region: eu-west-1

If an SQS queue is not specified the entire list of objects found when this input starts will be consumed. Note that the prefix configuration is only used when downloading objects without SQS configured.

If your bucket is configured to send events directly to an SQS queue then you need to set the sqs_body_path field to a dot path where the object key is found in the payload. However, it is also common practice to send bucket events to an SNS topic which sends enveloped events to SQS, in which case you must also set the sqs_envelope_path field to where the payload can be found.

When using SQS events it's also possible to extract target bucket names from the events by specifying a path in the field sqs_bucket_path. For each SQS event, if that path exists and contains a string it will used as the bucket of the download instead of the bucket field.

Here is a guide for setting up an SQS queue that receives events for new S3 bucket objects:

https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html

WARNING: When using SQS please make sure you have sensible values for sqs_max_messages and also the visibility timeout of the queue itself.

When Benthos consumes an S3 item as a result of receiving an SQS message the message is not deleted until the S3 item has been sent onwards. This ensures at-least-once crash resiliency, but also means that if the S3 item takes longer to process than the visibility timeout of your queue then the same items might be processed multiple times.

Credentials

By default Benthos will use a shared credentials file when connecting to AWS services. It's also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in this document.

Metadata

This input adds the following metadata fields to each message:

- s3_key
- s3_bucket
- s3_last_modified_unix*
- s3_last_modified (RFC3339)*
- s3_content_type*
- s3_content_encoding*
- All user defined metadata*
* Only added when NOT using download manager

You can access these metadata fields using function interpolation.

Fields

bucket

The bucket to consume from. If sqs_bucket_path is set this field is still required as a fallback.

Type: string
Default: ""

prefix

An optional path prefix, if set only objects with the prefix are consumed. This field is ignored when SQS is used.

Type: string
Default: ""

sqs_url

An optional SQS URL to connect to. When specified this queue will control which objects are downloaded from the target bucket.

Type: string
Default: ""

sqs_body_path

A dot path whereby object keys are found in SQS messages, this field is only required when an sqs_url is specified.

Type: string
Default: "Records.*.s3.object.key"

sqs_bucket_path

An optional dot path whereby the bucket of an object can be found in consumed SQS messages.

Type: string
Default: ""

sqs_envelope_path

An optional dot path of enveloped payloads to extract from SQS messages. This is required when pushing events from S3 to SNS to SQS.

Type: string
Default: ""

sqs_max_messages

The maximum number of SQS messages to consume from each request.

Type: number
Default: 10

sqs_endpoint

A custom endpoint to use when connecting to SQS.

Type: string
Default: ""

region

The AWS region to target.

Type: string
Default: "eu-west-1"

endpoint

Allows you to specify a custom endpoint for the AWS API.

Type: string
Default: ""

credentials

Optional manual configuration of AWS credentials to use. More information can be found in this document.

Type: object
Default: {"id":"","profile":"","role":"","role_external_id":"","secret":"","token":""}

credentials.profile

A profile from ~/.aws/credentials to use.

Type: string
Default: ""

credentials.id

The ID of credentials to use.

Type: string
Default: ""

credentials.secret

The secret for the credentials being used.

Type: string
Default: ""

credentials.token

The token for the credentials being used, required when using short term credentials.

Type: string
Default: ""

credentials.role

A role ARN to assume.

Type: string
Default: ""

credentials.role_external_id

An external ID to provide when assuming a role.

Type: string
Default: ""

retries

The maximum number of times to attempt an object download.

Type: number
Default: 3

force_path_style_urls

Forces the client API to use path style URLs, which helps when connecting to custom endpoints.

Type: bool
Default: false

delete_objects

Whether to delete downloaded objects from the bucket.

Type: bool
Default: false

download_manager

Controls if and how to use the download manager API. This can help speed up file downloads, but results in file metadata not being copied.

Type: object
Default: {"enabled":true}

download_manager.enabled

Whether to use to download manager API.

Type: bool
Default: true

timeout

The period of time to wait before abandoning a request and trying again.

Type: string
Default: "5s"