airflow.providers.amazon.aws.operators.bedrock¶

类¶

`BedrockInvokeModelOperator`	调用指定的 Bedrock 模型以使用提供的输入运行推理。
`BedrockCustomizeModelOperator`	创建微调作业以自定义基础模型。
`BedrockCreateProvisionedModelThroughputOperator`	创建微调作业以自定义基础模型。
`BedrockCreateKnowledgeBaseOperator`	创建包含 Amazon Bedrock LLMs 和 Agents 使用的数据源的知识库。
`BedrockCreateDataSourceOperator`	设置一个 Amazon Bedrock 数据源以添加到 Amazon Bedrock 知识库。
`BedrockIngestDataOperator`	开始一个摄取作业，其中 Amazon Bedrock 数据源被添加到 Amazon Bedrock 知识库。
`BedrockRaGOperator`	查询知识库并根据检索到的结果生成响应，并附带来源引用。
`BedrockRetrieveOperator`	查询知识库并检索结果，并附带来源引用。
`BedrockBatchInferenceOperator`	创建批量推理作业以在多个提示上调用模型。

模块内容¶

class airflow.providers.amazon.aws.operators.bedrock.BedrockInvokeModelOperator(model_id, input_data, content_type=None, accept_type=None, **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockRuntimeHook]

调用指定的 Bedrock 模型以使用提供的输入运行推理。

使用 InvokeModel 对文本模型、图像模型和嵌入模型运行推理。有关不同模型的 input_data 字段的格式和内容，请参阅推理参数文档。

另请参阅

有关如何使用此操作器的更多信息，请参阅指南：调用现有的 Amazon Bedrock 模型

参数:

model_id (str) – Bedrock 模型的 ID。(模板化)
input_data (dict[str, Any]) – 输入数据，格式由 content-type 请求头指定。(模板化)
content_type (str | None) – 请求中输入数据的 MIME 类型。(模板化) 默认值: application/json
accept – 响应中推理体的所需 MIME 类型。(模板化) 默认值: application/json
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

model_id[source]¶

input_data[source]¶

content_type = None[source]¶

accept_type = None[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

class airflow.providers.amazon.aws.operators.bedrock.BedrockCustomizeModelOperator(job_name, custom_model_name, role_arn, base_model_id, training_data_uri, output_data_uri, hyperparameters, ensure_unique_job_name=True, customization_job_kwargs=None, wait_for_completion=True, waiter_delay=120, waiter_max_attempts=75, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockHook]

创建微调作业以自定义基础模型。

另请参阅

有关如何使用此操作器的更多信息，请参阅指南：自定义现有的 Amazon Bedrock 模型

参数:

job_name (str) – 微调作业的唯一名称。
custom_model_name (str) – 正在创建的自定义模型的名称。
role_arn (str) – Amazon Bedrock 可以代您执行任务的 IAM 角色的 Amazon 资源名称 (ARN)。
base_model_id (str) – 基础模型的名称。
training_data_uri (str) – 训练数据存储所在的 S3 URI。
output_data_uri (str) – 输出数据存储所在的 S3 URI。
hyperparameters (dict[str, str]) – 与调整模型相关的参数。
ensure_unique_job_name (bool) – 如果设置为 True，操作器将检查配置中指定的名称是否存在模型自定义作业，如果名称冲突，则附加当前时间戳。(默认值: True)
customization_job_kwargs (dict[str, Any] | None) – 传递给 API 的任何可选参数。
wait_for_completion (bool) – 是否等待集群停止。(默认值: True)
waiter_delay (int) – 两次状态检查之间的等待时间（秒）。(默认值: 120)
waiter_max_attempts (int) – 检查作业完成状态的最大尝试次数。(默认值: 75)
deferrable (bool) – 如果为 True，操作器将异步等待集群停止。这意味着等待完成。此模式需要安装 aiobotocore 模块。(默认值: False)
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

wait_for_completion = True[source]¶

waiter_delay = 120[source]¶

waiter_max_attempts = 75[source]¶

deferrable = True[source]¶

job_name[source]¶

custom_model_name[source]¶

role_arn[source]¶

base_model_id[source]¶

training_data_config[source]¶

output_data_config[source]¶

hyperparameters[source]¶

ensure_unique_job_name = True[source]¶

customization_job_kwargs[source]¶

valid_action_if_job_exists: set[str][source]¶

execute_complete(context, event=None)[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

class airflow.providers.amazon.aws.operators.bedrock.BedrockCreateProvisionedModelThroughputOperator(model_units, provisioned_model_name, model_id, create_throughput_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockHook]

创建微调作业以自定义基础模型。

另请参阅

有关如何使用此操作器的更多信息，请参阅指南：为现有的 Amazon Bedrock 模型预置吞吐量

参数:

model_units (int) – 要分配的模型单元数。(模板化)
provisioned_model_name (str) – 此预置吞吐量的唯一名称。(模板化)
model_id (str) – 与此预置吞吐量关联的模型的名称或 ARN。(模板化)
create_throughput_kwargs (dict[str, Any] | None) – 传递给 API 的任何可选参数。
wait_for_completion (bool) – 是否等待集群停止。(默认值: True)
waiter_delay (int) – 两次状态检查之间的等待时间（秒）。(默认值: 60)
waiter_max_attempts (int) – 检查作业完成状态的最大尝试次数。(默认值: 20)
deferrable (bool) – 如果为 True，操作器将异步等待集群停止。这意味着等待完成。此模式需要安装 aiobotocore 模块。(默认值: False)
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

model_units[source]¶

provisioned_model_name[source]¶

model_id[source]¶

create_throughput_kwargs[source]¶

wait_for_completion = True[source]¶

waiter_delay = 60[source]¶

waiter_max_attempts = 20[source]¶

deferrable = True[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

execute_complete(context, event=None)[source]¶

class airflow.providers.amazon.aws.operators.bedrock.BedrockCreateKnowledgeBaseOperator(name, embedding_model_arn, role_arn, storage_config, create_knowledge_base_kwargs=None, wait_for_indexing=True, indexing_error_retry_delay=5, indexing_error_max_attempts=20, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentHook]

创建包含 Amazon Bedrock LLMs 和 Agents 使用的数据源的知识库。

要创建知识库，您必须首先设置数据源并配置支持的向量存储。

另请参阅

有关如何使用此操作器的更多信息，请参阅指南：创建 Amazon Bedrock 知识库

参数:

name (str) – 知识库的名称。(模板化)
embedding_model_arn (str) – 用于为知识库创建向量嵌入的模型的 ARN。(模板化)
role_arn (str) – 拥有创建知识库权限的 IAM 角色的 ARN。(模板化)
storage_config (dict[str, Any]) – 知识库使用的矢量数据库的配置详情。(模板化)
wait_for_indexing (bool) – 矢量索引可能需要一些时间，且在尝试创建知识库之前没有明确的方法检查其状态。如果此参数为 True，且创建因索引不可用而失败，操作符将等待并重试。(默认值: True) (模板化)
indexing_error_retry_delay (int) – 遇到索引错误时，重试之间的秒数。(默认值 5) (模板化)
indexing_error_max_attempts (int) – 遇到索引错误时，最大重试次数。(默认值 20) (模板化)
create_knowledge_base_kwargs (dict[str, Any] | None) – 传递给 API 调用的任何其他可选参数。(模板化)
wait_for_completion (bool) – 是否等待集群停止。(默认值: True)
waiter_delay (int) – 两次状态检查之间的等待时间（秒）。(默认值: 60)
waiter_max_attempts (int) – 检查作业完成状态的最大尝试次数。(默认值: 20)
deferrable (bool) – 如果为 True，操作器将异步等待集群停止。这意味着等待完成。此模式需要安装 aiobotocore 模块。(默认值: False)
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

name[source]¶

role_arn[source]¶

storage_config[source]¶

create_knowledge_base_kwargs[source]¶

embedding_model_arn[source]¶

knowledge_base_config[source]¶

wait_for_indexing = True[source]¶

indexing_error_retry_delay = 5[source]¶

indexing_error_max_attempts = 20[source]¶

wait_for_completion = True[source]¶

waiter_delay = 60[source]¶

waiter_max_attempts = 20[source]¶

deferrable = True[source]¶

execute_complete(context, event=None)[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

class airflow.providers.amazon.aws.operators.bedrock.BedrockCreateDataSourceOperator(name, knowledge_base_id, bucket_name=None, create_data_source_kwargs=None, **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentHook]

设置一个 Amazon Bedrock 数据源以添加到 Amazon Bedrock 知识库。

另请参阅

有关如何使用此操作符的更多信息，请参阅指南：创建 Amazon Bedrock 数据源

参数:

name (str) – 要创建的 Amazon Bedrock 数据源的名称。(模板化)。
bucket_name (str | None) – 用于数据源存储的 Amazon S3 桶的名称。(模板化)
knowledge_base_id (str) – 要添加数据源的知识库的唯一标识符。(模板化)
create_data_source_kwargs (dict[str, Any] | None) – 传递给 API 调用的任何其他可选参数。(模板化)
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

name[source]¶

knowledge_base_id[source]¶

bucket_name = None[source]¶

create_data_source_kwargs[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

class airflow.providers.amazon.aws.operators.bedrock.BedrockIngestDataOperator(knowledge_base_id, data_source_id, ingest_data_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=10, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentHook]

开始一个摄取作业，其中 Amazon Bedrock 数据源被添加到 Amazon Bedrock 知识库。

另请参阅

有关如何使用此操作符的更多信息，请参阅指南：将数据摄取到 Amazon Bedrock 数据源

参数:

knowledge_base_id (str) – 要添加数据源的知识库的唯一标识符。(模板化)
data_source_id (str) – 要摄取数据的数据源的唯一标识符。(模板化)
ingest_data_kwargs (dict[str, Any] | None) – 传递给 API 调用的任何其他可选参数。(模板化)
wait_for_completion (bool) – 是否等待集群停止。(默认值: True)
waiter_delay (int) – 两次状态检查之间的等待时间（秒）。(默认值: 60)
waiter_max_attempts (int) – 检查作业完成情况的最大尝试次数。(默认值: 10)
deferrable (bool) – 如果为 True，操作器将异步等待集群停止。这意味着等待完成。此模式需要安装 aiobotocore 模块。(默认值: False)
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

knowledge_base_id[source]¶

data_source_id[source]¶

ingest_data_kwargs[source]¶

wait_for_completion = True[source]¶

waiter_delay = 60[source]¶

waiter_max_attempts = 10[source]¶

deferrable = True[source]¶

execute_complete(context, event=None)[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

class airflow.providers.amazon.aws.operators.bedrock.BedrockRaGOperator(input, source_type, model_arn, prompt_template=None, knowledge_base_id=None, vector_search_config=None, sources=None, rag_kwargs=None, **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentRuntimeHook]

查询知识库并根据检索到的结果生成响应，并附带来源引用。

注意: 对 EXTERNAL SOURCES 的支持已在 botocore 1.34.90 中添加

另请参阅

有关如何使用此操作符的更多信息，请参阅指南：Amazon Bedrock 检索增强生成 (RaG)

参数:

input (str) – 向知识库发起的查询。(模板化)
source_type (str) – 请求查询的资源类型。(模板化) 必须是“KNOWLEDGE_BASE”或“EXTERNAL_SOURCES”之一，并且必须提供相应的配置值。如果设置为“KNOWLEDGE_BASE”，则必须提供 knowledge_base_id，并且可以提供 vector_search_config。如果设置为 EXTERNAL_SOURCES，则还必须提供 sources。注意: 对 EXTERNAL SOURCES 的支持已在 botocore 1.34.90 中添加。
model_arn (str) – 用于生成响应的基础模型的 ARN。(模板化)
prompt_template (str | None) – 用于发送到模型生成响应的提示模板。您可以包含提示占位符，这些占位符在发送到模型之前会被替换，以向模型提供指令和上下文。此外，您可以包含 XML 标签来划分提示模板中有意义的部分。(模板化)
knowledge_base_id (str | None) – 被查询的知识库的唯一标识符。(模板化) 仅当 source_type='KNOWLEDGE_BASE' 时可以指定。
vector_search_config (dict[str, Any] | None) – 矢量搜索结果的返回方式。(模板化) 仅当 source_type='KNOWLEDGE_BASE' 时可以指定。更多信息，请参阅 https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html。
sources (list[dict[str, Any]] | None) – 用作响应参考的文档。(模板化) 仅当 source_type='EXTERNAL_SOURCES' 时可以指定。注意: 对 EXTERNAL SOURCES 的支持已在 botocore 1.34.90 中添加。
rag_kwargs (dict[str, Any] | None) – 传递给 API 调用的额外关键字参数。(模板化)

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

input[source]¶

prompt_template = None[source]¶

source_type[source]¶

knowledge_base_id = None[source]¶

model_arn[source]¶

vector_search_config = None[source]¶

sources = None[source]¶

rag_kwargs[source]¶

validate_inputs()[source]¶

build_rag_config()[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

类 airflow.providers.amazon.aws.operators.bedrock.BedrockRetrieveOperator(retrieval_query, knowledge_base_id, vector_search_config=None, retrieve_kwargs=None, **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentRuntimeHook]

查询知识库并检索结果，并附带来源引用。

另请参阅

关于如何使用此运算符的更多信息，请参阅指南： Amazon Bedrock Retrieve

参数:

retrieval_query (str) – 要对知识库发出的查询。(templated)
knowledge_base_id (str) – 被查询的知识库的唯一标识符。(templated)
vector_search_config (dict[str, Any] | None) – 如何返回向量搜索结果。(templated) 更多信息，请参阅 https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html。
retrieve_kwargs (dict[str, Any] | None) – 要传递给 API 调用的额外关键字参数。(templated)

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

retrieval_query[source]¶

knowledge_base_id[source]¶

vector_search_config = None[source]¶

retrieve_kwargs[source]¶

execute(context)[source]¶

创建操作器时派生。

Context 是与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

类 airflow.providers.amazon.aws.operators.bedrock.BedrockBatchInferenceOperator(job_name, role_arn, model_id, input_uri, output_uri, invoke_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

基类: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockHook]

创建批量推理作业以在多个提示上调用模型。

另请参阅

关于如何使用此运算符的更多信息，请参阅指南： Create an Amazon Bedrock Batch Inference Job

参数:

job_name (str) – 批量推理作业的名称。(templated)
role_arn (str) – 拥有创建知识库权限的 IAM 角色的 ARN。(模板化)
model_id (str) – 与此预置吞吐量关联的模型的名称或 ARN。(模板化)
input_uri (str) – 输入数据的 S3 位置。(templated)
output_uri (str) – 输出数据的 S3 位置。(templated)
invoke_kwargs (dict[str, Any] | None) – 要传递给 API 调用的额外关键字参数。(templated)
wait_for_completion (bool) – 是否等待作业停止。(default: True) 注意：批量推理作业的工作方式是将您的作业添加到队列中并“最终”完成，因此使用可延迟模式比使用 wait_for_completion 更实用。
waiter_delay (int) – 两次状态检查之间的等待时间（秒）。(默认值: 60)
waiter_max_attempts (int) – 检查作业完成情况的最大尝试次数。(默认值: 10)
deferrable (bool) – 如果为 True，操作器将异步等待集群停止。这意味着等待完成。此模式需要安装 aiobotocore 模块。(默认值: False)
aws_conn_id – 用于 AWS 凭据的 Airflow 连接。如果为 None 或为空，则使用默认的 boto3 行为。如果在分布式模式下运行 Airflow 且 aws_conn_id 为 None 或为空，则将使用默认的 boto3 配置（并且必须在每个 worker 节点上维护）。
region_name – AWS region_name。如果未指定，则使用默认的 boto3 行为。
verify – 是否验证 SSL 证书。参见: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – botocore 客户端的配置字典（键值对）。参见: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html