airflow.providers.pinecone.hooks.pinecone¶

Pinecone Hook。

类¶

PineconeHook

与 Pinecone 交互。此 Hook 使用 Pinecone conn_id。

模块内容¶

class airflow.providers.pinecone.hooks.pinecone.PineconeHook(conn_id=default_conn_name, environment=None, region=None)[source]¶

基类: airflow.hooks.base.BaseHook

与 Pinecone 交互。此 Hook 使用 Pinecone conn_id。

参数：: conn_id (str) – 可选，默认连接 id 为 pinecone_default。连接到 Pinecone 时使用的连接 id。

conn_name_attr = 'conn_id'[source]¶

default_conn_name = 'pinecone_default'[source]¶

conn_type = 'pinecone'[source]¶

hook_name = 'Pinecone'[source]¶

classmethod get_connection_form_widgets()[source]¶

返回要添加到连接表单的连接小部件。

classmethod get_ui_field_behaviour()[source]¶

返回自定义字段行为。

conn_id = 'pinecone_default'[source]¶

property api_key: str[source]¶

property environment: str[source]¶

property region: str[source]¶

property pinecone_client: pinecone.Pinecone[source]¶

用于与 Pinecone 交互的 Pinecone 对象。

property conn: airflow.models.connection.Connection[source]¶

test_connection()[source]¶

list_indexes()[source]¶

检索您项目中的所有索引列表。

upsert(index_name, vectors, namespace='', batch_size=None, show_progress=True, **kwargs)[source]¶

将向量写入命名空间。

如果为现有的向量 id 插入新值，它将覆盖先前的值。

另请参阅

https://docs.pinecone.io/reference/upsert

并行 upsert 请遵循

另请参阅

https://docs.pinecone.io/docs/insert-data#sending-upserts-in-parallel

参数：

index_name (str) – 要描述的索引名称。
vectors (list[pinecone.Vector] | list[tuple] | list[dict]) – 要 upsert 的向量列表。
namespace (str) – 要写入的命名空间。如果未指定，则使用默认命名空间 - “”。
batch_size (int | None) – 每个批次中要 upsert 的向量数量。
show_progress (bool) – 是否使用 tqdm 显示进度条。仅当提供了 batch_size 时适用。

get_pod_spec_obj(*, replicas=None, shards=None, pods=None, pod_type='p1.x1', metadata_config=None, source_collection=None, environment=None)[source]¶

获取 PodSpec 对象。

参数：

replicas (int | None) – 副本数量。
shards (int | None) – 分片数量。
pods (int | None) – Pod 数量。
pod_type (str | None) – Pod 类型。
metadata_config (dict | None) – 元数据配置。
source_collection (str | None) – 源集合。
environment (str | None) – 创建索引时使用的环境。

get_serverless_spec_obj(*, cloud, region=None)[source]¶

获取 ServerlessSpec 对象。

参数：

cloud (str) – 云提供商。
region (str | None) – 创建索引时使用的区域。

create_index(index_name, dimension, spec, metric='cosine', timeout=None)[source]¶

创建一个新索引。

参数：

index_name (str) – 索引名称。
dimension (int) – 要索引的向量维度。
spec (pinecone.ServerlessSpec | pinecone.PodSpec) – 传递一个 ServerlessSpec 对象以创建无服务器索引，或传递一个 PodSpec 对象以创建 Pod 索引。可以使用 get_serverless_spec_obj 和 get_pod_spec_obj 创建 Spec 对象。
metric (str | None) – 要使用的度量。默认为 cosine。
timeout (int | None) – 要使用的超时时间。

describe_index(index_name)[source]¶

检索有关特定索引的信息。

参数：: index_name (str) – 要描述的索引名称。

delete_index(index_name, timeout=None)[source]¶

删除特定索引。

参数：

index_name (str) – 索引名称。
timeout (int | None) – 等待索引准备就绪的超时时间。

configure_index(index_name, replicas=None, pod_type='')[source]¶

更改索引的当前配置。

参数：

index_name (str) – 要配置的索引名称。
replicas (int | None) – 新的副本数量。
pod_type (str | None) – 索引的新 pod_type。

create_collection(collection_name, index_name)[source]¶

从指定的索引创建一个新集合。

参数：

collection_name (str) – 要创建的集合名称。
index_name (str) – 源索引名称。

delete_collection(collection_name)[source]¶

删除特定集合。

参数：: collection_name (str) – 要删除的集合名称。

describe_collection(collection_name)[source]¶

检索有关特定集合的信息。

参数：: collection_name (str) – 要描述的集合名称。

list_collections()[source]¶

检索当前项目中的所有集合列表。

query_vector(index_name, vector, query_id=None, top_k=10, namespace=None, query_filter=None, include_values=None, include_metadata=None, sparse_vector=None)[source]¶

使用查询向量搜索命名空间。

它检索命名空间中最相似项的 id 以及它们的相似度分数。API 参考：https://docs.pinecone.io/reference/query

参数：

index_name (str) – 要查询的索引名称。
vector (list[Any]) – 查询向量。
query_id (str | None) – 用作查询向量的唯一 ID。
top_k (int) – 要返回的结果数量。
namespace (str | None) – 要从中获取向量的命名空间。如果未指定，则使用默认命名空间。
query_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – 要应用的过滤器。参见 https://www.pinecone.io/docs/metadata-filtering/
include_values (bool | None) – 结果中是否包含向量值。
include_metadata (bool | None) – 指示响应中是否包含元数据以及 id。
sparse_vector (pinecone.core.client.model.sparse_values.SparseValues | dict[str, list[float] | list[int]] | None) – 查询向量的稀疏值。期望是 SparseValues 对象或形式为：{'indices': list[int], 'values': list[float]} 的字典，其中每个列表具有相同的长度。

upsert_data_async(index_name, data, async_req=False, pool_threads=None)[source]¶

将数据 upsert（插入/更新）到 Pinecone 索引中。

参数：

index_name (str) – 索引名称。
data (list[tuple[Any]]) – 要 upsert 的元组列表。每个元组的形式为 (id, vector, metadata)。元数据是可选的。
async_req (bool) – 如果为 True，upsert 操作将是异步的。
pool_threads (int | None) – 用于并行 upsert 的线程数。如果 async_req 为 True，则必须提供此项。

describe_index_stats(index_name, stats_filter=None, **kwargs)[source]¶

描述索引统计信息。

返回有关索引内容的统计信息。例如：每个命名空间的向量计数和维度数量。API 参考：https://docs.pinecone.io/reference/describe_index_stats_post

参数：

index_name (str) – 索引名称。
stats_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – 如果此参数存在，操作将仅返回满足过滤条件的向量的统计信息。参见 https://www.pinecone.io/docs/metadata-filtering/