airflow.providers.google.cloud.transfers.cassandra_to_gcs¶

此模块包含用于将数据从 Cassandra 以 JSON 格式复制到 Google Cloud Storage 的操作符。

属性¶

`NotSetType`
`NOT_SET`

类¶

CassandraToGCSOperator

将数据从 Cassandra 以 JSON 格式复制到 Google Cloud Storage。

模块内容¶

airflow.providers.google.cloud.transfers.cassandra_to_gcs.NotSetType[source]¶

airflow.providers.google.cloud.transfers.cassandra_to_gcs.NOT_SET[source]¶

class airflow.providers.google.cloud.transfers.cassandra_to_gcs.CassandraToGCSOperator(*, cql, bucket, filename, schema_filename=None, approx_max_file_size_bytes=1900000000, gzip=False, cassandra_conn_id='cassandra_default', gcp_conn_id='google_cloud_default', impersonation_chain=None, query_timeout=NOT_SET, encode_uuid=True, **kwargs)[source]¶

基类: airflow.models.BaseOperator

将数据从 Cassandra 以 JSON 格式复制到 Google Cloud Storage。

注意：不支持数组的数组。

参数:

cql (str) – 在 Cassandra 表上执行的 CQL。
bucket (str) – 要上传到的存储桶。
filename (str) – 上传到 Google Cloud Storage 时用作对象名称的文件名。文件名中应指定 {}，以便操作符在文件因大小拆分时注入文件编号。
schema_filename (str | None) – 如果设置，则这是上传包含从 MySQL 导出的表对应的 BigQuery 模式字段的 .json 文件时用作对象名称的文件名。
approx_max_file_size_bytes (int) – 此操作符支持将大型表导出拆分成多个文件的功能（参见上面文件名参数文档中的注意事项）。此参数允许开发者指定拆分后的文件大小。请查看 https://cloud.google.com/storage/quotas 以了解单个对象的最大允许文件大小。
cassandra_conn_id (str) – 对特定 Cassandra Hook 的引用。
gzip (bool) – 上传时压缩文件的选项
gcp_conn_id (str) – （可选）用于连接到 Google Cloud 的连接 ID。
impersonation_chain (str | collections.abc.Sequence[str] | None) – （可选）用于使用短期凭据模拟的服务账号，或获取列表中最后一个账号的 access_token 所需的链式账号列表，该账号将在请求中被模拟。如果设置为字符串，该账号必须授予原始账号 Service Account Token Creator IAM 角色。如果设置为序列，列表中的身份必须授予紧前身份 Service Account Token Creator IAM 角色，列表中第一个账号将此角色授予原始账号（模板化）。
query_timeout (float | None | NotSetType) – （可选）执行 Cassandra 查询所用的时间量，以秒为单位。如果未设置，超时值将由 Cassandra 驱动程序在 Session.execute() 中设置。如果设置为 None，则没有超时。
encode_uuid (bool) – （可选）从 Cassandra 上传到 GCS 时是否编码 UUID 的选项。默认为编码 UUID。

template_fields: collections.abc.Sequence[str] = ('cql', 'bucket', 'filename', 'schema_filename', 'impersonation_chain')[source]¶

template_ext: collections.abc.Sequence[str] = ('.cql',)[source]¶

ui_color = '#a0e08c'[source]¶

cql[source]¶

bucket[source]¶

filename[source]¶

schema_filename = None[source]¶

approx_max_file_size_bytes = 1900000000[source]¶

cassandra_conn_id = 'cassandra_default'[source]¶

gcp_conn_id = 'google_cloud_default'[source]¶

gzip = False[source]¶

impersonation_chain = None[source]¶

query_timeout[source]¶

encode_uuid = True[source]¶

CQL_TYPE_MAP[source]¶

execute(context)[source]¶

创建操作符时派生。

上下文与渲染 jinja 模板时使用的字典相同。

有关更多上下文信息，请参阅 get_template_context。

generate_data_dict(names, values)[source]¶

生成将存储为 GCS 文件的数据结构。

convert_value(value)[source]¶

将值转换为 BQ 类型。

convert_array_types(value)[source]¶

将 convert_value 映射到数组。

convert_user_type(value)[source]¶

将用户类型转换为包含 n 个字段的 RECORD，其中 n 是属性的数量。

用户类型类中的每个元素都将转换为其在 BQ 中的相应数据类型。

convert_tuple_type(values)[source]¶

将元组转换为包含 n 个字段的 RECORD。

每个字段都将转换为其在 BQ 中的相应数据类型，并命名为 'field_<index>'，其中 index 由 cassandra 中定义的元组元素的顺序决定。

convert_map_type(value)[source]¶

将 map 转换为包含两个字段（'key' 和 'value'）的重复 RECORD。

每个都将转换为其在 BQ 中的相应数据类型。

classmethod generate_schema_dict(name, type_)[source]¶

生成 BQ 模式。

classmethod get_bq_fields(type_)[source]¶

将非简单类型值转换为 BQ 表示。

static is_simple_type(type_)[source]¶

检查类型是否为简单类型。

static is_array_type(type_)[source]¶

检查类型是否为数组类型。

static is_record_type(type_)[source]¶

检查 RECORD 类型。

classmethod get_bq_type(type_)[source]¶

将类型转换为等效的 BQ 类型。

classmethod get_bq_mode(type_)[source]¶

将类型转换为等效的 BQ 模式。