Google Cloud Life Sciences 操作器¶
Google Cloud Life Sciences 是一项在 Google Cloud 上执行系列计算引擎容器的服务。它用于大规模处理、分析和标注基因组学和生物医学数据。
警告
Cloud Life Sciences 将于 2025 年 7 月 8 日停止服务。请改用 Google Cloud Batch。
先决任务¶
要使用这些操作器,您必须完成以下几项工作
使用 Cloud Console 选择或创建一个 Cloud Platform 项目。
为您的项目启用结算功能,详见 Google Cloud 文档。
启用 API,详见 Cloud Console 文档。
通过 pip 安装 API 库。
pip install 'apache-airflow[google]'详细信息请参阅 安装。
运行流水线¶
使用 LifeSciencesRunPipelineOperator
执行流水线。
此操作器已被弃用,并将在 2025 年 7 月 8 日后移除。所有功能和新特性均可在 Google Cloud Batch 平台获取。请使用 CloudBatchSubmitJobOperator
tests/system/google/cloud/cloud_batch/example_cloud_batch.py
def _create_job():
runnable = batch_v1.Runnable()
runnable.container = batch_v1.Runnable.Container()
runnable.container.image_uri = "gcr.io/google-containers/busybox"
runnable.container.entrypoint = "/bin/sh"
runnable.container.commands = [
"-c",
"echo Hello world! This is task ${BATCH_TASK_INDEX}.\
This job has a total of ${BATCH_TASK_COUNT} tasks.",
]
task = batch_v1.TaskSpec()
task.runnables = [runnable]
resources = batch_v1.ComputeResource()
resources.cpu_milli = 2000
resources.memory_mib = 16
task.compute_resource = resources
task.max_retry_count = 2
group = batch_v1.TaskGroup()
group.task_count = 2
group.task_spec = task
policy = batch_v1.AllocationPolicy.InstancePolicy()
policy.machine_type = "e2-standard-4"
instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
instances.policy = policy
allocation_policy = batch_v1.AllocationPolicy()
allocation_policy.instances = [instances]
job = batch_v1.Job()
job.task_groups = [group]
job.allocation_policy = allocation_policy
job.labels = {"env": "testing", "type": "container"}
job.logs_policy = batch_v1.LogsPolicy()
job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING
return job
参考¶
更多信息请参考