Automated Daily DHW Pipeline • dhw

Overview

This document describes an automated pipeline that computes daily coral bleaching heat stress products for the Great Barrier Reef and exports them to Google Cloud Storage (GCS) as Cloud Optimized GeoTIFFs (COGs), with summary statistics logged to BigQuery.

The pipeline implements the same methodology as the interactive GEE scripts (see companion document), following @skirving2020, but runs unattended on a daily schedule via Google Cloud infrastructure.

Architecture

flowchart TD
    SCHED[Cloud Scheduler<br/>Daily 12:00 UTC] -->|Pub/Sub| CF[Cloud Function<br/>pipeline/main.py]
    CF -->|ee.batch.Export| GCS[(GCS Bucket<br/>COGs)]
    CF -->|reduceRegion + getInfo| BQ[(BigQuery<br/>daily_summary)]
    CF -->|reads| ASSETS[(EE Assets<br/>MM, MMM, DC)]
    CF -->|reads| OISST[(NOAA OISST<br/>Daily SST)]

    subgraph "Pre-computed once"
        ASSETS
    end

    subgraph "Updated daily"
        OISST
    end

    subgraph "Outputs"
        GCS
        BQ
    end

Figure 1: Pipeline architecture: Cloud Scheduler triggers a Cloud Function daily.

Pipeline Components

The pipeline consists of four Python scripts and one shell deployment script. No credentials, project IDs, or login details are stored in any script — all configuration is read from environment variables at runtime.

Table 1: Pipeline files and their roles.

File	Purpose	Runs
`precompute_climatology.py`	Export MM, MMM, DC as EE assets	Once
`pipeline/main.py`	Cloud Function: daily SST, anomaly, DHW	Daily (automated)
`pipeline/requirements.txt`	Python dependencies for the function	—
`backfill.py`	Process historical date ranges locally	On demand
`deploy.sh`	Create all GCP resources and deploy	Once

Environment Variables

All scripts read configuration from environment variables. No defaults contain real project IDs or bucket names.

Table 2: Environment variables used by the pipeline.

Variable	Used by	Description
`GEE_PROJECT`	All	Google Cloud project ID registered with Earth Engine
`GCS_BUCKET`	`main.py`, `backfill.py`, `deploy.sh`	GCS bucket name for COG exports
`BQ_TABLE`	`main.py`	BigQuery table (auto-set by `deploy.sh`)

Pre-computed Climatology

The MM, MMM, and 366-day daily climatology are static products derived from the 1985–2012 OISST record. They are computed once by precompute_climatology.py and exported as Earth Engine assets.

export GEE_PROJECT="<your-project-id>"
python precompute_climatology.py

This creates three assets:

Table 3: Pre-computed Earth Engine assets.

Asset	Bands	Description
`coral_dhw/mm_climatology`	12	Monthly mean SST (mm_01 … mm_12)
`coral_dhw/mmm_climatology`	1	Maximum monthly mean SST
`coral_dhw/daily_climatology`	366	Interpolated daily SST (dc_001 … dc_366)

The script polls the EE task queue and reports completion. Typical runtime is 5–15 minutes.

Why Pre-compute?

Loading climatology from assets is effectively instantaneous. Without pre-computation, each daily run would re-derive the climatology from 28 years of data — processing 12 × 28 = 336 monthly means and performing 12 regressions — adding several minutes and significant computation to every invocation.

Cloud Function: Daily Processing

File: pipeline/main.py

Trigger

The function is triggered by a Pub/Sub message sent by Cloud Scheduler at 12:00 UTC daily. NOAA OISST data typically becomes available by ~09:00 UTC for the previous day, so the function processes yesterday’s date by default.

If data is not yet available, the function falls back to the day before.

Products Computed

For each day, three gridded products are computed and exported:

1. Raw SST

The observed OISST sea surface temperature, scaled from raw integer values to °C and clipped to the ROI:

\[\text{SST} = \text{OISST}_{\text{raw}} \times 0.01\]

2. SST Anomaly

\[\text{SST Anomaly}_i = \text{SST}_i - \text{DC}_d\]

where $d$ is the day-of-year of date $i$, and $\text{DC}_d$ is the corresponding band from the pre-computed daily climatology asset.

3. Degree Heating Weeks

\[\text{DHW}_i = \sum_{n=i-83}^{i} \frac{\text{HS}_n}{7}, \quad \text{where } \text{HS}_n \geq 1\,°\text{C}\]

The function loads 84 days of OISST data, computes the HotSpot for each day ($\text{HS} = \max(\text{SST} - \text{MMM}, 0)$), thresholds at 1°C, sums, and divides by 7.

Export to GCS

Each product is exported as a Cloud Optimized GeoTIFF via ee.batch.Export.image.toCloudStorage() with cloudOptimized: True.

These exports are asynchronous — the Cloud Function starts the EE export tasks (~2 seconds) and returns. GEE’s servers then perform the actual computation and write the files to GCS (typically 1–5 minutes).

GCS file structure:

gs://<bucket>/
  sst/
    2024/
      20240101.tif
      20240102.tif
      ...
  sst_anomaly/
    2024/
      20240101.tif
      ...
  dhw/
    2024/
      20240101.tif
      ...
  annual_max_dhw/
    2024/
      20241231.tif

Summary Statistics to BigQuery

The function also computes GBR-wide spatial statistics synchronously via ee.Image.reduceRegion() and inserts a single row into BigQuery (~10 seconds).

For each product $X \in \{\text{SST}, \text{Anomaly}, \text{DHW}\}$:

Spatial mean: $\bar{X} = \frac{1}{n}\sum_{p=1}^{n} X_p$
Standard deviation: $s_X$
95% confidence interval on the spatial mean: $\bar{X} \pm 1.96 \cdot \frac{s_X}{\sqrt{n}}$
Pixel count: $n$

BigQuery Table Schema

Table 4: BigQuery daily_summary table schema.

Column	Type	Description
`date`	DATE	Observation date
`sst_mean`	FLOAT	GBR spatial mean SST (°C)
`sst_std`	FLOAT	Spatial standard deviation
`sst_ci95_lower`	FLOAT	Lower bound of 95% CI
`sst_ci95_upper`	FLOAT	Upper bound of 95% CI
`sst_n_pixels`	INTEGER	Number of valid pixels
`anomaly_mean`	FLOAT	GBR spatial mean anomaly (°C)
`anomaly_std`	FLOAT	Spatial standard deviation
`anomaly_ci95_lower`	FLOAT	Lower bound of 95% CI
`anomaly_ci95_upper`	FLOAT	Upper bound of 95% CI
`anomaly_n_pixels`	INTEGER	Number of valid pixels
`dhw_mean`	FLOAT	GBR spatial mean DHW (°C-weeks)
`dhw_std`	FLOAT	Spatial standard deviation
`dhw_ci95_lower`	FLOAT	Lower bound of 95% CI
`dhw_ci95_upper`	FLOAT	Upper bound of 95% CI
`dhw_n_pixels`	INTEGER	Number of valid pixels

Function Execution Timeline

gantt
    title Daily Pipeline Execution
    dateFormat  ss
    axisFormat  %S s

    section Cloud Function
    Init EE + load assets     :a1, 00, 3s
    Compute SST + Anomaly     :a2, after a1, 2s
    Compute DHW (84-day)      :a3, after a2, 3s
    Start 3 export tasks      :a4, after a3, 2s
    Compute + save summary    :a5, after a4, 10s
    Function returns          :milestone, after a5, 0s

    section GEE Backend (async)
    Export SST COG            :b1, after a4, 120s
    Export Anomaly COG        :b2, after a4, 120s
    Export DHW COG            :b3, after a4, 180s

Figure 2: Execution timeline for a single daily run.

Backfill: Historical Processing

File: backfill.py

The backfill script processes a range of historical dates, submitting export tasks to GEE in bulk:

python backfill.py --start 2024-01-01 --end 2024-12-31 --dest gcs

It also supports computing the annual maximum DHW (per-pixel):

python backfill.py --annual-max 2024 --dest gcs

Rate Limiting

GEE allows up to ~3,000 concurrent queued export tasks. The backfill script pauses briefly every 50 dates to avoid hitting API rate limits. A full year (365 days × 3 products = 1,095 tasks) completes in a few minutes of task submission, with GEE processing the exports over 1–2 hours.

Deployment

File: deploy.sh

The deployment script creates all required GCP resources:

Enables APIs — Earth Engine, Cloud Functions, Cloud Scheduler, Pub/Sub, BigQuery, Cloud Storage, Cloud Build, Cloud Run
Creates a service account with roles:
- roles/earthengine.admin
- roles/storage.objectAdmin
- roles/bigquery.dataEditor
- roles/bigquery.jobUser
Creates the GCS bucket in australia-southeast1
Creates the BigQuery dataset and table
Creates a Pub/Sub topic for the scheduler trigger
Deploys the Cloud Function (gen2, Python 3.11, 512 MB, 300s timeout)
Creates the Cloud Scheduler job (daily at 12:00 UTC)

export GEE_PROJECT="<your-project-id>"
export GCS_BUCKET="<your-bucket-name>"
chmod +x deploy.sh
./deploy.sh

Prerequisites

Google Cloud project with billing enabled
gcloud CLI installed and authenticated
Earth Engine API enabled for the project
Service account registered with Earth Engine

Cost Estimate

Data Volume

The GBR ROI at 0.25° resolution contains approximately:

\[\frac{12.1°}{0.25°} \times \frac{15.8°}{0.25°} \approx 48 \times 63 = 3{,}024 \text{ pixels}\]

A single-band float32 COG with 3,024 pixels is approximately 15 KB.

Table 5: Annual data volume for the GBR ROI.

Item	Calculation	Annual total
Daily COGs	3 products × 15 KB × 365 days	~16 MB
BigQuery rows	365 rows × ~0.5 KB	~180 KB
Annual max DHW	1 COG	~15 KB

Monthly Cost

Table 6: Estimated monthly costs (all within free tiers).

Service	Free tier	Pipeline usage	Cost
GCS storage	5 GB	~16 MB/year	$0
GCS operations	5,000 Class A/month	~90 writes/month	$0
BigQuery storage	10 GB	~180 KB/year	$0
BigQuery queries	1 TB/month	Minimal	$0
Cloud Functions	2M invocations/month	30 runs	$0
Cloud Scheduler	3 free jobs	1 job	$0
Pub/Sub	10 GB/month	~30 KB/month	$0
Earth Engine	Free (non-commercial)	30 computations/month	$0
Total			$0/month

Serving Outputs

The COGs stored in GCS can be consumed by mapping applications:

Cesium / Web Maps

Use a tile server such as TiTiler to serve XYZ tiles from the COGs:

# Example TiTiler URL
GET /cog/tiles/{z}/{x}/{y}?url=gs://bucket/dhw/2024/20240315.tif

GEE Tile API

For live-computed layers (e.g., current DHW), use ee.data.getMapId() to generate tile URLs that can be fed to Cesium.UrlTemplateImageryProvider.

Direct Access

COGs in GCS support HTTP range requests, so desktop GIS applications (QGIS, ArcGIS) can open them directly via the gs:// or https://storage.googleapis.com/ URL without downloading the entire file.

Monitoring

Check Pipeline Status

# Recent logs
gcloud functions logs read daily-dhw-pipeline \
  --region=australia-southeast1 --limit=20

# Exported files
gsutil ls gs://<bucket>/dhw/2024/

# Latest BigQuery entry
bq query --nouse_legacy_sql \
  "SELECT * FROM coral_dhw.daily_summary ORDER BY date DESC LIMIT 1"

Sample BigQuery Queries

Annual maximum DHW (GBR-wide average):

SELECT
  EXTRACT(YEAR FROM date) AS year,
  MAX(dhw_mean) AS max_dhw,
  MAX(dhw_ci95_upper) AS max_dhw_upper
FROM `<project>.coral_dhw.daily_summary`
GROUP BY year ORDER BY year;

Days above bleaching thresholds per year:

SELECT
  EXTRACT(YEAR FROM date) AS year,
  COUNTIF(dhw_mean >= 4) AS days_warning,
  COUNTIF(dhw_mean >= 8) AS days_alert2
FROM `<project>.coral_dhw.daily_summary`
GROUP BY year ORDER BY year;