Raster Chef

(originally presented on Nov. 11th, 2022 as a part of NCSU CGA’s lunch & learn series)

all code examples can be found at github.com/ocsmit/raster-chef

Goals & Concepts

Overview & goals

  • Understand historical context of access
  • Know where to search for data
  • Gain familiarity and resources to ingest data

Concepts covered

  • Raster data model
  • Cloud Optimized GeoTIFF (COG)
  • Spatio Temporal Asset Catalog (STAC)
  • API access

Platforms & APIs

  • NASA EarthData
  • Planetary Computer
  • Planet

Key concepts

Raster data model

Cloud Optimized GeoTIFF

Spatio Temporal Asset Catalog

Extends the GeoJSON object and consists of catalogs, and collections, items

The spec additionally defines a RESTful API

STAC - Catalog

Top Level object which wraps other catalogs, collections, and items

{
    "stac_version": "1.0.0",
    "type": "Catalog",
    "id": "20201211_223832_CS2",
    "description": "A simple catalog example",
    "links": []
}

STAC - Collection

Defines a set of common fields to describe a group of Items that share properties and metadata.

{
    "stac_version": "1.0.0",
    "type": "Collection",
    "license": "ISC",
    "id": "20201211_223832_CS2",
    "description": "A simple collection example",
    "links": [],
    "extent": {},
    "summaries": {}
}

STAC - Item

Most important part of the spec, contains relevant metadata for a scene and links to the assets

{
    "stac_version": "1.0.0",
    "type": "Feature",
    "id": "20201211_223832_CS2",
    "bbox": [],
    "geometry": {},
    "properties": {},
    "collection": "simple-collection",
    "links": [],
    "assets": {}
}

STAC - API

Defined as REST interface which will generate a GeoJSON of items based on a user request.

Endpoint is `/stacname/search`. The user requests a date range and a bounding box, e.g.

{
  "bbox": [5.5, 46, 8, 47.4],
  "time": "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z"
}

and receives all items in the catalog which intersect the bounding box and time range.

STAC - API

Evolving landscape, but primarily accessed with `pystac-client`

Key libraries

  • GDAL
  • Rasterio
  • NumPy
  • PyPROJ

Platforms 🛰️

NASA EarthData

EarthData

NASA’s premier data repository. A part of Earth Observing System Data and Information System (EOSDIS).

Contains 59 PB of data. Expected to grow to 250+ PB by 2025.

EarthData

Built around Common Metadata Repository (CMR), a high performance metadata cataloging system.

Exposes `CMR-STAC` which provides access to entire catalog with STAC compliant API. Has 54 seperate clients containing different data.

Accessing EarthData API for HLS

Harmonized Landsat Sentinel is accesed via the `LPCLOUD` client.

STAC_URL = "https://cmr.earthdata.nasa.gov/stac"
catalog = Client.open(f'{STAC_URL}/LPCLOUD/')

and is split between two collections for the gridded Sentinel 2 and Landsat 8

Parse geojson

with open(GEOJSON_PATH, "r") as fp:
    geojson = json.load(fp)
bbox = geojson["features"][0]["geometry"]

create search query

search = catalog.search(
    collections = ['HLSL30.v2.0', 'HLSS30.v2.0'],
    intersects = bbox,
    datetime = "2022-06/2022-06",
    limit = 100
)

The search query returns a STAC json that contains links to each band’s COG in the cloud, e.g.

...
{
  "B05": {
    "href": "https://data.lpdaac.earthdatacloud.nasa.gov/...
             /HLS.S30.T17TMF.2022154T161829.v2.0.B05.tif"
  }
},
...

COG memory layout allows us to request only our extent

bbox = bounds(geometry)
coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs)
# calculate pixels to be streamed in cog
coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
coord_lower_right = coord_transformer.transform(bbox[1], bbox[2])
pixel_upper_left = geo_fp.index(coord_upper_left[0], coord_upper_left[1])
pixel_lower_right = geo_fp.index(coord_lower_right[0], coord_lower_right[1])
# make http range request only for bytes in window
window = rasterio.windows.Window.from_slices(
    (pixel_upper_left[0], pixel_lower_right[0]),
    (pixel_upper_left[1], pixel_lower_right[1]),
)
subset = geo_fp.read(1, window = window)

Utilizing multiple cores to download data (found here)


def downloader(item, geometry, outpath):
    ... # subset and write to disk

if __name__ == "__main__":

    ... # STAC query to get assets

    import multiprocessing as mp
    with mp.Pool(5) as pool:
        pool.starmap(
            downloader, [(item, geojson, outpath) for item in items]
        )

While we are downloading to disk, with COG’s we can keep the data in memory instead

subset = geo_fp.read(1, window = window) # This is just a NumPy array

and perform our calculations without writing to disk

Planetary Computer

Built around STAC API and uses open standards and software.

Contains 24 PB of data, growing rapidly. Full analysis ready Sentinel 2 catalog.

Offers “Hubs” for cloud computing

Querying data is the almost the same as with EarthData

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)


time_range = "2009-01/2020-01"

search = catalog.search(
                        collections=["landsat-c2-l2"],
                        bbox=bbox,
                        datetime=time_range,
                       )
items = search.get_all_items()

stackstac

“Easier cloud native geoprocessing” - Generates a 4D datacube from a collection of STAC items

import stackstac
stack = stackstac.stack(items, bounds_latlon=bbox) * Can clip on load
>>> xarray.DataArray'stackstac-1b418a2d8beee2539c17d7af4d7720cb'
    time: 8
    band: 22
    y: 7972
    x: 12372
>>> dask.array chunksize=(1, 1, 1024, 1024), meta=np.ndarray
>>> Coordinates: (31)
>>> Attributes: (4)

Masking

mask_bitfields = [1, 2, 3, 4]  * dilated cloud, cirrus, cloud, cloud shadow
bitmask = 0
for field in mask_bitfields:
    bitmask |= 1 << field * Fun bit manipulation \>>

bin(bitmask)

qa = stack.sel(band="qa_pixel").astype("uint16")
bad = qa & bitmask  * just look at those 4 bits

good = stack.where(bad == 0)  * mask pixels where any one of those bits are set

NDVI TimeSeries

\[ NDVI = \frac{(nir - red)}{(nir + red)} \]

nir, red = stack.sel(band="nir08"), stack.sel(band="red")
ndvi = (nir - red) / (nir + red)
ndvi = ndvi.persist() * Dask compute

NDVI TimeSeries

xcen = -423850
ycen = 5100220
# select nearest point and plot timeserie
s = ndvi.sel(x=xcen, y=ycen, method="nearest").to_series()
s.plot(title='mean NDVI', figsize=(8,4), style='ko')

Planet API

Only way to bulk order Planet data.

Access to PlanetScope, SkySat, Rapid Eye, Sentinel 2, and LandSat

Orders API

Orders V2 api allows user to offload a lot of compute to Planet.

Provides band math, clipping, compositing, coregistering, harmonization, reprojecting, tiling

Orders API

Tools can be chained together

{"name": "toar and reproject",
  "source_type": "scenes",
    "products": [{
      "item_ids": [
        "20170614_113217_3163208_RapidEye-5"
      ],
      "item_type": "REOrthoTile",
      "product_bundle": "analytic"
    }],
  "tools": [{
      "toar": {
        "scale_factor": 10000
      }},
    {
      "reproject": {
        "projection": "WGS84",
        "kernel": "cubic"
}}}

Orders API

Need to write own JSON queries or use third party library

POrder is a 3rd party library to interact with their API

Questions?