(originally presented on Nov. 11th, 2022 as a part of NCSU CGA’s lunch & learn series)
all code examples can be found at github.com/ocsmit/raster-chef
Extends the GeoJSON object and consists of catalogs, and collections, items
The spec additionally defines a RESTful API
Top Level object which wraps other catalogs, collections, and items
{
"stac_version": "1.0.0",
"type": "Catalog",
"id": "20201211_223832_CS2",
"description": "A simple catalog example",
"links": []
}
Defines a set of common fields to describe a group of Items that share properties and metadata.
{
"stac_version": "1.0.0",
"type": "Collection",
"license": "ISC",
"id": "20201211_223832_CS2",
"description": "A simple collection example",
"links": [],
"extent": {},
"summaries": {}
}
Most important part of the spec, contains relevant metadata for a scene and links to the assets
{
"stac_version": "1.0.0",
"type": "Feature",
"id": "20201211_223832_CS2",
"bbox": [],
"geometry": {},
"properties": {},
"collection": "simple-collection",
"links": [],
"assets": {}
}
Defined as REST interface which will generate a GeoJSON of items based on a user request.
Endpoint is `/stacname/search`. The user requests a date range and a bounding box, e.g.
{
"bbox": [5.5, 46, 8, 47.4],
"time": "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z"
}
and receives all items in the catalog which intersect the bounding box and time range.
Evolving landscape, but primarily accessed with `pystac-client`
NASA’s premier data repository. A part of Earth Observing System Data and Information System (EOSDIS).
Contains 59 PB of data. Expected to grow to 250+ PB by 2025.
Built around Common Metadata Repository (CMR), a high performance metadata cataloging system.
Exposes `CMR-STAC` which provides access to entire catalog with STAC compliant API. Has 54 seperate clients containing different data.
Harmonized Landsat Sentinel is accesed via the `LPCLOUD` client.
STAC_URL = "https://cmr.earthdata.nasa.gov/stac"
catalog = Client.open(f'{STAC_URL}/LPCLOUD/')
and is split between two collections for the gridded Sentinel 2 and Landsat 8
Parse geojson
with open(GEOJSON_PATH, "r") as fp:
geojson = json.load(fp)
bbox = geojson["features"][0]["geometry"]
create search query
search = catalog.search(
collections = ['HLSL30.v2.0', 'HLSS30.v2.0'],
intersects = bbox,
datetime = "2022-06/2022-06",
limit = 100
)
The search query returns a STAC json that contains links to each band’s COG in the cloud, e.g.
...
{
"B05": {
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/...
/HLS.S30.T17TMF.2022154T161829.v2.0.B05.tif"
}
},
...
COG memory layout allows us to request only our extent
bbox = bounds(geometry)
coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs)
# calculate pixels to be streamed in cog
coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
coord_lower_right = coord_transformer.transform(bbox[1], bbox[2])
pixel_upper_left = geo_fp.index(coord_upper_left[0], coord_upper_left[1])
pixel_lower_right = geo_fp.index(coord_lower_right[0], coord_lower_right[1])
# make http range request only for bytes in window
window = rasterio.windows.Window.from_slices(
(pixel_upper_left[0], pixel_lower_right[0]),
(pixel_upper_left[1], pixel_lower_right[1]),
)
subset = geo_fp.read(1, window = window)
Utilizing multiple cores to download data (found here)
def downloader(item, geometry, outpath):
... # subset and write to disk
if __name__ == "__main__":
... # STAC query to get assets
import multiprocessing as mp
with mp.Pool(5) as pool:
pool.starmap(
downloader, [(item, geojson, outpath) for item in items]
)
While we are downloading to disk, with COG’s we can keep the data in memory instead
subset = geo_fp.read(1, window = window) # This is just a NumPy array
and perform our calculations without writing to disk
Built around STAC API and uses open standards and software.
Contains 24 PB of data, growing rapidly. Full analysis ready Sentinel 2 catalog.
Offers “Hubs” for cloud computing
Querying data is the almost the same as with EarthData
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
time_range = "2009-01/2020-01"
search = catalog.search(
collections=["landsat-c2-l2"],
bbox=bbox,
datetime=time_range,
)
items = search.get_all_items()
“Easier cloud native geoprocessing” - Generates a 4D datacube from a collection of STAC items
import stackstac
stack = stackstac.stack(items, bounds_latlon=bbox) * Can clip on load
>>> xarray.DataArray'stackstac-1b418a2d8beee2539c17d7af4d7720cb'
time: 8
band: 22
y: 7972
x: 12372
>>> dask.array chunksize=(1, 1, 1024, 1024), meta=np.ndarray
>>> Coordinates: (31)
>>> Attributes: (4)
mask_bitfields = [1, 2, 3, 4] * dilated cloud, cirrus, cloud, cloud shadow
bitmask = 0
for field in mask_bitfields:
bitmask |= 1 << field * Fun bit manipulation \>>
bin(bitmask)
qa = stack.sel(band="qa_pixel").astype("uint16")
bad = qa & bitmask * just look at those 4 bits
good = stack.where(bad == 0) * mask pixels where any one of those bits are set
\[ NDVI = \frac{(nir - red)}{(nir + red)} \]
nir, red = stack.sel(band="nir08"), stack.sel(band="red")
ndvi = (nir - red) / (nir + red)
ndvi = ndvi.persist() * Dask compute
xcen = -423850
ycen = 5100220
# select nearest point and plot timeserie
s = ndvi.sel(x=xcen, y=ycen, method="nearest").to_series()
s.plot(title='mean NDVI', figsize=(8,4), style='ko')
Only way to bulk order Planet data.
Access to PlanetScope, SkySat, Rapid Eye, Sentinel 2, and LandSat
Orders V2 api allows user to offload a lot of compute to Planet.
Provides band math, clipping, compositing, coregistering, harmonization, reprojecting, tiling
Tools can be chained together
{"name": "toar and reproject",
"source_type": "scenes",
"products": [{
"item_ids": [
"20170614_113217_3163208_RapidEye-5"
],
"item_type": "REOrthoTile",
"product_bundle": "analytic"
}],
"tools": [{
"toar": {
"scale_factor": 10000
}},
{
"reproject": {
"projection": "WGS84",
"kernel": "cubic"
}}}
Need to write own JSON queries or use third party library
POrder is a 3rd party library to interact with their API
Questions?