Wombling

Tags :: Spatial Statistics

Wombling is the general term assigned to the class of boundary analysis methods that seek to discover important boundaries or barriers which reveal sharp changes in of spatially oriented variables and the underlying influences responsible (1).

These methods tend to be regarded as “distribution-free”, using approximate randomization to generate empirical distributions for test statistics and foregoing any distributional assumptions about the underlying model.

Approaches depend on if the data is point level (geostatistical) or areal (lattice). E.g. when using point-level data, the boundaries can naturally be obtained by locating those points of steepest ascent/descent on the fitted spatial surface.

Lattice wombling

Given the set \(\mathcal{D} = \{Y(s_i), \dots, Y(s_{h \times k}): h \times k \in \mathbb{Z}^+ \}\) containing the instances of the function \(Y:\mathbb{R}^2 \to \mathbb{R}^n\) of locations \(s_i\) regularly spaced along \(h\) columns and \(k\) rows.The Womble algorithm calculates the rate of change \(m\) of the variable \(Z\) from the partial derivatives in the \(x\) and \(y\) directions. \(Z\) will be a set of 4 nearby points from \(\mathcal{D}\) which form a square. The coordinates of \(Z\) will be rescaled to be between 0 and 1, yielding the set \(\{0,1\}\times\{0,1\}\).

The rate of change is defined as \[ m = \sqrt{\frac{\partial f(x,y)}{\partial x}^2 + \frac{\partial f(x,y)}{\partial y}^2} \] where \(f(x,y)\) is the bilinear function \[ f(x,y) = Z_1(1-x)(1-y) + Z_2 x(1-y) + Z_3xy + Z_4(1-x)y \]

The partial derivatives are straight forward to find;

\begin{aligned} f_x & = Z_1(1-y) \cdot -1 + Z_2(1-y) \cdot 1 + Z_3y \cdot 1 + Z_4 \cdot -1 \\ & = -Z_1(1-y) + Z_2(1-y) + Z_3y+ -Z_4y \\ & = (Z_1 - Z_2 + Z_3 - Z_4)y + Z_2 - Z_1 \end{aligned}

\begin{aligned} f_y & = Z_1(1-x) \cdot -1 + Z_2x \cdot -1 + Z_3x \cdot 1 + Z_4(1-x) \cdot 1 \\ & = -Z_1(1-x) - Z_2x + Z_3x + -Z_4(1-x) \\ & = (-Z_1 - Z_2 + Z_3 + Z_4)x + Z_4 - Z_1 \end{aligned}

The function \(m\) will be calculated at the centroid of the 4 points, which is \(x=y=0.5\) since the coordinates are scaled between \([0,1]\). The area will now be a \((h-1)(k-1)\) sized grid (1 less column and row) as we are interpolating between all the points.

Areal wombling w/ adjacency modeling

(2007) introduce a model which allow the data to determine the degree and nature of spatial smoothing for areal wombling through a randomly weighted hierarchical model. The model allows the determination of the neighborhood structure from each region process and variables determining their similarity.

The core of the method is the randomly weighted Conditional Autoregressive Model, \(CAR(\tau, W)\) or simply CARw, where instead of being fixed the weights are modeled as \[ w_{ij}|p_{ij} \stackrel{\text{indep}}{\sim} \text{Bernoulli}(p_{ij}), \text{ where } \log \left(\frac{p_{ij}}{1 - p_{ij}}\right) = \mathbf{z}^{’}_{ij}\mathcal{y} \] where \(\mathbf{z}_{ij}\) is a set of known features of the $\{i, j\}$th region pair, and \(\mathcal{y}\) is the corresponding parameter vector. Regions \(i\) & \(j\) are neighbors with probability \(p_{ij}\) provided they share a common boundary. \(w_{ij} = 0\) for all nonadjacent regions.

The covariates for \(\mathbf{z}_{ij}\) can be just about anything containing information about the differences between a region \(i\) and \(j\). E.g. the distance between centroids, auxiliary topological information such presence of a river or farmland, or sociodemographic information in the case of health related studies.

Following a bayesian approach, the CARw model is used with a Gibbs sampler to draw posterior estimates for the parameters. More details on the estimated parameters and their forms be found in the paper.

The real data and simulation studies show the method is advantages as it allows data and observed covariate information to help determine the degree of spatial smoothing, with the CARw approach generally producing better agreement between true and wombled boundaries. However, no existing software could fit the model at the time of writing and the link provided to the authors code is dead.

One limitation is that the CARw model cannot distinguish between a “continuation” and “cross” boundary segment. Additionally in the setting of an irregular lattice it is not clear how neighbor groups should be defined, nor how it could be managed computationally.

References

[1] Lu, Haolan and Reilly, Cavan S. and Banerjee, Sudipto and Carlin, Bradley P., Bayesian areal wombling via adjacency modeling, Springer Science and Business Media LLC, 2007.


No notes link to this note