Through sampling experiments on high-resolution LiDAR snow depth observations at six separate 1.17-km^{2} sites in the Colorado Rocky Mountains, we provide novel perspectives on a variety of issues affecting the regression estimation of snow depth from sparse observations. We measure the effects of observation count, random selection of observations, quality of predictor variables, and cross-validation procedures using three skill metrics: percent error in total snow volume, root mean squared error (*RMSE*), and *R*^{2}. Extremes of predictor quality are used to understand the range of its effect; how do predictors downloaded from internet perform against more accurate predictors measured by LiDAR? Whereas cross validation remains the only option for validating inference from sparse observations, in our experiments, the full set of LiDAR-measured snow depths can be considered the ‘true’ spatial distribution and used to understand cross-validation bias at the spatial scale of inference. We model at the 30-m resolution of readily available predictors, which is a popular spatial resolution in the literature. Three regression models are also compared, and we briefly examine how sampling design affects model skill.

Results quantify the primary dependence of each skill metric on observation count that ranges over three orders of magnitude, doubling at each step from 25 up to 3200. Whereas uncertainty (resulting from random selection of observations) in percent error of true total snow volume is typically well constrained by 100–200 observations, there is considerable uncertainty in the inferred spatial distribution (*R*^{2}) even at medium observation counts (200–800). We show that percent error in total snow volume is not sensitive to predictor quality, although *RMSE* and *R*^{2} (measures of spatial distribution) often depend critically on it. Inaccuracies of downloaded predictors (most often the vegetation predictors) can easily require a quadrupling of observation count to match *RMSE* and *R*^{2} scores obtained by LiDAR-measured predictors.

Under cross validation, the *RMSE* and *R*^{2} skill measures are consistently biased towards poorer results than their true validations. This is primarily a result of greater variance at the spatial scales of point observations used for cross validation than at the 30-m resolution of the model. The magnitude of this bias depends on individual site characteristics, observation count (for our experimental design), and sampling design. Sampling designs that maximize independent information maximize cross-validation bias but also maximize true *R*^{2}. The bagging tree model is found to generally outperform the other regression models in the study on several criteria.

Finally, we discuss and recommend use of LiDAR in conjunction with regression modelling to advance understanding of snow depth spatial distribution at spatial scales of thousands of square kilometres.

]]>Project HOTSPOT has completed three drill holes. (1) The Kimama site is located along the central volcanic axis of the SRP; our goal here was to sample a long-term record of basaltic volcanism in the wake of the SRP hotspot. (2) The Kimberly site is located near the margin of the plain; our goal here was to sample a record of high-temperature rhyolite volcanism associated with the underlying plume. This site was chosen to form a nominally continuous record of volcanism when paired with the Kimama site. (3) The Mountain Home site is located in the western plain; our goal here was to sample the Pliocene-Pleistocene transition in lake sediments at this site and to sample older basalts that underlie the sediments.

We report here on our initial results for each site, and on some of the geophysical logging studies carried out as part of this project.

]]>