sgsPy
structurally guided sampling
Loading...
Searching...
No Matches

Classes

struct  sgs::strat::OptimAllocationDataManager
class  sgs::strat::IndexStorageVectors
struct  sgs::strat::FocalWindow

Functions

std::vector< int64_t > sgs::strat::calculateAllocation (int64_t numSamples, std::string allocation, std::vector< int64_t > strataCounts, std::vector< double > weights, int64_t numPixels)
template<typename T>
std::vector< int64_t > sgs::strat::processBlocksStratRandom (int numSamples, int numStrata, helper::RasterBandMetaData &band, access::Access &access, existing::Existing &existing, IndexStorageVectors &indices, std::vector< std::vector< OGRPoint > > &existingSamples, uint64_t multiplier, xso::xoshiro_4x64_plus &rng, std::string allocation, OptimAllocationDataManager &optim, std::vector< double > weights, int width, int height)
template<typename T>
std::vector< int64_t > sgs::strat::processBlocksStratQueinnec (int numSamples, int numStrata, helper::RasterBandMetaData &band, access::Access &access, existing::Existing &existing, IndexStorageVectors &indices, IndexStorageVectors &queinnecIndices, FocalWindow &fw, std::vector< std::vector< OGRPoint > > &existingSamples, uint64_t multiplier, uint64_t queinnecMultiplier, xso::xoshiro_4x64_plus &rng, std::string allocation, OptimAllocationDataManager &optim, std::vector< double > weights, int width, int height)
std::tuple< std::vector< std::vector< double > >, vector::GDALVectorWrapper *, size_t > sgs::strat::strat (raster::GDALRasterWrapper *p_raster, int bandNum, int64_t numSamples, int64_t numStrata, std::string allocation, std::vector< double > weights, raster::GDALRasterWrapper *p_mraster, int mrastBandNum, std::string method, int wrow, int wcol, double mindist, vector::GDALVectorWrapper *p_existing, bool force, vector::GDALVectorWrapper *p_access, std::string layerName, double buffInner, double buffOuter, std::vector< std::pair< std::string, int > > mapStratMapping, bool plot, std::string filename, std::string tempFolder)

Detailed Description

Function Documentation

◆ calculateAllocation()

std::vector< int64_t > sgs::strat::calculateAllocation ( int64_t numSamples,
std::string allocation,
std::vector< int64_t > strataCounts,
std::vector< double > weights,
int64_t numPixels )

Helper function which calculates the count of values for each strata depending on the allocation method.

Parameters
std::stringallocation method
std::vector<size_t>&sizes of each strata
std::vector<double>weights of each strata
size_ttotal number of pixels
Returns
std::vector<size_t> counts of each stratum

◆ processBlocksStratQueinnec()

template<typename T>
std::vector< int64_t > sgs::strat::processBlocksStratQueinnec ( int numSamples,
int numStrata,
helper::RasterBandMetaData & band,
access::Access & access,
existing::Existing & existing,
IndexStorageVectors & indices,
IndexStorageVectors & queinnecIndices,
FocalWindow & fw,
std::vector< std::vector< OGRPoint > > & existingSamples,
uint64_t multiplier,
uint64_t queinnecMultiplier,
xso::xoshiro_4x64_plus & rng,
std::string allocation,
OptimAllocationDataManager & optim,
std::vector< double > weights,
int width,
int height )

This function processes the strat raster in blocks using the 'Queinnec' method. In the Queinnec method, pixels which are surrounded by pixels of the same strata are prioritized for sampling over pixels which aren't.

First, the block size is adjusted to be scanlines with a height of either the original y block size, or 128. This is done because the raster needs to be read as scanlines for the FocalWindow struct to work well and not get any more complicated than it already is. However, we want the raster IO to still be as efficient as possible, so chunks of the raster are still read on block boundaries, just containing a lot more than just 1 block.

Next, memory is allocated and structs are created for random value calculation and optim allocation. More memory is allocated than just 1 of the chunks (xBlockSize * yBlockSize), this is because usign the focal window struct method, we may have to read in some of the final few pixels of the previous chunk, to the start of the new chunk.

Next, iterate thorugh the blocks within the raster. For each block:

  • read block from the strat raster
  • calculate rand values (both normal and queinnec) for the new block
  • iterate through the pixels in the block

The newBlockStart value is used due to the aforementioned padding which may be placed on the top of the new block. This newBlockStart value is the index of the block which is the start of the new block. This value will be 0 in the case of the first block, but different otherwise.

Rather than using a typical nested for loop, one for the vertical and one for the horizontal, we use one for the vertical and three for the horizontal. This is because there are areas on the left and right of the raster which will never be eligible as queinnec pixels because their focal window includes pixels which go off the edge of the raster. And, critically, trying to calculate whether they are horizontally eligible as a queinnec pixel would result in checking a pixel which either doesn't exist or is on the opposite side of the raster.

For each pixel horizontally not eligible to be a queinnec pixel:

  • ignore the pixel if it is a nan value
  • update the optim variances if optim allocation is used
  • update total sample counts
  • add index to existing vector if the pixel is part of an existing sample network
  • if the pixel is both accessible and not already sampled, update the index storage vectors.

For pixel which is horizontally eligible to be a queinnec pixel:

  • do all of the same as those non-eligible pixels in addition to...
  • set focal window valid vector for the current pixel
  • set focal window matrix vector for the current pixel by checking horizontally adjacent pixels
  • check the focal window matrix for the pixel which will have just had all of it's vertical pixels horizontally checked. Calling this check on a pixel which would have a negative y will always result in a false due to the focal window matrix being automatically set to false. If this check succeeds – check vertical pixels to see if they are the same and if so update the queinnec index storage vectors.

Once all blocks have been processed, free any allocated memory and calculate the sample allocation per strata, and return this allocation.

Parameters
intnumSamples
intnumStreata
RasterBandMetaData&band
Access&access
Existing&existing
IndexStorageVectors&indices
IndexStorageVectors&queinnecIndices
FocalWindow&fw
std::vector<std::vector<OGRPoint>>&existingSamples
uint64_tmultiplier
uint64_tqueinnecMultiplier
xso::xoshiro_4x64_plus&rng
std::stringallocation
OptimAllocatoinDataManager&optim
std::vector<double>weights
intwidth
intheight
Returns
std::vector<int64_t>

◆ processBlocksStratRandom()

template<typename T>
std::vector< int64_t > sgs::strat::processBlocksStratRandom ( int numSamples,
int numStrata,
helper::RasterBandMetaData & band,
access::Access & access,
existing::Existing & existing,
IndexStorageVectors & indices,
std::vector< std::vector< OGRPoint > > & existingSamples,
uint64_t multiplier,
xso::xoshiro_4x64_plus & rng,
std::string allocation,
OptimAllocationDataManager & optim,
std::vector< double > weights,
int width,
int height )

This function processes the strat raster in blocks using the 'random' method. In the random method, every pixel in a particular strata has the same priority of being added as any other pixel in that strata.

First, memory is allocated and structures are initialized for random value calculation and optim allocation.

Next, iterate through the blocks within the raster. For each block:

  • read the strat raster (and potentially access & optim rasters) into memory
  • calculate rand values for the new block
  • iterate through the pixels in the block

For each pixel:

  • ignore the pixel if it is a nan value
  • update the optim variances if optim allocation used
  • update total sample counts
  • add index to existing vector if the pixel is part of an existing sample network
  • If the pixel is both accessible and not already sampled, update the index storage vectors

Once all blocks have been processed, free any allocated memory and calculate the sample allocation per strata, and return this allocation.

Parameters
intnumSamples
intnumStrata
RasterBandMetaData&band
Access&access
Existing&existing
IndexStorageVectors&indices
std::vector<std::vector<OGRPoint>>&existingSamples
uint64_tmultiplier
xoshiro_4x64_plus&rng
std::stringallocation
OptimAllocationDataManager&optim
std::vector<double>weights
intwidth
intheight
Returns
std::vector<int64_t>

◆ strat()

std::tuple< std::vector< std::vector< double > >, vector::GDALVectorWrapper *, size_t > sgs::strat::strat ( raster::GDALRasterWrapper * p_raster,
int bandNum,
int64_t numSamples,
int64_t numStrata,
std::string allocation,
std::vector< double > weights,
raster::GDALRasterWrapper * p_mraster,
int mrastBandNum,
std::string method,
int wrow,
int wcol,
double mindist,
vector::GDALVectorWrapper * p_existing,
bool force,
vector::GDALVectorWrapper * p_access,
std::string layerName,
double buffInner,
double buffOuter,
std::vector< std::pair< std::string, int > > mapStratMapping,
bool plot,
std::string filename,
std::string tempFolder )

This function conducts stratified random sampling on the provided stratified raster.

First, metadata is acquired in the strat raster band, which is to be read to determine sample strata and to ensure samples don't occur on nan values.

Next, the output vector dataset is creates as an in-memory dataset. If the user specifies a filename, this in-memory dataset will be written to disk in a different format after all points have been added.

An Access struct is created, which creates a raster dataset containing a rasterized version of access buffers. This raster will be 1 over accessible areas. In the case where there is no access vector given, the structs 'used' member will be false and no processing or rasterization will be done.

An Existing struct is created, which retains information on already existing sample points passed in the form of a vector dataset. The points are iterated through and added to the output dataset. The points are also added to a set, and during iteration the indexes of every pixel will be checked against this set to ensure there are no duplicate pixels. In the case whre there is no existing vector given, the structs 'used' member will be false and no processing will be done.

Next, a rng() function is created usign the xoshiro library, the specific randm number generator is the xoshrio256++ https://vigna.di.unimi.it/ftp/papers/ScrambledLinear.pdf

The impetus behind usign the rng() function to determine which pixels should be added DURING iteration, rather than afterwards, is it removes the necessity of storing every available pixel, which quickly becomes extrordinarily inefficient for large rasters. Rather, for pixels which are accessible, not nan, and not already existing, there is a pre-determined percentage chance to be stored which uses this random number generator. An over-estimation for the percentage chance is made, because it is better to have too many than not enough possible options to sample from. This over-estimation might result in the storage of 2x-3x extra pixels rather than the many orders of magnitude extra storage of adding all pixels. The calculation for this percentage is done and explained in detail in the getProbabilityMultiplier() function.

Next, the raster is processed in blocks either using the 'random' or 'Queinnec' methods, and the return of those functions contains the allocation of samples per strata.

Strata are iterated through, with samples being added according to their total allocation. First, existing pixels are added, all of which are added in the case where the force parameter is true. Next, queinnec pixels are added if the queinnec method is used. Finally, remaining random pixels are added.

Parameters
GDALRasterWrapper*p_raster
intbandNum
int64_tnumSamples
int64_tnumStrata
std::stringallocation
std::vector<double>weights
GDALRasterWrapper*p_mraster
intmrasterBandNum
std::stringmethod
intwrow
intwcol
doublemindist
GDALVectorWrapper*p_existing
boolforce
GDALVectorWrapper*p_access
std::stringlayerName
doublebuffInner
doublebuffOuter
std::vector<std::pair<std:string,int>>mapStratMapping
boolplot
std::stringfilename
std::stringtempFolder
Returns
std::tuple< std::vector<std::vector<double>>, GDALVectorWrapper *, size_t >

existing.used && mapped