sgsPy
structurally guided sampling
Loading...
Searching...
No Matches

Functions

template<typename T>
bool sgs::srs::getRandomIndices (helper::RasterBandMetaData &band, int width, int height, int numSamples, access::Access &access, existing::Existing &existing, std::vector< helper::Index > &indices, xso::xoshiro_4x64_plus rng)
template<typename T>
void sgs::srs::processBlock (helper::RasterBandMetaData &band, access::Access &access, existing::Existing &existing, std::vector< helper::Index > &indices, helper::RandValController &rand, int xBlock, int yBlock, int xValid, int yValid)
std::tuple< std::vector< std::vector< double > >, vector::GDALVectorWrapper *, size_t > sgs::srs::srs (raster::GDALRasterWrapper *p_raster, size_t numSamples, double mindist, vector::GDALVectorWrapper *p_existing, vector::GDALVectorWrapper *p_access, std::string layerName, double buffInner, double buffOuter, bool plot, std::string tempFolder, std::string filename)

Detailed Description

Function Documentation

◆ getRandomIndices()

template<typename T>
bool sgs::srs::getRandomIndices ( helper::RasterBandMetaData & band,
int width,
int height,
int numSamples,
access::Access & access,
existing::Existing & existing,
std::vector< helper::Index > & indices,
xso::xoshiro_4x64_plus rng )
inline

This function generates random index values to sample. This method is fast in many circumstances, because it does not require the entire raster to be read.

However, in cases where there are a very large number of samples, or a low percentage of the raster is sampleable, it may be slower than reading through the entire raster. This is because calling the RasterIO function incurs significant overhead, especially in random access patterns.

Parameters
helper::RasterBandMetaData&band
intwidth
intheight
intnumSamples
access::Access&access
existing::Existing&existing
std::vector<helper::Index>&index
xso::xoshiro_4x64_plusrng
Returns
bool

◆ processBlock()

template<typename T>
void sgs::srs::processBlock ( helper::RasterBandMetaData & band,
access::Access & access,
existing::Existing & existing,
std::vector< helper::Index > & indices,
helper::RandValController & rand,
int xBlock,
int yBlock,
int xValid,
int yValid )
inline

This is a helper function for processing a block of the raster. For each pixel in the block: The value is checked, and not added if it is a nanvalue. The pixel is checked to ensure it is within an accessible area. The pixel is checked to ensure it hasn't already been added as a pre-existing sample point. The next rng value is checked to see whether it is one of the chosen pixels to be added.

Parameters
RasterBandMetaData&band
Access&access
Existing&existing
std::vector<Index>&indices
std::vector<Index>&randVals,
int&randValIndex
intxBlock
intyBlock
intxValid
intyValid

◆ srs()

std::tuple< std::vector< std::vector< double > >, vector::GDALVectorWrapper *, size_t > sgs::srs::srs ( raster::GDALRasterWrapper * p_raster,
size_t numSamples,
double mindist,
vector::GDALVectorWrapper * p_existing,
vector::GDALVectorWrapper * p_access,
std::string layerName,
double buffInner,
double buffOuter,
bool plot,
std::string tempFolder,
std::string filename )

This function uses random sampling to determine the location of sample plots given a raster image.

First, metadata is acquired on the first raster band, which is to be read to check and ensure samples don't occur over nodata pixels.

Next, and output vector dataset is created as an in-memory dataset. If the user specifies a filename, this in-memory dataset will be written to disk in a different format after all points have been added.

An Access struct is created, which creates a raster dataset containing a rasterized version of access buffers. This raster will be 1 over accessible areas. In the case where there is no access vector given, the structs 'used' member will be false and no processing or rasterization will be done.

An Existing struct is created, which retains information on already existing sample points passed in the form of a vector dataset. The points are iterated through and added to the output dataset. The points are also added to a set, and during iteration the indexes of every pixel will be checked against this set to ensure there are no duplicate pixels. In the case whre there is no existing vector given, the structs 'used' member will be false and no processing will be done.

Next, a rng() function is created usign the xoshiro library, the specific randm number generator is the xoshrio256++ https://vigna.di.unimi.it/ftp/papers/ScrambledLinear.pdf

The impetus behind usign the rng() function to determine which pixels should be added DURING iteration, rather than afterwards, is it removes the necessity of storing every available pixel, which quickly becomes extrordinarily inefficient for large rasters. Rather, for pixels which are accessible, not nan, and not already existing, there is a pre-determined percentage chance to be stored which uses this random number generator. An over-estimation for the percentage chance is made, because it is better to have too many than not enough possible options to sample from. This over-estimation might result in the storage of 2x-3x extra pixels rather than the many orders of magnitude extra storage of adding all pixels. The calculation for this percentage is done and explained in detail in the getProbabilityMultiplier() function.

Then, the raster is processed in one of two ways. One using a random access strategy where random indexes are calculated and checked for validity, and another where the entire raster is read. The decision between these two is calculated to minimize the number of blocks read into memory, as this is the main bottleneck in processing time.

Once all possible pixels have been selected, there may be extra indices in the indicies vector. Because simply sampling the first few we need would might NOT be in a random order, the indices are first shuffled. After being shuffled, the indexes are added to the output dataset as samples if they don't occur within mindist if an already existing pixel.

Parameters
GDALRasterWrapper*p_raster
size_tnumSamples
doublemindist
GDALVectorWrapper*p_existing
GDALVectorWrapper*p_access
std::stringlayerName
doublebuffInner
doublebuffOuter
boolplot
std::stringtempFolder
std::stringfilename
Returns
std::tuple<std::vector<std::vector<double>>, GDALVectorWrapper *, size_t>