|
sgsPy
structurally guided sampling
|
Functions | |
| void | sgs::breaks::processMapPixel (size_t index, helper::RasterBandMetaData &dataBand, void *p_dataBuffer, helper::RasterBandMetaData &stratBand, void *p_stratBuffer, std::vector< double > &bandBreaks, size_t multiplier, bool &mapNan, size_t &mapStrat) |
| void | sgs::breaks::processPixel (size_t index, void *p_data, helper::RasterBandMetaData *p_dataBand, void *p_strat, helper::RasterBandMetaData *p_stratBand, std::vector< double > &bandBreaks) |
| raster::GDALRasterWrapper * | sgs::breaks::breaks (raster::GDALRasterWrapper *p_raster, std::map< int, std::vector< double > > breaks, bool map, std::string filename, bool largeRaster, int threads, std::string tempFolder, std::map< std::string, std::string > driverOptions) |
| raster::GDALRasterWrapper * sgs::breaks::breaks | ( | raster::GDALRasterWrapper * | p_raster, |
| std::map< int, std::vector< double > > | breaks, | ||
| bool | map, | ||
| std::string | filename, | ||
| bool | largeRaster, | ||
| int | threads, | ||
| std::string | tempFolder, | ||
| std::map< std::string, std::string > | driverOptions ) |
This function stratifies a given raster using user-defined breaks. The breaks are provided as a vector of doubles for each band specified in the input dataset.
The function can be run on a single raster band or multiple raster bands, and the user may pass the map variable to combine the stratification of multiple raster bands.
The function can be thought of in three different sections: the setup, the processing, and the finish/return. During the setup, metadata is aquired for the input raster, and an output dataset is created which depends on user-given parameters and the input raster. During the processing the input raster is iterated through, either by blocks or with the entire raster in memory, the strata are determined for each pixel and then written to the output dataset. During the finish/return step, a GDALRasterWrapper object is created using the output dataset.
SETUP: the data structures holding metadata are initialized and it is determined whether the raster is a virtual raster or not, and if it is a virtual raster whether it is fully in-memory or whether it must be stored on disk.
If the user provides an output filename, the dataset will not be a virtual dataset instead it will be associated with the filename. If the user does not provide an output filename then a virtual dataset driver will be used. In the case of a large raster (whether or not the raster is large enough for this is calculated and passed by Python side of application), the dataset will be VRT. If the package is comfortable fitting the entire raster in memory an in-memory dataset will be used.
The input raster bands are iterated through, metadata is stored on them, and bands are created for the output dataset. In the case of a VRT dataset, each band is a complete dataset itself which must be added after it has been written to. In the case of a MEM dataset, the bands must be aquired from the input raster. Both MEM and VRT support the AddBand function, and support bands with different types, so the bands are dynamically added while iterating through the input raster bands. Non virtual formats require the data types to be known at dataset initialization and don't support the AddBand function, so the dataset must be created after iterating through the input bands.
PROCESSING: the processing section iterates through every pixel in every input band, and calculates/writes the strata to the corresponding output band.
There are four different cases dealing with whether or not the entire raster band is allocated in memory (the largeRaster variable is false), and whether or not the values of each band should be mapped to an extra output raster band.
If the raster is large, it is processed in blocks and splits the raster into groups of blocks to be processed by multiple threads. If the raster bands are in-memory, the entire raster is processed at once. The mapped rasters store information on an extra output raster band, the output values of which are determined as a function of all other output raster bands. The multipliers vector stores the information for this.
For the large rasters, the processing starts out by splitting the raster into chunks depending on the number of threads. A thread is then created for each chunk. Within each thread, the blocks within it's designated chunk are iterated through and first read from the input bands, processed, then written to the output bands. In the case of a mapped raster all of the bands are iterated alongside eachother so that the intermediate mapping calculations don't have to be written then read again. In the case of a non mapped raster, each band is processed sequentially.
CLEANUP: If the output dataset is a VRT dataset, the datasets which represent its bands (that have not yet been added as bands) must be added as bands now that they are populated with data and are thus allowed to be added.
If the dataset output bands are fully in memory, they are moved to a vector from their metadata objects to be passed as a parameter to the GDALRasterWrapper constructor (or not if the bands aren't in memory). This GDALRasterWrapper is then returned.
| GDALRasterWrapper | *p_raster |
| std::map<int,std::vector<double>>breaks | |
| bool | map |
| std::string | filename |
| bool | largeRaster |
| int | threads |
| std::string | tempFolder |
| std::map<std::string,std::string> | driverOptions |
|
inline |
This is a helper function for processing a pixel of data when a mapped stratification is being created.
First, the value is read in as a double, and it is determined whether the pixel is a nan pixel or not. The mapNan boolean is updated in addition to the isNan boolean, to ensure that if one band within the raster is nan at a certain pixel then the mapped raster (but not necessarily all output rasters) is also nan at that pixel.
Then, if it isn't a nan pixel the lower bound of the value within the vector of break values is found. For example, if the value was 3 and the breaks vector was [2, 4, 6], the lower bound would be 1, which is the index of 4, the first value larger than 3 in the breaks vector. This lower bound is the strata. This strata (or the nan value) is then written with the appropriate type to the strat raster band.
| size_t | index |
| RasterBandMetaData& | dataBand |
| void | *p_dataBuffer |
| RasterBandMetaData& | stratBand |
| void | *p_stratBuffer |
| std::vector<double>& | bandBreaks |
| size_t | multiplier |
| bool& | mapNan |
| size_t& | mapStrat |
|
inline |
This is a helper function for processing a pixel of data.
First, the value is read in as a double, and it is determined whether the pixel is a nan pixel or not.
Then, if it isn't a nan pixel the lower bound of the value within the vector of break values is found. For example, if the value was 3 and the breaks vector was [2, 4, 6], the lower bound would be 1, which is the index of 4, the first value larger than 3 in the breaks vector. This lower bound is the strata. This strata (or the nan value) is then written with the appropriate type to the strat raster band.
| size_t | index |
| void | *p_data |
| RasterBandMetaData | *p_dataBand |
| void | *p_strat |
| RasterBandMetaData | *p_stratBand |
| std::vector<double>& | bandBreaks |