template<typename T>
class sgs::clhs::CLHSDataManager< T >
This class is responsible for managing the data for the clhs sampling method.
It contains a vector with the feature values of all added pixels, as well as the x and y values of those pixels, and can randomly return one of those pixels as desired. It also stores the correlation matrix of the raster.
Constructor, sets the nFeat, nSamp, and random number generator. Also sizes the vectors to 1,000,000 points (initially). The vector will resize as required, and sizes down once raster reading is completed.
set sizes and count values of existing pixels if required.
- Parameters
-
| int | nFeat |
| int | nSamp |
| xso::xoshiro_4x64_plus | *p_rng |
| int | existingCount |
This function is called once the raster has been read, meaining no more points will be added to the data manager. The correlation matrix, which is calculated just after the raster reading finishes, is passed as a parameter so that it can be saved.
The x, y, and features vectors are resized.
A mask, which is used along with the random number generator to generate random indices within the saved points, is generated. The mask value is all 1 starting from the most significant bit which is 1 when the index is at it's largest.
When anded against a new random number, the mask value will generate a number which can be any of the indices, and is quite likely not to be larger than the capacity. If it is larger than the capacity it can just be calculated again.
- Parameters
-
| std::vector<std::vector<T>>& | corr |
This function calculates the output to the continuous objective function for the clhs method. The goal of the clhs method is to have a latin hypercube output of samples. This is defined as having a single sample between each quantile of the hypercube for every feature. This function returns 0 if the sample is a latin hypercube, and the number of samples off if it is not.
The output of this function, along with the output of the correlation objective function are used to determine whether a sample should be kept or not. Lower values are better.
- Parameters
-
| std::vector<std::vector<int>>& | sampleCountPerQuantile |
- Returns
- T
This function is for generating a random index among the points saved.
Due to using the random number generator and a mask direcly (faster than something like an std::uniform_int_distribution) there may be some values which are larger than the total number of indices which are occupied. Due to this, indices are generated until there's one which is a valid index.
The reason the generator is initially bit shifted by 11 is because the type of generator used is not as random in the first 11 bits.
- Returns
- uint64_t