Classes
struct	sgs::pca::PCAResult< T >

Functions
template<typename T>
PCAResult< T >	sgs::pca::calculatePCA (std::vector< helper::RasterBandMetaData > &bands, GDALDataType type, size_t size, int width, int height, int nComp)
template<typename T>
PCAResult< T >	sgs::pca::calculatePCA (std::vector< helper::RasterBandMetaData > &bands, GDALDataType type, size_t size, int xBlockSize, int yBlockSize, int xBlocks, int yBlocks, int nComp)
template<typename T>
void	sgs::pca::writePCA (std::vector< helper::RasterBandMetaData > &bands, std::vector< helper::RasterBandMetaData > &PCABands, PCAResult< T > &result, GDALDataType type, size_t size, int height, int width)
template<typename T>
void	sgs::pca::writePCA (std::vector< helper::RasterBandMetaData > &bands, std::vector< helper::RasterBandMetaData > &PCABands, PCAResult< T > &result, GDALDataType type, size_t size, int xBlockSize, int yBlockSize, int xBlocks, int yBlocks)
std::tuple< raster::GDALRasterWrapper *, std::vector< std::vector< double > >, std::vector< double >, std::vector< double >, std::vector< double > >	sgs::pca::pca (raster::GDALRasterWrapper *p_raster, int nComp, bool largeRaster, std::string tempFolder, std::string filename, std::map< std::string, std::string > driverOptions)

Detailed Description

Function Documentation

◆ calculatePCA() [1/2]

template<typename T>

PCAResult< T > sgs::pca::calculatePCA	(	std::vector< helper::RasterBandMetaData > &	bands,
		GDALDataType	type,
		size_t	size,
		int	width,
		int	height,
		int	nComp )

This function is used by the pca() function to calculate the principal component eigenvectors and eigenvalues, along with the mean and standard deviation of each input raster band. This function is used in the case where the input raster is small, and can reasonably be expected to fit entirely into memory.

First, the input raster bands are read into memory usign the GDALRasterBand RasterIO function. Bands are read into memory in a row-wise manor such that a row indicates a single pixel, and a column indicates a raster band. This means that in between each pixel and the next, a gap must be left for the remaining band values for that pixel index to be written to. This is done using the nPixelSpace, and nLineSpace arguments of RasterIO.

Second, each pixel is checked to ensure it isn't a nan pixel. Any pixel containing a nan value in any band is overwritten completely with the next not-nan pixel, the total number of not-nan pixels is stored as the number of features.

The mean, standard deviation are then calculated using Welfords method, and the pca eigenvectors and eigenvalues are calculated using the oneDAL library principal components functionality.

A result containing the eigenvectors, eigenvalues, mean per band, and standard deviation per band, is returned.

Parameters

std::vector<RasterBandMetaData>&	bands,
GDALDataType	type
size_t	size
int	width
int	height
int	nComp

Returns: PCAResult<T>

◆ calculatePCA() [2/2]

template<typename T>

PCAResult< T > sgs::pca::calculatePCA	(	std::vector< helper::RasterBandMetaData > &	bands,
		GDALDataType	type,
		size_t	size,
		int	xBlockSize,
		int	yBlockSize,
		int	xBlocks,
		int	yBlocks,
		int	nComp )

This function is used by the pca() function to calculate the principal component eigenvectors and eigenvalues, along with the mean and standard deviation of each input raster band. This function is used in the case where the input raster is large, will be processed in blocks.

All of the blocks are iterated through, and within each iteration the following is done:

First, the input raster band blocks are read into memory using the GDALRasterBand RasterIO function. Bands are read into memory in a row-wise manor such that a row indicates a single pixel, and a column indicates a raster band. This means that in between each pixel and the next, a gap must be left for the remaining band values for that pixel index to be written to. This is done using the nPixelSpace, and nLineSpace arguments of RasterIO.

Second, each pixel is checked to ensure it isn't a nan pixel. Any pixel containing a nan value in any band is overwritten completely with the next not-nan pixel, the total number of not-nan pixels is stored as the number of features.

The mean, standard deviation are then updated using Welfords method, and the pca eigenvectors and eigenvalues partial result are updated using the oneDAL library principal components functionality.

once all blocks have been iterated through, the final resulting mean per band, standard deviation per band, eigenvectors, and eigenvalues are calculated and returned.

Parameters

std::vector<RasterBandMetaData>&	bands
GDALDataType	type
size_t	size
int	xBlockSize
int	yBlockSize
int	xBlocks
int	yBlocks
int	nComp

Returns: PCAResult<T>

◆ pca()

std::tuple< raster::GDALRasterWrapper *, std::vector< std::vector< double > >, std::vector< double >, std::vector< double >, std::vector< double > > sgs::pca::pca	(	raster::GDALRasterWrapper *	p_raster,
		int	nComp,
		bool	largeRaster,
		std::string	tempFolder,
		std::string	filename,
		std::map< std::string, std::string >	driverOptions )

This function conducts principal component analysis on the input raster, writing output bands to a new GDALRasterWrapper, and returning the eigenvectors and eigenvalues calculated for each raster band. The output values are both centered and scaled before being projected onto the pca eigenvectors.

First, depending on whether the raster is large (should be processed in blocks) or not, and whether an output filename is given, an output dataset is created to store the output results. In the case of a small raster without a given filename, an in-memory raster is created. In the case of a large raster without a given filename, a VRT dataset is created where each VRT band is a GTiff raster. When a filename is created, the driver which corresponds to that filename is used.

Then, the calculatePCA() function is called, with specific template parameters depending on the data type, and a specific function overload depending on whether the raster should be processed by blocks. This function calculates the principal component eigenvectors, eigenvalues, mean per band, and standard deviation per band. The writePCA() function is then called (again with specific template and overload) to center, scale, and project the input raster values to output pca bands which are written to the output dataset.

Finally, a GDALRasterWrapper is created using the output dataset, and returned in a tuple alongside the eigenvectors and eigenvalues.

Parameters

GDALRasterWrapper	*p_raster
int	nComp
bool	largeRaster
std::string	tempFolder
std::string	filename
std::mape<std::string,std::string>	driverOptions

Returns: std::tuple< GDALRasterWrapper *, std::vector<std::vector<double>> std::vector<double> >

◆ writePCA() [1/2]

template<typename T>

void sgs::pca::writePCA	(	std::vector< helper::RasterBandMetaData > &	bands,
		std::vector< helper::RasterBandMetaData > &	PCABands,
		PCAResult< T > &	result,
		GDALDataType	type,
		size_t	size,
		int	height,
		int	width )

This function is used to write the output principal components to a raster dataset, after the eigenvectors and eigenvalues have already been calculated for the input raster. This function is used in the case where the raster is small, and would not be expected to cause errors for being entirely in memory.

First, the input raster bands are read into memory using the GDALRasterBand RasterIO function. Bands are read into memory in a row-wise manor such that a row indicates a single pixel, and a column indicates a raster band. This means that in between each pixel and the next, a gap must be left for the remaining band values for that pixel index to be written to. This is done using the nPixelSpace, and nLineSpace arguments of RasterIO. The data pixels are iterated over: scaled, shifted, and set to nan if at a no data pixel.

Next, a matrix of pca eigenvectors are allocated and read into a new location.

Both the data matrix and the pca matrix are turned into oneDAL homogen tables, and the result of a linear kernel calculation is written to the output.

The reason a linear kernel is used, is because the result is essentially just a bunch of dot products. It's possible to do these dot products one at a time for each output pixel and component. However, the linear kernel, which is originally meant for fast machine learning use, does exactly what we need.

Parameters

std::vector<RasterBandMetaData>&	bands
std::vector<rasterBandMetaData>&	PCABands
PCAResult<T>&	result,
GDALDataType	type,
size_t	size,
int	height
int	width

the result for each output principal component pixel is just the dot product of that pixel's data values with the corresponding principal component eigenvector.

oneDAL has a fast way to calculate dot products which is originally meant to be used for machine learning, but it does exactly what we need – multiply large matrices.

◆ writePCA() [2/2]

template<typename T>

void sgs::pca::writePCA	(	std::vector< helper::RasterBandMetaData > &	bands,
		std::vector< helper::RasterBandMetaData > &	PCABands,
		PCAResult< T > &	result,
		GDALDataType	type,
		size_t	size,
		int	xBlockSize,
		int	yBlockSize,
		int	xBlocks,
		int	yBlocks )

This function is used to write the output principal components to a raster dataset, after the eigenvectors and eigenvalues have already been calculated for the input raster. This function is used in the case where the raster is large, and should be processed in blocks.

For each block:

First, the input raster bands are read into memory using the GDALRasterBand RasterIO function. Bands are read into memory in a row-wise manor such that a row indicates a single pixel, and a column indicates a raster band. This means that in between each pixel and the next, a gap must be left for the remaining band values for that pixel index to be written to. This is done using the nPixelSpace, and nLineSpace arguments of RasterIO. The data pixels are iterated over: scaled, shifted, and set to nan if at a no data pixel.

Next, a matrix of pca eigenvectors are allocated and read into a new location.

Both the data matrix and the pca matrix are turned into oneDAL homogen tables, and the result of a linear kernel calculation is written to the output.

The reason a linear kernel is used, is because the result is essentially just a bunch of dot products. It's possible to do these dot products one at a time for each output pixel and component. However, the linear kernel, which is originally meant for fast machine learning use, does exactly what we need.

Parameters

std::vector<RasterBandMetaData>&	bands
std::vector<RasterBandMetaData>&	PCABands
PCAResult<T>&	result
GDALDataType	type
size_t	size
int	xBlockSize
int	yBlockSize
int	xBlocks
int	yBlocks

the result for each output principal component pixel is just the dot product of that pixel's data values with the corresponding principal component eigenvector.

oneDAL has a fast way to calculate dot products which is originally meant to be used for machine learning (as I understand it) but it does exactly what we need – multiply large matrices.

Classes

Functions

Detailed Description

Function Documentation

◆ calculatePCA() [1/2]

◆ calculatePCA() [2/2]

◆ pca()

◆ writePCA() [1/2]

◆ writePCA() [2/2]