ClusteringClassification
The combination of variables beyond multiple linear regression techniques can lead to accurate prediction methods such as clustering classification.
06K-MeansClustering
K-Means is one of the most popular clustering algorithms. The K-means algorithm performs sequential iterations looking to minimise the distance among the datapoints belonging to a pre-defined number of groups or clusters. The algorithm converges consistenly for different initializations. In this case, the algorithm takes into account a multi-dimensional approach including the datapoints of backscatter, emissivities, temperature amplitude and NDVI.
K-MeansBuild
The input data for the K-Means algorithm is the combination of all input variables. Once the gridpoints are assigned to particular clusters, the average value of the LMD soil moisture dataproduct are associated to each one of the clusters.
Sklearn.cluster K-Means
# Input data input = data[['ts_amplitude','ndvi','backscatter','emissivity_v','emissivity_h']] # K-Means parameters clusters = KMeans(n_clusters=10, random_state=0, n_init="auto").fit_predict(input) data['clusters']=clusters # Assign values to clusters cluster_values = [ ] for i in range(0,10): value = data['lmd_soilWetness'].loc[data['clusters'] == i].mean() cluster_values.append(value) data['Kmeans_SM']= data['clusters'].apply(lambda x: cluster_values[x])
ClustersDistribution
The ten pre-defined clusters are distributed in size as shown here. There are five main clusters above 4000 datapoints each, while the smallest account to around 1400 datapoints. Overall there is a uniform distribution of datapoints in all the spectrum of values provided by the LMD soil moisture dataproduct.
SpatialDistribution
The K-Means clustering method captures the main patterns and regions of interest that the original LMD soil moisture dataproduct provides. This fact shows that the input variables considered have the information required to construct a similar dataproduct. While their relative weight in this construction remains unclear using this method, it sets the path to further analysis based on this dataset.