Implement the Soft k-Means Clustering Algorithm solved by 60

Sept. 13, 2015, 10:41 p.m. by Rosalind Team

The soft k-means clustering algorithm starts from randomly chosen centers and iterates the following two steps:

We begin with the “Centers to Soft Clusters” step. If we think about the centers as stars and the data points as planets, then the closer a point is to a center, the stronger that center’s “pull” should be on the point. Given k centers Centers = (x1, ..., xk) and n points Data = (Data1, ... , Datan), we therefore need to construct a k × n responsibility matrix HiddenMatrix for which HiddenMatrixi,j is the pull of center i on data point j. This pull can be computed according to the Newtonian inverse-square law of gravitation,

$$ \textit{HiddenMatrix}_{i, j} = \dfrac{1/d(\textit{Data}_j, x_i)^2}{\sum_{\text{all centers }x_i} 1/d(\textit{Data}_j , x_i)^2} $$

Unfortunately for Newton fans, the following partition function from statistical physics often works better in practice:

$$ \textit{HiddenMatrix}_{i,j} = \dfrac{e^{-\beta \cdot d(Data_j,\, x_i)}}{\sum_{\text{all centers }x_i} e^{-\beta \cdot d(Data_{j},\, x_i)}} $$

In this formula, e is the base of the natural logarithm (e ≈ 2.72), and β is a parameter reflecting the amount of flexibility in our soft assignment and called — appropriately enough — the stiffness parameter.

In soft k-means clustering, if we let HiddenMatrixi denote the i-th row of HiddenMatrix, then we can update center xi using an analogue of the above formulas. Specifically, we will define the j-th coordinate of center xi, denoted xi, j, as

$$ x_{i, j} = \dfrac{\textit{HiddenMatrix}_i \cdot \textit{Data}^j}{\textit{HiddenMatrix}_i\cdot\overrightarrow{1}} $$

Here, Dataj is the n-dimensional vector holding the j-th coordinates of the n points in Data.

The updated center xi is called a weighted center of gravity of the points Data.

Implement the Soft k-Means Clustering Algorithm

Given: Integers k and m, followed by a stiffness parameter β, followed by a set of points Data in m-dimensional space.

Return: A set Centers consisting of k points (centers) resulting from applying the soft k-means clustering algorithm. Select the first k points from Data as the first centers for the algorithm and run the algorithm for 100 steps. Results should be accurate up to three decimal places.

Sample Dataset

2 2
1.3 1.1
1.3 0.2
0.6 2.8
3.0 3.2
1.2 0.7
1.4 1.6
1.2 1.0
1.2 1.1
0.6 1.5
1.8 2.6
1.2 1.3
1.2 1.0
0.0 1.9

Sample Output

1.662 2.623
1.075 1.148

Extra Dataset

Please login to solve this problem.