It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled.

Implement the Soft k-Means Clustering Algorithm solved by 209

Sept. 13, 2015, 10:41 p.m. by Rosalind Team

The soft k-means clustering algorithm starts from randomly chosen centers and iterates the following two steps:

Centers to Soft Clusters (E-step): After centers have been selected, assign each data point a “responsibility” value for each cluster, where higher values correspond to stronger cluster membership.
Soft Clusters to Centers (M-step): After data points have been assigned to soft clusters, compute new centers.

We begin with the “Centers to Soft Clusters” step. If we think about the centers as stars and the data points as planets, then the closer a point is to a center, the stronger that center’s “pull” should be on the point. Given k centers Centers = (x₁, ..., x_k) and n points Data = (Data₁, ... , Data_n), we therefore need to construct a k × n responsibility matrix HiddenMatrix for which HiddenMatrix_{i,j is the pull of center i on data point j. This pull can be computed according to the Newtonian inverse-square law of gravitation,}

$$ \textit{HiddenMatrix}_{i, j} = \dfrac{1/d(\textit{Data}_j, x_i)^2}{\sum_{\text{all centers }x_i} 1/d(\textit{Data}_j , x_i)^2} $$

Unfortunately for Newton fans, the following partition function from statistical physics often works better in practice:

$$ \textit{HiddenMatrix}_{i,j} = \dfrac{e^{-\beta \cdot d(Data_j,\, x_i)}}{\sum_{\text{all centers }x_i} e^{-\beta \cdot d(Data_{j},\, x_i)}} $$

In this formula, e is the base of the natural logarithm (e ≈ 2.72), and β is a parameter reflecting the amount of flexibility in our soft assignment and called — appropriately enough — the stiffness parameter.

In soft k-means clustering, if we let HiddenMatrix_i denote the i-th row of HiddenMatrix, then we can update center x_i using an analogue of the above formulas. Specifically, we will define the j-th coordinate of center x_i, denoted x_{i, j}, as

$$ x_{i, j} = \dfrac{\textit{HiddenMatrix}_i \cdot \textit{Data}^j}{\textit{HiddenMatrix}_i\cdot\overrightarrow{1}} $$

Here, Data^j is the n-dimensional vector holding the j-th coordinates of the n points in Data.

The updated center x_i is called a weighted center of gravity of the points Data.

Implement the Soft k-Means Clustering Algorithm

Given: Integers k and m, followed by a stiffness parameter β, followed by a set of points Data in m-dimensional space.

Return: A set Centers consisting of k points (centers) resulting from applying the soft k-means clustering algorithm. Select the first k points from Data as the first centers for the algorithm and run the algorithm for 100 steps. Results should be accurate up to three decimal places.

Sample Dataset

2 2
2.7
1.3 1.1
1.3 0.2
0.6 2.8
3.0 3.2
1.2 0.7
1.4 1.6
1.2 1.0
1.2 1.1
0.6 1.5
1.8 2.6
1.2 1.3
1.2 1.0
0.0 1.9

Sample Output

1.662 2.623
1.075 1.148

Extra Dataset

Please login to solve this problem.

Implement the Soft k-Means Clustering Algorithm solved by 209

Implement the Soft k-Means Clustering Algorithm

Sample Dataset

Sample Output

Extra Dataset

Report a typo