# Implement the Soft k-Means Clustering Algorithm solved by 163

Sept. 13, 2015, 10:41 p.m. by Rosalind Team

The soft k-means clustering algorithm starts from randomly chosen centers and iterates the following two steps:

• Centers to Soft Clusters (E-step): After centers have been selected, assign each data point a “responsibility” value for each cluster, where higher values correspond to stronger cluster membership.
• Soft Clusters to Centers (M-step): After data points have been assigned to soft clusters, compute new centers.

We begin with the “Centers to Soft Clusters” step. If we think about the centers as stars and the data points as planets, then the closer a point is to a center, the stronger that center’s “pull” should be on the point. Given k centers Centers = (x1, ..., xk) and n points Data = (Data1, ... , Datan), we therefore need to construct a k × n responsibility matrix HiddenMatrix for which HiddenMatrixi,j is the pull of center i on data point j. This pull can be computed according to the Newtonian inverse-square law of gravitation,

$$\textit{HiddenMatrix}_{i, j} = \dfrac{1/d(\textit{Data}_j, x_i)^2}{\sum_{\text{all centers }x_i} 1/d(\textit{Data}_j , x_i)^2}$$

Unfortunately for Newton fans, the following partition function from statistical physics often works better in practice:

$$\textit{HiddenMatrix}_{i,j} = \dfrac{e^{-\beta \cdot d(Data_j,\, x_i)}}{\sum_{\text{all centers }x_i} e^{-\beta \cdot d(Data_{j},\, x_i)}}$$

In this formula, e is the base of the natural logarithm (e ≈ 2.72), and β is a parameter reflecting the amount of flexibility in our soft assignment and called — appropriately enough — the stiffness parameter.

In soft k-means clustering, if we let HiddenMatrixi denote the i-th row of HiddenMatrix, then we can update center xi using an analogue of the above formulas. Specifically, we will define the j-th coordinate of center xi, denoted xi, j, as

$$x_{i, j} = \dfrac{\textit{HiddenMatrix}_i \cdot \textit{Data}^j}{\textit{HiddenMatrix}_i\cdot\overrightarrow{1}}$$

Here, Dataj is the n-dimensional vector holding the j-th coordinates of the n points in Data.

The updated center xi is called a weighted center of gravity of the points Data.

## Implement the Soft k-Means Clustering Algorithm

Given: Integers k and m, followed by a stiffness parameter β, followed by a set of points Data in m-dimensional space.

Return: A set Centers consisting of k points (centers) resulting from applying the soft k-means clustering algorithm. Select the first k points from Data as the first centers for the algorithm and run the algorithm for 100 steps. Results should be accurate up to three decimal places.

## Sample Dataset

2 2
2.7
1.3 1.1
1.3 0.2
0.6 2.8
3.0 3.2
1.2 0.7
1.4 1.6
1.2 1.0
1.2 1.1
0.6 1.5
1.8 2.6
1.2 1.3
1.2 1.0
0.0 1.9


## Sample Output

1.662 2.623
1.075 1.148