Sept. 13, 2015, 10:41 p.m. by Rosalind Team
The soft k-means clustering algorithm starts from randomly chosen centers and iterates the following two steps:
We begin with the “Centers to Soft Clusters” step. If we think about the centers as stars and the data points as planets, then the closer a point is to a center, the stronger that center’s “pull” should be on the point. Given k centers Centers = (x1, ..., xk) and n points Data = (Data1, ... , Datan), we therefore need to construct a k × n responsibility matrix HiddenMatrix for which HiddenMatrixi,j is the pull of center i on data point j. This pull can be computed according to the Newtonian inverse-square law of gravitation,
Unfortunately for Newton fans, the following partition function from statistical physics often works better in practice:
In this formula, e is the base of the natural logarithm (e ≈ 2.72), and β is a parameter reflecting the amount of flexibility in our soft assignment and called — appropriately enough — the stiffness parameter.
In soft k-means clustering, if we let HiddenMatrixi denote the i-th row of HiddenMatrix, then we can update center xi using an analogue of the above formulas. Specifically, we will define the j-th coordinate of center xi, denoted xi, j, as
Here, Dataj is the n-dimensional vector holding the j-th coordinates of the n points in Data.
The updated center xi is called a weighted center of gravity of the points Data.
Given: Integers k and m, followed by a stiffness parameter β, followed by a set of points Data in m-dimensional space.
Return: A set Centers consisting of k points (centers) resulting from applying the soft k-means clustering algorithm. Select the first k points from Data as the first centers for the algorithm and run the algorithm for 100 steps. Results should be accurate up to three decimal places.
2 2 2.7 1.3 1.1 1.3 0.2 0.6 2.8 3.0 3.2 1.2 0.7 1.4 1.6 1.2 1.0 1.2 1.1 0.6 1.5 1.8 2.6 1.2 1.3 1.2 1.0 0.0 1.9
1.662 2.623 1.075 1.148