The linguistic complexity of a string $s$ of length $n$ formed over an alphabet of size $a$
(denoted $\textrm{lc}(s)$) is equal to the total number of distinctsubstrings
appearing in $s$ (denoted $\textrm{sub}(s)$) divided by the maximum substring count
(denoted $m(a, n)$); the maximum substring count is the
total number of distinct substrings that could theoretically appear in a string of length
$n$ formed over an alphabet of size $a$.

Note that we have the bounds $0 < \textrm{lc}(s) \leq 1$, with smaller values of $\textrm{lc}(s)$ indicating that $s$ is more repetitive.

As an example, consider the DNA string (alphabet size $a = 4$) given by $s = \textrm{ATTTGGATT}$.
In the following table, we demonstrate that $\textrm{lc}(s) = \frac{35}{40} = 0.875$ by considering
the number of observed and possible length $k$ substrings of $s$ for each $k$, which are denoted by
$\textrm{sub}_{k}(s)$ and $m(a, k, n)$, respectively. Accordingly, $m(a, n) = \sum_{k=1}^{n}{m(a,k,n)} = 35$
and $\textrm{sub}(s)= \sum_{k=1}^{n}\textrm{sub}_{k}(s) = 40$.