Suggested problems

Demerits Of Using Machine Learning in DNA Recognition

June 4, 2020, 7:35 a.m. by arpittrainer

Biological Motivation

An error can cause havoc within a machine learning interface, as all events subsequent to the error may be flawed, skewed or just plain undesirable. Errors do occur and it’s a susceptibility that developers have thus far been unable to premeditate and negate consistently. These errors can take many forms, which vary according to the way in which you’re using machine learning technology. For instance, you might have a faulty sensor that generates a flawed data set. The inaccurate data may then be fed into the machine learning program, which uses it as the basis of an algorithm update. This would cause skewed results in the algorithm’s output. In real life, the result could be a situation where related product recommendations are not actually related or similar. So, you might have dog bowls, beach towels, and footwear included in the same batch of “related” product recommendations. A computer lacks the ability to understand that these items are not in any way related; this is where human intelligence is required.

Errors are problematic with machine learning due to the autonomous, independent nature of this technology. You run a machine learning program because you don’t want a human to babysit the project. However, this means an error may not be discovered immediately. Then, when the problem is identified, it can take a fair amount of time and effort to root out the source of the issue. And finally, you must implement measures to correct the error and remedy any damages that arose from the situation.

Machine learning proponents argue that even with the sometimes time-consuming diagnosis and correction process, this technology is far better than the alternatives when it comes to productivity and efficiency. This stance can be proven in many situations by simply reviewing historical data.

On a related note, machine learning deals in theoretical and statistical truths, which can sometimes differ from literal, real-life truths. It is essential that you account for this fact when using machine learning.

Problem

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of an DNA string (whose alphabet contains the symbols A, C, G, and T) is ATGCTTCAGAAAGGTCTTACG.

Given: A DNA string $s$ of length at most 1000 nucleotides.

Return: Four integers corresponding to the number of times that the symbols A, C, G, and T occur in $s$.

Sample Dataset

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

Sample Output

20 12 17 21