Problem 79. DNA N-Gram Distribution
Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.
Solution Stats
Problem Comments
-
1 Comment
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
Solution Comments
Show commentsProblem Recent Solvers1314
Suggested Problems
-
Find all elements less than 0 or greater than 10 and replace them with NaN
15380 Solvers
-
Set the array elements whose value is 13 to 0
1343 Solvers
-
704 Solvers
-
4284 Solvers
-
Find the sides of an isosceles triangle when given its area and height from its base to apex
1736 Solvers
More from this Author96
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!