Problem 47658. Calculate the similarity between DNA sequences

A DNA sequence contains only the letters A, C, G and T. (Each letter represents a small molecule, and a DNA sequence is a ``macromolecular'' chain of them.) Each letter in a DNA sequence is called a base, basepair, or nucleotide. Normally, DNA occurs as a double strand where each A is paired with a T and vice versa, and each C is paired with a G and vice versa. The reverse complement of a DNA sequence is formed by reversing the letters, interchanging A and T and interchanging C and G. Thus the reverse complement of ACCTGAG is CTCAGGT. Rapid DNA sequencing machinary produces short strands of DNA, and these can be arbitrarily orientated - i.e the DNA strand is 'read' from left to right, or right to left (depends on which side is 'grabbed' when reading starts) , and it comes from either the forward or the reverse complement strand that had been uncoiled to facilitate sequencing.
Here we are interested in computing the similarity between two strand sequences (called 'reads'). We define the similarity in terms of the length of the longest common subsequence LCS, and lengths of the two sequences, LS1 and LS2 respectively.
SimSeq(S1,S2) = 2*LCS / (LS1+LS2)
When S1 = S2 the similarity score is 1.0 (also when S1 is equal to S2 reverse complement)
When the sequences are dissimilar the similarity score approaches zero, as well as when one of the sequences is much longer than the other (alternative definitions that correct for length diffrence are possible, but are of no concern here.)
Note that the longest common subsequence LCS can be found in the forward or in the reverse complement direction. For instance, LCS('TACCTGAGA','GACCTGAGC') is equal to 7 since the common substring is 'ACCTGAG' in S1, which matches the reverse complement 'ACCTGAG' in S2.

Solution Stats

33.33% Correct | 66.67% Incorrect
Last Solution submitted on Jul 01, 2021

Problem Comments

Solution Comments

Show comments

Problem Recent Solvers3

Suggested Problems

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!