Title: A Simple, Space-Efficient Algorithm for Detecting Start and Stop Codons in DNA
Via webex:
Abstract: The brute force algorithm for
string matching has a worst-case execution complexity of O(MN), where N is the length of the
entire string and M
is the length of the substring to be matched. The efficiency of brute-force
matching is insufficient for biological applications due to the sheer size of
bioinformatics datasets.
This thesis describes an
algorithm that uses bit compression techniques to represent multiple codons in
a single Python integer. It was proven that with exact matching steps, the
search algorithm is as accurate as brute-force matching with significantly
higher efficiency, as this algorithm runs in O(N) time and space, with
minimal memory overhead. While the bit-based algorithm is not more
space-efficient in a theoretical sense, it is more efficient practically as we
have observed significant memory savings compared to the size of the original
string.