187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return: ["AAAAACCCCC", "CCCCCAAAAA"].

 public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<>();
        Map<Integer, Integer> keys = new HashMap<>();// use hashmap to track if the sequence showed, and whether is duplicated.

        for(int i=10; i<= s.length(); i++){
            String seq = s.substring(i-10, i);
            int key = hashKey(seq);
            if(keys.containsKey(key)){
                if(keys.get(key) == 1){
                    res.add(seq);
                }
                keys.put(key, 2);
            }else{
                keys.put(key, 1);
            }
        }
        return res;
    }

    private int hashKey(String s){
        int res = 0;
        for(int i = s.length()-1; i>=0; i--){
            char ch = s.charAt(i);
            res <<= 2;
            switch (ch){
                case 'A':
                    res |= 0b00; 
                    break;
                case 'C':
                    res |= 0b01;
                    break;
                case 'G':
                    res |= 0b10;
                    break;
                case 'T':
                    res |= 0b11; //0x11 means hexdecimal, so it is 0b00010001.. which is not 3. in binary.
                    break;
            }

        }
        return res;
    }
}

results matching ""

    No results matching ""