187. Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return: ["AAAAACCCCC", "CCCCCAAAAA"].
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> res = new ArrayList<>();
Map<Integer, Integer> keys = new HashMap<>();// use hashmap to track if the sequence showed, and whether is duplicated.
for(int i=10; i<= s.length(); i++){
String seq = s.substring(i-10, i);
int key = hashKey(seq);
if(keys.containsKey(key)){
if(keys.get(key) == 1){
res.add(seq);
}
keys.put(key, 2);
}else{
keys.put(key, 1);
}
}
return res;
}
private int hashKey(String s){
int res = 0;
for(int i = s.length()-1; i>=0; i--){
char ch = s.charAt(i);
res <<= 2;
switch (ch){
case 'A':
res |= 0b00;
break;
case 'C':
res |= 0b01;
break;
case 'G':
res |= 0b10;
break;
case 'T':
res |= 0b11; //0x11 means hexdecimal, so it is 0b00010001.. which is not 3. in binary.
break;
}
}
return res;
}
}