Given a string S
of length N
find longest substring without repeating characters.
Example:
Input:
here is my javascript and cpp implementations with great details: https://algorithm.pingzhang.io/String/longest_substring_without_repeating_characters.html
We want to find the longest substring without repeating characters. The first thing comes to my mind is that we need a hash table to store every character in a substring so that when a new character comes in, we can easily know whether this character is already in the substring or not. I call it as valueIdxHash
. Then, a substring has a startIdx
and endIdx
. So we need a variable to keep track of the starting index of a substring and I call it as startIdx
. Let's assume we are at index i
and we already have a substring (startIdx, i - 1)
. Now, we want to check whether this substring can keep growing or not.
If the valueIdxHash
contains str[i]
, it means it is a repeated character. But we still need to check whether this repeated character is in the substring (startIdx, i - 1)
. So we need to retrieve the index of str[i]
that is appeared last time and then compare this index with startIdx
.
startIdx
is larger, it means the last appeared str[i]
is outside of the substring. Thus the subtring can keep growing. startIdx
is smaller, it means the last appeared str[i]
is within of the substring. Thus, the substring cannot grow any more. startIdx
will be updated as valueIdxHash[str[i]] + 1
and the new substring (valueIdxHash[str[i]] + 1, i)
has potential to keep growing.If the valueIdxHash
does not contain str[i]
, the substring can keep growing.
function longestSubstringWithoutDuplication(str) {
var max = 0;
//if empty string
if (str.length === 0){
return 0;
} else if (str.length === 1){ //case if the string's length is 1
return 1;
}
//loop over all the chars in the strings
var currentChar,
map = {},
counter = 0; //count the number of char in each substring without duplications
for (var i=0; i< str.length ; i++){
currentChar = str.charAt(i);
//if the current char is not in the map
if (map[currentChar] == undefined){
//push the currentChar to the map
map[currentChar] = i;
if (Object.keys(map).length > max){
max = Object.keys(map).length;
}
} else { //there is duplacation
//update the max
if (Object.keys(map).length > max){
max = Object.keys(map).length;
}
counter = 0; //initilize the counter to count next substring
i = map[currentChar]; //start from the duplicated char
map = {}; // clean the map
}
}
return max;
}
import java.util.HashMap;
import java.util.HashSet;
public class SubString {
public static String subString(String input) {
String longesTillNOw = "";
String longestOverAll = "";
HashMap<Character,Integer> chars = new HashMap<>();
char[] array=input.toCharArray();
int start=0;
for (int i = 0; i < array.length; i++) {
char charactor = array[i];
if (chars.containsKey(charactor) ) {
start=chars.get(charactor)+1;
i=start;
chars.clear();
longesTillNOw = "";
} else {
chars.put(charactor,i);
longesTillNOw = longesTillNOw + charactor;
if (longesTillNOw.length() > longestOverAll.length()) {
longestOverAll = longesTillNOw;
}
}
}
return longestOverAll;
}
public static void main(String[] args) {
String input = "stackoverflowabcdefghijklmn";
System.out.println(subString(input));
}
}
Algorithm:
1) Initialise an empty dictionary dct to check if any character already exists in the string.
2) cnt - to keep the count of substring without repeating characters.
3)l and r are the two pointers initialised to first index of the string.
4)loop through each char of the string.
5) If the character not present in the dct add itand increse the cnt.
6)If its already present then check if cnt is greater then resStrLen.
7)Remove the char from dct and shift the left pointer by 1 and decrease the count.
8)Repeat 5,6,7 till l,r greater or equal to length of the input string.
9)Have one more check at the end to handle cases like input string with non-repeating characters.
Here is the simple python program to Find longest substring without repeating characters
a="stackoverflow"
strLength = len(a)
dct={}
resStrLen=0
cnt=0
l=0
r=0
strb=l
stre=l
while(l<strLength and r<strLength):
if a[l] in dct:
if cnt>resStrLen:
resStrLen=cnt
strb=r
stre=l
dct.pop(a[r])
cnt=cnt-1
r+=1
else:
cnt+=1
dct[a[l]]=1
l+=1
if cnt>resStrLen:
resStrLen=cnt
strb=r
stre=l
print "Result String Length : "+str(resStrLen)
print "Result String : " + a[strb:stre]
The solution in C.
#include<stdio.h>
#include <string.h>
void longstr(char* a, int *start, int *last)
{
*start = *last = 0;
int visited[256];
for (int i = 0; i < 256; i++)
{
visited[i] = -1;
}
int max_len = 0;
int cur_len = 0;
int prev_index;
visited[a[0]] = 0;
for (int i = 1; i < strlen(a); i++)
{
prev_index = visited[a[i]];
if (prev_index == -1 || i - cur_len > prev_index)
{
cur_len++;
*last = i;
}
else
{
if (max_len < cur_len)
{
*start = *last - cur_len;
max_len = cur_len;
}
cur_len = i - prev_index;
}
visited[a[i]] = i;
}
if (max_len < cur_len)
{
*start = *last - cur_len;
max_len = cur_len;
}
}
int main()
{
char str[] = "ABDEFGABEF";
printf("The input string is %s \n", str);
int start, last;
longstr(str, &start, &last);
//printf("\n %d %d \n", start, last);
memmove(str, (str + start), last - start);
str[last] = '\0';
printf("the longest non-repeating character substring is %s", str);
return 0;
}
You keep an array indicating the position at which a certain character occurred last. For convenience all characters occurred at position -1. You iterate on the string keeping a window, if a character is repeated in that window, you chop off the prefix that ends with the first occurrence of this character. Throughout, you maintain the longest length. Here's a python implementation:
def longest_unique_substr(S):
# This should be replaced by an array (size = alphabet size).
last_occurrence = {}
longest_len_so_far = 0
longest_pos_so_far = 0
curr_starting_pos = 0
curr_length = 0
for k, c in enumerate(S):
l = last_occurrence.get(c, -1)
# If no repetition within window, no problems.
if l < curr_starting_pos:
curr_length += 1
else:
# Check if it is the longest so far
if curr_length > longest_len_so_far:
longest_pos_so_far = curr_starting_pos
longest_len_so_far = curr_length
# Cut the prefix that has repetition
curr_length -= l - curr_starting_pos
curr_starting_pos = l + 1
# In any case, update last_occurrence
last_occurrence[c] = k
# Maybe the longest substring is a suffix
if curr_length > longest_len_so_far:
longest_pos_so_far = curr_starting_pos
longest_len_so_far = curr_length
return S[longest_pos_so_far:longest_pos_so_far + longest_len_so_far]