How to find minimal-length subsequence that contains all element of a sequence

后端未结

关注

 7  1447

无人共我 2021-02-01 07:29

Given a sequence such as S = {1,8,2,1,4,1,2,9,1,8,4}, I need to find the minimal-length subsequence that contains all element of S (no duplicates, order does n

7条回答

深忆病人 (楼主)

2021-02-01 08:15
This can be solved by dynamic programming.

At each step k, we'll compute the shortest subsequence that ends at the k-th position of S and that satisfies the requirement of containing all the unique elements of S.

Given the solution to step k (hereinafter "the sequence"), computing the solution to step k+1 is easy: append the (k+1)-th element of S to the sequence and then remove, one by one, all elements at the start of the sequence that are contained in the extended sequence more than once.

The solution to the overall problem is the shortest sequence found in any of the steps.

The initialization of the algorithm consists of two stages:
1. Scan S once, building the alphabet of unique values.
2. Find the shortest valid sequence whose first element is the first element of S; the last position of this sequence will be the initial value of k.
All of the above can be done in O(n logn) worst-case time (let me know if this requires clarification).

Here is a complete implementation of the above algorithm in Python:
```
import collections

S = [1,8,2,1,4,1,2,9,1,8,4,2,4]

# initialization: stage 1
alphabet = set(S)                         # the unique values ("symbols") in S
count = collections.defaultdict(int)      # how many times each symbol appears in the sequence

# initialization: stage 2
start = 0
for end in xrange(len(S)):
  count[S[end]] += 1
  if len(count) == len(alphabet):         # seen all the symbols yet?
    break
end += 1

best_start = start
best_end = end

# the induction
while end < len(S):
  count[S[end]] += 1
  while count[S[start]] > 1:
    count[S[start]] -= 1
    start += 1
  end += 1
  if end - start < best_end - best_start: # new shortest sequence?
    best_start = start
    best_end = end

print S[best_start:best_end]
```
Notes:
1. the data structures I use (dictionaries and sets) are based on hash tables; they have good average-case performance but can degrade to O(n) in the worst case. If it's the worst case that you care about, replacing them with tree-based structures will give the overall O(n logn) I've promised above;
2. as pointed out by @biziclop, the first scan of S can be eliminated, making the algorithm suitable for streaming data;
3. if the elements of S are small non-negative integers, as your comments indicate, then count can be flattened out into an integer array, bringing the overall complexity down to O(n).
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...