Suppose you have a dictionary that contains valid words.
Given an input string with all spaces removed, determine whether the string is composed of valid words or not.<
I'd go for a recursive algorithm with implicit backtracking. Function signature: f: input -> result
, with input
being the string, result
either true
or false
depending if the entire string can be tokenized correctly.
Works like this:
input
is the empty string, return true
.input
(i.e., the first character). If it is in the dictionary, run f
on the suffix of input
. If that returns true
, return true
as well.f
in the previous step returned false
, make the prefix longer by one and repeat at step 2. If the prefix cannot be made any longer (already at the end of the string), return false
.For dictionaries with low to moderate amount of ambiguous prefixes, this should fetch a pretty good running time in practice (O(n) in the average case, I'd say), though in theory, pathological cases with O(2^n) complexity can probably be constructed. However, I doubt we can do any better since we need backtracking anyways, so the "instinctive" O(n) approach using a conventional pre-computed lexer is out of the question. ...I think.
EDIT: the estimate for the average-case complexity is likely incorrect, see my comment.
Space complexity would be only stack space, so O(n) even in the worst-case.