I have an unsorted list of noisy X, Y points. They do, however, form a path through the world. I would like an algorithm to draw an approximation of this data using line segment
How many points you have?
A Bezier curve, as mentioned, is a good idea if you have comparedly few points. If you have many points, buiding clusters as suggested by dmckee.
However you also need another constraint for defining the order of points. There have been many good suggestions for how to chose the points, but unless you introduce another constraint, any gives a possible solution.
Possible constraints I can think of:
In all cases, to meet the constraint you probably need to test all permutations of the sequence. If you start with a "good guess", you cayn terminate the others quickly.
My approach would be to first sort your list of points, then use a bezier curve.
The trick is of course the sorting. Start with one random point and find the nearest point. Assume these two are connected. With those two endpoints, find the nearest points to them. Assume that the one with the smaller distance to it's endpoint is connected to that point. Repeat until all points are connected.
I assume that there are still some problems with this approach, but maybe you can use it as a starting point (pun intended).
Edit: You can do it several times with different starting points, and then see where the results differ. That at least gives you some confidence, which points are connected to each other.
I take it that "unsorted list" means that while your set of points is complete, you don't know what order they were travelled through?
The gaussian noise has to be basically ignored. We're given absolutely no information that allows us to make any attempt to reconstruct the original, un-noisy path. So I think the best we can do is assume the points are correct.
At this point, the task consists of "find the best path through a set of points", with "best" left vague. I whipped up some code that attempts to order a set of points in euclidean space, preferring orderings that result in straighter lines. While the metric was easy to implement, I couldn't think of a good way to improve the ordering based on that, so I just randomly swap points looking for a better arrangement.
So, here is some PLT Scheme code that does that.
#lang scheme
(require (only-in srfi/1 iota))
; a bunch of trig
(define (deg->rad d)
(* pi (/ d 180)))
(define (rad->deg r)
(* 180 (/ r pi)))
(define (euclidean-length v)
(sqrt (apply + (map (lambda (x) (expt x 2)) v))))
(define (dot a b)
(apply + (map * a b)))
(define (angle-ratio a b)
(/ (dot a b)
(* (euclidean-length a) (euclidean-length b))))
; given a list of 3 points, calculate the likelihood of the
; angle they represent. straight is better.
(define (probability-triple a b c)
(let ([av (map - a b)]
[bv (map - c b)])
(cos (/ (- pi (abs (acos (angle-ratio av bv)))) 2))))
; makes a random 2d point. uncomment the bit for a 3d point
(define (random-point . x)
(list (/ (random 1000) 100)
(/ (random 1000) 100)
#;(/ (random 1000) 100)))
; calculate the likelihood of an entire list of points
(define (point-order-likelihood lst)
(if (null? (cffffdr lst))
1
(* (probability-triple (car lst)
(cadr lst)
(caddr lst))
(point-order-likelihood (cdr lst)))))
; just print a list of points
(define (print-points lst)
(for ([p (in-list lst)])
(printf "~a~n"
(string-join (map number->string
(map exact->inexact p))
" "))))
; attempts to improve upon a list
(define (find-better-arrangement start
; by default, try only 10 times to find something better
[tries 10]
; if we find an arrangement that is as good as one where
; every segment bends by 22.5 degrees (which would be
; reasonably gentle) then call it good enough. higher
; cut offs are more demanding.
[cut-off (expt (cos (/ pi 8))
(- (length start) 2))])
(let ([vec (list->vector start)]
; evaluate what we've started with
[eval (point-order-likelihood start)])
(let/ec done
; if the current list exceeds the cut off, we're done
(when (> eval cut-off)
(done start))
; otherwise, try no more than 'tries' times...
(for ([x (in-range tries)])
; pick two random points in the list
(let ([ai (random (vector-length vec))]
[bi (random (vector-length vec))])
; if they're the same...
(when (= ai bi)
; increment the second by 1, wrapping around the list if necessary
(set! bi (modulo (add1 bi) (vector-length vec))))
; take the values from the two positions...
(let ([a (vector-ref vec ai)]
[b (vector-ref vec bi)])
; swap them
(vector-set! vec bi a)
(vector-set! vec ai b)
; make a list out of the vector
(let ([new (vector->list vec)])
; if it evaluates to better
(when (> (point-order-likelihood new) eval)
; start over with it
(done (find-better-arrangement new tries cut-off)))))))
; we fell out the bottom of the search. just give back what we started with
start)))
; evaluate, display, and improve a list of points, five times
(define points (map random-point (iota 10)))
(define tmp points)
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 10))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 100))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 1000))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 10000))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
With an unsorted list, you won't really know which points to include in each segment, so I guess you could just go with the closest point.
One way could be to pick a start point at random, and pick the closest point as the next point in each step. Add the first two points to a set S.
Fit a line to the points in S until the RMS exceeds some value, then clear S and start a new line.
The intersection of consecutive lines would be the end-points of the segments.
Here is a heuristic hack that might address the ordering problem for the data, if
Proceed like this:
Now you have a new list of points q1..qn that are ordered.
Off the top of my head, very rough, and only works under pretty good conditions...
Self-crossing behavior can probably be improved by requiring in step (5) that the new projected line lie within some maximum angle of the previous one.
A completely different approach, that does not require another constraint, but details may depend on your application. It sghould work best if you have a "dense cloud of points" around the path.
Use a "cost" function that defines the difference between the curve and the cloud of points. Use a parametrized curve, and a standard optimization algorithm. - OR - Start with a straight curve from start to end, then use a genetic algorithm to modify it.
The typical cost function would be to take the smallest distance between each point and the curve, and sum the squares.
I have not enough experience to suggest an optimization or genetic algorithm but I am sure it can be done :)
I could imagine a genetic algorithm as follows: The path will be built from Waypoints. Start with putting N waypoints in a straigt line from start to end. (N can be chosen depending on the problem). Mutations could be:
You will need to include the total length in the cost function. Splitting might not be needed, or maybe x (the "split chance") might need to decrease as more waypoints are introduced. You may or may not want to apply (2) to the start- and endpoint.
Would be fun to try that...