The k-means++ algorithm helps in two following points of the original k-means algorithm:
Nobody claims k-means++ runs in O(lg k) time; it's solution quality is O(lg k)-competitive with the optimal solution. Both k-means++ and the common method, called Lloyd's algorithm, are approximations to an NP-hard optimization problem.
I'm not sure what the worst case running time of k-means++ is; note that in Arthur & Vassilvitskii's original description, steps 2-4 of the algorithm refer to Lloyd's algorithm. They do claim that it works both better and faster in practice because it starts from a better position.
The drawbacks of k-means++ are thus:
That said, if your k-means library supports k-means++, then by all means try it out.
Not your question, but an easy speedup to any kmeans method for large N:
1) first do k-means on a random sample of say sqrt(N) of the points
2) then run full k-means from those centres.
I've found this 5-10 times faster than kmeans++ for N 10000, k 20, with similar results.
How well it works for you will depend on how well a sqrt(N) sample
approximates the whole, as well as on N, dim, k, ninit, delta ...
What are your N (number of data points), dim (number of features), and k ?
The huge range in users' N, dim, k, data noise, metrics ...
not to mention the lack of public benchmarks, make it tough to compare methods.
Added: Python code for kmeans() and kmeanssample() is here on SO; comments are welcome.