What is a fast and efficient way to implement the server-side component for an autocomplete feature in an html input box?
I am writing a service to autocomplete use
For those who stumble upon this question...
I just posted a server-side autocomplete implementation on Google Code. The project includes a java library that can be integrated into existing applications and a standalone HTTP AJAX autocomplete server.
My hope is that enables people to incorporate efficient autocomplete into their applications. Kick the tires!
With a set that large I would try something like a Lucene index to find the terms you want, and set a timer task that gets reset after every key stroke, with a .5 second delay. This way if a user types multiple characters fast it doesn't query the index every stroke, only when the user pauses for a second. Useability testing will let you know how long that pause should be.
Timer findQuery = new Timer();
...
public void keyStrokeDetected(..) {
findQuery.cancel();
findQuery = new Timer();
String text = widget.getEnteredText();
final TimerTask task = new TimerTask() {
public void run() {
...query Lucene Index for matches
}
};
findQuery.schedule(task, 350); //350 ms delay
}
Some pseduocode there, but that's the idea. Also if the query terms are set the Lucene Index can be pre-created and optimized.
I had a similar requirement.
I used relational database with a single well-indexed synthetic table (avoiding joins and views to speed up lookups), and in-memory cache (Ehcache) to store most used entries.
By using MRU cache you'll be able to have instant response times for most of the lookups, and there is probably nothing that can beat relational database in accessing indexed column in a big table stored on disk.
This is solution for big datasets you can't store on the client and it works pretty fast (non-cached lookup was always retrieved under 0.5 seconds in my case). It's also horizontally scalable - you can always add additional servers and database servers.
You could also play with caching of only the most used results on the client, especially if you've already implemented it. In my case, server-side solution is fast enough, and client load times are slow enough as it is, so it's not warranted.
P.S. Having client query only when user pauses for a certain amount of time to avoid repeated lookups as suggested is a good solution. On my client, I query database only after first three characters are entered, since less than that returns too many results in all instances.
use trie data structure here is the wiki http://en.wikipedia.org/wiki/Trie