Autocomplete performance and private “magic search”

可紊 提交于 2019-12-25 02:37:25

问题


Poor performance of autocomplete fields reduces their usefulness. If the client-side implementation has to call an endpoint that does heavy db lookup, the response time can easily get frustrating.

One neat approach comes from AWS Case Study: IMDb. It used to come with a diagram (no longer available), but in a nutshell a prediction tree would be generated and stored for every combination that can resolve in a meaningful way. E.g. resolutions for sta would include Star Wars, Star Trek, Sylvester Stallone which will be stored, but stb will not resolve to anything meaningful and will not be stored.

To get the lowest possible latency, all possible results are pre-calculated with a document for every combination of letters in search. Each document is pushed to Amazon Simple Storage Service (Amazon S3) and thereby to Amazon CloudFront, putting the documents physically close to the users. The theoretical number of possible searches to calculate is mind-boggling—a 20-character search has 23 x 1030 combinations—but in practice, using IMDb's authority on movie and celebrity data can reduce the search space to about 150,000 documents, which Amazon S3 and Amazon CloudFront can distribute in just a few hours. IMDb creates indexes in several languages with daily updates for datasets of over 100,000 movie and TV titles and celebrity names.

How would one achieve a similarly performant experience be achieved with private data? E.g. autocompleting client names, job ids, invoice numbers... Storing different documents/decision trees for separate users sounds expensive, especially if some of the data (client names?) could be available for multiple users.


回答1:


You right that such workload requires some special optimizations.

You can use ready search engine like Apache lucene or Solr (wich is REST API wrapper for lucene)

This engine optimized for full text searches and can work with private data.

Work steps:

  1. Install solr (or lucene)
  2. Design schema for storing information (what fields and what types of searchs you need)
  3. Load data into it ( via bach operations or on update basis)
  4. Query searches based on solrs query language (similar to google search). In this place you could add special restrictions based on user_id or any over parameter in addition to original user query. So private data wouldn't mess between users.



回答2:


I actually agree with CGI. The best solution is a 3rd party search engine. Anything else is trying to build your own search engine. I'm really not sure what the hardware at your disposal by your post so i'll give a possible solution for a lowbrow if all you got is LAMP hosting.

So in your PHP code you would make a query string like:

$qstr = "SELECT * FROM Clients WHERE `name` like '%".$search."%' ORDER BY popularity DESC LIMIT 0,100";

Than increment the popularity column for every record that is found via the "search engine." On the front end (Lets say your using Dojo) you could do something like...

<script>
   require(["dojo/on", "dojo/dom", "dojo/request/xhr", "dojo/domReady!"], function (on, dom, xhr) {
       on(dom.byId('txtSearch'), "change", function(evt) {
           if (typeof searchCheck !== undefined) clearTimeout(searchCheck);
           searchCheck = setTimeout(function() { //keep from flooding XHR
               xhr("fetch-json-results.php", {
                  handleAs: "json"
               }).then(function(data){
                  //update txtSearch combo store
               });
           }, 500);
       });
   });
</script>
<input id="txtSearch" type="text" data-dojo-type="dijit/form/ComboBox" data-dojo-props="intermediateChanges:true">

This would be a low tech low budget (LAMP) equiv answer.



来源:https://stackoverflow.com/questions/25414620/autocomplete-performance-and-private-magic-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!