Loading large amount of data into memory - most efficient way to do this?

后端 未结 4 1697
悲&欢浪女
悲&欢浪女 2020-12-30 01:08

I have a web-based documentation searching/viewing system that I\'m developing for a client. Part of this system is a search system that allows the client to search for a t

相关标签:
4条回答
  • 2020-12-30 01:09

    Fetch all the data as a string, and use split(). This is the fastest way to build an array in Javascript.

    There's an excellent article a very similar problem, from the people who built the flickr search: http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/

    0 讨论(0)
  • 2020-12-30 01:18

    Instead of using $.getScript to load JavaScript files containing function calls, consider using $.getJSON. This may boost performance. The files would now look like this:

    {
        "key" : 0,
        "values" : [0,1,2,3,4,5,6,7,8]
    }
    

    After receiving the JSON response, you could then call AddToBookData on it, like this:

    function AddToBookData(json) {
         BookData[BookIndex].push([json.key,json.values]);
    }
    

    If your files have multiple sets of calls to AddToBookData, you could structure them like this:

    [
        {
            "key" : 0,
            "values" : [0,1,2,3,4,5,6,7,8]
        },
        {
            "key" : 1,
            "values" : [0,1,2,3,4,5,6,7,8]
        },
        {
            "key" : 2,
            "values" : [0,1,2,3,4,5,6,7,8]
        }
    ]
    

    And then change the AddToBookData function to compensate for the new structure:

    function AddToBookData(json) {
        $.each(json, function(index, data) {
            BookData[BookIndex].push([data.key,data.values]);
        });
    }  
    

    Addendum
    I suspect that regardless what method you use to transport the data from the files to the BookData array, the true bottleneck is in the sheer number of requests. Must the files be fragmented into 40-100? If you change to JSON format, you could load a single file that looks like this:

    {
        "file1" : [
            {
                "key" : 0,
                "values" : [0,1,2,3,4,5,6,7,8]
            },
            // all the rest...
        ],
        "file2" : [
            {
                "key" : 1,
                "values" : [0,1,2,3,4,5,6,7,8]
            },
            // yadda yadda
        ]
    }
    

    Then you could do one request, load all the data you need, and move on... Although the browser may initially lock up (although, maybe not), it would probably be MUCH faster this way.

    Here is a nice JSON tutorial, if you're not familiar: http://www.webmonkey.com/2010/02/get_started_with_json/

    0 讨论(0)
  • 2020-12-30 01:25

    I tested three methods of loading the same 9,000,000 point dataset into Firefox 3.64.

    1: Stephen's GetJSON Method
    2) My function based push method
    3) My pre-processed array appending method:
    

    I ran my tests two ways: The first iteration of testing I imported 100 files containing 10,000 rows of data, each row containing 9 data elements [0,1,2,3,4,5,6,7,8]

    The second interation I tried combining files, so that I was importing 1 file with 9 million data points.

    This was a lot larger than the dataset I'll be using, but it helps demonstrate the speed of the various import methods.

    Separate files:                 Combined file:
    
    JSON:        34 seconds         34
    FUNC-BASED:  17.5               24
    ARRAY-BASED: 23                 46
    

    Interesting results, to say the least. I closed out the browser after loading each webpage, and ran the tests 4 times each to minimize the effect of network traffic/variation. (ran across a network, using a file server). The number you see is the average, although the individual runs differed by only a second or two at most.

    0 讨论(0)
  • 2020-12-30 01:31

    Looks like there are two basic areas for optimising the data loading, that can be considered and tackled separately:

    1. Downloading the data from the server. Rather than one large file you should gain wins from parallel loads of multiple smaller files. Experiment with number of simultaneous loads, bear in mind browser limits and diminishing returns of having too many parallel connections. See my parallel vs sequential experiments on jsfiddle but bear in mind that the results will vary due to the vagaries of pulling the test data from github - you're best off testing with your own data under more tightly controlled conditions.
    2. Building your data structure as efficiently as possible. Your result looks like a multi-dimensional array, this interesting article on JavaScript array performance may give you some ideas for experimentation in this area.

    But I'm not sure how far you'll really be able to go with optimising the data loading alone. To solve the actual problem with your application (browser locking up for too long) have you considered options such as?

    Using Web Workers

    Web Workers might not be supported by all your target browsers, but should prevent the main browser thread from locking up while it processes the data.

    For browsers without workers, you could consider increasing the setTimeout interval slightly to give the browser time to service the user as well as your JS. This will make things actually slightly slower but may increase user happiness when combined with the next point.

    Providing feedback of progress

    For both worker-capable and worker-deficient browsers, take some time to update the DOM with a progress bar. You know how many files you have left to load so progress should be fairly consistent and although things may actually be slightly slower, users will feel better if they get the feedback and don't think the browser has locked up on them.

    Lazy Loading

    As suggested by jira in his comment. If Google Instant can search the entire web as we type, is it really not possible to have the server return a file with all locations of the search keyword within the current book? This file should be much smaller and faster to load than the locations of all words within the book, which is what I assume you are currently trying to get loaded as quickly as you can?

    0 讨论(0)
提交回复
热议问题