Retrieving entire data collection from a RavenDB

前端 未结 6 933
囚心锁ツ
囚心锁ツ 2021-02-01 21:55

I have a requirement where I need to fetch the entire data collection Users from RavenDB and compare the retrieved result set with another set of data. There are cl

6条回答
  •  走了就别回头了
    2021-02-01 22:27

    With a slight twist on @capaj's post. Here is a generic way of getting all the document IDs as a list of strings. Note the use of Advanced.LuceneQuery(idPropertyName), SelectFields(idPropertyName) and GetProperty(idPropertyName) to make things generic. The default assumes "Id" is a valid property on the given (which should be the case 99.999% of the time). In the event you have some other property as your Id you can pass it in as well.

    public static List getAllIds(DocumentStore docDB, string idPropertyName = "Id") {
       return getAllIdsFrom(0, new List(), docDB, idPropertyName);
    }
    
    public static List getAllIdsFrom(int startFrom, List list, DocumentStore docDB, string idPropertyName ) {
        var allUsers = list;
    
        using (var session = docDB.OpenSession())
        {
            int queryCount = 0;
            int start = startFrom;
            while (true)
            {
                var current = session.Advanced.LuceneQuery().Take(1024).Skip(start).SelectFields(idPropertyName).ToList();
                queryCount += 1;
                if (current.Count == 0)
                    break;
    
                start += current.Count;
                allUsers.AddRange(current.Select(t => (t.GetType().GetProperty(idPropertyName).GetValue(t, null)).ToString()));
    
                if (queryCount >= 28)
                {
                    return getAllIdsFrom(start, allUsers, docDB, idPropertyName);
                }
            }
        }
        return allUsers;
    }
    

    An example of where/how I use this is when making a PatchRequest in RavenDb using the BulkInsert session. In some cases I may have hundreds of thousands of documents and can't afford to load all the documents in memory just to re-iterate over them again for the patch operation... thus the loading of only their string IDs to pass into the Patch command.

    void PatchRavenDocs()
    {
        var store = new DocumentStore
        {
            Url = "http://localhost:8080",
            DefaultDatabase = "SoMeDaTaBaSeNaMe"
        };
    
        store.Initialize();
    
        // >>>here is where I get all the doc IDs for a given type<<<
        var allIds = getAllIds(store);    
    
        // create a new patch to ADD a new int property to my documents
        var patches = new[]{ new PatchRequest { Type = PatchCommandType.Set, Name = "SoMeNeWPrOpeRtY" ,Value = 0 }};
    
        using (var s = store.BulkInsert()){
            int cntr = 0;
            Console.WriteLine("ID Count " + allIds.Count);
            foreach(string id in allIds)
            {
                // apply the patch to my document
                s.DatabaseCommands.Patch(id, patches);
    
                // spit out a record every 2048 rows as a basic sanity check
                if ((cntr++ % 2048) == 0)
                    Console.WriteLine(cntr + " " + id);
            }
        }
    }
    

    Hope it helps. :)

提交回复
热议问题