When to definitely use SOLR over Lucene in a Sitecore 7 build?

前端 未结 3 1082
不知归路
不知归路 2021-02-07 01:22

My client does not have the budget to setup and maintain a SOLR server to use in their production environment. If I understand the Sitecore 7 Content Search API correctly, it i

相关标签:
3条回答
  • 2021-02-07 02:07

    Stephen pretty much covered the question - but I just wanted to add another scenario. You need to take into account the server setup in your production environment. If you are going to be using multiple content delivery servers behind a load balancer I would consider Solr from the start, as trying to make sure that the Lucene index on each delivery server is synchronized 100% of the time can be painful.

    0 讨论(0)
  • 2021-02-07 02:20

    I think if you are dealing with a customer on a limited budget then Lucene will work perfectly well and perform excellently for the scale of things you are doing. All the things you mention are fully supported by the implementation in Lucene.

    In a Sitecore scenario I would begin to consider Solr if:

    • You need to index a large number of items - id say 50 thousand upwards - Lucene is happy with these sorts of number but Solr has improved query caching and is designed for these large numbers of items.
    • The resilience of the search tier is of maximum business importance (ie the site is purely driven by search) - Solr provides a more robust replication/sharding and failover system with SolrCloud.
    • Re-purposing of the search tier in other application is important (non Sitecore) - Solr is a search application so can be accessed over HTTP with XML/JSON etc which makes integration with external systems easier.
    • You need some specific additional feature of Solr that Lucene doesn't have.

    .. but as you say if you want swap out Lucene for Solr at a later phase, we have worked hard to make sure that the process as simple as possible. Worth noting a few points here:

    • While your LINQ queries will stay the same your configuration will be slightly different and will need attention to port across.
    • The understanding of how Solr works as an application and how the schema works is important to know but there are some great books and a wealth of knowledge out there.
    • Solr has slightly different (newer) analyzers and scoring mechanisms so your search results may be slightly different (sometimes customers can get alarmed by this :P)

    .. but I think these are things you can build up to over time and assess with the customer. Im sure there are more points here and others can chime in if they think of them. Hope this helps :)

    0 讨论(0)
  • 2021-02-07 02:23

    I would recommend planning an escape plan from Lucene as early as you start thinking about multiple CDs and here is why:

    A) Each server has to maintain its own index copy:

    1. Any unexpected restart might cause a few documents not to be added to the index on the one box, making indexes different from server to server. That would lead to same page showing differently by CDs
    2. Each server must perform index updates - use CPU & disk space; response rate drops after publish operation is over =/
    3. According to security guide, CDs should have Sitecore Shell UI removed, so index cannot be easily rebuilt from Control Panel =\

    B) Lucene is not designed for large volumes of content. Each search operation does roughly following:

    1. Create an array with size equal to total number of documents in the index
    2. If document matches search, set flag in the array

    While this works like a charm for low sized indexes (~10K elements), huge performance degradation is produced once the volume of content grows.

    The allocated array ends in Large Object Heap that is not compacted by default, thereby gets fragmented fast.

    Scenario:

    1. Perform search for 100K documents -> huge array created in memory

    2. Perform one more search in another thread -> one more huge array created

    3. Update index -> now 100K + 10 documents

    4. The first operation was completed; LOH has space for 100K array

    5. Seach triggered again -> 100K+10 array is to be created; freed memory 'hole' is not large enough, so more RAM is requested.

    6. w3wp.exe process keeps on consuming more and more RAM

    This is the common case for Analytics Aggregation as an index is being populated by multiple threads at once. You'll see a lot of RAM used after a while on the processing instance.

    C) Last Lucene.NET release was done 5 years ago.

    Whereas SOLR is actively being developed.

    The sooner you'll make the switch to SOLR, the easier it would be.

    0 讨论(0)
提交回复
热议问题