I\'ll use a university\'s library system to explain my use case. Students register in the library system and provide their profile: gender, age, department, previously completed
My experience with Drools (or a rules engine in general) is that it is a good fit if user visibility into the rules are important, or if fast changes to the rules without making it a coding project is important, or if the set of rules is very large making it hard to manage, think about and analyze in code (so you would have business people asking technical people to go read the code and tell them what happens in situation X).
That being said, rules engines can be a bottleneck. They don't run anything close to the performance of code, so you do need to manage that up front architecturally. In this specific case there is certainly a database behind this, and you can add to the performance issues that the database will return a query a whole lot faster than you can analyze the whole set in code.
I would absolutely not implement that by making a million rules objects, rather I would make a book type that multiple books can be assigned to, and run the rules against the book types, and then only show books that are in an allowed type. This way you could load the types, pass them through the rules engine, and then push the allowed types to a query on the database end that pulls the list of books in the allowed types.
Types get a bit complicated by the fact it will be likely that in practice a book may be of two types (allowed if you are taking a certain course, or in general if you are part of the department), but the approach should still hold.
I would be worried about the need to have the number of rules a function of the number of students - that could really make things tricky (that sounds like the biggest problem).
First, Don't make rules for every book. Make rules on the restrictions—there are a lot fewer restrictions defined than books. This will make a huge impact on the running time and memory usage.
Running a ton of books through the rule engine is going to be expensive. Especially since you won't show all the results to the user: only 10-50 per page. One idea that comes to mind is to use the rule engine to build a set of query criteria. (I wouldn't actually do this—see below.)
Here's what I have in mind:
rule "Only two books for networking"
when
Student($checkedOutBooks : checkedOutBooks),
Book(subjects contains "networking", $book1 : id) from $checkedOutBooks,
Book(subjects contains "networking", id != $book1) from $checkedOutBooks
then
criteria.add("subject is not 'networking'", PRIORITY.LOW);
end
rule "Books allowed for course"
when
$course : Course($textbooks : textbooks),
Student(enrolledCourses contains $course)
Book($book : id) from $textbooks,
then
criteria.add("book_id = " + $book, PRIORITY.HIGH);
end
But I wouldn't actually do that!
This is how I would have changed the problem: Not showing the books to the user is a poor experience. A user may want to peruse the books to see which books to get next time. Show the books, but disallow the checkout of restricted books. This way, you only have 1-50 books to run through the rules at a time per user. This will be pretty zippy. The above rules would become:
rule "Allowed for course"
activation-group "Only one rule is fired"
salience 10000
when
// This book is about to be displayed on the page, hence inserted into working memory
$book : Book(),
$course : Course(textbooks contains $book),
Student(enrolledCourses contains $course),
then
//Do nothing, allow the book
end
rule "Only two books for networking"
activation-group "Only one rule is fired"
salience 100
when
Student($checkedOutBooks : checkedOutBooks),
Book(subjects contains "networking", $book1 : id) from $checkedOutBooks,
Book(subjects contains "networking", id != $book1) from $checkedOutBooks,
// This book is about to be displayed on the page, hence inserted into working memory.
$book : Book(subjects contains "networking")
then
disallowedForCheckout.put($book, "Cannot have more than two networking books");
end
Where I am using activation-group to make sure only one rule is fired, and salience to make sure they are fired in the order I want them to be.
Finally, keep the rules cached. Drools allows—and suggests that—you load the rules only once into a knowledge base and then create sessions from that. Knowledge bases are expensive, sessions are cheap.
Any time we are looking at large data-sets (which this question is about ... whether or not Drools is a good fit in a large data set case), think outside the box (below). Any time we are talking about "millions of objects" or similar log-N type problems, I don't think they tool in question is necessarily the problem. So yes, Drools (or JBoss Rules) can be used BUT would only make sense in a certain context...
When you have log-N of anything (cross-referencing large data-sets against inputs), I would recommend using more novel approaches like database-backed Bloom Filters. These can be implemented as Java objects and referenced by Drools for the fact lookup (custom coding there however).
Since Bloom Filters are tiny memory structures with only basic insert()/contains() functions, they do have a drawback ... about a 1% false-positive rate. So this will serve as a primary-cache. If constructing the Drools question to generally be "NO" as the answer, a Bloom Filter backed fact-table construct lookup will be lightning fast and with a tiny memory footprint (about 1.1 bytes per record in my implementation) so 1 MB of RAM for this case. Then in the "contains" case (which might be a false-positive), use the database-backed fact table to clarify. Again, if in 80% of the time, the lookup is false, then the Bloom Filter will be a huge cost-savings in memory and time. Otherwise, the pure (anything - Drools facts, database, etc) 1M record lookups will be very expensive every time (in memory and speed).
My questions are: how much memory will Drools take if loaded with a million book rules? How fast will it be for all those million rules to fire?
How fast is your computer and how much memory have you got? In one sense you can only find out by building a proof of concept and filling it with the right quantity of (randomly-generated) test data. My experience is that Drools is faster than you expect, and that you have to have very good knowledge of what's under the hood to be able to predict what is going to make it slow.
Note that you are talking about a million rule session facts (i.e. Book objects), not a million rules. There are only a handful of rules, which won't take long to fire. The potentially slow part is inserting the million objects, because Drools needs to decide which rules to put on the Agenda for each new fact.
It's a shame that none of us has an answer for some particular set-up with a million facts.
As for the implementation, my approach would be to insert a Book object for each book that the student wants to check out, retract the ones that are not allowed, and a query to get the remaining (allowed) Book objects, and another query to get the list of reasons. Alternatively, use RequestedBook objects that have additional boolean allowed
and String reasonDisallowed
properties that you can set in your rules.