For example I have Documents A, B, C. User 1 must only be able to see Documents A, B. User 2 must only be able to see Document C. Is it possible to do it in SOLR without filteri
Might want to check the Document level Security patches.
https://issues.apache.org/jira/browse/SOLR-1872
https://issues.apache.org/jira/browse/SOLR-1834
Keeping in mind that solr is pure text based search engine,indexing system,to facilitate fast searching, you should not expect RDMS style capabilities from it. solr does not provide security for documents being indexed, you have to write such an implementation if you want. In that case you have two options. 1)Just index documents into solr and keep authorization details into RDBMS.Now query solr for your search and collect the results returned.Now fire another query to DB for the doc ids returned by solr to see if the user has an access to them or not.Filter out those documents on which user in action has no access.You are done ! But not really, your problem starts from here only.Assume, what if all results returned by solr gets filtered out ? (Assuming you are not accessing all the documents at a time,means you are retrieving top 1000 results only from solr result set,otherwise you can not get fast search) You have to query solr again for next bunch of result set and have to iterate these steps until you get enough results to display. 2)Second approach to this is to index authorization meta data along with document in solr.Same as aitchnyu has explained.But to answer your query for document sharing to an external user,along with usergroup and role detail, you index these external user's userid into access_roles field or you can just add an another field to your schema 'access_user' too. Now you can modify search queries for external user's sharing to include access_user field into your filter query. e.g
q=mainquery
&fq=access_roles:group_1
&fq=access_user:externaluserid
Now the most important thing, update to an indexed documents.Well its off course tedious task, but with careful design and async processing along with solrs partial document update feature(solr 4.0=>), you can achieve reasonably good TPS with solr. If you are using solr <4.0 you can have separate systems for both searching and updates and with care full use of load balancer and master slave replication strategies you will have smile on your face !
You can implement your security model using Solr's PostFilter. For more information see http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/
Note: you should probably cache your access rights otherwise performance will be terrible.
I think my approach would be similar to @aitchnyu's answer. I would however NOT use individual users in the meta data. If you create groups for each document, then you will have to reindex for security reason less often.
For a given document, you might have access_roles: group_1, group_3
In this way, the group_1 and group_3 always retain rights to the document. However, I can vary what groups each user belongs to and adjust the query accordingly.
When the query then is generated, it always passes as a part of the query the user's groups. If I belong to group_1 and group_2, my query will look like this:
q=mainquery
&fq=access_roles:group_1
&fq=access_roles:group_2
Since the groups are dynamically generated in the query, I simply remove a user from the group, and when a new query is issued, they will no longer include the removed group in the query. So removing the user from group_1 would new create a query like this:
q=mainquery
&fq=access_roles:group_2
All documents that require group 1 will no longer be accessible to the user.
This allows most changes to be done in real-time w/out the need to reindex the documents. The only reason you would have to reindex for security reasons is if you decided that a particular group should no longer have access to a document.
In many real-world scenarios, that should be a relatively uncommon occurrence. It seems much more likely that HR documents will always be available to the HR department, however a specific user may not always be part of the HR group.
Hope that helps.
There are no built in mechanisms for Solr that I am aware of that will allow you to control access to documents without maintaining the rights in the metadata. The approach outlined by aitchnyu seems reasonable if you keep it a true role level and not assign user specific permissions to a document. That way you can assign roles to users and this will grant them the ability to see documents in the index. Granted you will still need to reindex documents when the roles change, but hopefully you can identify most of the needed roles ahead if time and reduce the need for frequent reindexing.
I would suggest storing the access roles (yes, its plural) as document metadata. Here the required field access_roles
is a facet-able multi-valued string field.
Doc1: access_roles:[user_jane, manager_vienna] // Jane and the Vienna branch manager may see it
Doc2: access_roles:[user_john, manager_vienna, special_team] // Jane, the Vienna branch manager and a member of special team may see it
The user owning the document is a default access role for that document.
To change the access roles of a document, you edit access_roles
.
When Jane searches, the access roles she belongs to will be part of the query. Solr will retrieve only the documents that match the user's access role.
When Jane (user_jane
), manager at vienna office (manager_vienna
) searches, her searches go like:
q=mainquery
&fq=access_roles:user_jane
&fq=access_roles:manager_vienna
&facet=on
&facet.field=access_roles
which fetches all documents which contains user_jane
OR manager_vienna
in access_roles
; Doc1
and Doc2
.
When Bob, (user_bob
), member of a special team (specia_team
) searches,
q=mainquery
&fq=access_roles:user_bob
&fq=access_roles:special_team
&facet=on
&facet.field=access_roles
which fetches Doc2
for him.
Queries adapted from http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams