I\'m new to data governance, forgive me if question lack some information.
We\'re building data lake & enterprise data warehouse from scratch for
I am currently exploring Immuta and Privacera, so I can't yet comment in detail on differences between these two. So far, Immuta gave me better impression with it's elegant policy based setup.
Still, there are ways to solve some of the issues you mentioned above without buying an external component:
1. Security
For RLS, consider using Table ACLs, and giving access only to certain Hive views.
For getting access to data inside ADLS, look at enabling password pass-through on clusters. Unfortunately, then you disable Scala.
You still need to setup permissions on Azure Data Lake Gen 2, which is awful experience for giving permissions on existing child items.
Please avoid creating dataset copies with columns/rows subsets, as data duplication is never a good idea.
2. Lineage
3. Data quality
4. Data life cycle management
One option is to use native data lake storage lifecycle management. That's not a viable alternative behind Delta/Parquet formats.
If you use Delta format, you can easier apply retention or pseudoanonymize
Second option, imagine that you have a table with information about all datasets (dataset_friendly_name, path, retention time, zone, sensitive_columns, owner, etc.). Your Databricks users use a small wrapper to read/write:
DataWrapper.Read("dataset_friendly_name")
DataWrapper.Write("destination_dataset_friendly_name")
It's up to you then to implement the logging, data loading behind the scenes. In addition you can skip sensitive_columns, acts based on retention time (both available in dataset info table). Requires quite some effort
Hopefully you find something useful in my answer. It would be interesting to know which path you took.