I have some expirience with Apache Spark and Spark-SQL. Recently I\'ve found Apache Drill project. Could you describe me what are the most significant advantages/differences bet
Apache Spark-SQL:
Apache Drill:
Drill provides the ability for you to query different kinds of datasets with ANSI SQL. This makes it great for adhoc data exploration, and connecting BI tools to datasets via ODBC. You can even use Drill to SQL JOIN different kinds of datasets. For example, you could join records in a MySQL table with rows in a JSON file, or a CSV file, or OpenTSDB, or MapR-DB... the list goes on. Drill can connect to lots of different types of data.
When I think to use Spark, I'm typically wanting to use it for RDDs (resilient distributed dataset). RDDs make it easy to process a lot of data, quickly. Spark also has a bunch of libraries for ML and streaming. Drill doesn't process data at all. It just gets you access to said data. You could use Drill to pull data into Spark, or Tensorflow, or PySpark, or Tableau, etc.
Here's an article I came across that discusses some of the SQL technologies: http://www.zdnet.com/article/sql-and-hadoop-its-complicated/
Drill is fundamentally different in both the user's experience and the architecture. For example:
Drill 1.0 was just released (May 19, 2015). You can easily download it onto your laptop and play with it without any infrastructure (Hadoop, NoSQL, etc.).