I want to create a system where I can read logs in real time, and use apache spark to process it. I am confused if I should use something like kafka or flume to pass the logs to
Although this is a old question, posting a link from Databricks, which has a great step by step article for log analysis with Spark considering many areas.