Currently I am sniffing packets from my local wlan interface like :
sudo tshark > sampleData.pcap
However, I need to feed this data to kafka.
Currently, I have a kafka producer script producer.sh
:
../bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 'spark-kafka'
and feed data to kafka like this:
producer.sh < sampleData.pcap
where in sampleData.pcap I have pre-captured IP packet information.
However, I wanna automate the process where it'd be something like this:
sudo tshark > http://localhost:9091
producer.sh < http://localhost:9091
This is obviously just a pseudoalgorithm. What I want to do is, send the sniffing data to a port and have kafka continuously read it. I don't want kafka to read from a file continuously because that'd mean tremendous amount of read/write operations from a single file causing inefficiency.
I searched the internet and came across kafka-connect but I can't find any useful documentation for implementing something like this.
What's the best way to implement something like this?
Thanks!
With netcat
No need to write a server, you can use netcat (and tell your script to listen on the standard input):
shell1> nc -l 8888 | ./producer.sh
shell2> sudo tshark -l | nc 127.1 8888
The -l
of tshark prevents it from buffering the output too much (flushes after each packet).
With a named pipe
You could also use a named pipe to transmit tshark output to your second process:
shell1> mkfifo /tmp/tsharkpipe
shell1> tail -f -c +0 /tmp/tsharkpipe | ./producer.sh
shell2> sudo tshark -l > /tmp/tsharkpipe
I think you can either
- create a tiny server that connects to kafka ant listens to a port
- use the kafka-file connector and append all your data to that file. http://kafka.apache.org/documentation.html#quickstart_kafkaconnect
If you use Node, you can use child_process
and kafka_node
to do it. Something like this:
var kafka = require('kafka-node');
var client = new kafka.Client('localhost:2181');
var producer = new kafka.Producer(client);
var spawn = require('child_process').spawn;
var tshark = spawn('sudo', ['/usr/sbin/tshark']);
tshark.stdout.on('data', (data) => {
producer.send([
{topic: 'spark-kafka', messages: [data.split("\n")]}
], (err,result) => { console.log("sent to kafka")});
});
Another option would be to use Apache NiFi. With NiFi you can execute commands and pass the output to other blocks for further processing. Here you could have NiFi execute a tshark command on the local host and then pass the output to Kafka.
There is an example here which should demonstrate this type of approach in slightly more detail.
来源:https://stackoverflow.com/questions/35872663/how-to-continuously-feed-sniffed-packets-to-kafka