HDFS file watcher

末鹿安然 提交于 2019-11-30 09:45:09

Hadoop 2.6 introduced DFSInotifyEventInputStream that you can use for this. You can get an instance of it from HdfsAdmin and then just call .take() or .poll() to get all the events. Event types include delete, append and create which should cover what you're looking for.

Here's a basic example. Make sure you run it as the hdfs user as the admin interface requires HDFS root.

public static void main( String[] args ) throws IOException, InterruptedException, MissingEventsException
{
    HdfsAdmin admin = new HdfsAdmin( URI.create( args[0] ), new Configuration() );
    DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
    while( true ) {
        EventBatch events = eventStream.take();
        for( Event event : events.getEvents() ) {
            System.out.println( "event type = " + event.getEventType() );
            switch( event.getEventType() ) {
                case CREATE:
                    CreateEvent createEvent = (CreateEvent) event;
                    System.out.println( "  path = " + createEvent.getPath() );
                    break;
                default:
                    break;
            }
        }
    }
}

Here's a blog post that covers it in more detail:

http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1

sunitha

Oozie coordinator can do this. Oozie coordinator actions can be triggered based on data availability. Write a data triggered coordinator. The coordinator actions are triggered based on the done-flag. done-flag is nothing but an empty file. So when your threshold is reached write an empty file into the directory.

Old thread... In case, if someone wants to do this in Scala

import java.net.URI

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hdfs.client.HdfsAdmin
import org.apache.hadoop.hdfs.inotify.Event.{AppendEvent, CreateEvent, RenameEvent}


object HDFSTest extends App {
  val admin = new HdfsAdmin( URI.create( "hdfs://namenode:port" ), new Configuration() )
  val eventStream = admin.getInotifyEventStream()

  while( true ) {
    val events =  eventStream.poll(2l, java.util.concurrent.TimeUnit.SECONDS)
    events.getEvents.toList.foreach { event ⇒
      println(s"event type = ${event.getEventType}")
      event match {
        case create: CreateEvent ⇒
          println("CREATE: " + create.getPath)

        case rename: RenameEvent ⇒
          println("RENAME: " + rename.getSrcPath + " => " + rename.getDstPath)

        case append: AppendEvent ⇒
          println("APPEND: " + append.getPath)

        case other ⇒
          println("other: " + other)
      }
    }
  }
}

In case, if one wants to use an impersonated user... set env var: HADOOP_USER_NAME=user-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!