I have to implement file watcher functionality in Erlang: There should be a process that list files if specific directory and do something, when files appear.
I take
I have written such a library, based on polling. (It would be nice to extend it to use inotify on platforms where this is supported.) It was originally meant to be used in EUnit, but I turned into a separate project instead. You can find it here:
https://github.com/richcarl/file_monitor
if you are using Linux, you can use inotify. It is a kernel service that lets you subscribe to file system events. Don't poll the filesystem, let the filesystem call you.
you can try https://github.com/massemanet/inotify for observing your directory.
Ulf
In Erlang it is very cheap to create processes (orders of magnitudes compared to other systems).
Therefore I recommend to create a new ProcessFileServer each time a new file to process is appearing. When it is done with just terminate the process with exit reason normal
.
I would suggest the following structure:
top_supervisor
|
+-----------------------+-------------------------+
| |
directory_supervisor processing_supervisor
| simple_one_for_one
+----------+-----...-----+ |
| | | starts children transient
| | | |
dir_watcher_1 dir_watcher_2 dir_watcher_n +-------------+------+---...----+
| | |
proc_file_1 proc_file_2 proc_file_n
When a dir_watcher
notices a new file appeared. It calls the processing_supervisor
s supervisor:start_child\2
function, with the extra parameter of the file pathe e.g.
The processing_supervisor
should start its children with transient
restart policy.
So if one of the proc_file
servers is crashing it will be restarted, but when they terminate with exit reason normal
they are not restarted. So you just exit normal
when done and crash when whatever else happens.
If you don't overdo it, cyclic polling for files is Ok. If the system becomes loaded because of this polling you can investigate in kernel notification systems (e.g. FreeBSD KQUEUE or the higher level services building upon it on MacOSX) to send you a message when a file appears in a directory. These services however have a complexity because it is necessary for them to throw up their hands if too many events happen (otherwise they wouldn't be a performance improvement but the opposite). So you will have to have a robust polling solution as a fallback anyway.
So don't do premature optimization and start with polling, adding improvements (which would be isolated in the dir_watcher
servers) when it gets necessary.
Regarding the comment what behaviour to use as dir_watcher
process since it doesn't use much of gen_servers
functionality:
There is no problem with only using part of gen_servers
posibilities, in fact it is very common not to use all of it. In your case you only set up a timer in init
and use handle_info
to do your work. The rest of the gen_server
is just the unchanged template.
If you later want changing parameters like poll frequency it is easy to add into this.
gen_fsm
is much less used since it only fits a quite limited model and is not very flexible. I use it only when it really fits 100% to the requirement (which it does almost never).
In a case where you just want a simple plain Erlang server you can use the spawn functions in proc_lib to get just the minimal functionality to run under a supervisor.
A interesting way to write more natural Erlang code and still have the OTP advantages is plain_fsm, here you have the advantages of selective receive and flexible message handling needed especially when handling protocols paired with the nice features of OTP.
Having said all this: if I would write a dir_watcher
I'd just use a gen_server
and use only what I need. The unused functionality doesn't really cost you anything and everybody understands what it does.