问题
I want to apply a function for every file in a directory and subdirectories, as follows:
def applyRecursively(dir: String, fn: (File) => Any) {
def listAndProcess(dir: File) {
dir.listFiles match {
case null => out.println("exception: dir cannot be listed: " + dir.getPath); List[File]()
case files => files.toList.sortBy(_.getName).foreach(file => {
fn(file)
if (!java.nio.file.Files.isSymbolicLink(file.toPath) && file.isDirectory) listAndProcess(file)
})
}
}
listAndProcess(new File(dir))
}
def exampleFn(file: File) { println(s"processing $file") }
applyRecursively(dir, exampleFn)
this works. the question here is how I could refactor this code by using scala Iteratees. something like this:
val en = Enumerator.generateM(...) // ???
val it: Iteratee[File, Unit] = Iteratee.foreach(exampleFn)
val res = en.run(it)
res.onSuccess { case x => println("DONE") }
回答1:
It does not capture all your requirements but this can get you started
object ExampleEnumerator {
import scala.concurrent.ExecutionContext.Implicits.global
def exampleFn(file: File) { println(s"processing $file") }
def listFiles(dir: File): Enumerator[File] = {
val files = Option(dir.listFiles).toList.flatten.sortBy(_.getName)
Enumerator(dir) andThen Enumerator(files :_*).flatMap(listFiles)
}
def main(args: Array[String]) {
import scala.concurrent.duration._
val dir = "."
val en: Enumerator[File] = listFiles(new File(dir))
val it: Iteratee[File, Unit] = Iteratee.foreach(exampleFn)
val res = en.run(it)
res.onSuccess { case x => println("DONE") }
Await.result(res, 10.seconds)
}
}
回答2:
You can use Enumerator.unfold
for this. The signature is:
def unfold[S, E](s: S)(f: (S) => Option[(S, E)])(implicit ec: ExecutionContext): Enumerator[E]
The idea is that you start with a value of type S
, and then apply a function to it that returns an Option[(S, E)]
. A value of None
means the Enumerator
has reached EOF. A Some
contains another S
to unfold, and the next value the Enumerator[E]
will generate. In your example you can start with a Array[File]
(the initial directory), take the first value from the Array
, and check if it's a file or directory. If it's just a file, you return the tail of the Array
with the File
tupled together. If the File
is a directory, you get the file listing and add it to the beginning of the Array
. Then next steps in unfold
will continue to process the contained files.
You end up with something like this:
def list(dir: File)(implicit ec: ExecutionContext): Enumerator[File] = {
Enumerator.unfold(Array(dir)) { listing =>
listing.headOption.map { file =>
if(!java.nio.file.Files.isSymbolicLink(file.toPath) && file.isDirectory)
(file.listFiles.sortBy(f => (f.isDirectory, f.getName)) ++ listing.tail) -> file
else
listing.tail -> file
}
}
}
I added an extra sort by isDirectory
to prioritize non-directories first. This means that if directory contents are added to the Array
to unfold, the files will be consumed first before adding more contents. This will prevent the memory footprint from quickly expanding due to the recursive nature.
If you want the directories to be removed from the final Enumerator
, you can use Enumeratee.filter
to do that. You'll end up with something like:
list(dir) &> Enumeratee.filter(!_.isDirectory) |>> Iteratee.foreach(fn)
回答3:
This just complements the great answer of m-w
with some logging to help understand it.
$ cd /david/test
$ find .
.
./file1
./file2
./file3d
./file3d/file1
./file3d/file2
./file4
java:
import play.api.libs.iteratee._
import java.io.File
import scala.concurrent.Await
import scala.concurrent.duration.Duration
object ExampleEnumerator3 {
import scala.concurrent.ExecutionContext.Implicits.global
def exampleFn(file: File) { println(s"processing $file") }
def list(dir: File): Enumerator[File] = {
println(s"list $dir")
val initialInput: List[File] = List(dir)
Enumerator.unfold(initialInput) { (input: List[File]) =>
val next: Option[(List[File], File)] = input.headOption.map { file =>
if(file.isDirectory) {
(file.listFiles.toList.sortBy(_.getName) ++ input.tail) -> file
} else {
input.tail -> file
}
}
next match {
case Some(dn) => print(s"value to unfold: $input\n next value to unfold: ${dn._1}\n next input: ${dn._2}\n")
case None => print(s"value to unfold: $input\n finished unfold\n")
}
next
}
}
def main(args: Array[String]) {
val dir = new File("/david/test")
val res = list(dir).run(Iteratee.foreach(exampleFn))
Await.result(res, Duration.Inf)
}
}
log:
list /david/test
value to unfold: List(/david/test)
next value to unfold: List(/david/test/file1, /david/test/file2, /david/test/file3d, /david/test/file4)
next input: /david/test
processing /david/test
value to unfold: List(/david/test/file1, /david/test/file2, /david/test/file3d, /david/test/file4)
next value to unfold: List(/david/test/file2, /david/test/file3d, /david/test/file4)
next input: /david/test/file1
processing /david/test/file1
value to unfold: List(/david/test/file2, /david/test/file3d, /david/test/file4)
next value to unfold: List(/david/test/file3d, /david/test/file4)
next input: /david/test/file2
processing /david/test/file2
value to unfold: List(/david/test/file3d, /david/test/file4)
next value to unfold: List(/david/test/file3d/file1, /david/test/file3d/file2, /david/test/file4)
next input: /david/test/file3d
processing /david/test/file3d
value to unfold: List(/david/test/file3d/file1, /david/test/file3d/file2, /david/test/file4)
next value to unfold: List(/david/test/file3d/file2, /david/test/file4)
next input: /david/test/file3d/file1
processing /david/test/file3d/file1
value to unfold: List(/david/test/file3d/file2, /david/test/file4)
next value to unfold: List(/david/test/file4)
next input: /david/test/file3d/file2
processing /david/test/file3d/file2
value to unfold: List(/david/test/file4)
next value to unfold: List()
next input: /david/test/file4
processing /david/test/file4
value to unfold: List()
finished unfold
回答4:
This just complements the great answer of @JonasAnso with some logging to help understand it.
$ cd /david/test
$ find .
.
./file1
./file2
./file3d
./file3d/file1
./file3d/file2
./file4
java:
import play.api.libs.iteratee._
import java.io.File
import scala.concurrent.Await
import scala.concurrent.duration.Duration
object ExampleEnumerator2b {
import scala.concurrent.ExecutionContext.Implicits.global
def exampleFn(file: File) { println(s"processing $file") }
def listFiles(dir: File): Enumerator[File] = {
println(s"listFiles. START: $dir")
if (dir.isDirectory) {
val files = dir.listFiles.toList.sortBy(_.getName)
Enumerator(dir) andThen Enumerator(files :_*).flatMap(listFiles)
} else {
Enumerator(dir)
}
}
def main(args: Array[String]) {
val dir = new File("/david/test2")
val res = listFiles(dir).run(Iteratee.foreach(exampleFn))
Await.result(res, Duration.Inf)
}
}
log:
listFiles. START: /david/test
processing /david/test
listFiles. START: /david/test/file1
processing /david/test/file1
listFiles. START: /david/test/file2
processing /david/test/file2
listFiles. START: /david/test/file3d
processing /david/test/file3d
listFiles. START: /david/test/file3d/file1
processing /david/test/file3d/file1
listFiles. START: /david/test/file3d/file2
processing /david/test/file3d/file2
listFiles. START: /david/test/file4
processing /david/test/file4
来源:https://stackoverflow.com/questions/36267374/scala-iteratee-to-recursively-process-files-and-subdirectories