Why does my Mapreduce implementation (real world haskell) using iteratee IO also fails with “Too many open files”

送分小仙女□ 提交于 2019-11-30 13:11:39

As the error says, there are too many open files. I expected Haskell to run most of the program sequentially, but some 'sparks' parallel. However, as sclv mentioned, Haskell always sparks the evaluations.

This usually is not a problem in a pure functional program, but it is when dealing with IO (resources). I scaled the parallelism as described in the Real World Haskell book too far up. So my conclusion is to do parallelism only on a limited scale when dealing with IO resources within the sparks. In the pure functional part, excessive parallelism may succeed.

Thus the answer to my post is, to not use MapReduce on the whole program, but within an inner pure functional part.

To show where the program actually failed, i configured it with --enable-executable-profiling -p, build it, and ran it using +RTS -p -hc -L30. Because the executable fails immediately, there is no memory allocation profile. The resulting time allocation profile in the .prof file starts with the following:

                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1            0   0.0    0.3   100.0  100.0
  main                    Main                                                1648           2   0.0    0.0    50.0   98.9
    sumOfDistancesOnFileWithIt MapReduceTest                                  1649           1   0.0    0.0    50.0   98.9
      chunkedFileEnum       MapReduceTest                                     1650           1   0.0    0.0    50.0   98.9
        chunkedEnum          MapReduceTest                                    1651         495   0.0   24.2    50.0   98.9
          lineOffsets         MapReduceTest                                   1652           1  50.0   74.6    50.0   74.6

chunkedEnum returns IO ([Enumerator Text m b], [Handle]), and apparently receives 495 entries. The input file was a 2k line file, so the single entry on lineOffsets returned a list of 2000 offsets. There is not a single entry in distancesUsingMapReduceIt, so the actual work did not even start!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!