Traditional IO vs memory-mapped

北城以北 提交于 2019-12-05 12:40:55

What I see with the one benchmark "Stream Read/Write" is:

  • It does not really do stream I/O but seeks to a specific location in the file. This is non-buffered so all the I/Os must be completed from disk (the other streams are using buffered I/O so really read/write in large blocks then the ints are read from or written to the memory area).
  • It is seeking to the end - 4 bytes so reads the last int and the writes a new int. The file continues to grow in length by one int every iteration. This really doesn't add much to the time cost though (but does show that the author of that benchmark either misunderstood something or was not careful).

This explains the very high cost of that particular benchmark.

You asked:

Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place?

This is what the author I think was trying to do with the last two benchmarks but that's not what they got. With RandomAccessFile to read and write the same place in the file you would need to put a seek before the read and the write:

raf.seek(raf.length() - 4);
int val = raf.readInt();
raf.seek(raf.length() - 4);
raf.writeInt(val);

This does demonstrate one advantage of memory mapped I/O since you can just use the same memory address to access the same bits of the file instead of having to do an additional seek before every call.

By the way, your first benchmark example class may have issues too since CHUNK_SIZE is not an even multiple of the file system block size. Often it's good to use multiples of 1024 and 8192 has been shown as a good sweet spot for most applications (and the reason the Java's BufferedInputStream and BufferedOutputStream use that value for the default buffer sizes). The OS will need to read an extra block(s) to satisfy read requests that are not on block boundaries. Subsequent reads (of a stream) will reread the same block, possibly some full blocks, and then an extra again. Memory mapped I/O always physically reads and writes in blocks as the actual I/Os are handled by the OS memory manager which would use its page size. Page size is always optimized to map well to file blocks.

In that example, the memory mapped test does read everything into a memory buffer and then write it all back out. These two tests are really not well written to compare those two cases. memmoryMappedCopy should read and write in the same chunk size as customBufferedCopy.

EDIT: There may even be more things wrong with these test classes. Because of your comment to the other answer I looked more carefully at the first class again.

Method customBufferedCopy is static and uses a static buffer. For this kind of test that buffer should be defined within the method. Then it would not need to use synchronized (though it doesn't need it in this context and for these tests anyway). This static method is called as a normal method, which is bad programming practice (i.e. use FileCopy.customBufferedCopy(...) instead of new FileCopy().customBufferedCopy(...)).

If you actually did run this test from multiple threads the use of that buffer would be contentious and the benchmark would not just be about file I/O so it would not be fair to compare the results of the two test methods.

1) These sound like questions your students should be asking - not the other way around?

2) The reason the two methods are used are to demonstrate the different ways that you can copy a file. I would hazard a guess that the first method (RamdomAccessFile) creates a version of the file in RAM, and then copies to a new version on the disk, and that the second method (customBufferedCop) reads directly from the drive.

3) I'm not sure, but I think synchronized is used to ensure that multiple instances of the same class do not write at the same time.

4) As for the last question, I've got to go - so I hope someone else can help you with that.

Seriously though, these sound like just the questions a tutor should be teaching to their students. If you don't have the ability to research simple things like this yourself, what kind of example are you setting your students? </rant>

Thanks for looking in to this. I will look at the first examples later, for now, my professor asked to rewrite the 2 tests (Stream and mapped read/write)
They generate random ints, first read the index (the generated int) and check if the int at this index is equal to the generated int, if it's not equal, the generated int is written at its index. He thought this could result in a better test, making more use of the RandomAccessFile, does this make sence?

However I have some issues, first of all I dont know how to use a buffer with the stream read/write when I'm using RandomAccessFile, I found a lot about byte[] buffers using an array but i'm not sure how to use it correctly.
My code so far for this test:

    new Tester("Stream Read/Write") {
        public void test() throws IOException {
            RandomAccessFile raf = new RandomAccessFile(new File("temp.tmp"), "rw");
            raf.seek(numOfUbuffInts*4);
            raf.writeInt(numOfUbuffInts);
            for (int i = 0; i < numOfUbuffInts; i++) {
                int getal = (int) (1 + Math.random() * numOfUbuffInts);
                raf.seek(getal*4);
                if (raf.readInt() != getal) {
                    raf.seek(getal*4);
                    raf.writeInt(getal);
                }
            }
            raf.close();
        }
    },

So this is still unbuffered..

The second test I did as following:

    new Tester("Mapped Read/Write") {
        public void test() throws IOException {
            RandomAccessFile raf = new RandomAccessFile(new File("temp.tmp"), "rw");
            raf.seek(numOfUbuffInts*4);
            raf.writeInt(numOfUbuffInts);
            FileChannel fc = raf.getChannel();
            IntBuffer ib = fc.map(FileChannel.MapMode.READ_WRITE, 0, fc.size()).asIntBuffer();

            for(int i = 1; i < numOfUbuffInts; i++) {
                int getal = (int) (1 + Math.random() * numOfUbuffInts);
                if (ib.get(getal) != getal) {
                    ib.put(getal, getal);
                }
            }
            fc.close();
        }
    }

For small numbers of numOfUbuffInts it seems to go fast, for large numbers (20 000 000+) it takes ages. I just tried some things but i'm not sure if i'm on the right track.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!