Access File through multiple threads

后端 未结 10 799
天涯浪人
天涯浪人 2021-01-31 10:52

I want to access a large file (file size may vary from 30 MB to 1 GB) through 10 threads and then process each line in the file and write them to another file through 10 threads

相关标签:
10条回答
  • 2021-01-31 11:42

    Since order need to be maintained, so problem in itself says that reading and writing cannot be done in parallel as it is sequential process, the only thing that you can do in parallel is processing of records but that also doesnt solve much with only one writer.

    Here is a design proposal:

    1. Use One Thread t1 to read file and store data into a LinkedBlockingQueue Q1
    2. Use another Thread t2 to read data from Q1 and put into another LinkedBlockingQueue Q2
    3. Thread t3 reads data from Q2 and writes into a file.
    4. To make sure that you dont encounter OutofMemoryError you should initialize Queues with appropriate size
    5. You can use a CyclicBarrier to ensure all thread complete their operation
    6. Additionally you can set an Action in CyclicBarrier where you can do your post processing tasks.

    Good Luck, hoping you get the best design.

    Cheers !!

    0 讨论(0)
  • 2021-01-31 11:43

    Spring Batch comes to mind.

    Maintaining the order would require a post process step i.e Store the read index/key ordered in the processing context.The processing logic should store the processed information in context as well.Once processing is done you can then post process the list and write to file.

    Beware of OOM issues though.

    0 讨论(0)
  • 2021-01-31 11:47

    I have faced similar problem in past. Where i have to read data from single file, process it and write result in other file. Since processing part was very heavy. So i tried to use multiple threads. Here is the design which i followed to solve my problem:

    • Use main program as master, read the whole file in one go (but dont start processing). Create one data object for each line with its sequence order.
    • Use one priorityblockingqueue say queue in main, add these data objects into it. Share refernce of this queue in constructor of every thread.
    • Create different processing units i.e. threads which will listen on this queue. When we add data objects to this queue, we will call notifyall method. All threads will process individually.
    • After processing, put all results in single map and put results against with key as its sequence number.
    • When queue is empty and all threads are idle, means processing is done. Stop the threads. Iterate over map and write results to a file
    0 讨论(0)
  • 2021-01-31 11:48

    You can do this using FileChannel in java which allows multiple threads to access the same file. FileChannel allows you to read and write starting from a position. See sample code below:

    import java.io.*;
    import java.nio.*;
    import java.nio.channels.*;
    
    public class OpenFile implements Runnable
    {
        private FileChannel _channel;
        private FileChannel _writeChannel;
        private int _startLocation;
        private int _size;
    
        public OpenFile(int loc, int sz, FileChannel chnl, FileChannel write)
        {
            _startLocation = loc;
            _size = sz;
            _channel = chnl;
            _writeChannel = write;
        }
    
        public void run()
        {
            try
            {
                System.out.println("Reading the channel: " + _startLocation + ":" + _size);
                ByteBuffer buff = ByteBuffer.allocate(_size);
                if (_startLocation == 0)
                    Thread.sleep(100);
                _channel.read(buff, _startLocation);
                ByteBuffer wbuff = ByteBuffer.wrap(buff.array());
                int written = _writeChannel.write(wbuff, _startLocation);
                System.out.println("Read the channel: " + buff + ":" + new String(buff.array()) + ":Written:" + written);
            }
            catch (Exception e)
            {
                e.printStackTrace();
            }
        }
    
        public static void main(String[] args)
            throws Exception
        {
            FileOutputStream ostr = new FileOutputStream("OutBigFile.dat");
            FileInputStream str = new FileInputStream("BigFile.dat");
            String b = "Is this written";
            //ostr.write(b.getBytes());
            FileChannel chnl = str.getChannel();
            FileChannel write = ostr.getChannel();
            ByteBuffer buff = ByteBuffer.wrap(b.getBytes());
            write.write(buff);
            Thread t1 = new Thread(new OpenFile(0, 10000, chnl, write));
            Thread t2 = new Thread(new OpenFile(10000, 10000, chnl, write));
            Thread t3 = new Thread(new OpenFile(20000, 10000, chnl, write));
            t1.start();
            t2.start();
            t3.start();
            t1.join();
            t2.join();
            t3.join();
            write.force(false);
            str.close();
            ostr.close();
        }
    }
    

    In this sample, there are three threads reading the same file and writing to the same file and do not conflict. This logic in this sample has not taken into consideration that the sizes assigned need not end at a line end etc. You will have find the right logic based on your data.

    0 讨论(0)
提交回复
热议问题