Multithreading File IO program behaves unpredictably when number of thread is increased

人走茶凉 提交于 2019-12-13 03:37:31

问题


Trying to create 1Mb(1048576Byte) file by writing in various chunk sizes and a different number of threads. When int NUM_THREADS = 2 or int NUM_THREADS = 1 then created file size is same as given i.e. 10MB .

However when I increase thread count to 4, The created file size is around 400MB; Why this anomaly?

#include <pthread.h>
#include <string>
#include <iostream>
#define TenGBtoByte 1048576
#define fileToWrite "/tmp/schatterjee.txt"

using namespace std;
pthread_mutex_t mutexsum;
struct workDetails {
    int threadcount;
    int chunkSize;
    char *data;
};

void *SPWork(void *threadarg) {
    struct workDetails *thisWork;
    thisWork = (struct workDetails *) threadarg;
    int threadcount = thisWork->threadcount;
    int chunkSize = thisWork->chunkSize;
    char *data = thisWork->data;
    long noOfWrites = (TenGBtoByte / (threadcount * chunkSize));
    FILE *f = fopen(fileToWrite, "a+");
    for (long i = 0; i < noOfWrites; ++i) {
        pthread_mutex_lock(&mutexsum);
        fprintf(f, "%s", data);
        fflush (f);
        pthread_mutex_unlock(&mutexsum);
    }
    fclose(f);
    pthread_exit((void *) NULL);
}

int main(int argc, char *argv[]) {
    int blocksize[] = {1024};
    int NUM_THREADS = 2;
    for (int BLOCKSIZE: blocksize) {
        char *data = new char[BLOCKSIZE];
        fill_n(data, BLOCKSIZE, 'x');

        pthread_t thread[NUM_THREADS];
        workDetails detail[NUM_THREADS];
        pthread_attr_t attr;
        int rc;
        long threadNo;
        void *status;

        /* Initialize and set thread detached attribute */
        pthread_mutex_init(&mutexsum, NULL);
        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
        for (threadNo = 0; threadNo < NUM_THREADS; threadNo++) {
            detail[threadNo].threadcount = NUM_THREADS;
            detail[threadNo].chunkSize = BLOCKSIZE;
            detail[threadNo].data = data;
            rc = pthread_create(&thread[threadNo], &attr, SPWork, (void *) &detail[threadNo]);
            if (rc) exit(-1);
        }
        pthread_attr_destroy(&attr);
        for (threadNo = 0; threadNo < NUM_THREADS; threadNo++) {
            rc = pthread_join(thread[threadNo], &status);
            if (rc) exit(-1);
        }
        pthread_mutex_destroy(&mutexsum);
        delete[] data;
    }
    pthread_exit(NULL);
}

N.B. - 1)It's a benchmarking task, so doing as they asked in requirement. 2) long noOfWrites = (TenGBtoByte / (threadcount * chunkSize)); basically computing how many times each thread should write to get the combined size of 10MB. 4)I tried to put Mutex lock at various position . All yeild in same result

Suggestions about other changes in the programme is also welcome


回答1:


You are allocating and initializing your data array like this:

char *data = new char[BLOCKSIZE];
fill_n(data, BLOCKSIZE, 'x');

Then you are writing it to file using fprintf:

fprintf(f, "%s", data);

Function fprintf expects data to be a null-terminated string. This is an undefined behavior already. If this worked with low number of threads, it is because memory after than memory chunk happen to contain zero byte.

Other than that, mutex in your program serves no purpose and can be removed. File locking is also redundant, so you can use fwrite_unlocked and fflush_unlocked to write your data since every thread uses separate FILE object. Essentially all synchronization in your program is performed in the kernel, not in userspace.

Even after removing mutex and using _unlocked functions your program reliably creates 1 MB files regardless of number of threads. So invalid file writing seems to be the only issue you have.




回答2:


@Ivan Yes! Yes! Yes! .You are absolutely right my friend. Except for a small fact. The mutex is necessary. This is the final code. Try removing mutex and file size will be different.

#include <pthread.h>
#include <string>
#include <iostream>
#define TenGBtoByte 1048576
#define fileToWrite "/tmp/schatterjee.txt"

using namespace std;
pthread_mutex_t mutexsum;;
struct workDetails {
    int threadcount;
    int chunkSize;
    char *data;
};

void *SPWork(void *threadarg) {

    struct workDetails *thisWork;
    thisWork = (struct workDetails *) threadarg;
    int threadcount = thisWork->threadcount;
    int chunkSize = thisWork->chunkSize;
    char *data = thisWork->data;
    long noOfWrites = (TenGBtoByte / (threadcount * chunkSize));
    FILE *f = fopen(fileToWrite, "a+");

    for (long i = 0; i < noOfWrites; ++i) {
        pthread_mutex_lock(&mutexsum);
        fprintf(f, "%s", data);
        fflush (f);
        pthread_mutex_unlock(&mutexsum);
    }
    fclose(f);
    pthread_exit((void *) NULL);
}

int main(int argc, char *argv[]) {
    int blocksize[] = {1024};
    int NUM_THREADS = 128;
    for (int BLOCKSIZE: blocksize) {
        char *data = new char[BLOCKSIZE+1];
        fill_n(data, BLOCKSIZE, 'x');
        data[BLOCKSIZE] = NULL;

        pthread_t thread[NUM_THREADS];
        workDetails detail[NUM_THREADS];
        pthread_attr_t attr;
        int rc;
        long threadNo;
        void *status;
        pthread_mutex_init(&mutexsum, NULL);
        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
        for (threadNo = 0; threadNo < NUM_THREADS; threadNo++) {
            detail[threadNo].threadcount = NUM_THREADS;
            detail[threadNo].chunkSize = BLOCKSIZE;
            detail[threadNo].data = data;
            rc = pthread_create(&thread[threadNo], &attr, SPWork, (void *) &detail[threadNo]);
            if (rc) exit(-1);
        }
        pthread_attr_destroy(&attr);
        for (threadNo = 0; threadNo < NUM_THREADS; threadNo++) {
            rc = pthread_join(thread[threadNo], &status);
            if (rc) exit(-1);
        }
        pthread_mutex_destroy(&mutexsum);
        delete[] data;
    }
    pthread_exit(NULL);
} 


来源:https://stackoverflow.com/questions/49248431/multithreading-file-io-program-behaves-unpredictably-when-number-of-thread-is-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!