C - Get random words from text a file

邮差的信 提交于 2020-05-08 15:30:23

问题


I have a text file which contains a list of words in a precise order. I'm trying to create a function that return an array of words from this file. I managed to retrieve words in the same order as the file like this:

char *readDict(char *fileName) {

    int i;

    char * lines[100];
    FILE *pf = fopen ("francais.txt", "r");

    if (pf == NULL) {
        printf("Unable to open the file");
    } else {

        for (i = 0; i < 100; i++) {

            lines[i] = malloc(128);

            fscanf(pf, "%s", lines[i]);

            printf("%d: %s\n", i, lines[i]);
        }


        fclose(pf);

        return *lines;
    }

    return "NULL";
}

My question is: How can I return an array with random words from the text file; Not as the file words order?

The file looks like this:

exemple1
exemple2
exemple3
exemple4

回答1:


Reservoir sampling allows you to select a random number of elements from a stream of indeterminate size. Something like this could work (although untested):

char **reservoir_sample(const char *filename, int count) {
    FILE *file;
    char **lines;
    char buf[LINE_MAX];
    int i, n;

    file = fopen(filename, "r");
    lines = calloc(count, sizeof(char *));
    for (n = 1; fgets(buf, LINE_MAX, file); n++) {
        if (n <= count) {
            lines[n - 1] = strdup(buf);
        } else {
            i = random() % n;
            if (i < count) {
                free(lines[i]);
                lines[i] = strdup(buf);
            }
        }
    }
    fclose(file);

    return lines;
}

This is "Algorithm R":

  • Read the first count lines into the sample array.
  • For each subsequent line, replace a random element of the sample array with probability count / n, where n is the line number.
  • At the end, the sample contains a set of random lines. (The order is not uniformly random, but you can fix that with a shuffle.)



回答2:


If each line of the file contains one word, one possibility would be to open the file and count the number of lines first. Then rewind() the file stream and select a random number, sel, in the range of the number of words in the file. Next, call fgets() in a loop to read sel words into a buffer. The last word read can be copied into an array that stores the results. Rewind and repeat for each word desired.

Here is a program that uses the /usr/share/dict/words file that is typical on Linux systems. Note that if the number of lines in the file is greater than RAND_MAX (the largest number that can be returned by rand()), words with greater line numbers will be ignored. This number can be as small as 32767. In the GNU C Library RAND_MAX is 2147483647.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define MAX_WORD   100
#define NUM_WORDS  10

int main(void)
{
    /* Open words file */
    FILE *fp = fopen("/usr/share/dict/words", "r");

    if (fp == NULL) {
        perror("Unable to locate word list");
        exit(EXIT_FAILURE);
    }

    /* Count words in file */
    char word[MAX_WORD];
    long wc = 0;
    while (fgets(word, sizeof word, fp) != NULL) {
        ++wc;
    }

    /* Store random words in array */
    char randwords[NUM_WORDS][MAX_WORD];
    srand((unsigned) time(NULL));
    for (size_t i = 0; i < NUM_WORDS; i++) {
        rewind(fp);
        int sel = rand() % wc + 1;
        for (int j = 0; j < sel; j++) {
            if (fgets(word, sizeof word, fp) == NULL) {
                perror("Error in fgets()");
            }
        }
        strcpy(randwords[i], word);
    }

    if (fclose(fp) != 0) {
        perror("Unable to close file");
    }

    /* Display results */
    for (size_t i = 0; i < NUM_WORDS; i++) {
        printf("%s", randwords[i]);
    }

    return 0;
}

Program output:

biology's
lists
revamping
slitter
loftiness's
concur
solemnity's
memories
winch's
boosting

If blank lines in input are a concern, the selection loop can test for them and reset to select another word when they occur:

/* Store random words in array */
char randwords[NUM_WORDS][MAX_WORD];
srand((unsigned) time(NULL));
for (size_t i = 0; i < NUM_WORDS; i++) {
    rewind(fp);
    int sel = rand() % wc + 1;
    for (int j = 0; j < sel; j++) {
        if (fgets(word, sizeof word, fp) == NULL) {
            perror("Error in fgets()");
        }
    }
    if (word[0] == '\n') {      // if line is blank
        --i;                    // reset counter
        continue;               // and select another one
    }

    strcpy(randwords[i], word);
}

Note that if a file contains only blank lines, with the above modification the program would loop forever; it may be safer to count the number of blank lines selected in a row and skip until some reasonable threshold is reached. Better yet to verify that at least one line of the input file is not blank during the initial line-count:

/* Count words in file */
char word[MAX_WORD];
long wc = 0;
long nonblanks = 0;
while (fgets(word, sizeof word, fp) != NULL) {
    ++wc;
    if (word[0] != '\n') {
        ++nonblanks;
    }
}
if (nonblanks == 0) {
    fprintf(stderr, "Input file contains only blank lines\n");
    exit(EXIT_FAILURE);
}


来源:https://stackoverflow.com/questions/43214157/c-get-random-words-from-text-a-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!