问题
I have a text file which contains a list of words in a precise order. I'm trying to create a function that return an array of words from this file. I managed to retrieve words in the same order as the file like this:
char *readDict(char *fileName) {
int i;
char * lines[100];
FILE *pf = fopen ("francais.txt", "r");
if (pf == NULL) {
printf("Unable to open the file");
} else {
for (i = 0; i < 100; i++) {
lines[i] = malloc(128);
fscanf(pf, "%s", lines[i]);
printf("%d: %s\n", i, lines[i]);
}
fclose(pf);
return *lines;
}
return "NULL";
}
My question is: How can I return an array with random words from the text file; Not as the file words order?
The file looks like this:
exemple1
exemple2
exemple3
exemple4
回答1:
Reservoir sampling allows you to select a random number of elements from a stream of indeterminate size. Something like this could work (although untested):
char **reservoir_sample(const char *filename, int count) {
FILE *file;
char **lines;
char buf[LINE_MAX];
int i, n;
file = fopen(filename, "r");
lines = calloc(count, sizeof(char *));
for (n = 1; fgets(buf, LINE_MAX, file); n++) {
if (n <= count) {
lines[n - 1] = strdup(buf);
} else {
i = random() % n;
if (i < count) {
free(lines[i]);
lines[i] = strdup(buf);
}
}
}
fclose(file);
return lines;
}
This is "Algorithm R":
- Read the first
count
lines into the sample array. - For each subsequent line, replace a random element of the sample array with probability
count / n
, wheren
is the line number. - At the end, the sample contains a set of random lines. (The order is not uniformly random, but you can fix that with a shuffle.)
回答2:
If each line of the file contains one word, one possibility would be to open the file and count the number of lines first. Then rewind()
the file stream and select a random number, sel
, in the range of the number of words in the file. Next, call fgets()
in a loop to read sel
words into a buffer. The last word read can be copied into an array that stores the results. Rewind and repeat for each word desired.
Here is a program that uses the /usr/share/dict/words
file that is typical on Linux systems. Note that if the number of lines in the file is greater than RAND_MAX
(the largest number that can be returned by rand()
), words with greater line numbers will be ignored. This number can be as small as 32767. In the GNU C Library RAND_MAX
is 2147483647.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define MAX_WORD 100
#define NUM_WORDS 10
int main(void)
{
/* Open words file */
FILE *fp = fopen("/usr/share/dict/words", "r");
if (fp == NULL) {
perror("Unable to locate word list");
exit(EXIT_FAILURE);
}
/* Count words in file */
char word[MAX_WORD];
long wc = 0;
while (fgets(word, sizeof word, fp) != NULL) {
++wc;
}
/* Store random words in array */
char randwords[NUM_WORDS][MAX_WORD];
srand((unsigned) time(NULL));
for (size_t i = 0; i < NUM_WORDS; i++) {
rewind(fp);
int sel = rand() % wc + 1;
for (int j = 0; j < sel; j++) {
if (fgets(word, sizeof word, fp) == NULL) {
perror("Error in fgets()");
}
}
strcpy(randwords[i], word);
}
if (fclose(fp) != 0) {
perror("Unable to close file");
}
/* Display results */
for (size_t i = 0; i < NUM_WORDS; i++) {
printf("%s", randwords[i]);
}
return 0;
}
Program output:
biology's
lists
revamping
slitter
loftiness's
concur
solemnity's
memories
winch's
boosting
If blank lines in input are a concern, the selection loop can test for them and reset to select another word when they occur:
/* Store random words in array */
char randwords[NUM_WORDS][MAX_WORD];
srand((unsigned) time(NULL));
for (size_t i = 0; i < NUM_WORDS; i++) {
rewind(fp);
int sel = rand() % wc + 1;
for (int j = 0; j < sel; j++) {
if (fgets(word, sizeof word, fp) == NULL) {
perror("Error in fgets()");
}
}
if (word[0] == '\n') { // if line is blank
--i; // reset counter
continue; // and select another one
}
strcpy(randwords[i], word);
}
Note that if a file contains only blank lines, with the above modification the program would loop forever; it may be safer to count the number of blank lines selected in a row and skip until some reasonable threshold is reached. Better yet to verify that at least one line of the input file is not blank during the initial line-count:
/* Count words in file */
char word[MAX_WORD];
long wc = 0;
long nonblanks = 0;
while (fgets(word, sizeof word, fp) != NULL) {
++wc;
if (word[0] != '\n') {
++nonblanks;
}
}
if (nonblanks == 0) {
fprintf(stderr, "Input file contains only blank lines\n");
exit(EXIT_FAILURE);
}
来源:https://stackoverflow.com/questions/43214157/c-get-random-words-from-text-a-file