问题
i try to count files in folder, but readdir function skip on files that contains unicode characters. I am using dirent, in c.
int filecount(char* path)
{
int file_Count=0;
DIR* dirp;
struct dirent * entry;
dirp = opendir(path);
while((entry=readdir(dirp)) !=NULL)
{
if(entry->d_type==DT_REG)
{
++file_Count;
}
}
closedir(dirp);
return file_Count;
}
回答1:
Testing on Mac OS X 10.9.1 Mavericks, I adapted your code into the following complete program:
#include <dirent.h>
#include <stdio.h>
static
int filecount(char *path)
{
int file_Count = 0;
DIR *dirp;
struct dirent *entry;
dirp = opendir(path);
while ((entry = readdir(dirp)) != NULL)
{
printf("Found (%llu)(%d): %s\n", entry->d_ino, entry->d_type, entry->d_name);
if (entry->d_type == DT_REG)
{
++file_Count;
}
}
closedir(dirp);
return file_Count;
}
static void proc_dir(char *dir)
{
printf("Processing %s:\n", dir);
printf("File count = %d\n", filecount(dir));
}
int main(int argc, char **argv)
{
if (argc > 1)
{
for (int i = 1; i < argc; i++)
proc_dir(argv[i]);
}
else
proc_dir(".");
return 0;
}
Notably, it lists each entry as it is returned — inode, type and name. On Mac OS X, I got told that the inode type was __uint64_t
aka unsigned long long
, hence the use of %llu
for the format; YMMV on that.
I also created a folder utf8
and in the folder created files:
total 32
-rw-r--r-- 1 jleffler eng 6 Jan 7 12:14 ÿ-y-umlaut
-rw-r--r-- 1 jleffler eng 6 Jan 7 12:15 £
-rw-r--r-- 1 jleffler eng 6 Jan 7 12:14 €
-rw-r--r-- 1 jleffler eng 6 Jan 7 12:15 ™
Each file contained Hello
plus a newline. When I run the command (I called it fc
), it gives:
$ ./fc utf8
Processing utf8:
Found (8138036)(4): .
Found (377579)(4): ..
Found (8138046)(8): ÿ-y-umlaut
Found (8138067)(8): £
Found (8138054)(8): €
Found (8138078)(8): ™
File count = 4
$
The Euro symbol € is U+20AC EURO SIGN, which is way outside the range of ordinary single-byte code sets. The pound symbol £ is U+00A3 POUND SIGN, so that's in the range of the Latin 1 alphabet (ISO 8859-1, 8859-15). The trademark symbol ™ is U+2122 TRADE MARK SIGN, also outside the range of ordinary single-byte code sets.
This shows that on at least some platforms, readdir()
works perfectly well with UTF-8 encoded file names using Unicode characters that are not in the Latin1 character set. It also demonstrates how I'd go about debugging the problem — and/or illustrates what I'd like you to run (the program above) and the sort of directory you should run it on to make your case that readdir()
on your platform does not like Unicode file names.
回答2:
Try to change
if(entry->d_type==DT_REG)
to
if((entry->d_type==DT_REG || entry->d_type==DT_UNKNOWN)
&& strcmp(entry->d_name,".")==0 && strcmp(entry->d_name,"..")==0)
which should enable you to count these files by further counting files of unknown types.
Note that, strcmp(entry->d_name,".")==0
and strcmp(entry->d_name,"..")==0
are used to exclude sub-directories.
来源:https://stackoverflow.com/questions/20979727/dirent-not-working-with-unicode