Finding out what characters a given font supports

前端未结

关注

 12  1520

你的背包

How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?

Is there a tool or a library I can use to process a .tt

相关标签:

12条回答

孤独总比滥情好

2020-11-30 22:44

Here is a method using the FontTools module (which you can install with something like pip install fonttools):

#!/usr/bin/env python
from itertools import chain
import sys

from fontTools.ttLib import TTFont
from fontTools.unicode import Unicode

ttf = TTFont(sys.argv[1], 0, verbose=0, allowVID=0,
                ignoreDecompileErrors=True,
                fontNumber=-1)

chars = chain.from_iterable([y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables)
print(list(chars))

# Use this for just checking if the font contains the codepoint given as
# second argument:
#char = int(sys.argv[2], 0)
#print(Unicode[char])
#print(char in (x[0] for x in chars))

ttf.close()

The script takes as argument the font path :

python checkfont.py /path/to/font.ttf

0 讨论(0)

抹茶落季

2020-11-30 22:44

The above Janus's answer (https://stackoverflow.com/a/19438403/431528) works. But python is too slow, especially for Asian fonts. It costs minutes for a 40MB file size font on my E5 computer.

So I write a little C++ program to do that. It is depends on FreeType2(https://www.freetype.org/). It is a vs2015 project, but it is easy to port to linux for it is a console application.

Code can be found here, https://github.com/zhk/AllCodePoints For the 40MB file size Asian font, it costs about 30 ms on my E5 computer.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-30 22:46

You can do this on Linux in Perl using the Font::TTF module.

0 讨论(0)
发布评论:

提交评论
- 加载中...

情书的邮戳

2020-11-30 22:46

If you want to get all characters supported by a font, you may use the following (based on Janus's answer)

from fontTools.ttLib import TTFont

def get_font_characters(font_path):
    with TTFont(font_path) as font:
        characters = {chr(y[0]) for x in font["cmap"].tables for y in x.cmap.items()}
    return characters

0 讨论(0)

死守一世寂寞

2020-11-30 22:49

Here is a ~~POSIX~~[1] shell script that can print the code point and the character in a nice and easy way with the help of fc-match which is mentioned in Neil Mayhew's answer (it can even handle up to 8-hex-digit Unicode):

#!/bin/sh
for range in $(fc-match --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        n_hex=$(printf "%04x" "$n")
        # using \U for 5-hex-digits
        printf "%-5s\U$n_hex\t" "$n_hex"
        count=$((count + 1))
        if [ $((count % 10)) = 0 ]; then
            printf "\n"
        fi
    done
done
printf "\n"

You can pass the font name or anything that fc-match accepts:

$ ls-chars "DejaVu Sans"

Updated content:

I learned that subshell is very time consuming (the printf subshell in my script). So I managed to write a improved version that is 5-10 times faster!

#!/bin/sh
for range in $(fc-match --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        printf "%04x\n" "$n"
    done
done | while read -r n_hex; do
    count=$((count + 1))
    printf "%-5s\U$n_hex\t" "$n_hex"
    [ $((count % 10)) = 0 ] && printf "\n"
done
printf "\n"

Old version:

$ time ls-chars "DejaVu Sans" | wc
    592   11269   52740

real    0m2.876s
user    0m2.203s
sys     0m0.888s

New version (the line number indicates 5910+ characters, in 0.4 seconds!):

$ time ls-chars "DejaVu Sans" | wc
    592   11269   52740

real    0m0.399s
user    0m0.446s
sys     0m0.120s

End of update

Sample output (it aligns better in my st terminal

0 讨论(0)

佛祖请我去吃肉

2020-11-30 22:51

fc-query my-font.ttf will give you a map of supported glyphs and all the locales the font is appropriate for according to fontconfig

Since pretty much all modern linux apps are fontconfig-based this is much more useful than a raw unicode list

The actual output format is discussed here http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页