Finding out what characters a given font supports

前端 未结 12 1506
你的背包
你的背包 2020-11-30 22:24

How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?

Is there a tool or a library I can use to process a .tt

相关标签:
12条回答
  • 2020-11-30 22:44

    Here is a method using the FontTools module (which you can install with something like pip install fonttools):

    #!/usr/bin/env python
    from itertools import chain
    import sys
    
    from fontTools.ttLib import TTFont
    from fontTools.unicode import Unicode
    
    ttf = TTFont(sys.argv[1], 0, verbose=0, allowVID=0,
                    ignoreDecompileErrors=True,
                    fontNumber=-1)
    
    chars = chain.from_iterable([y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables)
    print(list(chars))
    
    # Use this for just checking if the font contains the codepoint given as
    # second argument:
    #char = int(sys.argv[2], 0)
    #print(Unicode[char])
    #print(char in (x[0] for x in chars))
    
    ttf.close()
    

    The script takes as argument the font path :

    python checkfont.py /path/to/font.ttf
    
    0 讨论(0)
  • 2020-11-30 22:44

    The above Janus's answer (https://stackoverflow.com/a/19438403/431528) works. But python is too slow, especially for Asian fonts. It costs minutes for a 40MB file size font on my E5 computer.

    So I write a little C++ program to do that. It is depends on FreeType2(https://www.freetype.org/). It is a vs2015 project, but it is easy to port to linux for it is a console application.

    Code can be found here, https://github.com/zhk/AllCodePoints For the 40MB file size Asian font, it costs about 30 ms on my E5 computer.

    0 讨论(0)
  • 2020-11-30 22:46

    You can do this on Linux in Perl using the Font::TTF module.

    0 讨论(0)
  • 2020-11-30 22:46

    If you want to get all characters supported by a font, you may use the following (based on Janus's answer)

    from fontTools.ttLib import TTFont
    
    def get_font_characters(font_path):
        with TTFont(font_path) as font:
            characters = {chr(y[0]) for x in font["cmap"].tables for y in x.cmap.items()}
        return characters
    
    0 讨论(0)
  • 2020-11-30 22:49

    Here is a POSIX[1] shell script that can print the code point and the character in a nice and easy way with the help of fc-match which is mentioned in Neil Mayhew's answer (it can even handle up to 8-hex-digit Unicode):

    #!/bin/sh
    for range in $(fc-match --format='%{charset}\n' "$1"); do
        for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
            n_hex=$(printf "%04x" "$n")
            # using \U for 5-hex-digits
            printf "%-5s\U$n_hex\t" "$n_hex"
            count=$((count + 1))
            if [ $((count % 10)) = 0 ]; then
                printf "\n"
            fi
        done
    done
    printf "\n"
    

    You can pass the font name or anything that fc-match accepts:

    $ ls-chars "DejaVu Sans"
    

    Updated content:

    I learned that subshell is very time consuming (the printf subshell in my script). So I managed to write a improved version that is 5-10 times faster!

    #!/bin/sh
    for range in $(fc-match --format='%{charset}\n' "$1"); do
        for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
            printf "%04x\n" "$n"
        done
    done | while read -r n_hex; do
        count=$((count + 1))
        printf "%-5s\U$n_hex\t" "$n_hex"
        [ $((count % 10)) = 0 ] && printf "\n"
    done
    printf "\n"
    

    Old version:

    $ time ls-chars "DejaVu Sans" | wc
        592   11269   52740
    
    real    0m2.876s
    user    0m2.203s
    sys     0m0.888s
    

    New version (the line number indicates 5910+ characters, in 0.4 seconds!):

    $ time ls-chars "DejaVu Sans" | wc
        592   11269   52740
    
    real    0m0.399s
    user    0m0.446s
    sys     0m0.120s
    

    End of update

    Sample output (it aligns better in my st terminal

    0 讨论(0)
  • 2020-11-30 22:51

    fc-query my-font.ttf will give you a map of supported glyphs and all the locales the font is appropriate for according to fontconfig

    Since pretty much all modern linux apps are fontconfig-based this is much more useful than a raw unicode list

    The actual output format is discussed here http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html

    0 讨论(0)
提交回复
热议问题