Full disclosure: I\'m working on my libui GUI framework\'s text API. This wraps DirectWrite on Windows, Core Text on OS X, and Pango (which uses HarfBuzz for OpenType shaping) o
After some discussions with Peter Sikking and Ebrahim Byagowi, I went and debugged a more general-purpose program I built quickly to test things, and I figured out what's going on internally.
First, however, I will say this applies to Uniscribe and DirectWrite equally.
As it turns out, DirectWrite is always providing a set of default OpenType features, regardless of what feature set I use! The situation is that the list of default features provided differs depending on whether I load my own features or not, and depending on the shaping engine. For the latn
script in horizontal writing mode and for English, this is done with the "generic engine".
If I don't provide any features, the generic engine will load script-specific features. For horizontal latn
, this list is
locl
ccmp
rlig
rclt
calt
liga
clig
If I do provide features, the generic engine will use the same default list for all scripts:
locl
ccmp
rclt
rlig
mark
mkmk
dist
So I don't know what to do about this. I could probably just provide liga
and a few others myself in libui code (marked as a HACK
of course), but this is still weird. I'm not sure what the motivation is either. Either way, this explains the behavior I'm seeing.
Supposing your question in general is about programming or at least concerns programming, I will try and give answers to some of your interrogative sentences.
would I have to drop the use of IDWriteTextLayout entirely in my code if I want to be able to add typographical features on top of the defaults?
It depends. If an IDWriteTextLayout interface suits well your project tasks in all ways except ease of variation of DirectWrite default typographic features, learn what you should about typography and create an IDWriteTypography instance suitable for your needs. Developing a custom text layout for the program may require substantial time and effort, especially if the program is supposed to render bidirectional texts, complex scripts, inline objects, etc.
It may happen that the tasks of your project require to develop a text layout engine for reasons other than just controlling typographic features used in rendered text. For example, your manager/customer may ask for implementation of customized linebreaking opportunities or a glyph advance justification algorithm. In this scenario, you will implement an IDWriteTextAnalizer::GetGlyphs method. This method has parameters DWRITE_TYPOGRAPHIC_FEATURES ** features, const UINT32 * featureRangeLengths, UINT32 featureRanges, and this parameters enable you to supersede a set of "default" typography features for a range of the text to be rendered (see my answer to the other question What are the default typography settings used by IDWriteTextLayout?). Only affected features will be altered; the other features has their "default" values. Morever, if you omit this parameters in a GetGlyphs call for the next text range (for example, use values of NULL, NULL, 0), the features altered in the previous GetGlyphs call will not be altered by the call for this next range.
the documentation for the equivalent SCRIPT_ANALYSIS type says that its script ID is an "[opaque] value" whose "value for this member is undefined and applications should not rely on its value being the same from one release to the next". And while I can get a language code to identify the script by, there's still no defined value other than LANG_ENGLISH for "Western" (Latin?) scripts.
Strictly speaking, this is not an interrogative statement, but I guess you are dissatisfied with how these Unicode script IDs are defined and how one can use the API with so vaguely defined structures and constants.
It may be off topic, but I risk to hypothesize on the origin of the "Unicode script ID" values. As of 2010-07-17, the Unicode, Inc. published The Unicode 6.0 version. The standard contained the document http://www.unicode.org/Public/6.0.0/ucd/PropertyValueAliases.txt, with a section containing a list of scripts. The list went so:
# Script (sc)
sc ; Arab ; Arabic
sc ; Armi ; Imperial_Aramaic
etc.
The Arabic script is #1, the Cyrillic script is #20, the Latin script is #47 in this list. Furthermore, elsewhere I saw this list starting with scripts Common and Inherited. It places the Arabic script to the 3rd, the Cyrillic to the 22nd, and the Latin to the 49th place. These ordinals are familiar to you, aren't they?
Fortunately, we need not rely on the "Unicode script ID" values; we need script properties, not script IDs or abbreviations. The API is self-consistent in that it gives actual script properties for the text range, when we pass to a GetScriptProperties method the number derived from an AnalyzeScript call.