.tmlanguage
files work by defining a list of key value pairs. Regular expressions are the keys and the type of syntax is the value. This is done in the following XM
Yes. The .tmlanguage
format was originally used by TextMate. The TextMate manual provides full documentation for the format, including the possible types of language constructs.
Copied from the relevant docs page, in hierarchical format:
comment
— for comments.
line
— line comments, we specialize further so that the type of comment start character(s) can be extracted from the scope
double-slash
—//
commentdouble-dash
—--
commentnumber-sign
—#
commentpercentage
—%
commentcharacter
— other types of line comments.block
— multi-line comments like/* … */
and<!-- … -->
.
documentation
— embedded documentation.constant
— various forms of constants.
numeric
— those which represent numbers, e.g.42
,1.3f
,0x4AB1U
.character
— those which represent characters, e.g.<
,\e
,\031
.escape
— escape sequences like\e
would beconstant.character.escape
.language
— constants (generally) provided by the language which are “special” liketrue
,false
,nil
,YES
,NO
, etc.other
— other constants, e.g. colors in CSS.entity
— an entity refers to a larger part of the document, for example a chapter, class, function, or tag. We do not scope the entire entity asentity.*
(we usemeta.*
for that). But we do useentity.*
for the “placeholders” in the larger entity, e.g. if the entity is a chapter, we would useentity.name.section
for the chapter title.
name
— we are naming the larger entity.
function
— the name of a function.type
— the name of a type declaration or class.tag
— a tag name.section
— the name is the name of a section/heading.other
— other entities.
inherited-class
— the superclass/baseclass name.attribute-name
— the name of an attribute (mainly in tags). we are naming the larger entity.invalid
— stuff which is “invalid”.
illegal
— illegal, e.g. an ampersand or lower-than character in HTML (which is not part of an entity/tag).deprecated
— for deprecated stuff e.g. using an API function which is deprecated or using styling with strict HTML.keyword
— keywords (when these do not fall into the other groups).
control
— mainly related to flow control like continue, while, return, etc.operator
— operators can either be textual (e.g. or) or be characters.other
— other keywords.markup
— this is for markup languages and generally applies to larger subsets of the text.
underline
— underlined text.
link
— this is for links, as a convenience this is derived frommarkup.underline
so that if there is no theme rule which specifically targetsmarkup.underline.link
then it will inherit the underline style.bold
— bold text (text which isstrong
and similar should preferably be derived from this name).heading
— a section header. Optionally provide the heading level as the next element, for examplemarkup.heading.2.html
for<h2>…</h2>
in HTML.italic
— italic text (text which isem
phasized and similar should preferably be derived from this name).list
— list items.
numbered
— numbered list items.unnumbered
— unnumbered list items.quote
— quoted (sometimes block quoted) text.raw
— text which is verbatim, e.g. code listings. Normally spell checking is disabled formarkup.raw
.other
— other markup constructs.meta
— the meta scope is generally used to markup larger parts of the document. For example the entire line which declares a function would bemeta.function
and the subsets would bestorage.type
,entity.name.function
,variable.parameter
etc. and only the latter would be styled. Sometimes the meta part of the scope will be used only to limit the more general element that is styled, most of the time meta scopes are however used in scope selectors for activation of bundle items. For example in Objective-C there is a meta scope for the interface declaration of a class and the implementation, allowing the same tab-triggers to expand differently, depending on context.storage
— things relating to “storage”.
type
— the type of something,class
,function
,int
,var
, etc.modifier
— a storage modifier likestatic
,final
,abstract
, etc.string
— strings.
quoted
— quoted strings.
single
— single quoted strings:'foo'
.double
— double quoted strings:"foo"
.triple
— triple quoted strings:"""Python"""
.other
— other types of quoting:$'shell'
,%s{...}
.unquoted
— for things like here-docs and here-strings.interpolated
— strings which are “evaluated”:`date`
,$(pwd)
.regexp
— regular expressions:/(\w+)/
.other
— other types of strings (should rarely be used).support
— things provided by a framework or library should be below support.
function
— functions provided by the framework/library. For exampleNSLog
in Objective-C issupport.function
.class
— when the framework/library provides classes.type
— types provided by the framework/library, this is probably only used for languages derived from C, which hastypedef
(andstruct
). Most other languages would introduce new types as classes.constant
— constants (magic values) provided by the framework/library.variable
— variables provided by the framework/library. For exampleNSApp
in AppKit.other
— the above should be exhaustive, but for everything else usesupport.other
.variable
— variables. Not all languages allow easy identification (and thus markup) of these.
parameter
— when the variable is declared as the parameter.language
— reserved language variables likethis
,super
,self
, etc.other
— other variables, like$some_variables
.
For a basic introduction, check out the Language Grammars section of the TextMate Manual. The Naming Conventions section describes some of the base scopes, like comment
, keyword
, meta
, storage
, etc. These classes can then be subclassed to give as much detail as possible - for example, constant.numeric.integer.long.hexadecimal.python
. However, it is very important to note that these are not hard-and-fast rules - just suggestions. This will become obvious as you scan through different language definitions and see, for example, all the different ways that functions are scoped - meta.function-call
, support.function.name
, meta.function-call punctuation.definition.parameters
, etc.
The best way to learn about scopes is to examine existing .tmLanguage
files, and to look through the source of different languages and see what scopes are assigned where. The XML format is very difficult to casually browse through, so I use the excellent PackageDev plugin to translate the XML to YAML. It is then much easier to scan and see what scopes are described by what regexes:
Another way to learn is to see how different language constructs are scoped, and for that I highly recommend using ScopeAlways. Once installed and activated, just place your cursor and the scope(s) that apply to that particular position are shown in the status bar. This is particularly useful when designing color schemes, as you can easily see which selectors will highlight a language feature of interest.
If you're interested, the color scheme used here is Neon, which I designed to make as many languages as possible look as good as possible, covering as many scopes as possible. Feel free to look through it to see how the different language elements are highlighted; this could also help you in designing your .tmLanguage
to be consistent with other languages.
I hope all this helps, good luck!