How do you even give an (openFST-made) FST input? Where does the output go?

一世执手 提交于 2019-11-30 08:09:16

One way is to create your machine that performs the transformation. A very simple example would be to upper case a string.

M.wfst

0 0 a A
0 0 b B
0 0 c C
0

The accompanying symbols file contains a line for for each symbols of the alphabet. Note 0 is reserved for null (epsilon) transitions and has special meaning in many of the operations.

M.syms

<epsilon> 0
a 1
b 2
c 3
A 4
B 5
C 6

Then compile the machine

fstcompile --isymbols=M.syms --osymbols=M.syms M.wfst > M.ofst

For an input string "abc" create a linear chain automata, this is a left-to-right chain with an arc for each character. This is an acceptor so we only need a column for the input symbols.

I.wfst

0 1 a
1 2 b
2 3 c
3  

Compile as an acceptor

fstcompile --isymbols=M.syms --acceptor I.wfst > I.ofst

Then compose the machines and print

fstcompose I.ofst M.ofst | fstprint --isymbols=M.syms --osymbols=M.syms 

This will give the output

0   1   a   A
1   2   b   B
2   3   c   C
3

The output of fstcompose is a lattice of all transductions of the input string. (In this case there is only one). If M.ofst is more complicated fstshortestpath can be used to extract n-strings using the flags --unique -nshortest=n. This output is again a transducer, you could either scrap the output of fstprint, or use C++ code and the OpenFst library to run depth first search to extract the strings.

Inserting fstproject --project_output will convert the output to an acceptor containing only the output labels.

fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --osymbols=M.syms 

Gives the following

0  1  A  A
1  2  B  B
2  3  C  C
3

This is an acceptor because the input and output labels are the same, the --acceptor options can be used to generate more succinct output.

 fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --acceptor

The example from Paul Dixon is great. As the OP uses Python I thought I'd add a quick example on how you can "run" transducers with Open FST's Python wrapper. It's a shame that you can not create "linear chain automata" with Open FST, but it's simple to automate as seen below:

def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
    """Produce a linear automata."""
    compiler = fst.Compiler(isymbols=automata_op.input_symbols().copy(), 
                            acceptor=keep_isymbols,
                            keep_isymbols=keep_isymbols, 
                            **kwargs)

    for i, el in enumerate(elements):
        print >> compiler, "{} {} {}".format(i, i+1, el)
    print >> compiler, str(i+1)

    return compiler.compile()

def apply_fst(elements, automata_op, is_project=True, **kwargs):
    """Compose a linear automata generated from `elements` with `automata_op`.

    Args:
        elements (list): ordered list of edge symbols for a linear automata.
        automata_op (Fst): automata that will be applied.
        is_project (bool, optional): whether to keep only the output labels.
        kwargs:
            Additional arguments to the compiler of the linear automata .
    """
    linear_automata = linear_fst(elements, automata_op, **kwargs)
    out = fst.compose(linear_automata, automata_op)
    if is_project:
        out.project(project_output=True)
    return out

Let's define a simple Transducer that uppercases the letter "a":

f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("A", 1)
f_ST.add_symbol("a", 2)
f_ST.add_symbol("b", 3)
compiler = fst.Compiler(isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True)

print >> compiler, "0 0 a A"
print >> compiler, "0 0 b b"
print >> compiler, "0"
caps_A = compiler.compile()
caps_A

Now we can simply apply the transducer using :

apply_fst(list("abab"), caps_A)

Output:

To see how to use it for an acceptor look at my other answer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!