How do you even give an (openFST-made) FST input? Where does the output go?

后端未结

关注

 2  465

不知归路 2021-02-07 03:04

Before I start, note that I\'m using the linux shell (via using subprocess.call() from Python), and I am using openFST.

I\'ve been sifting through documents

2条回答

陌清茗 (楼主)

2021-02-07 03:35
One way is to create your machine that performs the transformation. A very simple example would be to upper case a string.

M.wfst
```
0 0 a A
0 0 b B
0 0 c C
0
```
The accompanying symbols file contains a line for for each symbols of the alphabet. Note 0 is reserved for null (epsilon) transitions and has special meaning in many of the operations.

M.syms
```
 0
a 1
b 2
c 3
A 4
B 5
C 6
```
Then compile the machine
```
fstcompile --isymbols=M.syms --osymbols=M.syms M.wfst > M.ofst
```
For an input string "abc" create a linear chain automata, this is a left-to-right chain with an arc for each character. This is an acceptor so we only need a column for the input symbols.

I.wfst
```
0 1 a
1 2 b
2 3 c
3  
```
Compile as an acceptor
```
fstcompile --isymbols=M.syms --acceptor I.wfst > I.ofst
```
Then compose the machines and print
```
fstcompose I.ofst M.ofst | fstprint --isymbols=M.syms --osymbols=M.syms 
```
This will give the output
```
0   1   a   A
1   2   b   B
2   3   c   C
3
```
The output of fstcompose is a lattice of all transductions of the input string. (In this case there is only one). If M.ofst is more complicated fstshortestpath can be used to extract n-strings using the flags --unique -nshortest=n. This output is again a transducer, you could either scrap the output of fstprint, or use C++ code and the OpenFst library to run depth first search to extract the strings.

Inserting fstproject --project_output will convert the output to an acceptor containing only the output labels.
```
fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --osymbols=M.syms 
```
Gives the following
```
0  1  A  A
1  2  B  B
2  3  C  C
3
```
This is an acceptor because the input and output labels are the same, the --acceptor options can be used to generate more succinct output.
```
 fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --acceptor
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...