LLVM学习笔记（51）

3.10. X86折叠表的生成（v7.0）

指令折叠是在寄存器分配过程中执行的优化，目的是删除不必要的拷贝指令。例如，这样的一组指令：

%EBX = LOAD %mem_address

%EAX = COPY %EBX

可以被安全地替换为单条指令：

%EAX = LOAD %mem_address

通常，后者较前者更小，更快。

在X86中，许多指令有内存形式与寄存器形式，即通过内存或寄存器传递操作数。为了执行指令折叠优化，需要某种方式将这两种形式的同一条指令关联起来。目前LLVM维护了几张体量庞大的表用于关联这些指令。

Intel以色列的雇员Ayman Musa开发了这部分的codegen，希望能通过TableGen自动生成这些表。不过这部分代码目前因为尚有bug，没有默认生成。这里有关于这个bug的描述。

总的来说，这是因为Musa的主要假设是同一条指令的这两个形式共享相同的编码信息，除了负责“如何向指令传递参数”的部分，其中一个版本定义其中一个参数是寄存器，另一个版本定义它为内存操作数，而其他参数以相同方式传递。

但实际上，某些指令在寄存器与内存形式间有不同与上面假设的情形，例如bt（比特测试）指令，其寄存器形式寄存器参数相关比特是第4、5、6比特（依赖于模式与寄存器大小），而内存形式相关比特是第16、32或64比特。

这导致了编译错误。Musa建议特别处理这些指令。目前，这部分的codegen默认是关闭的，防止X86加入新指令时导致生成错误的表。可以在编译llvm时使用选项X86_GEN_FOLD_TABLES=ON来打开这个codegen。

与这个codegen相关，在td层面，在X86Inst定义里原有isMemoryFoldable域，它缺省设置为1。后来又添加了一个类：class NotMemoryFoldable { bit isMemoryFoldable = 0; }，所有不能折叠的指令使用它作为一个基类（它与X86inst原有的isMemoryFoldable是重复的，TableGen比较奇怪，域可以重复声明，反正最后出现的获胜。因此有时会出现有点难看的扩展）。

561 void X86FoldTablesEmitter::run(raw_ostream &OS) {

562 emitSourceFileHeader("X86 fold tables", OS);

563

564 // Holds all memory instructions

565 std::vector<const CodeGenInstruction *> MemInsts;

566 // Holds all register instructions - divided according to opcode.

567 std::map<uint8_t, std::vector<const CodeGenInstruction *>> RegInsts;

568

569 ArrayRef<const CodeGenInstruction *> NumberedInstructions =

570 Target.getInstructionsByEnumValue();

571

572 for (const CodeGenInstruction *Inst : NumberedInstructions) {

573 if (!Inst->TheDef->getNameInit() || !Inst->TheDef->isSubClassOf("X86Inst"))

574 continue;

575

576 const Record *Rec = Inst->TheDef;

577

578 // - Do not proceed if the instruction is marked as notMemoryFoldable.

579 // - Instructions including RST register class operands are not relevant

580 // for memory folding (for further details check the explanation in

581 // lib/Target/X86/X86InstrFPStack.td file).

582 // - Some instructions (listed in the manual map above) use the register

583 // class ptr_rc_tailcall, which can be of a size 32 or 64, to ensure

584 // safe mapping of these instruction we manually map them and exclude

585 // them from the automation.

586 if (Rec->getValueAsBit("isMemoryFoldable") == false ||

587 hasRSTRegClass(Inst) || hasPtrTailcallRegClass(Inst))

588 continue;

589

590 // Add all the memory form instructions to MemInsts, and all the register

591 // form instructions to RegInsts[Opc], where Opc in the opcode of each

592 // instructions. this helps reducing the runtime of the backend.

593 if (hasMemoryFormat(Rec))

594 MemInsts.push_back(Inst);

595 else if (hasRegisterFormat(Rec)) {

596 uint8_t Opc = getValueFromBitsInit(Rec->getValueAsBitsInit("Opcode"));

597 RegInsts[Opc].push_back(Inst);

598 }

599 }

569行getInstructionsByEnumValue()获取排序好的指令队列。在572行的循环首先筛选处具名的、X86Inst派生指令。如果候选者没有把isMemoryFoldable设置为true，或者使用了RST寄存器（栈寄存器），或者可以尾调用优化，则不在这个循环考虑范围内。

对合格的候选，如果其操作数是内存形式，就把它记录在容器MemInsts里；而如果其操作数是寄存器形式，把它保存在容器RegInsts中，以其操作码为键值。

注意：对比较复杂的指令形式，比如X86与PPC，在其TD形式的指令定义里会使用一个Format域（参考X86Inst的定义）。Format是带有一个整数值的定义，这些值含义在X86RecognizableInstr.h中的列出，在名字空间X86Local下。

操作数具有内存形式的指令使用MRMDestMem（值32）到MRM7m（值47），操作数具有寄存器形式的指令使用MRMDestReg（值48）到MRM7r（值63）。（值为0是伪指令）。

X86FoldTablesEmitter::run（续）

601 // For each memory form instruction, try to find its register form

602 // instruction.

603 for (const CodeGenInstruction *MemInst : MemInsts) {

604 uint8_t Opc =

605 getValueFromBitsInit(MemInst->TheDef->getValueAsBitsInit("Opcode"));

606

607 if (RegInsts.count(Opc) == 0)

608 continue;

609

610 // Two forms (memory & register) of the same instruction must have the same

611 // opcode. try matching only with register form instructions with the same

612 // opcode.

613 std::vector<const CodeGenInstruction *> &OpcRegInsts =

614 RegInsts.find(Opc)->second;

615

616 auto Match = find_if(OpcRegInsts, IsMatch(MemInst, Records));

617 if (Match != OpcRegInsts.end()) {

618 const CodeGenInstruction *RegInst = *Match;

619 // If the matched instruction has it's "FoldGenRegForm" set, map the

620 // memory form instruction to the register form instruction pointed by

621 // this field

622 if (RegInst->TheDef->isValueUnset("FoldGenRegForm")) {

623 updateTables(RegInst, MemInst);

624 } else {

625 const CodeGenInstruction *AltRegInst =

626 getAltRegInst(RegInst, Records, Target);

627 updateTables(AltRegInst, MemInst);

628 }

629 OpcRegInsts.erase(Match);

630 }

631 }

在TD的指令定义里，相同操作但操作数大小不同的指令共享同一个Opcode，因此613行根据Opcode获取寄存器版本指令时，会得到多条指令。以cvtps2pi与cvtpd2pi为例，它们都是将两个浮点值转换为两个整数值，只是浮点值分别是单精度与双精度。在X86InstrMMX.td中，给出这样的定义：

509 defm MMX_CVTPS2PI : sse12_cvt_pint<0x2D, VR128, VR64, int_x86_sse_cvtps2pi,

510 f64mem, load, "cvtps2pi\t{$src, $dst|$dst, $src}",

511 WriteCvtPS2I, SSEPackedSingle>, PS;

512 defm MMX_CVTPD2PI : sse12_cvt_pint<0x2D, VR128, VR64, int_x86_sse_cvtpd2pi,

513 f128mem, memop, "cvtpd2pi\t{$src, $dst|$dst, $src}",

514 WriteCvtPD2I, SSEPackedDouble>, PD;

0x2D就是Opcode；WriteCvtPS2I与WriteCvtPD2I是X86SchedWritePair的派生定义，描述了指令对资源的使用情况；SSEPackedSingle与SSEPackedDouble描述了SSE的执行域，这里区分单双精度；PS与PD描述了作为Opcode扩展的前缀。sse12_cvt_pint的定义则是：

128 multiclass sse12_cvt_pint<bits<8> opc, RegisterClass SrcRC, RegisterClass DstRC,

129 Intrinsic Int, X86MemOperand x86memop, PatFrag ld_frag,

130 string asm, X86FoldableSchedWrite sched, Domain d> {

131 def irr : MMXPI<opc, MRMSrcReg, (outs DstRC:$dst), (ins SrcRC:$src), asm,

132 [(set DstRC:$dst, (Int SrcRC:$src))], d>,

133 Sched<[sched]>;

134 def irm : MMXPI<opc, MRMSrcMem, (outs DstRC:$dst), (ins x86memop:$src), asm,

135 [(set DstRC:$dst, (Int (ld_frag addr:$src)))], d>,

136 Sched<[sched.Folded]>;

137 }

MRMSrcReg与MRMSrcMem描述了格式，区分这两种指令形式。

因此要找出匹配的对应版本，需要借助于IsMatch类。这个类正是Musa关于两个版本编码格式方面假设的体现。

在之前指令选择里我们看到过，有些指令在选择的过程里会通过Pattern匹配到另一条指令（参考另一种匹配方式），即实际使用的是另一条指令。为了对应这种情形，在X86Inst定义里引入了FoldGenRegForm域，指定替换的指令。比如在V7.0的这个定义里：

909 multiclass ArithBinOp_RF<bits<8> BaseOpc, bits<8> BaseOpc2, bits<8> BaseOpc4,

910 string mnemonic, Format RegMRM, Format MemMRM,

911 SDNode opnodeflag, SDNode opnode,

912 bit CommutableRR, bit ConvertibleToThreeAddress> {

913 let Defs = [EFLAGS] in {

914 let Constraints = "$src1 = $dst" in {

915 let isCommutable = CommutableRR in {

916 def NAME#8rr : BinOpRR_RF<BaseOpc, mnemonic, Xi8 , opnodeflag>;

917 let isConvertibleToThreeAddress = ConvertibleToThreeAddress in {

918 def NAME#16rr : BinOpRR_RF<BaseOpc, mnemonic, Xi16, opnodeflag>;

919 def NAME#32rr : BinOpRR_RF<BaseOpc, mnemonic, Xi32, opnodeflag>;

920 def NAME#64rr : BinOpRR_RF<BaseOpc, mnemonic, Xi64, opnodeflag>;

921 } // isConvertibleToThreeAddress

922 } // isCommutable

923

924 def NAME#8rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi8>, FoldGenData<NAME#8rr>;

925 def NAME#16rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi16>, FoldGenData<NAME#16rr>;

926 def NAME#32rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi32>, FoldGenData<NAME#32rr>;

927 def NAME#64rr_REV : BinOpRR_Rev<BaseOpc2, mnemonic, Xi64>, FoldGenData<NAME#64rr>;

928

929 def NAME#8rm : BinOpRM_RF<BaseOpc2, mnemonic, Xi8 , opnodeflag>;

…

988 }

ArithBinOp_RF是二元算术操作的基类，比如ADD、SUB、AND、OR。这个定义非常复杂，但我们可以看到在924~927行指出了要特殊处理的指令。以ADD为例，即ADD8rr_REV替换为ADD8rr等。下面的getAltRegInst()就是查找这条指令。

295 static inline const CodeGenInstruction *

296 getAltRegInst(const CodeGenInstruction *I, const RecordKeeper &Records,

297 const CodeGenTarget &Target) {

298

299 StringRef AltRegInstStr = I->TheDef->getValueAsString("FoldGenRegForm");

300 Record *AltRegInstRec = Records.getDef(AltRegInstStr);

301 assert(AltRegInstRec &&

302 "Alternative register form instruction def not found");

303 CodeGenInstruction &AltRegInst = Target.getInstruction(AltRegInstRec);

304 return &AltRegInst;

305 }

找到对应的指令后，通过下面的方法把两者关联起来。这里一共有这么几张表：Table2Addr，保存内存形式会执行读写的指令；Table0~ Table4保存内存形式会执行读或写，其第i个（Table后的数字）操作数是折叠的指令，其中Table0则保存目标操作数是折叠的指令。

449 void X86FoldTablesEmitter::updateTables(const CodeGenInstruction *RegInstr,

450 const CodeGenInstruction *MemInstr,

451 const UnfoldStrategy S) {

452

453 Record *RegRec = RegInstr->TheDef;

454 Record *MemRec = MemInstr->TheDef;

455 unsigned MemOutSize = MemRec->getValueAsDag("OutOperandList")->getNumArgs();

456 unsigned RegOutSize = RegRec->getValueAsDag("OutOperandList")->getNumArgs();

457 unsigned MemInSize = MemRec->getValueAsDag("InOperandList")->getNumArgs();

458 unsigned RegInSize = RegRec->getValueAsDag("InOperandList")->getNumArgs();

459

460 // Instructions which Read-Modify-Write should be added to Table2Addr.

461 if (MemOutSize != RegOutSize && MemInSize == RegInSize) {

462 addEntryWithFlags(Table2Addr, RegInstr, MemInstr, S, 0);

463 return;

464 }

465

466 if (MemInSize == RegInSize && MemOutSize == RegOutSize) {

467 // Load-Folding cases.

468 // If the i'th register form operand is a register and the i'th memory form

469 // operand is a memory operand, add instructions to Table#i.

470 for (unsigned i = RegOutSize, e = RegInstr->Operands.size(); i < e; i++) {

471 Record *RegOpRec = RegInstr->Operands[i].Rec;

472 Record *MemOpRec = MemInstr->Operands[i].Rec;

473 if (isRegisterOperand(RegOpRec) && isMemoryOperand(MemOpRec)) {

474 switch (i) {

475 case 0:

476 addEntryWithFlags(Table0, RegInstr, MemInstr, S, 0);

477 return;

478 case 1:

479 addEntryWithFlags(Table1, RegInstr, MemInstr, S, 1);

480 return;

481 case 2:

482 addEntryWithFlags(Table2, RegInstr, MemInstr, S, 2);

483 return;

484 case 3:

485 addEntryWithFlags(Table3, RegInstr, MemInstr, S, 3);

486 return;

487 case 4:

488 addEntryWithFlags(Table4, RegInstr, MemInstr, S, 4);

489 return;

490 }

491 }

492 }

493 } else if (MemInSize == RegInSize + 1 && MemOutSize + 1 == RegOutSize) {

494 // Store-Folding cases.

495 // If the memory form instruction performs a store, the *output*

496 // register of the register form instructions disappear and instead a

497 // memory *input* operand appears in the memory form instruction.

498 // For example:

499 // MOVAPSrr => (outs VR128:$dst), (ins VR128:$src)

500 // MOVAPSmr => (outs), (ins f128mem:$dst, VR128:$src)

501 Record *RegOpRec = RegInstr->Operands[RegOutSize - 1].Rec;

502 Record *MemOpRec = MemInstr->Operands[RegOutSize - 1].Rec;

503 if (isRegisterOperand(RegOpRec) && isMemoryOperand(MemOpRec) &&

504 getRegOperandSize(RegOpRec) == getMemOperandSize(MemOpRec))

505 addEntryWithFlags(Table0, RegInstr, MemInstr, S, 0);

506 }

507

508 return;

509 }

每个关联由一个X86FoldTableEntry实例来表示，Table2Addr等都是std::vector<X86FoldTableEntry>类型。addEntryWithFlags()就是准备这些X86FoldTableEntry实例。

回到run()，防止因为编码错位生成错误映射表的一个措施就是，对这些指令进行手动的设置。下面634行的数组ManualMapSet就设置了这些特殊指令对。

X86FoldTablesEmitter::run（续）

633 // Add the manually mapped instructions listed above.

634 for (const ManualMapEntry &Entry : ManualMapSet) {

635 Record *RegInstIter = Records.getDef(Entry.RegInstStr);

636 Record *MemInstIter = Records.getDef(Entry.MemInstStr);

637

638 updateTables(&(Target.getInstruction(RegInstIter)),

639 &(Target.getInstruction(MemInstIter)), Entry.Strategy);

640 }

641

642 // Print all tables to raw_ostream OS.

643 printTable(Table2Addr, "Table2Addr", OS);

644 printTable(Table0, "Table0", OS);

645 printTable(Table1, "Table1", OS);

646 printTable(Table2, "Table2", OS);

647 printTable(Table3, "Table3", OS);

648 printTable(Table4, "Table4", OS);

649 }

最后，printTable()将这些表的内容输出到同名数组中，这些都很简单。在这个过程里可以看到，要正确输出表，需要定制特殊指令的处理。每次增加新指令时，都要人工检查是否有错误。其实工作并没有简化太多。因此，这部分codegen缺省是不执行的，以后也不一定启用。不过，这个codegen为我们展示了修改TD定义来适应自己要求的一个例子。

3.10.1. 表的查找

真正的代码在X86InstrFoldTables.cpp/h里。结构与上面生成的代码类似。

这里有两种表，一种是折叠表，另一种则是反折叠表。折叠表用得比较多（但反折叠表也是其他优化所需的一个步骤，比如寄存器分配，以及TwoAddressInstructionPass）。因此，折叠表是直接声明为数组的，且根据指令码排序，所以可以通过二分法查找。

反折叠表则延迟到第一次使用时才生成，它声明为：static ManagedStatic<X86MemUnfoldTable> MemUnfoldTable;（ManagedStatic参考后面的描述）。其中结构体X86MemUnfoldTable在构造函数里构建作为自己成员Table的反折叠表。这个表也是根据指令码排序的，因此也是使用二分查找。

来源：CSDN

作者：wuhui_gdnt

链接：https://blog.csdn.net/wuhui_gdnt/article/details/103629847

标签

操作数

指令寄存器

opc

llvm

const

form

memory