name: inverse layout: true class: center, inverse --- class: middle, center, inverse # Write yourself a toolchain using LLVM .llvm[![LLVM](sources/llvm.png)] ##### Francis Visoiu Mistrih -
--- class: top, center, inverse # Goal #### C file ```c int foo(int); int bar(int a) { return foo(a); } ``` -- #### J Core machine code ```asm 0x100: e2 2f mov.l r14, @r15 0x102: e3 6f mov r15, r14 0x104: f0 7f add #-16, r15 [...] 0x10c: 0b 40 jsr @r0 0x10e: 09 00 nop [...] 0x118: 0b 00 rts 0x11a: 09 00 nop ``` --- class: middle, center, inverse # Goal #### NOT write a compiler from scratch #### Don't reimplement smart and optimized algorithms #### Instead, discover them #### And discover other targets ### Understand compiler internals --- class: middle, center, inverse # LLVM #### Collection of libraries #### Releases libraries #### Expose APIs to every stage of the compiler #### Rarely care about backward compatibility --- class: top, center, inverse # Toolchain ### Compilation - cc1 -- ### Assembly - as -- ### Linking - ld -- ### Runtime - builtins / libc --- class: top, center, inverse # Compilation ### Lexing -- ### Parsing -- ### Semantic analysis -- ### Intermediate representation -- ### IR optimizations -- ### Instruction selection -- ### Register allocation --- class: top, center, inverse # Assembly ### Parsing -- ### Encoding -- # Linking ### Collect object files -- ### Generate sections (.plt, .dynamic, ...) -- ### Merge sections -- ### Fix relocations --- class: top, center, inverse # Runtime ### Builtins -- ### libc -- ### Sanitizers --- class: top, center, inverse # The LLVM Compiler Infrastructure -- ### LLVM Core -- ### Clang -- ### LLD -- ### LLDB -- #### libc++ -- #### compiler-rt --- class: middle, center, inverse # Let's add our target! #### Clang -> LLVM -> LLD --- class: middle, center, inverse # Clang #### Clang generates LLVM IR from C / C++ / Objective-C #### Lowers front end constructions to lower-level ones --- class: top, center, inverse ## Clang Target informations: ```center Pointer width, pointer align, size_t / ptrdirr_t types, endianness, etc. ``` -- ##### cpp integrated in the front end CPP defines: ```center __TARGET__, __ELF_, __APPLE__, __x86__, etc. ``` --- class: top, center, inverse ## LLVM Core #### Registers: ```center registers, callee saved, caller saved, fp, sp, reserved, etc. ``` -- ###### Integrated assembler #### Instructions: ```center opcode, encoding, in / out regs, IR pattern, string, isBranch, sideEffects, etc. ``` -- #### Calling conventions: ```center argument passing, result value, stack alignment, argument types, etc. ``` -- #### Relocations: ```center symbols, relocation generation, ELF definitions, etc. ``` --- class: top, center, inverse ## LLD ### We generate relocations in the back end ### We need to fix them in the linker -- #### ELF: ```center ELF arch, target info, etc. ``` -- #### Relocations: ```center relocation type, position independent, relocation action, relocation name, etc. ``` --- class: top, center, inverse # J-Core ### SH2 ISA ### Hitachi SuperH - Sega Saturn ### Different models: SH2 <-> J2, SH3 <-> J3, etc. ### ARM Thumb's inspiration ### Open source ### VHDL implementation --- class: top, center, inverse # J2 ### 5-stage RISC -- ### 16-bit instructions #### 133 instructions -- ### 32-bit registers #### 16 general purpose registers #### 1 64-bit register --- class: middle, center, inverse # How to start #### Add the target to LLVM / Clang / LLD ##### Unlike GCC, you don't need to build a cross-compiler ##### Describe your target and the subtargets --- class: middle, center, inverse # Registers ```tablegen def R0 : J2Reg<0, "r0">, DwarfRegNum<[ 0 ]>; [...] def R15 : J2Reg<15, "r15", [ "sp" ]>, DwarfRegNum<[ 15 ]>; ``` Register classes ```tablegen def GPR : RegisterClass<"J2", [ i32 ], 32, (add R0, [...], R15)>; ``` Reserved registers --- class: middle, center, inverse ## Tablegen #### Why ? -- ##### Avoid duplication -- ##### Avoid C++ metaprogramming -- ##### Easily modulable -- ##### Focus on the description of the target --- class: middle, center, inverse # Instructions ## Instruction formats ```center 15 0 ----------------------------- x: opcode | | | | | n : src | xxxx | nnnn | mmmm | xxxx | | | | | | m : dst ----------------------------- x: instruction-specific data ``` -- ```tablegen class FNM
op, bits<4> data, dag outs, dag ins, string asmstr, list
pattern> { bits<4> dst; bits<4> src; let Inst{15 - 12} = op; let Inst{11 - 8} = dst; let Inst{7 - 4} = src; let Inst{3 - 0} = data; } ``` --- class: middle, center, inverse ## my_first_instruction ```center add r0, r14 ``` -- #### RTFM ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` --- class: middle, center, inverse ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` ```tablegen let , , in { def ADDrr : < , , (outs ), (ins ), " ", [ ( ) ]>; ``` --- class: middle, center, inverse ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` ```tablegen let , Constraints = "$rnd = $rns", in { def ADDrr : < , , (outs GPR:$rnd), (ins GPR:$rm, GPR:$rns), " ", [ ( ) ]>; ``` --- class: middle, center, inverse ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` ```tablegen let , Constraints = "$rnd = $rns", in { def ADDrr : < , , (outs GPR:$rnd), (ins GPR:$rm, GPR:$rns), " ", [ (set i32:$rnd, (add i32:$rm, i32:$rns)) ]>; ``` --- class: middle, center, inverse ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` ```tablegen let , Constraints = "$rnd = $rns", in { def ADDrr : < , , (outs GPR:$rnd), (ins GPR:$rm, GPR:$rns), "add $rm, $rnd", [ (set i32:$rnd, (add i32:$rm, i32:$rns)) ]>; ``` --- class: middle, center, inverse ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` ```tablegen let , Constraints = "$rnd = $rns", in { def ADDrr : FNM<0x3, 0xC, (outs GPR:$rnd), (ins GPR:$rm, GPR:$rns), "add $rm, $rnd", [ (set i32:$rnd, (add i32:$rm, i32:$rns)) ]>; ``` --- class: middle, center, inverse ```center add Rm,Rn Rm + Rn -> Rn 0011nnnnmmmm1100 ``` ```tablegen let isCommutable = 1, Constraints = "$rnd = $rns", isAdd = 1 in { def ADDrr : FNM<0x3, 0xC, (outs GPR:$rnd), (ins GPR:$rm, GPR:$rns), "add $rm, $rnd", [ (set i32:$rnd, (add i32:$rm, i32:$rns)) ]>; ``` --- class: middle, center, inverse ### What is actually generated ##### J2GenAsmMatcher.inc - AsmParser ```c++ static const MatchEntry MatchTable0[] = { { 0 /*add*/, J2::ADDrr, Convert_Reg1_1_Reg1_0_Tie0, 0, { MCK_GPR, MCK_GPR }, }, ``` ##### J2GenAsmWriter.inc - AsmPrinter ```c++ case 2: // ADDrr printOperand(MI, 1, O); O << ", "; printOperand(MI, 0, O); return; break; ``` --- class: middle, center, inverse ### What is actually generated ##### J2GenDisassemblerTables.inc - Disassembler ```c++ static const uint8_t DecoderTable16[] = { /* 221 */ MCD::OPC_Decode, 83, 4, // Opcode: ADDrr ``` ##### J2GenMCCodeEmitter.inc - Assembler ```c++ case J2::ADDrr: // op: src op = getMachineOpValue(MI, MI.getOperand(0), Fixups, STI); Value |= (op & UINT64_C(15)) << 4; // op: dst op = getMachineOpValue(MI, MI.getOperand(1), Fixups, STI); Value |= (op & UINT64_C(15)) << 8; break; ``` --- class: middle, center, inverse ### What is actually generated ##### J2GenDAGISel.inc - Instruction selection ```c++ static const unsigned char MatcherTable[] = { /*670*/ OPC_MorphNodeTo1, TARGET_VAL(J2::ADDrr), 0, MVT::i32, 2/*#Ops*/, 0, 1, // Src: (add:i32 i32:i32:$rm, i32:i32:$rns) - Complexity = 3 // Dst: (ADDrr:i32 i32:i32:$rm, i32:i32:$rns) /*678*/ 0, /*End of Scope*/ ``` ##### J2GenInstrInfo.inc - Instruction information ```c++ namespace J2 { enum { ANDrr = 83, ``` ```c++ { 83, 3, 1, 2, 0, 0|(1ULL<
DAG #### Convert IR to a graph (DAG) -- ##### BB -> MBB #### Select one basic block at a time -- ##### Select() #### Generic algorithm, based on the generated files from every target -- ##### Custom #### But some manual work needs to be done --- class: top, center, inverse # Code generation ### Instruction Selection ##### Addressing modes -- ```c int tab[30]; int a = tab[2]; ``` -- J2 ```asm mov.l @Rm, Rn - (Rm) -> Rn mov.l @(R0, Rm), Rn - (R0 + Rm) -> Rn mov.l @(disp, Rm), Rn - (Rm + disp * 4) -> Rn ``` Select the correct addressing mode. ```asm mov.l @(2, r14), r1 ``` --- class: top, center, inverse # Code generation ### Frame lowering -- class: top, center, inverse ##### Prologue / Epilogue ```asm mov.l r14, @r15 ; push $fp mov r15, r14 ; $fp = $sp add #-8, r15 ; $sp -= 8 [...] mov r14, r15 ; $sp = $fp mov.l @14, r14 ; pop $fp rts ; return ``` ##### Callee save / regalloc spilling * storeRegToStackSlot * loadRegFromStackSlot --- class: middle, center, inverse # LLD ## ELF, COFF, Mach-O ### New design ### C++ ### lld == 13k lines ; gold == 146k lines ### 1.2x to 2x faster than gold --- class: middle, center, inverse # What we don't need to do ### Generate sections ### Merge sections ### Generate output file ### Everything except relocations --- class: top, center, inverse # LLD ### Relocations - the other side -- ```c++ void relocateOne(uint8_t *Loc, uint32_t Type, uint64_t Val) const; ``` * Machine code in `Loc` * Relocation type in `Type`: R_J2_BSR, R_J2_BRA, etc. * The value to be written in `Val`: 0x2, 0x403430, etc. --- class: middle, center, inverse # Relocations ## the other side ### J2 calls ```asm bsr label ``` ``` PC + 4 -> PR disp * 2 + PC + 4 -> PC (Delayed branch) ``` ``` 1011dddddddddddd ``` * J2 calls are PC-relative * +/- 4096 --- class: middle, center, inverse # Relocations ## the other side ### J2 calls ```asm jsr @Rm ``` ``` PC + 4 -> PR Rm -> PC (Delayed branch) ``` ``` 0100mmmm00001011 ``` * Indirect call * 32bit absolute address --- class: top, center, inverse # Relocations ## the other side ### 4 x immediate mov Jump to 0x134: ```objdump *10e: 00 e0 mov #0, r0 ; 0b0 110: 18 40 shll8 r0 ; 0b0 *112: 00 cb or #0, r0 ; 0b0 114: 18 40 shll8 r0 ; 0b0 *116: 01 cb or #1, r0 ; 0b1 118: 18 40 shll8 r0 ; 0b100000000 *11a: 34 cb or #52, r0A ; 0b100110100 [...] 122: 0b 40 jsr @r0 124: 09 00 nop ``` * 4 x 1B relocations --- class: top, center, inverse # Relocations ## the other side ### PC-relative mov ```asm mov.l @(disp,PC),Rn ``` ``` (disp * 4 + (PC & 0xFFFFFFFC) + 4) -> Rn ``` ``` 1101nnnndddddddd ``` * PC-relative load * Can use this + JSR to call an absolute 32-bit address --- class: top, center, inverse # Relocations ## the other side ### 1 x immediate mov Jump to 0x134: ```objdump 10c: 05 d0 mov.l @(5, PC), r0 *10e: 0b 41 jsr @r0 110: 09 00 nop [...] 11c: 0b 00 rts 11e: 09 00 nop 120: 34 01 .data.l 0x134 ``` * Add data at the end of the function * Load PC-relative data into register --- class: middle, center, inverse # TODO ### tests, more tests, even more tests, qemu, fpga, exception handling, selectcc, plt, got, default linker script, clang intrinsics, undef, debug info, mul, div, not, 64bit op, fp, vectorization, frame pointer elimination, subtargets, delay slot filling, global variables, constant pools, inline asm, atomics, alloca, byval, varargs, jit, tls, relaxation, scheduling, branch folding, runtime, loader, .data, .bss, tail call, etc. --- class: middle, center, inverse #### References .urls[ https://github.com/thegameg/j2-llvm http://llvm.org/docs/WritingAnLLVMBackend.html http://www.shared-ptr.com/sh_insns.html ] ## Any questions? #####
.twitter[[@thegameg](https://twitter.com/thegameg)] .footnote[Slideshow created using [remark](http://github.com/gnab/remark).] --- class: middle, center, inverse ### xtrct #### WHY ??????????????????????? ``` xtrct Rm,Rn Rm:Rn middle 32 bits -> Rn 0010nnnnmmmm1101 ``` .xtrct[![LLVM](sources/xtrct.svg)]