lightning.git - Portable just-in-time compiler library

index : lightning.git
Portable just-in-time compiler library
summary refs log tree commit diff
path: root/lib/jit_mips-cpu.c
AgeCommit message (Collapse)AuthorFilesLines
2023年10月04日mips: Use fallback for jit_extr() Paul Cercueil1-22/+1
The code was exactly the same as the fallback function, so just use the latter. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年10月04日mips: Correct and optimize jit_rshr() and jit_rshr_u() Paul Cercueil1-52/+27
Rework the branch-less path to shrink the size of jit_rshr() and jit_rshr_u(), and use one register less, so that the whole code path that uses branches can be dropped. The case where O4 == __WORDSIZE in jit_rshr() was also handled incorrectly, as it would zero the O1 register instead of sign-extending it. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年10月03日mips: Correct signed qrsh with zero shift. pcpa1-12/+3
Like the aarch64 version, relies on wordsize shift filling the result with zero or minus one, based on sign.
2023年10月02日mips: Optimize jit_lshr() and jit_lshr_u() Paul Cercueil1-51/+21
Rework the branch-less path to shrink the size of jit_lshr() by one instruction, and jit_lshr_u() by two instructions. It also uses one register less, so the whole code path that uses branches can be dropped. Finally, fix whitespace issues as the original code used sometimes tabs, sometimes spaces. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年08月21日mips: Correct wrong variable setting pcpa1-1/+1
2023年08月21日Add back the jit_hmul interfaces pcpa1-0/+50
In Lightning 1.x it did exist, but at first jit_qmul appeared to provide all usages, as hmul was used for high bits computation of a complete multiplication. It turns out there might be other cases where only the top bits are really required. One example is division by constants. Now an optimized version that attempts to reduce used instructions when applicable has been added. * check/Makefile.am, check/lightning.c: Add new hmul tests. * doc/body.texi: Document hmul. * include/lightning.h.in: Create the new hmul codes. * lib/jit_aarch64-cpu.c, lib/jit_aarch64-sz.c, lib/jit_aarch64.c, lib/jit_alpha-cpu.c, lib/jit_alpha-sz.c, lib/jit_alpha.c, lib/jit_arm-cpu.c, lib/jit_arm-sz.c, lib/jit_arm.c, lib/jit_hppa-cpu.c, lib/jit_hppa-sz.c, lib/jit_hppa.c, lib/jit_ia64-cpu.c, lib/jit_ia64-sz.c, lib/jit_ia64.c, lib/jit_loongarch-cpu.c, lib/jit_loongarch-sz.c, lib/jit_loongarch.c, lib/jit_mips-cpu.c, lib/jit_mips-sz.c, lib/jit_mips.c, lib/jit_ppc-cpu.c, lib/jit_ppc-sz.c, lib/jit_ppc.c, lib/jit_riscv-cpu.c, lib/jit_riscv-sz.c, lib/jit_riscv.c, lib/jit_s390-cpu.c, lib/jit_s390-sz.c, lib/jit_s390.c, lib/jit_sparc-cpu.c, lib/jit_sparc-sz.c, lib/jit_sparc.c, lib/jit_x86-cpu.c, lib/jit_x86-sz.c, lib/jit_x86.c: Implement hmul and update the *-sz.c files. * lib/jit_names.c, lib/lightning.c: Add knowledge of hmul.
2023年08月10日mips: Fix can_sign_extend_short_p() Paul Cercueil1-1/+1
The boundaries were wrong. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年04月24日Add back earlier unld* implementation pcpa1-6/+14
It has been also made default, as it generates shorter code. Verified to work on several ports that need specialized code for unaligned memory access.
2023年04月19日Implement the new fnma* and fnms* instructions pcpa1-0/+4
These complete the fused multiply add/sub abstractions. There is no work on precise ordering on fallbacks. The interface only makes available the instructions, and the fallbacks are not guaranteed to generate the same result for all inputs. This happens because when actually implemented in hardware, any rounding is only done at the end. Lightning does not handle floating point exceptions or rounding modes, so, handling anything other than "default" configuration is up to the user.
2023年04月18日mips: Implement fma* and fms* pcpa1-2/+30
2023年04月14日Implement untyped macros to call proper wordsize load or store pcpa1-0/+6
2023年04月06日mips: Pass all tests in mips release 1 pcpa1-25/+29
2023年04月06日mips: Correct misunderstanding of how unaligned instructions work pcpa1-65/+48
2023年04月05日mips: Implement unaligned memory access. pcpa1-0/+307
Note that Linux kernels should trap and handle unaligned memory access. For the moment it does not rely on this behavior by default. To change it, add the C code: jit_cpu.unaligned = 1; after calling init_jit(). Note that this is only an option if jit_cpu.version is 5 or lower, as mips release 6 or newer removes the instructions to use unaligned memory access. This has not yet been fully optimized for mips 6. The logic should be to construct special instructions for smaller loads or stores and load/store the unaligned value without trapping.
2023年03月24日mips: Implement q{l,r}sh{r,i}{,_u} pcpa1-0/+222
2023年03月24日mips: Correct 32 bit build. pcpa1-2/+2
2023年03月20日Minor optimization to jit_extr. pcpa1-2/+6
Most backends do not check for a 0 (or wordsize) shift. Make sure to test it to avoid a nop, or possibly undefined behavior.
2023年03月19日For consistency, rename ext, ext_u and dep to ext_r, extr_u and dep_r pcpa1-18/+18
2023年03月17日mips: Implement ext, ext_u and dep. pcpa1-0/+82
2023年03月09日Rename fallback_bitswap to fallback_rbit pcpa1-1/+1
This matches the 'chosen' lightning instruction name, matching the aarch64 one.
2023年03月08日mips: Optimize jit_ctor() / jit_ctzr() Paul Cercueil1-36/+25
The jit_ctzr() can be performed with just 5 instructions and 2 temporary registers. The jit_ctor() can be performed by just inversing all bits then calculating the ctzr(). Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年03月08日mips: CLO/CLZ are available on MIPSr1 Paul Cercueil1-28/+22
The CLO, CLZ and DCLO, DCLZ instructions were present in the first revision of the MIPS32 and MIPS64 specifications. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年03月07日Implement new bit rotate instructions. pcpa1-0/+62
This commit also corrects some previous changes that were not properly tested and were failing to compile or having runtime problems, like using register 0 for addressing in s390. Still need to test on actual s390, as it fails in Hercules, but has the same encoding as shifts. For the moment presume it is a bug in the Hercules emulator. * check/alu_rot.tst, check/alu_rot.ok: New test files for the new lrotr, lroti, rrotr and rroti instructions. * check/Makefile.am, check/lightning.c, include/lightning.h.in, lib/jit_names.c: lib/lightning.c, doc/body.texi: Update for the new instructions. * lib/jit_aarch64-cpu.c, lib/jit_aarch64.c, lib/jit_arm-cpu.c, lib/jit_arm.c: Implement optimized rrotr and rroti. lrotr and lroti just adjust parameters for a left shift rotate. * lib/jit_alpha-cpu.c, lib/jit_alpha.c, lib/jit_ia64-cpu, lib/jit_ia64.c, lib/jit_riscv-cpu.c, lib/jit_riscv.c, jit_sparc-cpu.c, jit_sparc.c: Implement calls to fallback lrotr, lroti, rrotr and rroti. * lib/jit_hppa-cpu.c, lib/jit_hppa.c: Implement optimized rroti. Other instructions use fallbacks. * lib/jit_loongarch-cpu.c, lib/jit_loongarch.c: Implement optimized rrotr and rroti. lrotr and lroti just adapt arguments and use a right shift. * lib/jit_mips-cpu.c, lib/jit_mips.c: If mips2, Implement optimized rrotr and rroti. lrotr and lroti just adapt arguments and use a right shift. If mips1 use fallbacks. * lib/jit_ppc-cpu.c, lib/jit_ppc.c, jit_s390-cpu.c, jit_s390.c, lib/jit_x86-cpu.c, lib/jit_x86.c: Implement optimized lrotr, lroti, rrotr, rroti. * lib/jit_fallback.c: Implement fallbacks for lrotr, lroti, rrotr and rroti. Also add extra macro to avoid segfaults in s390, that cannot use register zero for some addressing instructions.
2023年02月28日Add new Lightning rbitr instruction. pcpa1-0/+18
This instruction reverses the bits of a word. A possible extension could be a instruction that reverses the bits in every byte, and/or if there is valid high usage, have typed versions, like bswapr_T. This instruction is made available as it is used internally in some backends to implement count trailing ones or zeros.
2023年02月28日Use a single implementation of a fallback bitswap. pcpa1-47/+2
There were two versions, one with loop for mips, and an unrolled one for ia64, ppc, s390 and sparc. Now there is yet a third version, made default, and using a table of swapped bit values, that should be the faster/smaller version. The unrolled one might be faster if patterns can be easily loaded, but would still be larger.
2023年02月26日mips: mips release 2 also has issues with delay slot pcpa1-2/+2
This change also corrects confusing "!expr == 0" code, that is equivalent to "(!expr) == 0", and was used as the later.
2023年02月26日mips: Implement optimized bswapr_ul pcpa1-2/+20
2023年02月26日mips: Pass all tests in mips release 1. pcpa1-41/+75
* check/bit.tst: Correct 32 bit sample ctz implementation. * include/lightning/jit_mips.h: Add jit_cpu flags for instructions that cannot be used in delay slot. * lib/jit_fallback.c: Mips fallbacks now might need a flush of instructions to get correct label addresses, due to pending instruction candidate to delay slot. * lib/jit_mips-cpu.c: Flush any pending instruction if it cannot be used in the delay slot. Add calls to fallback clo, clz, cto and ctz for mips 1. * lib/jit_mips.c: Add code to set defaults or detect if can use certain instructions to delay slots.
2023年02月25日mips: Add extra mips release 6 instructions. pcpa1-2/+55
The AUI and DAUI were implemented only when rs == rt. Documentation says they can be different, but at least with qemu 7.0 behavior is only correct if they are the same. DAHI and DATI work as documented.
2023年02月24日mips: Add support for bc and balc mips release 6 instructions pcpa1-4/+48
These are also disabled as they do not work at least with qemu 7.0.
2023年02月24日mips: add pcrel instructions to mips release 6 path pcpa1-0/+127
Most of it is disabled, as at least qemu 7.0 appears broken with instructions ADDIUPC, LWPC and LWUPC. Instructions AUIPC and LDPC appear functional. ALUIPC not tested.
2023年02月24日mips: Correct comment and usage of jmpi_p pcpa1-5/+4
It was a copy&paste of calli_p. There is no need to check for "t9" usage in the delay slot. Also, a register must be allocated, cannot pass jit_class_chk.
2023年02月24日mips: Correct possible incorrect code generation in jmpi optimization pcpa1-30/+30
Add minor clarification to use a single temporary register in bgei. Also add a minor extra cosmetic change to document where {,D}MOD{,U} is decoded in jit_get_reg_for_delay_slot.
2023年02月23日mips: Correct build for mips r6. pcpa1-29/+15
ABI is known broken for mips64 r6. It does not work with C interprocedure calls if using float or doubles. This is low priority fixing because debugging this in qemu-user is not trivial without a debugger or at least a reliable test environment.
2023年02月23日mips: Rewrite code to add an out of order instruction in delay slot pcpa1-678/+935
This change adds a complete (in the sense of instructions used by Lightning) decoder in the jit_get_reg_for_delay_slot() function, that serves double usage, that are getting a safe temporary register, and making sure delay slot can be filled with the 'pending' instruction, otherwise it just flushes code generation. There is an order or calls, where the new function pending() cannot be called before jit_get_reg_for_delay_slot(), as the later might call flush(), to emit the instruction. Code has been refactored to use the helpers to fill the delay slot, and also get the 'pending' instruction that is more likely to be safe to be executed out of order. There is no search for already emitted instructions, only a check shortly before the branch, to avoid yet more complex code to ensure the instruction order is correct. The call to pending(), after the call to jit_get_reg_for_delay_slot() will return either an instruction that can be executed out of order, or a nop, that must be added to the delay slot, using the delay*() function. * include/lightning/jit_private.h: Add new 'inst' field to jit_compiler_t, if __mips__ is defined. This field is a simple helper for a pending instruction to be emitted, and that can be emitted out of order. * lib/jit_fallback.c: Update for changes in internal mips patching and jumping macros and function calls. * lib/jit_mips-cpu.c: Core of changes to attempt to fill delay slots with instructions that can be emitted out of order. * lib/jit_mips-fpu.c: Update to use delay slot in branches. * lib/jit_mips.c: Update for new delay slot use logic.
2023年02月20日mips: Correct typo and wrong line removal when updating to mips6 pcpa1-1/+1
2023年02月20日mips: Add initial mips release 6 support. pcpa1-65/+245
Initially tested only on mips32el. * check/float.tst: Add conditionals for mips release for expected NaN truncated to an integer. * check/lightning.c: Add extra preprocessor for mips release. * include/lightning/jit_mips.h: Make the NEW_ABI preprocessor defined to zero if using the n32 or n64 abis. This makes it easier to create runtime checks with an always true or false condition. * lib/jit_mips-cpu.c, lib/jit_mips-fpu.c: Implement mips release 6 support. * lib/jit_mips.c: Add more reliable mips release detection code.
2023年02月17日Update copyright year pcpa1-1/+1
2023年02月14日mips: Implement optimized clor, clzr, ctor and ctzr pcpa1-9/+145
This change also adds a better logic to discover mips release version. It is still not complete, but good for the moment. Instructions were added to mips release 6, but these are untested. Also corrected a bug when checking if a forward unconditional jump is reachable, where it did not check when validating range for mips release 2, while the code generating the optimized jump did check for it. Problem noticed when misdetecting mips release version.
2023年01月31日mips: Optimize ldx* generators Paul Cercueil1-70/+28
There is no need to use a temporary register, when the r0 destination register can be used instead. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年01月26日Unify common code for stack frame size handling pcpa1-7/+7
2023年01月23日mips: Use relative unconditional branch and calls if appropriate pcpa1-10/+66
* lib/jit_mips-cpu.c, lib/jit_mips-cpu.c: Use pseudo instructions "b" (BEQ(0,0,disp)) and "bal" (BGEZAL(0,disp)) for mips2, when an unconditional branch or function call is known to be in range of a relative jump. This should significantly reduce jit size generation.
2023年01月20日mips: Correct regression with code to fill delay slot pcpa1-2/+4
Apparently, previously the regression was not noticed due to value in the register that shows the regression. Now the check has been extended to detect the regression condition, and functionality of code to use delay slot reenabled.
2023年01月20日mips: Use variable stack framesize and simplify leaf functions pcpa1-79/+79
The simplification of the frame pointer for leaf functions is only applicable to the new abi. * lib/jit_mips-cpu.c, lib/jit_mips.c, lib/jit_rewind.c: Adapt code to implement a variable framesize and optimize frame pointer for simple leaf functions.
2023年01月14日mips: Fill delay slots in jit_bger, jit_bgei, jit_bltr, jit_blti Paul Cercueil1-156/+87
Fill the delay slots with the opcode that precedes the branch opcode, if possible. To simplify things, the code has also been factorized into a single function. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年01月14日mips: Fill delay slots in jit_bgtr, jit_bgti, jit_bler, jit_blei Paul Cercueil1-137/+82
Fill the delay slots with the opcode that precedes the branch opcode, if possible. To simplify things, the code has also been factorized into a single function. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年01月14日mips: Fill delay slots in jit_bner / jit_bnei Paul Cercueil1-15/+38
When we know that the last generated opcode is not the target of a jump, and that it does not write to the source registers of the BNE opcode, we can swap it with the BNE opcode, so that it now becomes the delay slot of the BNE opcode. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年01月14日mips: Fill delay slots in jit_beqr / jit_beqi Paul Cercueil1-15/+137
When we know that the last generated opcode is not the target of a jump, and that it does not write to the source registers of the BEQ opcode, we can swap it with the BEQ opcode, so that it now becomes the delay slot of the BEQ opcode. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年01月14日mips: Fill delay slots of J in jit_jmpi Paul Cercueil1-10/+36
When we know that the last generated opcode is not the target of a jump, we can swap it with the J opcode, so that it now becomes the delay slot of the J opcode. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
2023年01月14日mips: Fill delay slots of JALR opcodes in jit_callr Paul Cercueil1-8/+20
Fill the delay slot of the generated JALR opcode, if it is not already used to set the $t9 register. When we know that the last generated opcode is not the target of a jump, and that it does not write the register used by the JALR opcode, we can swap it with the JALR opcode, so that it now becomes the delay slot of the JALR opcode. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
generated by cgit v1.2.3 (git 2.25.1) at 2025年09月10日 23:23:31 +0000

AltStyle によって変換されたページ (->オリジナル) /