Age | Commit message (Collapse) | Author | Files | Lines |
2023年10月04日 | mips: Use fallback for jit_extr() | Paul Cercueil | 1 | -22/+1 |
|
The code was exactly the same as the fallback function, so just use the
latter.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年10月04日 | mips: Correct and optimize jit_rshr() and jit_rshr_u() | Paul Cercueil | 1 | -52/+27 |
|
Rework the branch-less path to shrink the size of jit_rshr() and
jit_rshr_u(), and use one register less, so that the whole code path
that uses branches can be dropped.
The case where O4 == __WORDSIZE in jit_rshr() was also handled
incorrectly, as it would zero the O1 register instead of sign-extending
it.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年10月03日 | mips: Correct signed qrsh with zero shift. | pcpa | 1 | -12/+3 |
|
Like the aarch64 version, relies on wordsize shift filling the result
with zero or minus one, based on sign.
|
2023年10月02日 | mips: Optimize jit_lshr() and jit_lshr_u() | Paul Cercueil | 1 | -51/+21 |
|
Rework the branch-less path to shrink the size of jit_lshr() by one
instruction, and jit_lshr_u() by two instructions. It also uses one
register less, so the whole code path that uses branches can be dropped.
Finally, fix whitespace issues as the original code used sometimes tabs,
sometimes spaces.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年08月21日 | mips: Correct wrong variable setting | pcpa | 1 | -1/+1 |
|
2023年08月21日 | Add back the jit_hmul interfaces | pcpa | 1 | -0/+50 |
|
In Lightning 1.x it did exist, but at first jit_qmul appeared to
provide all usages, as hmul was used for high bits computation of
a complete multiplication. It turns out there might be other cases
where only the top bits are really required. One example is division
by constants.
Now an optimized version that attempts to reduce used instructions
when applicable has been added.
* check/Makefile.am, check/lightning.c: Add new hmul tests.
* doc/body.texi: Document hmul.
* include/lightning.h.in: Create the new hmul codes.
* lib/jit_aarch64-cpu.c, lib/jit_aarch64-sz.c, lib/jit_aarch64.c,
lib/jit_alpha-cpu.c, lib/jit_alpha-sz.c, lib/jit_alpha.c,
lib/jit_arm-cpu.c, lib/jit_arm-sz.c, lib/jit_arm.c,
lib/jit_hppa-cpu.c, lib/jit_hppa-sz.c, lib/jit_hppa.c,
lib/jit_ia64-cpu.c, lib/jit_ia64-sz.c, lib/jit_ia64.c,
lib/jit_loongarch-cpu.c, lib/jit_loongarch-sz.c, lib/jit_loongarch.c,
lib/jit_mips-cpu.c, lib/jit_mips-sz.c, lib/jit_mips.c,
lib/jit_ppc-cpu.c, lib/jit_ppc-sz.c, lib/jit_ppc.c,
lib/jit_riscv-cpu.c, lib/jit_riscv-sz.c, lib/jit_riscv.c,
lib/jit_s390-cpu.c, lib/jit_s390-sz.c, lib/jit_s390.c,
lib/jit_sparc-cpu.c, lib/jit_sparc-sz.c, lib/jit_sparc.c,
lib/jit_x86-cpu.c, lib/jit_x86-sz.c, lib/jit_x86.c: Implement
hmul and update the *-sz.c files.
* lib/jit_names.c, lib/lightning.c: Add knowledge of hmul.
|
2023年08月10日 | mips: Fix can_sign_extend_short_p() | Paul Cercueil | 1 | -1/+1 |
|
The boundaries were wrong.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年04月24日 | Add back earlier unld* implementation | pcpa | 1 | -6/+14 |
|
It has been also made default, as it generates shorter code.
Verified to work on several ports that need specialized code for
unaligned memory access.
|
2023年04月19日 | Implement the new fnma* and fnms* instructions | pcpa | 1 | -0/+4 |
|
These complete the fused multiply add/sub abstractions.
There is no work on precise ordering on fallbacks. The interface only
makes available the instructions, and the fallbacks are not guaranteed
to generate the same result for all inputs. This happens because when
actually implemented in hardware, any rounding is only done at the end.
Lightning does not handle floating point exceptions or rounding modes, so,
handling anything other than "default" configuration is up to the user.
|
2023年04月18日 | mips: Implement fma* and fms* | pcpa | 1 | -2/+30 |
|
2023年04月14日 | Implement untyped macros to call proper wordsize load or store | pcpa | 1 | -0/+6 |
|
2023年04月06日 | mips: Pass all tests in mips release 1 | pcpa | 1 | -25/+29 |
|
2023年04月06日 | mips: Correct misunderstanding of how unaligned instructions work | pcpa | 1 | -65/+48 |
|
2023年04月05日 | mips: Implement unaligned memory access. | pcpa | 1 | -0/+307 |
|
Note that Linux kernels should trap and handle unaligned memory access.
For the moment it does not rely on this behavior by default. To change
it, add the C code:
jit_cpu.unaligned = 1;
after calling init_jit(). Note that this is only an option if
jit_cpu.version is 5 or lower, as mips release 6 or newer removes the
instructions to use unaligned memory access. This has not yet been fully
optimized for mips 6. The logic should be to construct special instructions
for smaller loads or stores and load/store the unaligned value without
trapping.
|
2023年03月24日 | mips: Implement q{l,r}sh{r,i}{,_u} | pcpa | 1 | -0/+222 |
|
2023年03月24日 | mips: Correct 32 bit build. | pcpa | 1 | -2/+2 |
|
2023年03月20日 | Minor optimization to jit_extr. | pcpa | 1 | -2/+6 |
|
Most backends do not check for a 0 (or wordsize) shift. Make sure to
test it to avoid a nop, or possibly undefined behavior.
|
2023年03月19日 | For consistency, rename ext, ext_u and dep to ext_r, extr_u and dep_r | pcpa | 1 | -18/+18 |
|
2023年03月17日 | mips: Implement ext, ext_u and dep. | pcpa | 1 | -0/+82 |
|
2023年03月09日 | Rename fallback_bitswap to fallback_rbit | pcpa | 1 | -1/+1 |
|
This matches the 'chosen' lightning instruction name, matching the
aarch64 one.
|
2023年03月08日 | mips: Optimize jit_ctor() / jit_ctzr() | Paul Cercueil | 1 | -36/+25 |
|
The jit_ctzr() can be performed with just 5 instructions and 2 temporary
registers. The jit_ctor() can be performed by just inversing all bits
then calculating the ctzr().
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年03月08日 | mips: CLO/CLZ are available on MIPSr1 | Paul Cercueil | 1 | -28/+22 |
|
The CLO, CLZ and DCLO, DCLZ instructions were present in the first
revision of the MIPS32 and MIPS64 specifications.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年03月07日 | Implement new bit rotate instructions. | pcpa | 1 | -0/+62 |
|
This commit also corrects some previous changes that were not properly
tested and were failing to compile or having runtime problems, like using
register 0 for addressing in s390. Still need to test on actual s390, as
it fails in Hercules, but has the same encoding as shifts. For the moment
presume it is a bug in the Hercules emulator.
* check/alu_rot.tst, check/alu_rot.ok: New test files for the new
lrotr, lroti, rrotr and rroti instructions.
* check/Makefile.am, check/lightning.c, include/lightning.h.in,
lib/jit_names.c: lib/lightning.c, doc/body.texi: Update for the
new instructions.
* lib/jit_aarch64-cpu.c, lib/jit_aarch64.c, lib/jit_arm-cpu.c,
lib/jit_arm.c: Implement optimized rrotr and rroti. lrotr and
lroti just adjust parameters for a left shift rotate.
* lib/jit_alpha-cpu.c, lib/jit_alpha.c, lib/jit_ia64-cpu,
lib/jit_ia64.c, lib/jit_riscv-cpu.c, lib/jit_riscv.c,
jit_sparc-cpu.c, jit_sparc.c: Implement calls to fallback lrotr,
lroti, rrotr and rroti.
* lib/jit_hppa-cpu.c, lib/jit_hppa.c: Implement optimized rroti.
Other instructions use fallbacks.
* lib/jit_loongarch-cpu.c, lib/jit_loongarch.c: Implement optimized
rrotr and rroti. lrotr and lroti just adapt arguments and use a
right shift.
* lib/jit_mips-cpu.c, lib/jit_mips.c: If mips2, Implement optimized
rrotr and rroti. lrotr and lroti just adapt arguments and use a
right shift. If mips1 use fallbacks.
* lib/jit_ppc-cpu.c, lib/jit_ppc.c, jit_s390-cpu.c, jit_s390.c,
lib/jit_x86-cpu.c, lib/jit_x86.c: Implement optimized lrotr,
lroti, rrotr, rroti.
* lib/jit_fallback.c: Implement fallbacks for lrotr, lroti,
rrotr and rroti. Also add extra macro to avoid segfaults in s390,
that cannot use register zero for some addressing instructions.
|
2023年02月28日 | Add new Lightning rbitr instruction. | pcpa | 1 | -0/+18 |
|
This instruction reverses the bits of a word. A possible extension
could be a instruction that reverses the bits in every byte, and/or
if there is valid high usage, have typed versions, like bswapr_T.
This instruction is made available as it is used internally in some
backends to implement count trailing ones or zeros.
|
2023年02月28日 | Use a single implementation of a fallback bitswap. | pcpa | 1 | -47/+2 |
|
There were two versions, one with loop for mips, and an unrolled one
for ia64, ppc, s390 and sparc.
Now there is yet a third version, made default, and using a table of
swapped bit values, that should be the faster/smaller version. The unrolled
one might be faster if patterns can be easily loaded, but would still be
larger.
|
2023年02月26日 | mips: mips release 2 also has issues with delay slot | pcpa | 1 | -2/+2 |
|
This change also corrects confusing "!expr == 0" code, that is equivalent
to "(!expr) == 0", and was used as the later.
|
2023年02月26日 | mips: Implement optimized bswapr_ul | pcpa | 1 | -2/+20 |
|
2023年02月26日 | mips: Pass all tests in mips release 1. | pcpa | 1 | -41/+75 |
|
* check/bit.tst: Correct 32 bit sample ctz implementation.
* include/lightning/jit_mips.h: Add jit_cpu flags for instructions
that cannot be used in delay slot.
* lib/jit_fallback.c: Mips fallbacks now might need a flush of
instructions to get correct label addresses, due to pending
instruction candidate to delay slot.
* lib/jit_mips-cpu.c: Flush any pending instruction if it cannot
be used in the delay slot. Add calls to fallback clo, clz, cto and
ctz for mips 1.
* lib/jit_mips.c: Add code to set defaults or detect if can use
certain instructions to delay slots.
|
2023年02月25日 | mips: Add extra mips release 6 instructions. | pcpa | 1 | -2/+55 |
|
The AUI and DAUI were implemented only when rs == rt. Documentation says
they can be different, but at least with qemu 7.0 behavior is only correct
if they are the same. DAHI and DATI work as documented.
|
2023年02月24日 | mips: Add support for bc and balc mips release 6 instructions | pcpa | 1 | -4/+48 |
|
These are also disabled as they do not work at least with qemu 7.0.
|
2023年02月24日 | mips: add pcrel instructions to mips release 6 path | pcpa | 1 | -0/+127 |
|
Most of it is disabled, as at least qemu 7.0 appears broken with
instructions ADDIUPC, LWPC and LWUPC. Instructions AUIPC and LDPC
appear functional. ALUIPC not tested.
|
2023年02月24日 | mips: Correct comment and usage of jmpi_p | pcpa | 1 | -5/+4 |
|
It was a copy&paste of calli_p. There is no need to check for "t9"
usage in the delay slot. Also, a register must be allocated, cannot
pass jit_class_chk.
|
2023年02月24日 | mips: Correct possible incorrect code generation in jmpi optimization | pcpa | 1 | -30/+30 |
|
Add minor clarification to use a single temporary register in bgei.
Also add a minor extra cosmetic change to document where {,D}MOD{,U}
is decoded in jit_get_reg_for_delay_slot.
|
2023年02月23日 | mips: Correct build for mips r6. | pcpa | 1 | -29/+15 |
|
ABI is known broken for mips64 r6. It does not work with C interprocedure
calls if using float or doubles. This is low priority fixing because
debugging this in qemu-user is not trivial without a debugger or at least
a reliable test environment.
|
2023年02月23日 | mips: Rewrite code to add an out of order instruction in delay slot | pcpa | 1 | -678/+935 |
|
This change adds a complete (in the sense of instructions used by
Lightning) decoder in the jit_get_reg_for_delay_slot() function, that
serves double usage, that are getting a safe temporary register,
and making sure delay slot can be filled with the 'pending' instruction,
otherwise it just flushes code generation.
There is an order or calls, where the new function pending() cannot
be called before jit_get_reg_for_delay_slot(), as the later might
call flush(), to emit the instruction.
Code has been refactored to use the helpers to fill the delay slot,
and also get the 'pending' instruction that is more likely to be safe
to be executed out of order. There is no search for already emitted
instructions, only a check shortly before the branch, to avoid yet
more complex code to ensure the instruction order is correct.
The call to pending(), after the call to jit_get_reg_for_delay_slot()
will return either an instruction that can be executed out of order, or
a nop, that must be added to the delay slot, using the delay*() function.
* include/lightning/jit_private.h: Add new 'inst' field to
jit_compiler_t, if __mips__ is defined. This field is a simple
helper for a pending instruction to be emitted, and that can
be emitted out of order.
* lib/jit_fallback.c: Update for changes in internal mips patching
and jumping macros and function calls.
* lib/jit_mips-cpu.c: Core of changes to attempt to fill delay
slots with instructions that can be emitted out of order.
* lib/jit_mips-fpu.c: Update to use delay slot in branches.
* lib/jit_mips.c: Update for new delay slot use logic.
|
2023年02月20日 | mips: Correct typo and wrong line removal when updating to mips6 | pcpa | 1 | -1/+1 |
|
2023年02月20日 | mips: Add initial mips release 6 support. | pcpa | 1 | -65/+245 |
|
Initially tested only on mips32el.
* check/float.tst: Add conditionals for mips release for expected
NaN truncated to an integer.
* check/lightning.c: Add extra preprocessor for mips release.
* include/lightning/jit_mips.h: Make the NEW_ABI preprocessor
defined to zero if using the n32 or n64 abis. This makes it
easier to create runtime checks with an always true or false
condition.
* lib/jit_mips-cpu.c, lib/jit_mips-fpu.c: Implement mips release
6 support.
* lib/jit_mips.c: Add more reliable mips release detection code.
|
2023年02月17日 | Update copyright year | pcpa | 1 | -1/+1 |
|
2023年02月14日 | mips: Implement optimized clor, clzr, ctor and ctzr | pcpa | 1 | -9/+145 |
|
This change also adds a better logic to discover mips release version.
It is still not complete, but good for the moment.
Instructions were added to mips release 6, but these are untested.
Also corrected a bug when checking if a forward unconditional jump is
reachable, where it did not check when validating range for mips release 2,
while the code generating the optimized jump did check for it. Problem
noticed when misdetecting mips release version.
|
2023年01月31日 | mips: Optimize ldx* generators | Paul Cercueil | 1 | -70/+28 |
|
There is no need to use a temporary register, when the r0 destination
register can be used instead.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年01月26日 | Unify common code for stack frame size handling | pcpa | 1 | -7/+7 |
|
2023年01月23日 | mips: Use relative unconditional branch and calls if appropriate | pcpa | 1 | -10/+66 |
|
* lib/jit_mips-cpu.c, lib/jit_mips-cpu.c: Use pseudo instructions
"b" (BEQ(0,0,disp)) and "bal" (BGEZAL(0,disp)) for mips2, when an
unconditional branch or function call is known to be in range of a
relative jump. This should significantly reduce jit size generation.
|
2023年01月20日 | mips: Correct regression with code to fill delay slot | pcpa | 1 | -2/+4 |
|
Apparently, previously the regression was not noticed due to value
in the register that shows the regression.
Now the check has been extended to detect the regression condition,
and functionality of code to use delay slot reenabled.
|
2023年01月20日 | mips: Use variable stack framesize and simplify leaf functions | pcpa | 1 | -79/+79 |
|
The simplification of the frame pointer for leaf functions is only
applicable to the new abi.
* lib/jit_mips-cpu.c, lib/jit_mips.c, lib/jit_rewind.c: Adapt
code to implement a variable framesize and optimize frame pointer
for simple leaf functions.
|
2023年01月14日 | mips: Fill delay slots in jit_bger, jit_bgei, jit_bltr, jit_blti | Paul Cercueil | 1 | -156/+87 |
|
Fill the delay slots with the opcode that precedes the branch opcode, if
possible.
To simplify things, the code has also been factorized into a single
function.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年01月14日 | mips: Fill delay slots in jit_bgtr, jit_bgti, jit_bler, jit_blei | Paul Cercueil | 1 | -137/+82 |
|
Fill the delay slots with the opcode that precedes the branch opcode, if
possible.
To simplify things, the code has also been factorized into a single
function.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年01月14日 | mips: Fill delay slots in jit_bner / jit_bnei | Paul Cercueil | 1 | -15/+38 |
|
When we know that the last generated opcode is not the target of a jump,
and that it does not write to the source registers of the BNE opcode, we
can swap it with the BNE opcode, so that it now becomes the delay slot
of the BNE opcode.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年01月14日 | mips: Fill delay slots in jit_beqr / jit_beqi | Paul Cercueil | 1 | -15/+137 |
|
When we know that the last generated opcode is not the target of a jump,
and that it does not write to the source registers of the BEQ opcode, we
can swap it with the BEQ opcode, so that it now becomes the delay slot
of the BEQ opcode.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年01月14日 | mips: Fill delay slots of J in jit_jmpi | Paul Cercueil | 1 | -10/+36 |
|
When we know that the last generated opcode is not the target of a jump,
we can swap it with the J opcode, so that it now becomes the delay slot
of the J opcode.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|
2023年01月14日 | mips: Fill delay slots of JALR opcodes in jit_callr | Paul Cercueil | 1 | -8/+20 |
|
Fill the delay slot of the generated JALR opcode, if it is not already
used to set the $t9 register.
When we know that the last generated opcode is not the target of a jump,
and that it does not write the register used by the JALR opcode, we can
swap it with the JALR opcode, so that it now becomes the delay slot of
the JALR opcode.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
|