3

I am making a PE .exe packer in C and assembly. In C, I do the things like create a new .packed section header, changing Entry Point to that new section, changing sizeofimage, etc. In my C code, I encrypt the .text section with a key

unsigned char* textSectionData = (unsigned char*)outputFile + textSection->PointerToRawData;
for (DWORD i = 0; i < textSection->SizeOfRawData; i++) {
 textSectionData[i] ^= 0x19;
 }

So, in the new .packed section, I have to inject raw machine code (unpacking stub) that does the reverse operation (decrypt .text section with key 0x19 ) and then jump back to the original entry point. I am using NASM -f bin mode to get raw binary data I can execute on that new section.

I am currently using hardcoded absolute addresses / values for the sake of simplicity and an infinite jmp to signify success.

Here's my XOR loop in assembly:


BITS 64
xor rbx, rbx
loop:
mov rax, byte [0x00007FF75C991000 + rbx] // start of .text section
xor rax, 0x19
inc rbx ,1
cmp rbx, 797696
jne loop
jmp $

Where 797696 corresponds to the SizeOfRawData field on the .text section. Can someone tell me what I'm doing wrong, because NASM gives me this error:

C:\Users\tamar\Downloads\brainfuck compiler\might>nasm -f bin stub.asm
stub.asm:4: error: comma, decorator or end of line expected, got 259

I expected to get a working loop that I can extract the raw bytes of, and use as a stub in my executable packer.

Thanks a lot!

asked Jul 26, 2025 at 9:49
2
  • your code in anyway wrong and senseless. you xor rax, 0x69 ? ok. and sense ? may be something like this mov al,69h mov ecx,797696 @@0: xor [rdx],al inc rdx loop @@0 Commented Jul 26, 2025 at 10:27
  • 4
    // is not how you make comments in NASM... Besides that there's plenty of other errors as pointed out. On top of that, xor packers are pretty useless as most sections have large runs of zeros. You also need to make the text section writable. But all in all it's a good thing the new generation of malware writers have a lower and lower grasp of how low-level things work, this make our work much easier. Commented Jul 26, 2025 at 10:47

3 Answers 3

3

In nasm you have to use ; for comments. And it's byte ptr rather than byte. Also your never writes back the elements it reads. Here is a version that should work, although only one byte at a time:

Inputs: rax = ptr, rsi = len
Clobbers: rbx

BITS 64
 mov rax, 0x00007FF75C991000 ; ptr
 mov rsi, 797696 ; length, NOTE: doesnt handle zero length
 xor ebx, ebx ; loop index
loop:
 xor byte ptr [rax + rbx], 0x19
 inc rbx
 cmp rbx, rsi
 jne loop
answered Jul 26, 2025 at 17:02
Sign up to request clarification or add additional context in comments.

1 Comment

NASM does use byte. byte ptr is an error unless you enable the MASM-compat macro package or manually %define ptr to the empty string. Also, there's no need for 64-bit operand size in the inc or cmp. Or in mov esi, len. Unless you want to support lengths greater than 4GiB. The only reason to go one byte at a time is to optimize for code-size even when it costs a lot of speed; if you want it fast for a non-tiny size, you'd go by at least 8 bytes at a time, with a cleanup loop if needed. Preferably 16, with movdqu / pxor on XMM regs. Or since we know alignment, movdqa.
2
mov rax, byte [0x00007FF75C991000 + rbx] // start of .text

stub.asm:4: error: comma, decorator or end of line expected, got 259

This error exists because NASM does not use // for comments; use ; instead.

The code that you propose forgets to write back to memory the result of the xoring.
If you're going to do this one byte at a time then use next code:

BITS 64
 mov rbx, 0x00007FF75C991000 ; start of .text
 lea rcx, [rbx + 797696] ; end of .text
loop:
 movzx eax, byte [rbx]
 xor eax, 0x19
 mov [rbx], al
 inc rbx
 cmp rbx, rcx
 jb loop
 jmp $

For extra speed you can do it eight bytes at a time:

BITS 64
 mov rbx, 0x00007FF75C991000 ; start of .text
 mov ecx, 797696 / 8 ; number of qwords is 99712
 mov rdx, 0x1919191919191919 ; mask
loop:
 mov rax, [rbx]
 xor rax, rdx
 mov [rbx], rax
 add rbx, 8
 dec ecx
 jnz loop
 jmp $
answered Jul 27, 2025 at 13:58

2 Comments

xor [rbx], rdx would be more efficient on some CPUs, at least equal on others. Only potentially worse on P5 Pentium which can't split memory-destination RMWs into uops for better pipelining. But it was 32-bit only, so actually only KNC (first-gen Xeon Phi) had that pipeline for x86-64.
And of course SSE2 is baseline for x86-64 and the data is aligned, so you could movaps-load/xorps/movaps-store, after broadcasting a constant into XMM1. (Perhaps with mov eax, 0x19191919 / movd xmm1, eax / shufps xmm0, xmm0, 0.)
1

Here's a simple example for demonstration (not optimized):

BITS 64
function:
 xor eax, eax
.xorLoop:
;; rcx is the starting address, rax is the counter
 xor byte ptr [rcx + rax], 0x19 ; xor value
 inc rax ; increment loop counter
 cmp rax, 797696 ; number of iterations
 jne .xorLoop
 ret
answered Jul 26, 2025 at 11:45

18 Comments

While this presents a working code, it does not explain the error you got. Please consider to add what specifically was the error and how you solved the issue.
This overwrites 8 times more memory than the size. and if that was fixed, it wouldn't work if the size wasn't divisible by 8
@RedRam: For this use-case you can almost certainly just round up the size to a multiple of 8. But yes, cmp rbx, 797696/8 or something. But this is super buggy compared to the problem description, so that's the least serious problem.
This has shown me that handwritten assembly by an experienced person could still beat the compiler Yeah, absolutely. Especially if I can assume that a section will be aligned by 16 and have a size a multiple of 16, so I can skip cleanup. And I know this only runs once per process, for large inputs, so I should somewhat favour code-size. Your answer mentioned beating the "average assembly coder", though, and that's probably true depending on what you mean by "average". Like "average person who knows how to optimize for modern x86", or "average person capable of writing correct asm".
Yeah, compilers are only this good because people taught them what optimizations to look for and how to auto-vectorize. They aren't like AI. We still need some humans that can see whether asm is efficient or not and point out what compilers could be doing better. Plus it's fun to know how CPUs work.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.