OSDev.org • E820 code example comment incorrect?

Page 1 of 1

E820 code example comment incorrect?

Posted: Wed Jul 16, 2025 3:59 am

by sandras

In the wiki Detecting Memory (x86) code example there is this line:

Code: Select all

	mov edx, 0x0534D4150	; Place "SMAP" into edx

Since x86 is little endian, wouldn't the magic value be "PAMS", not "SMAP"?

Sorry if this is a trivial question. I'm not that confident yet in editing the wiki, so I'm asking for confirmation here. This does concern my OS development too.

Re: E820 code example comment incorrect?

Posted: Wed Jul 16, 2025 5:11 am

by iansjack

"Little endian" and "Big endian" refer to the way that bytes are stored in memory. But we are dealing with registers here, not memory.

Re: E820 code example comment incorrect?

Posted: Wed Jul 16, 2025 5:32 am

by sandras

So since this magic value is not read from or written to memory, we can say that it's up for interpretation which direction we read it in, in ASCII?

Re: E820 code example comment incorrect?

Posted: Wed Jul 16, 2025 6:06 am

by iansjack

No, there's no interpretation involved. That's the magic number you put in the register.

Re: E820 code example comment incorrect?

Posted: Wed Jul 23, 2025 8:51 am

by sandras

I would like to provide some context on what I was doing and why this question came up.

I have my assembly E820 code. I was also writing my assembly ACPI code. I was putting non null-terminated strings like "RSD PTR " and "RSDT" in registers as hexadecimal values to compare them to data in memory. As pointed out, this an instance where endianness matters, as we're touching memory. So the start of these strings had to be in the low part of the register, and the end - in the high part of register.

The confusion came when realizing that "RSD PTR " and "RSDT" were being encoded one way, while "SMAP" was encoded the other way around - the end of the string is in the low part of the register.

Now, as I'm not able to put string literals into registers and have to resort to putting them in as hexadecimal values, I want to comment what these values represent. I'm faced with a question as to how.

I'm leaning towards commenting "SMAP" as

Code: Select all

// SMAP magic value.

and "RSD PTR " as

Code: Select all

// "RSD PTR "

I would like to keep the comments short and simple while not causing more confusion. Any thoughts?

Re: E820 code example comment incorrect?

Posted: Wed Jul 23, 2025 3:40 pm

by BenLunt

In my opinion, this is an assembler topic/issue, not a documentation issue.

For example, the assembler I use allows the following:

Code: Select all

 mov eax,'ABCD'
 mov ebx,"ABCD"

The assembler will see the single quoted string as little-endian and place the value of 'ABCD' into EAX as little-endian: D C B A
Alternatively, the assembler will see the double quoted string as big-endian and place the value of "ABCD" into EBX as big-endian: A B C D

For "RSD PTR ", this doesn't fit into a 32-bit register and should produce an error:

Code: Select all

 mov eax,"RSD PTR " ; <---- error

However,

Code: Select all

 mov rax,"RSD PTR "

may work (though my assembler doesn't support 64-bit registers).

Therefore, as far as the assembler is concerned, the author of the code needs to make sure and use the correct syntax. As far as the documentation is concerned, it only needs to state little- or big-endian. To have clarity, it could state "RSDT" as little-endian with the 'T' at the lower address and the 'R' at the higher address or visa-versa if different -endian.

Just my opinion,

Ben
- https://www.fysnet.net/newbasic.htm

Re: E820 code example comment incorrect?

Posted: Thu Jul 24, 2025 5:29 am

by sandras

Someone has faced a similar problem in this topic.

iansjack's comment that endianness is a concern when we're working with values in memory, not with values in registers, is technically correct. However, we are placing a value which represents a string into a register. Characters in a string do go in an order. And so, the order in which we place "SMAP" as a hexadecimal value into a register is important. I think an endianness comment would make things clearer in this case. We do not search for "SMAP" in memory, we just pass it to BIOS, but it is represented in the mind of the developer where the order of bytes has significance.

In other words, we do not load "SMAP" from memory, but if we were, the order of bytes would be important.

I think I have settled on commenting these (hexadecimal, in my case) values in my code by pointing out when the endianness does not match the machine's endianness, as well as providing the string they represent.

It would be nice if I could just put string literals into registers, as commented by BenLunt, but it does not seem GNU as lets me.
[引用]

BenLunt wrote: ↑ Wed Jul 23, 2025 3:40 pm To have clarity, it could state "RSDT" as little-endian with the 'T' at the lower address and the 'R' at the higher address or visa-versa if different -endian.

Wouldn't little-endian mean 'R' in the low address and 'T' in the high address?

Re: E820 code example comment incorrect?

Posted: Thu Jul 24, 2025 10:51 am

by BenLunt

sandras wrote: ↑ Thu Jul 24, 2025 5:29 am [引用]

BenLunt wrote: ↑ Wed Jul 23, 2025 3:40 pm To have clarity, it could state "RSDT" as little-endian with the 'T' at the lower address and the 'R' at the higher address or visa-versa if different -endian.

Wouldn't little-endian mean 'R' in the low address and 'T' in the high address?

Again, this would depend on the assembler. Let's look at it a little different.

If I did:

Code: Select all

 mov eax,0x12345678

Doesn't the 0x78 (the same position as the 'T' above) be placed at the "lower address" (AL) and the 0x12 (the same position as the 'R') placed in the "upper address", bits 31:24 of eax?

It all depends on how your assembler translates "RSDT" to an immediate value before it places it in the register, or memory operand.

Ben
- https://www.fysnet.net/leanfs/index.php

Re: E820 code example comment incorrect?

Posted: Thu Jul 24, 2025 11:11 am

by sandras

We're talking about endianness here, right? Endianness is a computer architecture concept, not an assembler concept.

Wikipedia article on endianness.

Re: E820 code example comment incorrect?

Posted: Thu Jul 24, 2025 5:45 pm

by BenLunt

I think there is a little misunderstanding here. Let me try to explain what I mean.

You originally posted:

Code: Select all

   mov edx, 0x0534D4150 ; Place "SMAP" into edx

With the question:

Since x86 is little endian, wouldn't the magic value be "PAMS", not "SMAP"?

A comment or documentation stating to placing "SMAP" (which is short for "System Map", if memory serves) into 'edx' relies 100% on endianness of the assembler. That same comment or documentation stating to place 0x534D4150 into 'edx' only relies on the endianness of the architecture and not the assembler. I believe we both agree on this statement.

Many documents will state to place a string literal into a register or memory operand giving the human readable format of, for this example, "SMAP". Then, hopefully, explain that the 'P' is the low-byte and the 'S' is the high byte (or the other way around), but some do not, unfortunately.

What I think is the confusion is with the following assembly line:

Code: Select all

   mov edx,"SMAP" ; Place "SMAP" into edx

With your question in mind, the above line 100% relies upon the function of the assembler translating "SMAP" to 0x534D4150 or 0x50414D53.

If the assembler translates "SMAP" to 0x50414D53, in my opinion, this is considered a big-endian translation. When written to memory the 'S' is stored first.

If the assembler translates "SMAP" to 0x534D4150, in my opinion, this is considered a little-endian translation. When written to memory the 'P' is stored first.

(Remembering that we are discussing the target to be a little-endian architecture)

Therefore, look at the following code: (no specific assembler stated)

Code: Select all

 mov eax,[some_offset_where_we_read_the_SMAP_value]
 cmp eax,"SMAP"

Since Intel is a little-endian machine, the above statement may fail. However,

Code: Select all

 mov eax,[some_offset_where_we_read_the_SMAP_value]
 cmp eax,"PAMS"

May not fail.

The question comes as, does the author prefer the first or the second block of code?

So, to get back to the original question, please note that you did not ask:

Since x86 is little endian, wouldn't the magic value be 0x534D4150, not 0x50414D53?

You asked about the string literal's endianness, not the value's endianness.

Therefore, my comment was/is, if you use the string literal instead of the number, make sure the assembler knows which endianness you desire.
In my assembler, the following two lines produce two separate machine code instructions:

Code: Select all

   mov edx,'SMAP' ; Place 0x534D4150 into edx
   mov edx,"SMAP" ; Place 0x50414D53 into edx

Therefore, I would prefer to use:

Code: Select all

   mov edx,'SMAP'

Instead of:

Code: Select all

   mov edx,'PAMS'

Since the documentation states to use "SMAP" as a string literal *and* it is easier to read the first block instead of the second block.

Where this really comes into play is when you are on a little-endian machine, using a little-endian assembler, trying to read from a memory block that uses big-endian values, such as some of the descriptors in CD-ROMs. So the following code can be used to read the little-endian values and big-endian values using the same string literal:

Code: Select all

   cmp dword ptr [ebx+123], 'SMAP' ; does the little-endian signature match what we want?
   cmp dword ptr [ebx+123], "SMAP" ; does the big-endian signature match what we want?

Therefore, depending on the assembler you use, make sure that if you use:

Code: Select all

   mov edx,"SMAP" ; Place "SMAP" into edx

the assembler uses the correct endianness, which in this case is 100% the endianness of the assembler.

So when I state, check the endianness of the assembler, I mean, how does the assembler translate "SMAP" in to a dword value.

Does this make sense now? I am sorry for the confusion.

Ben

P.S. This is not assembler specific. This same issue can be found in compilers as well. https://learn.microsoft.com/en-us/cpp/c ... w=msvc-170. When using literals in this way, it is good to make sure you understand what the compiler will translate the following into:

Code: Select all

 auto m0 = 'abcd'; // int, value 0x61626364

Re: E820 code example comment incorrect?

Posted: Fri Jul 25, 2025 1:56 am

by iansjack

Code: Select all

auto m0 = 'abcd'; // int, value 0x61626364

Well, to my mind that is extremely poor code and you deserve any consequent confusion.

I don't know about other compilers, but g++ rightly warns you against such misuse.

Re: E820 code example comment incorrect?

Posted: Fri Jul 25, 2025 2:45 pm

by BenLunt

iansjack wrote: ↑ Fri Jul 25, 2025 1:56 am

Code: Select all

auto m0 = 'abcd'; // int, value 0x61626364

Well, to my mind that is extremely poor code and you deserve any consequent confusion.

The 'auto', the 'abcd', or both? :-)

Re: E820 code example comment incorrect?

Posted: Sat Jul 26, 2025 2:43 am

by sandras

sandras wrote: ↑ Thu Jul 24, 2025 5:29 am [引用]

BenLunt wrote: ↑ Wed Jul 23, 2025 3:40 pm To have clarity, it could state "RSDT" as little-endian with the 'T' at the lower address and the 'R' at the higher address or visa-versa if different -endian.

Wouldn't little-endian mean 'R' in the low address and 'T' in the high address?

BenLunt, you have explained the confusion occuring in the quote above, and I think I understand what you are saying. However, I'm still of the opinion, that you are using the tem "endianness" in an unappropriate context. That's what caused the confusion in the first place, I believe. I think it would be better to speak of byte order, not endianness, in the context of an assembler.

As for this thread, I think I got what I needed out of it. Thanks, both of you. : )

Re: E820 code example comment incorrect?

Posted: Sat Jul 26, 2025 5:00 am

by iansjack

The "auto" is fine - it’s more than fine. But using the multi-character constant is horrible for all the reasons that have been discussed in this thread.