8
\$\begingroup\$

As part of the prologue of all my console applications, I need to determine the extents of the current terminal so if there are less than 132 columns or 43 lines the user can be warned output may not appear as expected. Code has been tested with;

$ AppName /usr/include/*.h

Assemble with source being whatever name you want to give app.

~$ nasm -felf64 source.asm -source.o 
~$ ld -osource -osource

which passes 112 arguments to process.

Essentially what I am going for is contiguous flow with the least number of instructions. Time is an important consideration but it is the least important especially considering if my calculations are near correct, this procedure comes in at 4.18 micro seconds.

 USE64
 global _start
 section .text 
; *----* *----* *----* *----* *----* *----* *----* *----* *----* *----* *----*
 _start:
 %define argc [rbp+ 8]
 %define args [rbp+16]
 mov rsi, rsp ; Establish pointer to argc.
 push rbp ; So argc & **args can easily be addressed
 mov rbp, rsp ; via base pointer.
; This application expects a minimum 132 x 43 terminal. If this sessions metrics
; are less than that, then operator needs to be made aware output to screen
; may not be as expected.
 ; [A] Establish a pointer to the array of QWORD pointers to environment
 ; strings. It is determined by &argc + (argc+1) * 8
 lodsq ; Determine # of args passed via command-line
 inc eax ; Bump argument count
 shl rax, 3 ; Multiply by 8
 add rsi, rax ; Add result to &argc
 ; [B] Intialize the two registers needed for the loop that determines
 ; matching entries.
 mov edi, Metrics ; Pntr to the two strings that need to be found.
 ; RDX Bits 07 - 00 = Count of environment variables.
 ; 15 - 08 = Columns defined by "COLUMNS=".
 ; 23 - 16 = Rows " " "LINES=".
 xor edx, edx
 mov ecx, edx ; Should be zero, but just to be safe.
 FindMatch:
 lodsq ; Get pointer to next environment string.
 test eax, eax ; NULL pointer indicates end of array.
 jnz .cont
 ; Now RBP - 1 = Count of environment strings
 ; RBP - 2 = Current display columns
 ; RBP - 3 = rows
 mov [rbp-4], edx
 jmp .done
 .cont:
 inc dl ; Bump count of environment strings.
 mov ecx, 6 ; Length of string first string.
 mov bl, [rax] ; Get first character.
 ; Determine if this string begins with either 'L' or 'C'.
 cmp bl, 'L'
 jz .cmpstr
 cmp bl, 'C'
 jnz FindMatch
 push rdi
 add edi, ecx ; Bump to point to next string
 add cl, 2 ; and it is 2 characters longer
 jmp .cmpstr + 1 ; No need to save RDI again
 ; Now that the first character matches, determine if the remaining
 ; do for a count of CL
 .cmpstr:
 push rdi
 push rsi
 mov rsi, rax ; Move pointer to string into source index.
 repz cmpsb ; Compare strings for count of CL.
 jnz .nextone ; Does not match? Carry on.
 mov rax, rcx ; Both registers are NULL now.
 .L0: lodsb ; Read ASCII decimal digit.
 test eax, eax
 jz .J0
 ; Convert ASCII decimal digits to binary. As it is safe to assume we will
 ; only be expecting characters '0' - '9', this works quite effectively.
 and al, 15 ; Strip high nibble
 imul ecx, 10
 add ecx, eax
 jmp .L0
 ; Determine which position result will be written based on which
 ; calculation was done
 .J0: shl ecx, 16 ; Assume value is # of rows.
 cmp byte [rdi], 0
 jnz $ + 5
 shr ecx, 8 ; Move back into columns position.
 or edx, ecx ; Copy to appropriate position in RDX
 .nextone:
 pop rsi
 pop rdi ; Restore pointer to array of pointers.
 jmp FindMatch
 .done:
 shr edx, 8
 sub dx, 0x2b84 ; Equivalent to DH = 43 & DL = 132
 test dx, 0x8080 ; Result equal negative in either 8 bit register
 jz ParseCmdLine
 ; TODO -> Put some kind of prompting here for user to respond too.
 ParseCmdLine:
 ; TODO -> Implement something similar to optarg.
 Exit:
 leave ; Kill empty procedure frame
 xor edi, edi ; Set return code EXIT_SUCCESS
 mov eax, sys_exit
 syscall ; Terminate application
 section .rodata
; =============================================================================
 Metrics db 'LINES='
 db 'COLUMNS=',0,0 ; So next is page aligned.
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
asked Oct 23, 2019 at 15:03
\$\endgroup\$
6
  • 2
    \$\begingroup\$ Why assembly? If it's for learning, fine. If it's because you think you can beat the performance of an optimizing compiler, then... I find that suspect, to put it lightly. \$\endgroup\$ Commented Oct 24, 2019 at 1:46
  • 2
    \$\begingroup\$ @Reinderien It is nothing more than a hobby and a relaxing means by which to program and share my invocations with others. However it would be monumentally educational if someone was to implement an HLL version, but of the several times I've suggested this over the years, it's never come to fruition. Why, I don't know, but I suspect it can't be done. \$\endgroup\$ Commented Oct 24, 2019 at 3:16
  • 1
    \$\begingroup\$ Nice 😁 I neglected to include "for fun", because apparently I've become a stick in the mud. \$\endgroup\$ Commented Oct 24, 2019 at 3:19
  • \$\begingroup\$ Do you have the tput command on Linux? It's a one-liner using that command, and on NetBSD the source code for the tput command is not that complicated either. Written in C, it's probably 20 lines of code. \$\endgroup\$ Commented Oct 25, 2019 at 18:57
  • 1
    \$\begingroup\$ Please see What to do when someone answers. I have rolled back that last edit to the code. \$\endgroup\$ Commented Oct 26, 2019 at 0:09

4 Answers 4

4
\$\begingroup\$

Here are some things that may help you improve your program

Use consistent formatting

The code as posted has irregular indentation, making it not so easy to read. Assembly language programs are typically very linear and neat. Also, I personally don't use tab characters in my code so that it looks the same everywhere (including printing), but that's a personal preference.

Provide the complete program

The program is missing the definition of sys_exit (which should have a value of 60). I'd suggest also telling reviewers how you've compiled and linked the program. Here's what I used:

nasm -o rowcol.o -f elf64 rowcol.asm
ld -o rowcol rowcol.o

Document register use

The comments in your program are generally quite good, but one thing lacking is documentation on how the registers are being used, which is one of the most important aspects to assembly language programming. The x86 architecture is unlike many others in that particular instructions require particular registers. For that reason, it's useful to identify when you'll need to use such instructions and base the register usage around that.

Avoid slow instructions

Although special-purpose instructions such as loop and repnz scasb seem appealing, they are, in fact, relatively slow. Instead, it's usually much faster (and not that many more code bytes) to do things with the more generic instructions.

Use address multipliers for efficiency

We can greatly simplify getting a pointer to the environment list into a register:

mov rbp, rsp ; use rbp for stack pointer
mov rcx, [rbp + 0] ; get argc
lea rbx, [rbp+8+8*rcx] ; rbx now points to env

Understand environment variables

In Linux, there is a difference between shell variables and environment variables. Environment variables are what your program is searching, but the LINES and COLUMNS variables are shell variables that are set by the shell but typically not as environment variables. See this question for details.

Use an IOCTL

The reliable way to get the screen dimensions in Linux is to invoke the TIOCGWINSZ ioctl call. In C++ it would might look like this:

#include <sys/ioctl.h>
#include <unistd.h>
#include <iostream>
int main () {
 struct winsize w;
 ioctl(STDOUT_FILENO, TIOCGWINSZ, &w);
 std::cout << "lines = " << w.ws_row << "\ncolumns = " << w.ws_col << '\n';
}

So we just need to put that into assembly language. First, some constants:

sys_ioctl equ 0x10
STDOUT_FILENO equ 1
TIOCGWINSZ equ 0x5413

Now the winsize structure:

struc winsize
 .ws_row: resw 1
 .ws_col: resw 1
 .ws_xpixel: resw 1
 .ws_ypixel: resw 1
endstruc
section .bss
w resb winsize_size ; allocate enough for the struc

Finally the call:

mov edx, w
mov esi, TIOCGWINSZ
mov edi, STDOUT_FILENO
mov eax, sys_ioctl
syscall
; do stuff with window size...

If the call was successful (that is, if eax is 0) then the winsize structure is filled in with the current dimensions.

answered Oct 25, 2019 at 19:02
\$\endgroup\$
2
  • \$\begingroup\$ Please provide a little more detail in regard to indentation. Documenting has always been a problem. I think what I should start is writing a large block, get it working the way I want and then document. The tip on winsize is going to shave off many bytes. \$\endgroup\$ Commented Oct 25, 2019 at 23:40
  • \$\begingroup\$ I see what you mean by the indentation and if you load code into an editor that is set for tabs of 8, it is a real mess. When I've implemented TIOCGWINSZ I will make sure replace tabs with spaces. \$\endgroup\$ Commented Oct 26, 2019 at 0:05
4
\$\begingroup\$

A code-size optimization

If you move the mov edi, Metrics instruction to just below the FindMatch label and thus have it repeat with each iteration, you can remove 4 instructions from the code. I've marked these with an exclamation mark:

 xor edx, edx
 mov ecx, edx
 FindMatch:
 mov edi, Metrics ;Restore it from here
 lodsq 
! push rdi
 add edi, ecx
 add cl, 2
! jmp .cmpstr + 1 ; No need to save RDI again
 .cmpstr:
! push rdi
 push rsi
 ...
 .nextone:
 pop rsi
! pop rdi ; Restore pointer to array of pointers.
 jmp FindMatch

cmp bl, 'L'
jz .cmpstr
cmp bl, 'C'

Are these environment strings guaranteed to be in uppercase?

answered Oct 25, 2019 at 16:55
\$\endgroup\$
2
  • \$\begingroup\$ I believe they have been and always will be uppercase although I don't have anything specifically to back that up. @Edward pointing me toward TIOCGWINSZ will probably see that part replaced anyway. \$\endgroup\$ Commented Oct 25, 2019 at 23:13
  • \$\begingroup\$ My first revision implemented your example, but I decided to trade space for speed as moving from memory takes 6 cycles and push/pop only take one. I figure on my machine that save about 17 micro seconds but if I was to do that is a thousand places that would amount to 17 millisec. \$\endgroup\$ Commented Oct 25, 2019 at 23:17
0
\$\begingroup\$

As a result of a alternate method deliniated by Edward, overhead has been reduced from 168 bytes to 56 a 300% saving.

~$ nasm -felf64 appname.asm -oappname.o
~$ ld appname.o -oappname

 USE64
 TIOCGWINSZ equ 0x5413
 STDOUT_FILENO equ 1
 sys_ioctl equ 16
 sys_exit equ 60
 global _start
 section .text
; =============================================================================
 _start:
 %define argc [rbp+ 8]
 %define args [rbp+16]
 push rbp ; So argc & **args can easily be.
 mov rbp, rsp ; addressed via base pointer.
 xor eax, eax
 mov edx, winsize ; Point to structure.
 mov esi, TIOCGWINSZ ; Read structure.
 mov edi, eax
 mov di, STDOUT_FILENO
 mov al, sys_ioctl
 syscall
 test ax, ax ; If there is an error just bail.
 jnz Exit ; because the likelihood slim to none.
 ; ws_xpixel & ws_ypixel are of no conseqence, so they will be overwritten
 ; with condition bits. Semicolon denotes bit position
 ; ws_xpixel:0 != 1 Windows has fewer than 43 rows.
 ; wx_xpixel:1 != 1 132 cols. 
 cld ; Just to be sure of auto increment.
 mov esi, edx ; Move to source index for LODSW.
 mov edx, eax ; Applications status bits (flags).
 lodsw ; Read rows from ws_row.
 sub ax, 43 ; Minimum rows expected.
 jns $ + 5 ; Skips over next instruction.
 or dl, 1 ; Set bit zero (rows below minimum).
 lodsw ; Read columns from ws_col
 sub ax, 132 ; Minimum columns expected.
 jns $ + 5 ; Skips over next instruction.
 or dl, 2 ; Set bit columns below minimum.
 ; Save new data where ws_xpixel was and erase any extraneous
 ; data @ ws_ypixel
 mov [rsi], edx ; Overwrite ws_xpixel & ws_ypixel.
 Exit: leave ; Kill empty procedure frame.
 xor edi, edi ; Set return code EXIT_SUCCESS.
 mov eax, sys_exit
 syscall ; Terminate application
 section .bss
; =============================================================================
 winsize:
 .ws_row resw 1
 .ws_col resw 1
 .ws_xpixel resw 1
 .ws_ypixel resw 1
answered Oct 26, 2019 at 4:57
\$\endgroup\$
2
  • \$\begingroup\$ Jumps often take longer, and jumps without explicitly named targets are a recipe for future frustration. (What happens if you add an instruction?) So I'd recommend doing this without jumps instead. Remember that a cmp instruction conditionally sets the carry flag; we can use that fact to produce a branchless version of the code: xor edx,edx cmp word [winsize.ws_col], 132 adc edx,edx shl edx,1 cmp word [winsize.ws_row], 43 adc edx,0 \$\endgroup\$ Commented Oct 26, 2019 at 16:37
  • \$\begingroup\$ That sets the edx register exactly the same way your code did. \$\endgroup\$ Commented Oct 26, 2019 at 16:38
0
\$\begingroup\$

Edward wrote:

Jumps often take longer, and jumps without explicitly named targets are a recipe for future frustration.

Yes, I remember the days when I used to spend hours just for that very reason, but it's become such a habit now, that whenever I anticipate a change, if there isn't an explicit reference I look up in code to see where that register was initialized. What I plan on doing in the future is commenting as such;

 cld ; Just to be sure indices auto increment.
; RDX has been set to winsize structure by previous 
; sys_ioctl call to TIOCGWINSZ, as has RAX been set to zero.
 cmp word [edx+2], 132 ; Expect a minimum 132 columns
 adc al, al
 shl al, 1 ; Move to next bit position
 cmp byte [edx], 43 ; Expect a minimum 43 rows
 adc al, 0
; Save new data where ws_xpixel was and erase any extraneous
; data @ ws_ypixel
 mov [edx+4], eax ; Overwrite ws_xpixel & ws_ypixel.

I think this would be a step in the right direction for those reading my code that they wouldn't have to search all over. This example saves another 5 bytes using implicit references instead of explicit.


A significant size and by that extension speed saving was realized with this change.

 22: 89 c2 mov edx,eax
 24: 66 ad lods ax,WORD PTR ds:[rsi]
 26: 66 83 e8 2b sub ax,0x2b
 2a: 79 03 jns 2f <_start+0x2f>
 2c: 80 ca 01 or dl,0x1
 2f: 66 ad lods ax,WORD PTR ds:[rsi]
 31: 66 2d 84 00 sub ax,0x84
 35: 79 03 jns 3a <_start+0x3a>
 37: 80 ca 02 or dl,0x2
 3a: 89 16 mov DWORD PTR [rsi],edx
 0x3c - 0x22 = 26 bytes

versus

 20: 66 67 81 7a 02 84 00 cmp WORD PTR [edx+0x2],0x84
 27: 10 c0 adc al,al
 29: d0 e0 shl al,1
 2b: 67 80 3a 2b cmp BYTE PTR [edx],0x2b
 2f: 14 00 adc al,0x0
 31: 67 89 42 04 mov DWORD PTR [edx+0x4],eax
 0x35 - 0x20 = 21 bytes

Had I used explicit references, then the size saving would have been completely negated, but speed is still significantly improved in either context.

answered Oct 26, 2019 at 18:39
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Using a BYTE PTR for the lines count may save one byte, but IMHO it's a poor bargain because it's a latent bug if anyone uses a screen that has 256 or more lines. \$\endgroup\$ Commented Oct 26, 2019 at 19:33
  • \$\begingroup\$ @Edward Very interesting point as it hadn't dawned on me if someone was to take a 16:9 monitor and use it in portrait mode and change the resolution, the row count could be as high as 475. I've changed the code accordingly as those kind of bugs are really hard to find. \$\endgroup\$ Commented Oct 26, 2019 at 20:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.