As part of the prologue of all my console applications, I need to determine the extents of the current terminal so if there are less than 132 columns or 43 lines the user can be warned output may not appear as expected. Code has been tested with;
$ AppName /usr/include/*.h
Assemble with source being whatever name you want to give app.
~$ nasm -felf64 source.asm -source.o ~$ ld -osource -osource
which passes 112 arguments to process.
Essentially what I am going for is contiguous flow with the least number of instructions. Time is an important consideration but it is the least important especially considering if my calculations are near correct, this procedure comes in at 4.18 micro seconds.
USE64
global _start
section .text
; *----* *----* *----* *----* *----* *----* *----* *----* *----* *----* *----*
_start:
%define argc [rbp+ 8]
%define args [rbp+16]
mov rsi, rsp ; Establish pointer to argc.
push rbp ; So argc & **args can easily be addressed
mov rbp, rsp ; via base pointer.
; This application expects a minimum 132 x 43 terminal. If this sessions metrics
; are less than that, then operator needs to be made aware output to screen
; may not be as expected.
; [A] Establish a pointer to the array of QWORD pointers to environment
; strings. It is determined by &argc + (argc+1) * 8
lodsq ; Determine # of args passed via command-line
inc eax ; Bump argument count
shl rax, 3 ; Multiply by 8
add rsi, rax ; Add result to &argc
; [B] Intialize the two registers needed for the loop that determines
; matching entries.
mov edi, Metrics ; Pntr to the two strings that need to be found.
; RDX Bits 07 - 00 = Count of environment variables.
; 15 - 08 = Columns defined by "COLUMNS=".
; 23 - 16 = Rows " " "LINES=".
xor edx, edx
mov ecx, edx ; Should be zero, but just to be safe.
FindMatch:
lodsq ; Get pointer to next environment string.
test eax, eax ; NULL pointer indicates end of array.
jnz .cont
; Now RBP - 1 = Count of environment strings
; RBP - 2 = Current display columns
; RBP - 3 = rows
mov [rbp-4], edx
jmp .done
.cont:
inc dl ; Bump count of environment strings.
mov ecx, 6 ; Length of string first string.
mov bl, [rax] ; Get first character.
; Determine if this string begins with either 'L' or 'C'.
cmp bl, 'L'
jz .cmpstr
cmp bl, 'C'
jnz FindMatch
push rdi
add edi, ecx ; Bump to point to next string
add cl, 2 ; and it is 2 characters longer
jmp .cmpstr + 1 ; No need to save RDI again
; Now that the first character matches, determine if the remaining
; do for a count of CL
.cmpstr:
push rdi
push rsi
mov rsi, rax ; Move pointer to string into source index.
repz cmpsb ; Compare strings for count of CL.
jnz .nextone ; Does not match? Carry on.
mov rax, rcx ; Both registers are NULL now.
.L0: lodsb ; Read ASCII decimal digit.
test eax, eax
jz .J0
; Convert ASCII decimal digits to binary. As it is safe to assume we will
; only be expecting characters '0' - '9', this works quite effectively.
and al, 15 ; Strip high nibble
imul ecx, 10
add ecx, eax
jmp .L0
; Determine which position result will be written based on which
; calculation was done
.J0: shl ecx, 16 ; Assume value is # of rows.
cmp byte [rdi], 0
jnz $ + 5
shr ecx, 8 ; Move back into columns position.
or edx, ecx ; Copy to appropriate position in RDX
.nextone:
pop rsi
pop rdi ; Restore pointer to array of pointers.
jmp FindMatch
.done:
shr edx, 8
sub dx, 0x2b84 ; Equivalent to DH = 43 & DL = 132
test dx, 0x8080 ; Result equal negative in either 8 bit register
jz ParseCmdLine
; TODO -> Put some kind of prompting here for user to respond too.
ParseCmdLine:
; TODO -> Implement something similar to optarg.
Exit:
leave ; Kill empty procedure frame
xor edi, edi ; Set return code EXIT_SUCCESS
mov eax, sys_exit
syscall ; Terminate application
section .rodata
; =============================================================================
Metrics db 'LINES='
db 'COLUMNS=',0,0 ; So next is page aligned.
4 Answers 4
Here are some things that may help you improve your program
Use consistent formatting
The code as posted has irregular indentation, making it not so easy to read. Assembly language programs are typically very linear and neat. Also, I personally don't use tab characters in my code so that it looks the same everywhere (including printing), but that's a personal preference.
Provide the complete program
The program is missing the definition of sys_exit
(which should have a value of 60). I'd suggest also telling reviewers how you've compiled and linked the program. Here's what I used:
nasm -o rowcol.o -f elf64 rowcol.asm
ld -o rowcol rowcol.o
Document register use
The comments in your program are generally quite good, but one thing lacking is documentation on how the registers are being used, which is one of the most important aspects to assembly language programming. The x86 architecture is unlike many others in that particular instructions require particular registers. For that reason, it's useful to identify when you'll need to use such instructions and base the register usage around that.
Avoid slow instructions
Although special-purpose instructions such as loop
and repnz scasb
seem appealing, they are, in fact, relatively slow. Instead, it's usually much faster (and not that many more code bytes) to do things with the more generic instructions.
Use address multipliers for efficiency
We can greatly simplify getting a pointer to the environment list into a register:
mov rbp, rsp ; use rbp for stack pointer
mov rcx, [rbp + 0] ; get argc
lea rbx, [rbp+8+8*rcx] ; rbx now points to env
Understand environment variables
In Linux, there is a difference between shell variables and environment variables. Environment variables are what your program is searching, but the LINES
and COLUMNS
variables are shell variables that are set by the shell but typically not as environment variables. See this question for details.
Use an IOCTL
The reliable way to get the screen dimensions in Linux is to invoke the TIOCGWINSZ
ioctl
call. In C++ it would might look like this:
#include <sys/ioctl.h>
#include <unistd.h>
#include <iostream>
int main () {
struct winsize w;
ioctl(STDOUT_FILENO, TIOCGWINSZ, &w);
std::cout << "lines = " << w.ws_row << "\ncolumns = " << w.ws_col << '\n';
}
So we just need to put that into assembly language. First, some constants:
sys_ioctl equ 0x10
STDOUT_FILENO equ 1
TIOCGWINSZ equ 0x5413
Now the winsize
structure:
struc winsize
.ws_row: resw 1
.ws_col: resw 1
.ws_xpixel: resw 1
.ws_ypixel: resw 1
endstruc
section .bss
w resb winsize_size ; allocate enough for the struc
Finally the call:
mov edx, w
mov esi, TIOCGWINSZ
mov edi, STDOUT_FILENO
mov eax, sys_ioctl
syscall
; do stuff with window size...
If the call was successful (that is, if eax
is 0) then the winsize
structure is filled in with the current dimensions.
-
\$\begingroup\$ Please provide a little more detail in regard to indentation. Documenting has always been a problem. I think what I should start is writing a large block, get it working the way I want and then document. The tip on
winsize
is going to shave off many bytes. \$\endgroup\$Shift_Left– Shift_Left2019年10月25日 23:40:28 +00:00Commented Oct 25, 2019 at 23:40 -
\$\begingroup\$ I see what you mean by the indentation and if you load code into an editor that is set for tabs of 8, it is a real mess. When I've implemented
TIOCGWINSZ
I will make sure replace tabs with spaces. \$\endgroup\$Shift_Left– Shift_Left2019年10月26日 00:05:33 +00:00Commented Oct 26, 2019 at 0:05
A code-size optimization
If you move the mov edi, Metrics
instruction to just below the FindMatch label and thus have it repeat with each iteration, you can remove 4 instructions from the code. I've marked these with an exclamation mark:
xor edx, edx
mov ecx, edx
FindMatch:
mov edi, Metrics ;Restore it from here
lodsq
! push rdi
add edi, ecx
add cl, 2
! jmp .cmpstr + 1 ; No need to save RDI again
.cmpstr:
! push rdi
push rsi
...
.nextone:
pop rsi
! pop rdi ; Restore pointer to array of pointers.
jmp FindMatch
cmp bl, 'L' jz .cmpstr cmp bl, 'C'
Are these environment strings guaranteed to be in uppercase?
-
\$\begingroup\$ I believe they have been and always will be uppercase although I don't have anything specifically to back that up. @Edward pointing me toward
TIOCGWINSZ
will probably see that part replaced anyway. \$\endgroup\$Shift_Left– Shift_Left2019年10月25日 23:13:26 +00:00Commented Oct 25, 2019 at 23:13 -
\$\begingroup\$ My first revision implemented your example, but I decided to trade space for speed as moving from memory takes 6 cycles and push/pop only take one. I figure on my machine that save about 17 micro seconds but if I was to do that is a thousand places that would amount to 17 millisec. \$\endgroup\$Shift_Left– Shift_Left2019年10月25日 23:17:26 +00:00Commented Oct 25, 2019 at 23:17
As a result of a alternate method deliniated by Edward, overhead has been reduced from 168 bytes to 56 a 300% saving.
~$ nasm -felf64 appname.asm -oappname.o
~$ ld appname.o -oappname
USE64
TIOCGWINSZ equ 0x5413
STDOUT_FILENO equ 1
sys_ioctl equ 16
sys_exit equ 60
global _start
section .text
; =============================================================================
_start:
%define argc [rbp+ 8]
%define args [rbp+16]
push rbp ; So argc & **args can easily be.
mov rbp, rsp ; addressed via base pointer.
xor eax, eax
mov edx, winsize ; Point to structure.
mov esi, TIOCGWINSZ ; Read structure.
mov edi, eax
mov di, STDOUT_FILENO
mov al, sys_ioctl
syscall
test ax, ax ; If there is an error just bail.
jnz Exit ; because the likelihood slim to none.
; ws_xpixel & ws_ypixel are of no conseqence, so they will be overwritten
; with condition bits. Semicolon denotes bit position
; ws_xpixel:0 != 1 Windows has fewer than 43 rows.
; wx_xpixel:1 != 1 132 cols.
cld ; Just to be sure of auto increment.
mov esi, edx ; Move to source index for LODSW.
mov edx, eax ; Applications status bits (flags).
lodsw ; Read rows from ws_row.
sub ax, 43 ; Minimum rows expected.
jns $ + 5 ; Skips over next instruction.
or dl, 1 ; Set bit zero (rows below minimum).
lodsw ; Read columns from ws_col
sub ax, 132 ; Minimum columns expected.
jns $ + 5 ; Skips over next instruction.
or dl, 2 ; Set bit columns below minimum.
; Save new data where ws_xpixel was and erase any extraneous
; data @ ws_ypixel
mov [rsi], edx ; Overwrite ws_xpixel & ws_ypixel.
Exit: leave ; Kill empty procedure frame.
xor edi, edi ; Set return code EXIT_SUCCESS.
mov eax, sys_exit
syscall ; Terminate application
section .bss
; =============================================================================
winsize:
.ws_row resw 1
.ws_col resw 1
.ws_xpixel resw 1
.ws_ypixel resw 1
-
\$\begingroup\$ Jumps often take longer, and jumps without explicitly named targets are a recipe for future frustration. (What happens if you add an instruction?) So I'd recommend doing this without jumps instead. Remember that a
cmp
instruction conditionally sets the carry flag; we can use that fact to produce a branchless version of the code:xor edx,edx
cmp word [winsize.ws_col], 132
adc edx,edx
shl edx,1
cmp word [winsize.ws_row], 43
adc edx,0
\$\endgroup\$Edward– Edward2019年10月26日 16:37:50 +00:00Commented Oct 26, 2019 at 16:37 -
\$\begingroup\$ That sets the
edx
register exactly the same way your code did. \$\endgroup\$Edward– Edward2019年10月26日 16:38:37 +00:00Commented Oct 26, 2019 at 16:38
Edward wrote:
Jumps often take longer, and jumps without explicitly named targets are a recipe for future frustration.
Yes, I remember the days when I used to spend hours just for that very reason, but it's become such a habit now, that whenever I anticipate a change, if there isn't an explicit reference I look up in code to see where that register was initialized. What I plan on doing in the future is commenting as such;
cld ; Just to be sure indices auto increment.
; RDX has been set to winsize structure by previous
; sys_ioctl call to TIOCGWINSZ, as has RAX been set to zero.
cmp word [edx+2], 132 ; Expect a minimum 132 columns
adc al, al
shl al, 1 ; Move to next bit position
cmp byte [edx], 43 ; Expect a minimum 43 rows
adc al, 0
; Save new data where ws_xpixel was and erase any extraneous
; data @ ws_ypixel
mov [edx+4], eax ; Overwrite ws_xpixel & ws_ypixel.
I think this would be a step in the right direction for those reading my code that they wouldn't have to search all over. This example saves another 5 bytes using implicit references instead of explicit.
A significant size and by that extension speed saving was realized with this change.
22: 89 c2 mov edx,eax
24: 66 ad lods ax,WORD PTR ds:[rsi]
26: 66 83 e8 2b sub ax,0x2b
2a: 79 03 jns 2f <_start+0x2f>
2c: 80 ca 01 or dl,0x1
2f: 66 ad lods ax,WORD PTR ds:[rsi]
31: 66 2d 84 00 sub ax,0x84
35: 79 03 jns 3a <_start+0x3a>
37: 80 ca 02 or dl,0x2
3a: 89 16 mov DWORD PTR [rsi],edx
0x3c - 0x22 = 26 bytes
versus
20: 66 67 81 7a 02 84 00 cmp WORD PTR [edx+0x2],0x84
27: 10 c0 adc al,al
29: d0 e0 shl al,1
2b: 67 80 3a 2b cmp BYTE PTR [edx],0x2b
2f: 14 00 adc al,0x0
31: 67 89 42 04 mov DWORD PTR [edx+0x4],eax
0x35 - 0x20 = 21 bytes
Had I used explicit references, then the size saving would have been completely negated, but speed is still significantly improved in either context.
-
1\$\begingroup\$ Using a
BYTE PTR
for the lines count may save one byte, but IMHO it's a poor bargain because it's a latent bug if anyone uses a screen that has 256 or more lines. \$\endgroup\$Edward– Edward2019年10月26日 19:33:19 +00:00Commented Oct 26, 2019 at 19:33 -
\$\begingroup\$ @Edward Very interesting point as it hadn't dawned on me if someone was to take a 16:9 monitor and use it in portrait mode and change the resolution, the row count could be as high as 475. I've changed the code accordingly as those kind of bugs are really hard to find. \$\endgroup\$Shift_Left– Shift_Left2019年10月26日 20:59:39 +00:00Commented Oct 26, 2019 at 20:59
tput
command on Linux? It's a one-liner using that command, and on NetBSD the source code for the tput command is not that complicated either. Written in C, it's probably 20 lines of code. \$\endgroup\$