GFA-BASIC 32 for Windows: Compiler

Showing posts with label Compiler. Show all posts

01 October 2024

DEP and GB32

"Data Execution Prevention (DEP) is a system-level memory protection feature that is built into the operating system starting with Windows XP and Windows Server 2003. DEP enables the system to mark one or more pages of memory as non-executable. Marking memory regions as non-executable means that code cannot be run from that region of memory, which makes it harder for the exploitation of buffer overruns."

I underlined the sentence that is important to GB32 developers. When a program is RUN (F5), the GB32 inline compiler creates an executable in memory and runs it from there. The DEP setting of your system can prevent the running of the code. This happened to a user who bought a new PC with a factory DEP setting of 3. Not only couldn't GB32 run the code, other nasty things happened as well, for instance programs couldn't be saved anymore.

To obtain the your system's DEP setting, you will need to follow the following steps:

Go to This PC -> right click and select Properties.Then select Advanced Settings and choose the Advanced tab, now click the Settings button of the Performance section. Here you can select the Data Execution Prevention tab. Normally, the option to protect Windows programs and services is selected. This conforms to DEP setting = 2.

You could also try this small GB32 program to obtain the system's DEP setting:

$Library "gfawinx"
$Library "UpdateRT"
UpdateRuntime ' Patches GfaWin23.Ocx
Declare Function GetSystemDEPPolicy Lib "kernel32" ()
Debug "DEP-Setting: "; GetSystemDEPPolicy()

The fact that you can Run the code tells you that the DEP setting isn't 3. A setting of 3 wouldn't allow the execution of the code stored in memory by the GB32 compiler.

It isn't a problem solely for the GB32 developer, but the final EXEs created with GB32 will also suffer from a DEP setting of 3. The EXE is a stand-alone program and its code can be executed, but the UpdateRuntime function 'hacks' the GfaWin23.ocx runtime and reroute some code to new code in memory, and executing new code in memory is not allowed with DEP = 3.

Conclusion
To be able to execute a GB32 produced EXE the DEP system setting of the user must be 2 or lower.

Posted by Sjouke Hamstra at 1.10.24 No comments:

Labels: Compiler, General, IDE Issues

23 February 2021

The Include directory

After installing GFA-BASIC 32 you’ll find four directories in the installation path: Bin, Doc, Include, and Samples.

image

The \Bin directory contains the GB32 binaries, the \Doc contains the original (German) doc-files that came with GFA-BASIC 32 back in 2001 (now obsolete because everything can be found in the English CHM helpfile), the \Include contains the Windows API include files, and the \Samples directory the samples g32 files, including the new Direct2D example programs.This time we’ll focus on the \Include directory only.

What is the \Include directory for?
The purpose of the \Include directory is to collect all Windows API definitions and declarations in one directory. Because of the huge amount of Windows APIs the definitions and declarations are split into multiple smaller include library files. These GB32 Windows API include-files come both with the source code and the compiled library (lg32) file. The organization of the GB32 include files follows the way the Windows SDK presents the C/C++ include header files. For instance, the C/C++ header file winuser.h has an equivalent GB32 include file winuser.inc.lg32. All include files follow this naming convention: name.inc.lg32.
You import a GB32 include library file using the $Library command, for instance:

$Library "winuser.inc" ' .lg32 may be omitted

By default, the line doesn’t need to specify the full path, because the location of the \Include directory is pre-selected in the Properties | Extra tab dialog box.

image

You can easily add your own paths that contain your own library files. The entire string with the specified paths is stored in “lg32paths”registry key. To add a path first type a semicolon ; after the existing path and then specify the full path after the semicolon. Once a path is part of Library paths you do no longer need to type the full path in the $Library statement. Note that libraries are searched for (1) in the current program's directory, (2) in My Documents\lg32, and (3) in the paths stored in the registry key "lg32paths".

Note The \Include directory also contains the libraries gfawinx.lg32, direct2d.lg32 and variants.lg32. These are not include files and don’t have the .inc clause in their names. These are libraries that are part of the GFA-BASIC 32 updates and need an easily accessible path.

Why multiple include files?
Each include file contains only a part of the Windows API, that way you don’t need to include a single large file with many APIs just to have a few declarations. The GB32 include files only provide the APIs that are not built-in by GFA-BASIC32. (Note that the Win32API.g32 that originally came with GB is incomplete and full of errors.) Due to the amount of new APIs that come with each new Windows version, the include files in the \Include files aren’t complete either. They do however provide more declarations and definitions than the original Win32API.g32. Some of the include files are updated with the latest APIs (most specifically APIs supported by Win7 and Windows 10) like, for instance, winuser.inc.lg32. However, many haven’t been updated for a while. They tend to get an update on a ad hoc basis; when I need new APIs I add them to the include files. It is a boring job and, because of the translation from C/C++, errors are easily made. If you need an API that isn’t in one of the include library files yet, let me know (gfabasic32@gmail.com).

Contents of an include library
By default, the GB32 Windows include files only provide function declarations (Declare statements), constant definitions, and user-defined type definitions. They don’t contain executable code and besides the Declare’s they do not contribute to the size of the program. You should know that Declare-ed DLL functions are collected in a DLL-import table that do become part of the program (executable). By spreading the Declare’s over multiple include files the DLL-import table can remain small.
There is one “include” file that does contain executable code: winmacros.lib.lg32 (note the lacking .inc clause in the name). This library contains functions for often used Windows macros that are used as functions in C/C++, but are defined as macros in the C/C++ header files. Some of these function macros are collected into winmacros.lib.lg32. Among others, the library provides (Naked) functions for GET_X_PARAM, GET_Y_PARAM, MakeLParam, MakeIntResource, etc. Please take a look at the source code in winmacros.lib.g32 for an overview of the supported functions/macros.

How to locate a specific API
How do you know in which include file a specific API (type, constant, or function) is located? It might be that GB32 already supports the API as a built-in API, only function declarations and constant definitions. GB32 does not have built-in API support for user-defined API type definitions, they always have to be defined by the program or imported from an include library. The easiest way to check if GB32 supports a specific API is by using the auto-complete feature. Just type the first letters of the function or constant and check if the auto-complete pops up with the required name. If it isn’t provided by GB32 you’ll need to check the Windows SDK documentation to see which C/C++ include file provides the declaration or definition for that API and load the equivalent GB32 .inc library file.

An example of using an API
Each topic in the Windows SDK specifies in which C/C++ header file an API function, type, or constant is declared. For instance, if and API is located in the winuser.h C/C++ header file you have a big chance of finding it in the winuser.inc.lg32 file. Let’s look at an example. Suppose your program wants to process the WM_GETMINMAXINFO message. After looking up the documentation for this message, it tells you to obtain the MINMAXINFO structure from the lParam. However, GB32 itself does not provide a definition for the MINMAXINFO user-defined type, and you need to import the type. When you go to the SDK page that describes this type you’ll find at the bottom of the page the location of the definition of this structure: the winuser.h C/C++ header file. Now you know which GB32 include library you need: winuser.inc and the program can import that library. As it happens this structure contains members of the API type POINT, which is defined in wingdi.inc. However, you won’t need to include wingdi.inc in your program, because it is imported by winuser.inc (otherwise it couldn’t be compiled). After importing winuser.inc in your program the constants and user-defined types from wingdi.inc are available as well. So, the POINT structure is available as a user-defined type in your program.

Note winuser.inc exports the constants and type definitions of wingdi.inc, but not the function declarations. If you need an API function declared in wingdi.inc you’ll need to import wingdi.inc as well. Importing both include files won’t collide with each other.

$Library "winuser.inc"
$Library "wingdi.inc"

Although both include files export the same constants and user-defined types they are added only once to the program. Now if winuser.inc would also export the function declares from wingdi.inc the internal database of GB32 may become corrupted. Therefor, winuser.inc exports all constants and types using the $Export Const * and $Export Type * statements, but the exported $Export Decl name statements must specify each exported declare separately. This is easily done using the App+E shortcut, it inserts $Export Decl lines for each Declare in the library file.

Conclusion
You can use the Windows API inc files to import Windows APIs easily. Because the (compiled) user-defined types are imported from a library the autocomplete function has direct access to their members and is fully operational without first compiling your program.

Posted by Sjouke Hamstra at 23.2.21 No comments:

Labels: Compiler, Library, Windows API

26 November 2020

The Naked attribute in practice

In the previous posts The Anatomy of a Procedure (1) and The Anatomy of a Procedure (2) I discussed the effect of the Naked attribute on the code generated by the compiler. Everything you want to know about the Naked attribute can be found in these posts, but – unfortunately – the posts are rather technical. If you lack any experience in assembly it might be hard to understand, so I will recap on the use of the Naked attribute in ‘layman’s’ terms here.

Naked explained
A Naked procedure is fully optimized, both in size and in performance. This comes with a penalty though, a naked procedure lacks support for dynamic variables types (String, Object, Variant, array and hash), structured exception handling, and runtime debugging (Tron, Trace). The reason for this is the lack of ‘procedure-housekeeping’ that GFA-BASIC 32 inserts in each regular procedure. In a regular procedure GB starts of by inserting a 80 bytes stackframe (68 in an EXE) to store all information necessary for housekeeping of the procedure. At the end of the procedure in insert code to restore the stack to automatically release all (dynamic) variables (even in case of a runtime error). The housekeeping code is missing in a naked procedure. Consequently, a naked procedure can execute faster, in certain cases up to 50% faster than a regular procedure. Only short procedures benefit from the Naked attribute; the executable code must be relative small compared to the code necessary to setup a stackframe of 80 bytes, as we will see. The example procedure from the previous post is a good example of a candidate for Naked, it executes 50% faster:

TestMul(2, 3)
Proc TestMul2(x As Int, y As Int) Naked
 Local tmp As Int
 tmp = x Mul y
EndProc

When the procedure grows and contains more executable code the relevance of Naked disappears. The next example shows two things. First, it declares a local dynamic string variable (which needs to be released explicitly by setting it to the ‘empty-string’). Secondly, the assignment of data to the string will allocate memory and produce code to copy the data to that memory. This is a relatively expensive operation and will reduce the possible performance gain of Naked.

Dim i%, t#
t# = Timer
For i% = 0 To 100000 : TestMul(2, 3) : Next
Debug "Normal: "; Timer - t#
t# = Timer
For i% = 0 To 100000 : TestMul2(2, 3) : Next
Debug "Naked: "; Timer - t#
Proc TestMul(x As Int, y As Int)
 Local tmp As Int, s As String = "Something"
 tmp = x Mul y
EndProc
Proc TestMul2(x As Int, y As Int) Naked
 Local tmp As Int, s As String = "Something"
 tmp = x Mul y
 s = ""
EndProc

The measured time for calling TestMul() and TestMul2() a 10000 times is

TestMul: ca 0.021 seconds
TestMul2: ca 0.019 seconds.

In this example, the time to execute the naked procedure is almost the same as executing the regular procedure. Adding a dynamic variable – and assigning it a value - to a naked procedure negates the benefit almost entirely. The code to execute is drastically increased, assigning a value to a string causes the execution of malloc() and memcopy(), and these take so much time the advantage of naked is almost gone. In addition, the string must be released which leads to an extra call of mfree(). All in all, just by adding one dynamic variable the procedure contains too much code to really benefit from Naked.

Other issues
Another disadvantage of using Naked is the issue of runtime error trapping. In case of an error the IDE stops running the program and puts the the error-line-marker on the line that calls the procedure and not inside the naked procedure.

A non-naked, regular procedure will not only trap the error, but also clears the contents of the (dynamic) local variables automatically. With naked-procedures the dynamic variables must be released explicitly. The local variables can be released using the following commands.

Local dynamic variable Release command

String s$ = “” or Clr s$

Variant var = Empty

Object (any COM) Set obj = Nothing

Hash Hash Erase hs[]

Array (not allowed, unless static) If static: Erase ar()

A local array cannot be used in a naked-procedure. A local array declaration allocates an array-descriptor plus the memory required to store the array elements. Using the Erase command on a local array only releases the memory for the data, not the array-descriptor. Each time the procedure is executed a new descriptor is allocated without being released. Consequently, the program will leak memory. If you need a local array it must be static, then the descriptor is allocated only once. This is not a problem in regular procedures, of course.

Conclusion
Only short procedures that don’t use local dynamic variables are candidates for the Naked attribute.

Posted by Sjouke Hamstra at 26.11.20 No comments:

Labels: Commands, Compiler, Subroutines, Variables

18 March 2019

StdFont and StdPicture

GFA-BASIC 32 provides two COM objects for use with fonts and two for use with pictures, Font and StdFont and Picture and StdPicture. The Font and Picture objects are GB specific, they are used with other objects like Form.Font and Form.Picture, etc. So, why would do you need StdFont and StdPicture?

StdFont and StdPicture are standard automation OLE objects, so maybe you can use them with an automation server like Office? In theory it should. VB/VBA uses StdFont and StdPicture for its font and picture objects and they should be compatible with GB’s StdFont and StdPicture. However, when you try to assign a Font object from an Excel cell to a variable of the StdFont datatype, GB complains about incompatible data types.
There are a few situations where you might stumble upon a Std* COM type, for instance when you are converting VB/VBA code. Another use can be found for StdFont: it allows you to create a font object on its own. StdPicture is less useful in this respect.

StdFont and StdPicture are coclasses
StdFont and StdPicture are for use in a GB program mainly. Both types allow the New clause in a Dim statement, because both StdFont and StdPicture are coclasses from which you can create an object instance. (You cannot use New on a Font or Picture object.) The New keyword in the declaration inserts object-creation code into the program. The result of New is a new instance of the StdFont or StdPicture class provided by olepro32.dll. That’s one of the reasons GB requires the presence of this DLL. After an object has been created it has a pointer to an interface - located in olepro32.dll as well – which holds the address of the array of functions (properties and methods). These interfaces are called IFontDisp and IPictureDisp. In fact, IFontDisp and IPictureDisp only expose the IDispatch functions, there is no way to directly access the properties and methods. When you use a StdFont or StdPicture property the compiler inserts late binding code, it cannot early bind to the properties.
For a discussion on IDispatch see CreateObject Caching.

Normally, as with all IDispatch objects (Object type), you can only tell at runtime whether a property is accessed correctly. However, this is not true for the StdFont and StdPicture objects. GFA-BASIC 32 checks the syntax at compile time because it knows quite a lot of the properties that can be accessed through IDispatch. Therefor, the compiler can perform a syntax check on the names and arguments. In addition, the compiler can optimize the late binding code for the properties of these Std* types. The properties’ disp-ids are documented and the compiler can hard-code them into the executable code. This prevents the compiler from inserting code to obtain the disp-id before calling IDispatch.Invoke. Although the compiler can optimize a disadvantage of using the IDispatch interface is the use of Variants when passing arguments to and from properties.

Using StdFont and StdPicture
So there are only disadvantages in using StdFont and StdPicture, it seems. This is certainly true for the StdPicture; it doesn’t provide any useful functionality to actually create a picture. The only way to create a picture object is when you use CreatePicture or LoadPicture. You can assign such a picture to either a StdPicture or Picture data type. However, why would you want to assign it to a (New) StdPicture type? Let’s see how that should work.

Dim p As New StdPicture ' creates a new instance
Set p = LoadPicture(f$) ' assign new instance

The Set command assigns a new object to a variable. When that variable currently holds a reference to another object that object is released first. So, the StdPicture object instance created with New is released before the new picture is assigned.
The New keyword caused the creation of an ‘empty’ StdPicture object. Since all properties of StdPicture are read-only there is no way to manipulate the data of the StdPicture object (same is true for a Picture object). Consequently, the statement Dim p As New StdPicture is not very useful. It doesn’t provide any other functionality as the Picture object and it causes the compiler to insert (slower) late binding code.

The use of a StdFont is more useful. A New StdFont creates a new font that can be assigned to anything with a Font property. (In GB StdFont and Font are compatible types.) This feature is more useful than applying New on a StdPicture as the example shows:

Dim f As New StdFont ' new instance
f.Bold = True ' set properties
f.Name = "Arial"
Set frm1.Font = f ' assign

In contrast with StdPicture the StdFont properties are also writeable and makes the StdFont a very useful object.

Conclusion
StdFont and StdPicture are IDispatch objects. Using New creates a new instance, but this isn’t very useful for a StdPicture object.

Posted by Sjouke Hamstra at 18.3.19 No comments:

Labels: COM, Compiler, Ocx Object

16 November 2018

Anatomy of a procedure (1)

Only recently the IDE features ‘Proc Disassembly’, an option available under the Edit | Proc menuitem. This is a valuable resource if you want to get a better understanding of the code generated by the compiler. Once you understand the disassembly of a proc you can use the information to your advantage, especially when it comes to optimizing procedures.

Bare minimum: Naked
Let’s start with a Naked procedure. A Naked procedure is fully optimized, both in size as in performance. This comes with a penalty though, a Naked procedure lacks support for dynamic variables, structured exception handling, and runtime debugging (Tron, Trace). The Naked attribute forces the compiler to produce code much like if it would be done in a pure assembly program. The assembly code of a procedure has great similarity with textbook samples. It’s not hard to understand the procedure flow when it is compared to the theory in the assembly books. Therefor, I start this series on the anatomy of procedures with these bare minimum procs. Examining naked procedures allow us to understand how a proc is constructed and this knowledge can later be used to examine regular procedures.

The following sample shows a Naked proc taking two parameter of a simple type (Long). For now, we’ll omit the use of dynamic datatypes like String, Variant, Object, etc. The procedure contains the local variable tmp, also of a simple datatype, and assigns the product of x and y to tmp. This is the entire program:

TestMul(2, 3)
Proc TestMul(x As Int, y As Int) Naked
 Local Int tmp
 tmp = x * y
EndProc

Now put the caret inside the procedure TestMul and select Proc | Disassembly, it produces the following listing in the debug output window:

-------- Disassembly -----------------------------------
1 Proc TestMul(x As Int, y As Int) Naked (Lines=5)
042704D0: 53 push ebx
042704D1: 55 push ebp
042704D2: 8B EC mov ebp,esp
042704D4: 8D 5D 80 lea ebx,-128[ebp]
042704D7: 2B C0 sub eax,eax
042704D9: 50 push eax
042704DA: DB 45 0C fild dpt 12[ebp]
042704DD: DA 4D 10 fimul dpt 16[ebp]
042704E0: DB 5B 7C fistp dpt 124[ebx]
042704E3: 8B E5 mov esp,ebp
042704E5: 5D pop ebp
042704E6: 5B pop ebx
042704E7: C2 08 00 ret 8

The first line specifies the line number of the procedure (1), its entire prototype, and the number of lines (here 5, but might be more if the procedure includes any trailing empty lines).
The numbers at the start of each line show the memory address of the instructions, which might be different from your result. Consequently, in this case, the function ProcAddr(TestMul) would return the address of the first byte of the procedure: 0x042704D0.
After the memory address follow the opcodes for the assembly instruction. For instance, the opcode with value 0x53 corresponds to the push ebx assembly command. Some instructions require a one byte opcode only, others require multiple opcodes.

The first 6 lines make up the the procedure’s entry code (sometimes called prologue). The last 4 lines are the procedure’s exit code (or epilogue). The lines in between represent the actual functionality of the procedure.

Entry code
The procedure’s entry code prepares the procedure’s code to handle parameters and local variables:

push ebx ; save ebx
push ebp ; save ebp
mov ebp,esp ; establish stackframe
lea ebx,-128[ebp] ; let ebx reference local vars
sub eax,eax ; eax = 0
push eax ; clear first local var

Whenever a procedure takes a parameter or declares a local variable you’ll always find the same three instructions at the start of each procedure: push ebx / push ebp / mov ebp, esp. If the procedure also contains local variables the fourth line lea ebx, –128[ebp] is present as well. Following this line you’ll find the code that initializes the local variable; all local variables are initialized to zero.

Local variables, the purpose of ebx
In GFA-BASIC 32 the ebx register has a special purpose and thus ebx cannot be used as a general purpose register. It is used as a fixed reference point to address the local variables.

Note - According to the documentation it allows to layout variables that require more than 4 bytes (Double, Date, Large, Currency) on 8-byte borders increasing performance when accessed.

The ebx value points to an address 128 bytes down on the stack relative to the value in ebp, the stackframe. Although the first local variable is actually located at ebp – 4 , it will be referenced using the value in ebx. The location of the local variable is +124 bytes relative to the value in ebx, in assembly syntax the tmp variable is located at 124[ebx].

The value stored at that position is obtained using dword ptr 124[ebx]. This is illustrated by the next three lines of code where the parameters are multiplied by the fpu (floating-point processor) and where the result is assigned to the local variable tmp.

042704DA: fild dpt 12[ebp] ; load value of param x into fpu
042704DD: fimul dpt 16[ebp] ; multiply by value in param y
042704E0: fistp dpt 124[ebx] ; store result in tmp

The parameters x and y are accessed using the value in ebp as we will see.

Stack structure
When the procedure is called, the caller puts the parameters y and x on the stack, in reversed order. GB subroutines conform to the stdcall convention, which means that the parameters are pushed from right to left and that the subroutine corrects the stack before returning. Since y is the most right parameter it is pushed first, followed by the parameter at the left (here x). Then the CPU adds the return address on the stack and executes the subroutine.

From this point on the stack is prepared according the procedure’s entry code discussed above. The result can be viewed in the next picture:

The entry code saves the current values from the ebx and ebp registers on the stack. Then ebp is assigned the new esp value. Now ebp is used to address the parameters: parameter x is located at a positive offset of 12 bytes from the value in ebp; in assembly code 12[ebp]. Parameter y is 16 bytes up the stack relative to the value in ebp, in assembly code 16[ebp].

To address the parameters throughout the procedure ebp needs to remain constant during the execution of the procedure. The same is true for ebx that is used to address the local variables. We cannot use esp to reference both parameters and local variables because esp changes automatically during the execution of the procedure. (Although C/C++ compilers sometimes keep track of esp and address all stack variables using an offset to esp.)

Allocating and initializing
After the stackframe is established (mov ebp, esp) the next step requires the reservation of stackspace for local variables, see listing. The general idea, and described in most textbooks, is to subtract the required number of bytes from esp and then clear that piece of memory. In our sample esp would have to be decreased by 4 bytes (for the Long variable tmp) and then cleared by zero. Although GB produces the same effect, it proceeds a bit different.

Note that the first byte of the 32-bits local variable tmp is located at ebp-4. After creating the stackframe by mov ebp, esp the registers esp and ebp point to the same stack address.To reserve and initialize the 4 bytes below esp GB uses the instructions sub eax, eax / push eax.

Subtracting a register by itself results in zero. By pushing zero, now the value in eax, GB both reserves and initializes the local variable in one step. It prevents the additional step to first decrease esp explicitly. The technique to use push to reserve and initialize is typical for GB. The push eax can be repeated to clear and reserve all stack memory necessary for local variables. Thus, if the procedure would have contained two local variables of type long it would have had two push eax instructions.

Exit code
Before leaving the procedure the stack must be returned to the state it was when the procedure was entered. In addition, because of the stdcall convention, the procedure must remove the bytes necessary for the parameters (2 * 4 bytes for two long parameters). This is how its done:

042704E3: mov esp,ebp ; restore esp
042704E5: pop ebp ; restore ebp
042704E6: pop ebx ; restore ebx
042704E7: ret 8 ; return, discarding parameters

When the program returned to the caller the registers that matter and must remain constant are restored. This makes sure the caller can use the correct ebp value to access its parameters and that ebx can be used to access its local variables.

Optimize using disassembly
Inspecting a procedure’s disassembly is useful to get an idea what’s going on underneath the GFA-BASIC statements. The example presented in this blog proves why. The example performs a multiplication of two integer parameters and stores the result in another integer. As you can see, the compiler generates floating-point assembly instructions to perform the math. Since all variables are of type Long, the compiler could have generated more efficient code using the integer multiplication instruction imul. However, the compiler generates integer instruction only for addition and subtraction operators. Now its up to the programmer to optimize this procedure by replacing the multiplication operator * by the Mul operator. The optimized procedure then becomes:

Proc TestMul(x As Int, y As Int) Naked
 Local Int tmp
 tmp = x Mul y
EndProc

Now compile the code and inspect the disassembly. As you can see the floating point instructions are vanished and replaced by the imul instruction.

Conclusion
Inspecting a procedure’s disassembly requires knowledge to identify the three parts of a procedure; the entry code, the actual code, and the exit code, We discussed how to identify parameters and local variables and saw how GB uses a specific technique to reserve and initialize local variables.

In coming blog posts we’ll discuss non-naked procedures and how you can tell a procedure is a good candidate to be naked.

Posted by Sjouke Hamstra at 16.11.18 No comments:

Labels: Compiler, Subroutines

26 August 2017

A little array magic

Without going into a formal description of an array, we simply state an array stores multiple values of the same type in contiguous memory. In code an array is recognized by a variable name followed by parenthesis, either with or without indices. Like any other variable an array should be declared before it can be used. (Declaring a variable introduces a variable to the compiler.) Generally, a declaration specifies a name and a type. In case of an array the declaration may include values for lower and upper boundaries up to 7 dimensions.
Array declaration and dimensioning
An array declaration always results in the creation of an array-descriptor. For a global array the descriptor is added to the program’s global data section and for a local array the compiler inserts code to allocate an array descriptor dynamically.

' Declaration and allocation separated:
Global Dim a() As Long ' adds descriptor to data
ReDim a(6) ' code to allocate memory
' Declaration and allocation at once:
Global Dim b(3, 1) As String

The second declaration forces the compiler to add a descriptor to the global data and to generate code to allocate memory. It is the exactly the same as Global Dim b() As String : ReDim b(3,1).
A local array variable declaration is handled differently from a global declaration. First of all, the array is not assigned a static descriptor by the compiler. The declaration of the local array let the compiler insert code to obtain (or allocate) an array descriptor dynamically when the procedure is executed. The pointer to the descriptor is stored in a hidden local memory location on the stack of the procedure. Then the address of the descriptor is passed to the same ReDim to allocate memory for the array elements.

Proc LocalArr() ' Naked forbidden
 Dim dum$ ' prevent compiler bug
 Dim h() ' allocates a descriptor
 ReDim h(4, 5) ' allocates memory for descriptor
 Dim v(3) ' 1-step: descriptor + memory
EndProc ' destruction for h() and v() and dum$

Local arrays have the same anatomy, but they have no descriptors in the global data-section. Both the descriptor and the memory are allocated – from the heap - when the subroutine is executed. Room is reserved on the procedure stack for a (hidden) pointer to store the address of the descriptor. Later this pointer is necessary to clean the local stack and call the array-destruction code when the subroutine is left.

Local Array Destruction
Local array destruction is part of the termination handler of the procedure, that is if it has one. A Naked procedure doesn’t include termination handlers; the procedure needs to clear pointer variables manually (= the developer). However, a local array cannot be destructed manually, there is no statement to do so. The obvious Erase would only release the data memory, not the array-descriptor, leaving it on the stack. Eventually, the stack might overflow when the procedure is executed repeatedly.

An array cannot be destroyed explicitly, Clr doesn’t work with arrays (and hashes).
Local arrays are not allowed in Naked procedures.Naked prevents the compiler from insertion of destruction code for all pointer variables (String, Variant, Object).

Be aware of two possible problems
When a subroutine contains only one or more local array variables, without any other local variables of pointer types (String, Variant, Object), the compiler ‘forgets’ to insert the array-destruction code at all. This is a bug. In this specific situation it is necessary to force the compiler to add array-destruction code. This requires the introduction of another dynamic data-type that requires destruction code. A local String is the easiest solution as is demonstrated in the example above.( A bug still waiting for resolving ….)

The other problem involves ReDim, which - unlike Dim - does not default to the Option Base setting. Instead, ReDim always uses 0 as the lower bound of the array. When Option Base 1 is the default setting for your application, you need to use ReDim ar(1 .. x) explicitly, rather than ReDim ar(x).

- Important note on a Hash
A local Hash isn’t destroyed automatically as well (Naked or otherwise). Clr cannot be used and there is no way to force the compiler to insert Hash destruction code. All local Hash variables must be released manually using Hash Erase. You might want to use Static Hash for local variables. A Hash is a (relative) time consuming type, all entries of a Hash are released one by one. Static preserves the contents and prevents time consuming destruction.

Global Array destruction
GB implements hidden destruction for releasing arrays. A local array is destroyed on exit and a global array when the program is terminated. For a global array the descriptor is static and part of the global data-section and is an inherent part of the program. After a program exits (either as an EXE or in the IDE) the global data-section simply disappears and the descriptors with it. In case of an EXE-process all memory is released to the OS, and in the IDE the global data is destroyed after ending the program (RUN). There is no cause for memory leak on global arrays.

Anatomy of an array
In GB32 an array is described using a variable name, a descriptor, and a piece of contiguous memory to store the array data. When the compiler hits on a global array declaration it will create a mapping between the variable name and an array-descriptor stored in the global data section. This is true for in-memory compiling and when an EXE is created. A local declaration introduces a mapping between a hidden local pointer variable (32-bits pointer) and the name. The hidden variable stores the pointer to the dynamically allocated array descriptor.
An array-descriptor is a structure defining the attributes of an array. This ArrayDesc - structure is defined like this (note how the last member reserves LBound/UBound information for a maximum of 7 dimensions):

Type ArrayDesc
 -Int Magic ' "arry" or "ArrY"
 -Int ptype ' vtType (internal const)
 -Int size ' size of datatype
 -Int dimCnt ' number of dimensions
 -Int dimCnt2 ' # of dimensions == IndexCount
 -Int paddr ' address of data == ArrayAddr()
 -Int corr ' correction value
 -Int paddrCorr ' void* addrCorr;
 -Int anzElem ' number of elements == Dim?()
 -Int sizeArr ' size in bytes == ArraySize()
 -Int Idx(7 * 3) ' == LBound()/UBound()
EndType

For global and static arrays an instance of this structure is stored in the global data section. For local arrays the structure is allocated dynamically. Important to realize is that every declaration (Dim/Global/Local/Static) of an array immediately results in an array descriptor dimmed or un-dimmed. The values of the structure members determine the status of the array. The Magic member is for internal use, although it perfectly well indicates if an array is empty – Erase-d or an empty declaration. Other members can be retrieved using the following functions.

Function Member ArrayDesc Description

Dim?(a()) anzElem (element count) Returns the number of elements in the array. Erase clears this value (sets to 0). One way to determine if an array has been ‘dimmed’.

IndexCount(a()) dimCnt2 Returns the number of dimensions. Returns 0 when not ‘dimmed’. Another way to determine if an array is empty.

ArrayAddr(a()) paddr Returns the memory address of the first element of the array. Returns 0 if erased or not ‘dimmed’. Can be used to determine if an array is empty.

ArraySize(a()) sizeArr Returns the size of all elements in bytes. Returns 0 if array is empty.

LBound(a()[,i=1]) Idx[] Returns the lower bound for a dimension (default is 1). Raises an error when an array is empty.

UBound(a()[,i=1]) Idx[] Returns the upper bound for a dimension (default is 1). Raises an error when an array is empty.

Only LBound and UBound cannot be used to inquire for an ‘un-dimmed’ array.
For the special case OLE Automation array-type ParamArray, LBound and UBound return 0 and -1 respectively; these functions do not raise an error (VB compatibility). The ParamArray datatype is in fact nothing more than a Variant containing an OLE/COM SafeArray.

Functions and statements that do not apply to arrays
An array variable is treated differently from any other variable type. The array’s variable name cannot be used in any other GB32 functions and statements as other variables can. For instance, the Clr a() statement is forbidden, ArrPtr() function does not return the location of the array’s variable name, but the location of the array-descriptor instead. You cannot use Pointer to redirect an array variable name to another descriptor. TypeName(ar()) cannot be used to obtain the data type of the array. Etc.

Generally, all GB-functions and statements that use a variable name as an argument are forbidden for arrays.

When the compiler refers to an array it refers to the descriptor directly. The compiler doesn’t preserve a mapping between the array’s variable name and a particular location of a pointer as it does with Strings for instance. The generated code simply doesn’t ‘know’ the array name anymore, only the location of descriptor.
There is only one runtime function that accepts the address of a (local hidden) pointer containing the address of an descriptor: CLEARARR() the local array destructor. This function cannot be invoked manually – not even when using assembler, because the address of the hidden variable is unknown. Asm lea eax, ar will not work, it still returns the address of the descriptor.

Posted by Sjouke Hamstra at 26.8.17 No comments:

Labels: Bug, Commands, Compiler, Subroutines, Variables

12 May 2017

ANSI, UNICODE, BSTR and converting

Update 28-06-2017 - Conversion from Ansi to Unicode: WStr() function.
For some reason the WStr() routine contained a stupid bug (that I now have fixed).

More info: The number of bytes to read from a BSTR-address was wrong. GFA-BASIC always uses the SysAllocStringLen(Null, lenbytes) when allocating COM String memory. The BSTR returned is preceded by a 32-bits value specifying the BSTR's number of bytes, not the number of characters! This is exactly the value needed when reading the BSTR-bytes into a String datatype using StrPeek(). So, the function should have been: StrPeek(BSTR, {BSTR-4}), see the updated function below.

Another point of confusion was about the number of terminating null-bytes that WStr() returned. The StrPeek() function in WStr() only copies the UNICODE characters from the BSTR to a String, without the two null-bytes that secretly follow a BSTR string. As a result, the UNICODE characters copied to the String datatype are followed by only one (1) null byte; the terminating null byte that each String secretly gets.
When a String of UNICODE characters is to be passed to a Wide API function, two null-bytes must be added 'manually'.

w$ = WStr("GFABASIC") + #0#0 ' assign two nullbytes

~~(削除) (削除ここまで)~~
The post as it was:
In the previous post I discussed UNICODE versus ANSI in the ANSI-based GFA-BASIC. Basically, GB doesn’t support UNICODE because it expects 1-byte characters where strings are used. In UNICODE each character occupies 2 bytes and allows more than 256 characters. Conversion ANSI to UNICODE is ok, but conversion from UNICODE to ANSI might lead to a loss of characters with a value above 256. But there is more: Variants and BSTRs.
The introduction of COM in GB required the provision of a new data type, the Variant. The Variant is a 16-byte data type that holds data and a value that specifies the type of that data(LONG, CARD, DOUBLE, etc). A Variant can also be used to store (safe-) arrays, a specific COM array type, and BSTRs, special UNICODE strings. So to understand the String and BSTR/Variant in detail ….

How a String is stored
Because a BSTR is much like a GFA-BASIC String data type, I’ll first tell how a GB String is stored. You could skip this part if you already know.
Declaring (Dim) a String-variable introduces a name for a location. The String-variable itself requires four bytes to store a pointer to dynamically allocated memory for the characters. The declaration and assigning a location is handled by the compiler, the rest happens at runtime: assigning or initializing. When the String-variable is initialized a call to malloc() reserves memory for all its characters with an additional 5 bytes. The first 4-bytes are reserved to store the length of the string and the last byte for the null-byte (not included in the length value). After allocating and copying the characters, the address of the first character of the string is stored at the variable’s location, a 32-bits address or pointer.

Global a$ ' 32-bits location(=0) in data or stack
a$ = "GFABASIC" ' assign pointer (address) to location
l = Len(a$) ' address <> 0 return length {address-4}
Clr a$ : a$= "" ' free memory, set locations to 0

- String in memory: [xxxx|cccccc…c|0]
- Initially, the variable is a null pointer, the contents of the variable’s location is 0.
- String variable points the address of the first character c.
- Length is stored in position address – 4, and does not include the terminating zero.

Obtaining the string’s length is a 2-step process. First the variable is tested for a non-null pointer and than the value of the preceding 4 bytes (string-address – 4) is returned.
- Clearing a string (or assigning an empty string “”) will free the allocated memory and reset the variable’s contents to 0.

BSTR in GB
GB does not provide a data type BSTR, but it provides limited support of hidden BSTRs to pass and obtain BSTR-strings to and from COM objects. GB handles the conversion and memory allocation for BSTRs, but it does not provide string-manipulation functions for BSTRs, or even BSTRs in Variants. More on this below.
BSTR variables are always temporary, hidden local variables used to communicate with COM properties/methods that take or return BSTR arguments. These hidden BSTR variables are always destroyed when leaving a subroutine. Even the Naked attribute won’t prevent the inclusion of the termination code.
BSTR strings are COM based strings. They are allocated from COM-memory and consequently the memory can be managed by both the provider of the COM-object provider and the client. That is the first difference. Next a BSTR contains UTF-16 coded wide characters, which I discussed in ANSI and UNICODE. The way COM stores a BSTR is much the same as GB stores a String variable. In fact, a BSTR is 32-bits location that stores a pointer to dynamically allocated memory with UNICODE formatted characters. The length of the BSTR is stored In front of the BSTR, again like GB’s String data type.

Use Variant for BSTR
Although, GB provides hidden support for BSTRs, the only way to get access to a BSTR is by using a Variant. The following example assigns a GB-String to a Variant. At runtime the code allocates a BSTR by calling SysAllocStringLen(0, Len(GB-String)) followed by copying the converted GB-String to the returned address. The address of the BSTR together with its data type is stored in the Variant. When the Variant variable goes out of scope, the BSTR from the Variant is released through a call to SysFreeString(address).

Dim vnt1 = "Hello"

Now it gets interesting. After GB invoked the SysAllocStringLen() COM API, it converts the ANSI string to UNICODE using a private conversion routine interspersing zero’s between the characters see ANSI and UNICODE. GB does not turn to the MultiByte*() APIs Windows provides, because GB supports ANSI characters only. In the conversion process to UNICODE no characters will be lost and the private function is extremely fast.
An optimized UNICODE conversion function
This knowledge makes it possible to obtain a UNICODE-string (not a BSTR) from a String argument through our own optimized conversion routine. Note

A UNICODE string is required if you want to use the Wide version APIs.
A UNICODE string does not have a length field in front of it. It is not a BSTR. It only specifies how much bytes a character occupies (2).
It’s memory is managed by the program through malloc() – no COM memory - and it ends with two null-bytes (although it seems 1 is ok as well).
The converted ANSI argument is placed in a String only because it is a convenient data type to store consecutive data.

The function makes use of the BSTR allocation and conversion functionality of the Variant.
(The $Export is there because it comes from a .lg32 file).

Function WStr(vnt As Variant) As String Naked ' Return UNICODEd string
 $Export Function WStr "(AnsiString) As String-UNICODE Naked"
Dim BSTR As Register Long
 BSTR = {V:vnt + 8} ' BSTR address at offset 8
 Return StrPeek(BSTR, {BSTR - 4}) ' <- 28-06-2017="" font="" updated="">_{^{(削除) 
 (削除ここまで)}}EndFunc

1. A function very well suited for the Naked attribute, because it does not contain local variables that contain dynamically allocated memory that would otherwise require explicit release code.
2. The argument of the function is ByVal As Variant. This forces the caller (calling code) to create a Variant and than pass it by value by pushing 16-bytes (4 DWords) on the stack. Whether the Variant is passed by value or by reference, the calling subroutine is responsible for freeing the BSTR stored in the Variant. However, ByVal is interesting because …
3. The GFABASIC-compiler provides a hidden optimization when you pass a literal string to a ByVal As Variant. A ByVal Variant requires16 bytes to push on the stack, but the UNICODE characters the Variant points to are already converted at compile time. Therefor the following call is extremely efficient:

Dim t$ = WStr("GFABASIC")

The GFA-BASIC compiler stores the literal string “GFABASIC” as a UNICODE sequence of bytes (2 per character) and does not need to allocate (COM) memory and convert at runtime. This also relieves the caller from releasing the BSTR-COM-memory, so the calling function doesn’t need to execute Variant destruction code.
Assigning a UNICODE formatted string this way, is almost as efficient as initializing a String with an ANSI literal string. It only takes a few cycles to call and execute the WStr() function.
4. The caller provides the String variable to store the return value of the function. That is the function’s ‘local variable’ WStr is silently declared in the calling subroutine. The hidden string is passed as a ByRef variable to the function. The return value (String) is directly assigned to the hidden variable. If an exception would occur in function Wstr() the termination code of the caller will release the hidden WStr string variable. (Therefor Naked is perfect for this function: it doesnot need to provide explicit release code.)
5. Inside the function you can see two more optimizations. First the local Long variable that stores the address of BSTR is a register variable; no stack memory and copying required. The other optimization is the Shl 1 expression that multiplies the length of the BSTR by 2. This results in an integer asm add eax, eax instruction, rather than a floating point multiplication. Also a significant optimization.
6. Other mathematic operations like V:vnt+8 and BSTR-4 are relative address operations and are properly compiled into indirect addressing instructions. So, no chance here to optimize.
I went in some detail to explain the function hoping you’ll find it useful. I hope to tell more about the way the compiler constructs subroutines and performs optimizations.

Posted by Sjouke Hamstra at 12.5.17 1 comment:

Labels: COM, Commands, Compiler, Subroutines, UNICODE, Variables

10 May 2017

Error free using a library

I use libraries a lot, but there are few things that make using them a bit obscure and may lead to the non-descript message "Load lg32 error: filename.lg32" in the status bar and debug window. In addition, a compiled library may produce strange runtime errors.

BUG - Runtime errors
When you run a project which includes a library it may generate strange, seemingly unrelated error messages. In particular the error "Hash Internal Error 1/2 (Version?)" pops up regularly. The reason for runtime errors inside the code of a library is a bug(!) in applying the setting for Branch Optimizations.

For a lg32 file, GFA-BASIC wants to apply the Full Optimization for Exe setting on the compiling process. However, it is never applied at all, because the code applies this setting in the wrong place, after the code is compiled ;). Consequently, the compiler switches to the trackbar/slider setting from Branch Optimizations.
This is a bug from a long time ago and it is simply never tested properly.

In general objectcode generated for a lg32 file is position independent, it differs from code generated for EXE (and GLL files). Therefor, the lg32-generated code for the jump-tables for Switch/Case statements and On n GoSub/Call statements are wrong (this is also true for a GLL, for which I always use the default settings).

The only setting that work flawlessly is the None setting of the slider in 'Branch Optimizations' and uncheck the 'Full Optimization' check box.

A lg32-file has to be compiled using the default settings for Branch Optimizations.
The slider must be set to the first position (None) and the checkbox Full Optimization for Exe must be unchecked.

Note - The slider setting is applied to compiling code in memory, independent of the required output file type (EXE, LG32, or GLL). The most right position (Full) is exactly the same as checking the Full Optimization for Exe - box. This way you can test fully optimized code inside the IDE.

Note - The branch optimizations of the compiler do not lead to remarkable performance results. These days with fast CPUs and large caches performance increase is hard to provide, the only real performance increase is accomplished by using Naked procedures. Remember however, Naked procedures do not include termination code and do not allow exception handlers.

The $Library statement
The $Library statement loads a lg32 file into memory. But sometimes it cannot locate the lg32 file. The IDE code to find a lg32 file is a bit complicated. In some conditions you may omit the extension and in others you cannot. It depends on the inclusion of a path in $Library statement. For instance, you may include a relative path (relative to the current directory, mostly the g32-file directory, but not necessarily), but than the extension must be provided. It's all a bit incoherent. But there is a solution that always works correctly. That is - the library is always located properly.

Solution for load errors
This solution adds more functionality to the $Library statement and so it complements the current functionality. You must add a (new) register entry to the GFA/BASIC key in the HKEY_CURRENT_USER/Software setting. The key must be named "lg32path" and the value can contain multiple full paths separated by commas. (The value uses the same syntax a the PATH environment variable).

New key: "lg32path", REG_SZ
Value: "C:\GFA\Include, D:\GFA\MyLibs"

Have fun with lg32 file.

Posted by Sjouke Hamstra at 10.5.17 No comments:

Labels: Bug, Commands, Compiler, Library

30 July 2015

Tip: How to create short jmp

When doing some assembler I noticed an important optimization setting for jumps. To force the compiler to generate short jumps (short jmp, short jz, short jnz) the Branch optimization setting must be >= 1.

GBPropertiesBranch

A short jump requires 2 bytes of opcode instructions. Otherwise a jump takes 6 bytes. Short jumps are only generates when the target is within 127 bytes offset, of course.

Posted by Sjouke Hamstra at 30.7.15 1 comment:

Labels: Compiler

02 December 2013

Using methods and properties through IDispatch

This is all about two – more or less – undocumented features; the GFA-BASIC _DispID() function and a special dot-curly operator, the .{dispID} syntax.

An object’s layout in memory is, when using a BASIC Object variable data type:

MemObj vtable (array of function pointers)

AddrOf vtable AddrOf QueryInterface()

RefCount% AddrOf AddRef()

…. AddrOf Release()

…. AddrOf GetTypeInfoCount()

… data AddrOf GetTypeInfo()

AddrOf GetIDsOfNames()

AddrOf Invoke()

AddrOf ComPropertyMethod1()

AddrOf ComPropertyMethod2()

…..

After the following commands the BASIC Object variable oIE holds a reference to an instance of Internet Explorer, an automation server and therefor guaranteed to support a dual interface.

Debug.Show
Dim oIE As Object
Set oIE = CreateObject("InternetExplorer.Application")
Trace Hex(Long{V:oIE})
Set oIE = Nothing

An Object variable is actually an Int32 data type and holds the address returned by CreateObject.
Initially, the Object variable is zero (or Null) and is interpreted as Nothing. The following statement does nothing more then checking if the variable oIE holds a value (is not zero):

If oIE Is Nothing Then _
 Debug.Print "No Object assigned"

To view the contents of the integer oIE we first needs its address. In GB32 you can obtain an address of a variable in several ways. You may use VarPtr(oIE), or V:oIE, or *oIE. After obtaining the variable address it must be read, or peeked, to get the contents. Reading a Long or Int32 from some address is done using Long{addr}. The Trace statement uses the Hex function to convert the value to a hexadecimal format, the usual format for a memory address.

From the previous blog post Using OLEVIEW we learned how to process the calling of a property or method through late binding. To summarize, to execute a method or a property of the object an automation client can:

Call GetIDsOfNames to "look up" the DispID for the method or property.
Call Invoke to execute the method or property by using the DispID to index the array of function pointers.

All BASIC languages support this behavior behind the curtains. They provide the object dot-operator to call properties and methods through late binding. For instance, when you want to check to see if Internet Explorer is visible you may call the Visible property:

If oIE.Visible == True Then _
 Debug.Print "Is visible indeed"

After compiling this code the program will execute oIE’s interface-function GetIDsOfNames to "look up" the DispID for the property. It will then generate code to call Invoke to execute the method or property by using the DispID to index the array of function pointers (vtable). Calling Invoke is a bit of a hassle and is best left to the compiler.

Now what when the server gets new functions and new names. Unfortunately, you would need to recompile and redistribute the client application before it would be able to use the new properties and methods. In order to avoid this, you could use a ‘CallByName’ function to pass the new property and method names as strings, without changing the application.

In contrast with other programming languages GFA-BASIC features a function called _DispID(). This function allows you to call the objects GetIDsOfNames function to "look up" the DispID for the method or property. When you remember the blog post on Using OLEVIEW you might have noticed that every property and method is assigned a unique ID (integer value). Using _DispId(0bject,Name$) we can obtain exactly that unique value. For instance, this will display the ID value 402 in the Debug output window.

Trace _DispID(oIE, "Visible")

Obtaining the dispId of a method or property is only useful when you can use the integer value to call the IDispatch function Invoke(). Specifically for this purpose GFA-BASIC 32 provides us with a might piece of equipment; the dot-curly operator. To call Invoke using the dispId you can simply replace the name of the property or method with ‘{dispId}’.

Global Long dispIdVisible
dispIdVisible = _DispID(oIE, "Visible")
Trace oIE.{dispIdVisible}
Trace oIE.{402}

How does VB6 do it?
If you’re interested, you should compare this elegance to the VB6 function CallByName. This function allows you to use a string to specify a property or method at run time. The signature for the CallByName function looks like this:

Result = CallByName(Object, ProcedureName, CallType, Arguments())


The first argument to CallByName takes the name of the object that you want to act upon. The second argument, ProcedureName, takes a string containing the name of the method or property procedure to be invoked. The CallType argument takes a constant representing the type of procedure to invoke: a method (vbMethod), a property let (vbLet), a property get (vbGet), or a property set (vbSet). The final argument is optional, it takes a variant array containing any arguments to the procedure.
Conclusion
Due to the elegant syntax, GB detects how to invoke the method or property. The parameters of a property or method don’t go in a Variant array. Due to the dot-curly syntax the parameters are specified as any other property or method call. The only thing you need to do yourself is retrieving the dispID, but this is of great advantage since now you are able to store the ID to use it over and over. The CallByName() function each time has to obtain dispID for the name passed.

Posted by Sjouke Hamstra at 2.12.13 No comments:

Labels: COM, Commands, Compiler, Ocx Object

07 April 2013

Favorite: On GoSub/GoTo/Call

One of my favorite programming constructs in GFA-BASIC is, and always has always been, the On value GoXxx statements. Although, I must admit, there was a time I doubt the usefulness and legality of this command, because C/C++ doesn’t offer such a branch instruction. At those times I was under the impression that it was a typical BASIC overkill command. However, it is definitely not, on the contrary! These statements let GFA-BASIC produce optimized branching code through the use of jump tables. C/C++ compilers can do this only when their ‘optimize to fast code’ - switch is set. Since C/C++ doesn’t support jump tables as a language construct, these compilers crunch together a switch/case block and try to turn it in a jump table just like GFA-BASIC does by default. These are the statements that produce highly optimized branching code without setting a compiler option.

On n GoSub label1, label2, …
On n GoTo label1, label2, …
On n Call Proc1, Proc2, …

My most favorite version is On GoSub. This allows for executing local subroutines labeled by a Label: and ended with a Return statement. This version is not compatible to C/C++, these compilers produce code compatible to GFA-BASIC’s On GoTo.

Not in an Editor Extension
As I mostly program editor extensions, I was surprised to got in problems with these branch statements. GFA-BASIC, like Visual C++, creates a jump table at the end of a function/procedure. However GFA-BASIC hard-wires the address of the jump table, with the presumption that the code starts at 400000ドル (hInstance). With dynamically loaded GLLs, that place was reserved for the IDE executable image (that is where the code is loaded). When the GLL reaches the execution of the On GoSub code it reads the execution address of the labels from the IDE’s code text section. To the GLL this code contains rubbish. When the GLL is loaded not all the addresses in the GLL code aren’t updated correctly.

As my goal is to create code as efficient as possible, I am a bit disappointed and maybe I will try to fix the GLL-loader.

Posted by Sjouke Hamstra at 7.4.13 No comments:

Labels: Commands, Compiler, Editor Extensions

15 July 2011

COM (OOP) in GB32 - IUnknown (1)

In this part I'll show you the layout of a COM class as it could be implemented in GFA-BASIC 32. Note that we restrict ourselves to a minimalist COM object that supports the IUnknown interface only. We will not yet create properties and methods.
The GFA-BASIC 32 approach is loosely based on the article series COM in plain C by Jeff Glatt, specifically Part 1 and Part 2. However, in the first step we are not going to use type libraries, we are not going to store the COM object in a DLL, and we are not going to define another COM object that creates the one we define. COM is merely a binary standard; it dictates how to layout an array of function pointers and where to put the address of this array (of function pointers). It also dictates how to add reference counting. To comply to this standard a COM object must at least contain three pointers to pre-defined functions, also known as the IUnknown interface. In GFA-BASIC 32 this could look like this:

// IUnknownBlog.g32 17.07.2011
Debug.Show
// Our COM object: an implementation of IUnknown
GUID IID_IUnknownImpl = 9578fdab-97cb-4322-99e4-699abd26be1d
Type IUnknownImpl
 lpVtbl As Pointer IUnknownImplVtbl
 RefCount As Int
EndType
// VTABLE (an array of function pointers)
Type IUnknownImplVtbl
 QueryInterface As Long
 AddRef As Long
 Release As Long
EndType
Static IUnknownImplVtbl As IUnknownImplVtbl
With IUnknownImplVtbl
 .QueryInterface = ProcAddr(IUnknownImplVtbl_QueryInterface)
 .AddRef = ProcAddr(IUnknownImplVtbl_AddRef)
 .Release = ProcAddr(IUnknownImplVtbl_Release)
EndWith
// First create a heap allocated instance
// of our implementation of IUnknownImpl
Dim pIUnk As Pointer IUnknownImpl
Pointer pIUnk = mAlloc(SizeOf(IUnknownImpl))
Pointer pIUnk.lpVtbl = *IUnknownImplVtbl
pIUnk.RefCount = 1
Dim obIUnk As Object
{V:obIUnk} = V:pIUnk
Trace obIUnk
Dim o As Object // assign to other
Set o = obIUnk // AddRef() call
// two calls to Release
/*** END ***/
GUID IID_IUnknown = 00000000-0000-0000-c000-000000000046
Global Const E_NOTIMPL = 0x80004001
Global Const E_NOINTERFACE = 0x80004002
Declare Function IsEqualGUID Lib "ole32" (ByVal prguid1 As Long, ByVal prguid2 As Long) As Bool
Function IUnknownImplVtbl_QueryInterface( _
 ByRef this As IUnknownImpl, riid%, ppv%) As Long Naked
 Trace Hex(*this)
 {ppv} = Null
 If IsEqualGUID(riid, IID_IUnknownImpl) || _
 IsEqualGUID(riid, IID_IUnknown)
 {ppv} = *this
 IUnknownImplVtbl_AddRef(this)
 Return S_OK
 EndIf
 Return E_NOINTERFACE
EndFunc
Function IUnknownImplVtbl_AddRef(ByRef this As IUnknownImpl) As Long Naked
 Trace Hex(*this)
 this.RefCount++
 Return this.RefCount
EndFunc
Function IUnknownImplVtbl_Release(ByRef this As IUnknownImpl) As Long Naked
 Trace Hex(*this)
 this.RefCount--
 If this.RefCount == 0 Then ~mFree(*this)
 Return this.RefCount
EndFunc

Copy it to a new GFA-BASIC 32 application and save as IUnknownBlog.g32. Try to run it; it should compile and run flawlessly. Our first minimalist COM object/class has been defined and is up and running. Of course it doesn't do anything, but the we have an object that integrates with GFA-BASIC 32 and fully complies to the COM binary standard.
Assign to Object
Let us take a brief look at the code. Since this project isn't meant for the beginner, I'll walk you through it in big steps.
The Object data type holds a pointer to a piece of memory of at least 4 (lpVtbl) bytes. Mostly it contains another integer for a reference count. In GFA-BASIC 32 the Ocx variables always define their count in the second slot, we will use that as well.
To put a memory address in an Object variable we usually use Set obj2 = obj2. GFA-BASIC 32 checks for two proper COM object types at compile-time. Since we have mAlloc-ed address only this syntax wouldn't work. We must write it to Object directly. For the same reason we set the RefCount to 1 by hand.
The array of functions (VTABLE) is stored in a Type and shared with all instances of our custom COM object. Therefor a static (global) variable is used and initialized once. Each new COM object should hold the vtable-address in its lpVtbl member.
_QueryInterface, _AddRef, and _Release
The application COM functions _QueryInterface, _AddRef, and _Release are never called directly, but always by GFA or another COM object. They clutter up our application code and are in fact just boiler plate code. We must do something about that, for instance put them in a $Library. Also note the Naked attribute. These functions are never executed in the context of the application and need no TRACE code and Try/Catch-exception handler. They should be as naked as possible to gain the best performance.
The Trace *this commands in the code are to verify the address passed. Also note the type of the this pointer. COM passes the address of the COM object by reference and by adding the correct type into the function declaration we can access its members directly. All other parameters are simple placeholders for addresses we don't use or pass on.
In the next part we will implement an IDispatch object and try to integrate the COM code more into GFA-BASIC 32.

Posted by Sjouke Hamstra at 15.7.11 1 comment:

Labels: Compiler, Ocx Object, Variables, Windows API

GFA-BASIC 32 for Windows

Pages