Archives
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011
- November 2010
- October 2010
- August 2010
- July 2010
Hash Tables FTW
Over the last few weeks I did a bit of tinkering on a hobby software project. The code was written by two other people during the last several years and I finally managed to find some time to fix a couple of bugs.
The project is writing a functional clone of WGML 4.0 aka Watcom SCRIPT/GML, a text processor that came out of mainframe SCRIPT tradition (which goes back to 1968 or so). WGML is used by the Open Watcom project to produce documentation in a large number of formats: Windows and OS/2 native help files, homegrown Watcom help format, HTML, CHM, and last but not least PostScript/PDF.
Several slightly different WGML executables exist, and run either under OS/2 or 32-bit extended DOS. Running those on modern systems is possible (via DOSBox and such), but quite painful, and creates undesirable dependencies. The source code to WGML was presumably lost decades ago.
“Why don’t you just convert the documentation to some other format” is what a lot of people said, and when asked to show how, the discussion was over. Because it’s actually not that easy. I’m sure it’s not impossible but it’s much harder than it seems. As it turns out, SCRIPT includes a fairly flexible language with macros, variables (symbols), built-in functions, conditionals, and loops, and the documentation (several thousand pages) makes use of all that.
So we’re looking at either a huge amount of extremely boring and mind-numbing work converting the documentation source files, or a huge amount of interesting and challenging work rewriting the WGML text processor. Given that no one is paid for it, only the latter is actually a realistic option.
A number of years later we’re at a point where the rewritten WGML can process nearly all of the Open Watcom documentation and produce output that in many cases perfectly matches the output of old WGML 4.0, although bugs still remain. In the process of fixing bugs, it’s necessary to often re-process existing documents to make sure nothing broke. And that got boring because the rewritten WGML wasn’t all that fast.
Now, fast or slow is always relative. The new WGML was, I believe, always faster than the old DOS version. Which means that processing a 1,000 page manual might take under 10 seconds, but there is about a dozen of such manuals. And they get built in a handful of different formats, which means the time really adds up.
So I thought I’d look into where the time was being spent. The first culprit turned out to be symbol lookup.
SCRIPT Symbols
The symbols or variables in SCRIPT work somewhat like makefile macros. They might be used like this:
:set symbol="company" value="Open Watcom". :set symbol="watc" value="&company C/C++". A manual for &watc..
After the symbols get substituted, the result will be
A manual for Open Watcom C/C++.
An interesting catch is that if a symbol is not found in a dictionary, nothing gets replaced. Hence a code fragment such as
aptr = &array[0];
will be output without changes, assuming that a symbol named array does not exist.
The newly implemented WGML was using simple linked lists for all symbol dictionaries. There are, roughly speaking, local symbol dictionaries for macros and included files, and there is a global dictionary. The local dictionaries tend to be quite small, typically with under five symbols in them, sometimes even none. They also usually don’t need to satisfy a lot of lookups.
The global dictionary is a different story. On a larger document (the C library reference), there were 406,027 symbol lookups against the global dictionary. There dictionary was eventually filled up with almost 200 symbols, and not surprisingly, repeatedly searching such a list is not fast. To satisfy those 406,027 lookups, the code needed to perform 43,697,703 string comparisons.
I set out to replace the global dictionary linked list with a hash table. However, I decided to leave the local dictionaries alone, as the overhead of extra memory and hashing didn’t seem worthwhile.
With a hash function borrowed from the C compiler, and using a hash table with 241 entries, the number of string compares went from down 43,697,703 to 681,408. To put it differently, the linked list implementation needed on average to go through a little over 100 items before it found (or didn’t find) the desired symbol. In contrast, the hash table implementation on average needs to go through under 1.7 items. That is a huge difference.
The effect unfortunately was not as tangible as I expected. The change shaved around 5% off the execution time.
Macro Lookup
But it made other bottlenecks easier to see: Macro lookup and SCRIPT keyword lookup. When script encounters an input line such as
.br
it first checks whether there is a macro called br. If there isn’t, SCRIPT checks whether .br is a valid control word (which it is) and executes it.
The Open Watcom documentation is rather macro heavy. There is just one global macro dictionary, and when processing my test document, it needed to satisfy 2,321,499 macro lookups. This led to several hundred million string comparisons in the original linked list implementation.
At first I replaced the macro dictionary hashing with the same code as the symbol lookup. And that worked… mostly. Some documents didn’t build and at least one regression test failed.
It turns out that there’s an oddity related to macro name length. WGML only considers up to 8 characters when looking up macros. That required a change to the hashing function so that it not only stops at the end of a string, but also after 8 characters if the input string is longer.
With this adjustment, WGML worked again and to satisfy 2,321,499 macro lookups, only 1,873,022 string comparisons were needed, i.e. about 0.8 string comparisons per lookup.
That may sound strange, but there is a simple explanation: Many of the macro lookups search for overridden SCRIPT control words, but there is no override. The corresponding hash table entry is empty, and there is no string to compare.
The macro dictionary modification had a significant effect, and reduced the WGML run time by about 20%.
Control Word Lookup
The next item on the list was SCRIPT control word lookup. There are over 120 control words and the existing code linearly searched through an array. This, again, was not fast.
The SCRIPT control words, with a few exception, consist of two alphabetic characters. Instead of hashing, I decided to use a direct lookup table. A 26 ×ばつ 26 table holding one-byte indices into the control word array does most of the word. The odd cases are control words .H0 to .H9 and ... (used for labels). These are handled specially in the lookup function.
Changing the SCRIPT control word lookup reduced the run time by another 5% or so. A small but worthwhile change.
Old vs New
This isn’t really a criticism of the original rewritten WGML code. With such a project, it is very difficult to guess at the outset where the performance bottlenecks are going to be. Trying to optimize rarely used code is not useful. It’s much easier to get the code working and then analyze what needs speeding up.
After my round of improvements, I thought it’d be interesting to compare the performance of rewritten WGML against the original. I used a 32-bit Windows 7 VM for that, since the old DOS WGML can run directly there.
I chose one of the largest documents, an online help file combining a number of separate manuals. The old WGML 4.0 needed about 12.3 seconds to process the manual. The optimized rewritten WGML took 2.9 seconds to process the same manual.
That’s quite a difference! The new WGML, at least in some cases, needs a little under 25% of time compared to the old one. The old WGML might incur some NTVDM overhead which could skew the comparison somewhat, then again that’s not really avoidable so it’s still a fair comparison.
The new WGML now feels noticeably faster… because it is.
4 Responses to Hash Tables FTW
Were the changes committed to the OW repository?
Sure, it’s in the semi-private Perforce repository at openwatcom.org.
“With such a project, it is very difficult to guess at the outset where the performance bottlenecks are going to be. Trying to optimize rarely used code is not useful. It’s much easier to get the code working and then analyze what needs speeding up.”
Did you actually profile the code? Most of the text implies optimization was based on informed intuition, but the above quoted text also indicates some profiling might have been done.
Yes, I used the Watcom profiler, which is a simple sampling profiler that did the job pretty well.
This site uses Akismet to reduce spam. Learn how your comment data is processed.