LuaHashMap is a hash table library/implementation for use in C. Since C lacks a hash table in the standard library, this library provides a way to fill that hole. But instead of reinventing the wheel again to implement a hash table for C, LuaHashMap cleverly wraps Lua to leverage a proven implementation, while providing a friendly C API for hash tables without needing to learn/use the low-level Lua C-API.
For those not familar with Lua, Lua is a scripting language implemented in pure ANSI C designed to be embedded in larger applications. It is the de-facto scripting language used in the video game industry for the past 15 years (e.g. Escape from Monkey Island, World of Warcraft, Angry Birds) as well as other non-gaming high profile products such as Adobe Lightroom, Wireshark, and MediaWiki (Wikipedia). Lua is fast, small, lightweight, simple, and under the MIT license. And tables are the sole data structure in Lua so the underlying hash table is fast, reliable, well documented, understood, and proven.
So rather than reinventing a new hash table implementation, LuaHashMap simply wraps Lua's to mimimize bugs and leverage known performance and behaviors. But to you (the end user/developer), LuaHashMap provides a simple/straight-forward C API for hash tables and Lua is simply an implementation detail, so if you don't know anything about Lua, it is not a problem. (But for those wondering about the implementation details, LuaHashMap itself is completely written in C and uses Lua's C API exclusively without any actual Lua script code.)
LuaHashMap is designed to be easy to use and convenient. LuaHashMap can work with keys and values of the following types: strings, numbers, and pointers. The API provides explicit function names for each permutation of types to avoid macro hell and hard to debug situations. (Though C11 users, check out the _Generic macros to use C++ like overloading while still preserving a degree of type safety, and without all the namemangling problems.) And there are no hashing functions you need to figure-out/write/provide. (All of this was already done/tuned by Lua itself.)
LuaHashMap is ideal for projects that may agree with one the following:
LuaHashMap may not be ideal for projects that:
LuaHashMap provides explicit interfaces for 4 types: integers, floating point, strings, and pointers. There is no macro-hell here. (Yes, I typed-out/implemented all the permutations. You can thank me later.) So you get nice straight-forward compiler warnings and more type safety.
The general pattern for all the permutations of the APIs is: LuaHashMap_SetValue<TV>ForKey<TK> where TV is the value type and TK is the key type. The names are "String", "Pointer", "Number" (for floating point), and "Integer".
There are two general ways to access the hash map, either by key or by iterator.
The following example inserts values into the table with string keys.
Iterators in LuaHashMap are conceptually similar to the C++ STL notion of an iterator (but much simpler). They let you iterate (traverse) through a hash table and let you get and set values. Here's a similar example that prints everything in the hash table using an iterator.
Of course this is just a taste. There are a lot more APIs available to make this a full featured library.
Iterators are the way to iterate/traverse through a hash table. To keep things simple, notice that the GetIterator family of functions all return a full copy of a struct (LuaHashMapIterator).
A major concept with iterators to notice is they are simply a struct that are intended to live on the stack when you use them. This allows you to use iterators without worrying about leaking. They are also generally seen as short term, inexpensive objects you should be able to refetch at will (an O(1) lookup by key).
Functions that operate on Iterators like IteratorNext() operate on the iterator by reference so it can change the data in the struct. The pattern is to pass your struct in by reference: LuaHashMap_IteratorNext(&hash_iterator)
In Lua (and thus LuaHashMap), it is safe to remove a key/value entry from the hash table while iterating through it. Note that Lua does not reshuffle entries or compact the table when clearing entries.
Unlike removing, it is not safe to add new entries to the table while iterating because the table may reshuffle.
Also of note, there are two ways to extract values from iterators:
The "Cached" value is the value copied in the struct when the iterator was last created/iterated/updated. The non-cached version incurs a full lookup in the hash to find the value. Generally, if you have the iterator and haven't modified the hash table behind the back of the iterator (e.g. use a non-iterator LuaHashMap function to change or remove a value), the cached value is correct and will be faster. Proper programming style of the LuaHashMap iterator functions should keep the "Cached" value up to date so you should rarely (if ever) need the non-cached version.
This demonstrates a typical pattern for using iterators to traverse an entire hash map. You get an iterator at the begin position and another at the end position. You increment your begin position iterator with ItereratorNext() as long as it does not equal the end iterator.
Alternatively, if you know you have at least one entry in the hash, this is a slightly more succinct pattern which uses a do-while loop and the return value of IteratorNext.
Another pattern is to get a value only if it exists. The non-iterator versions of the GetKey family of functions can't distinguish between a non-existent entry and a NULL/0/empty value. While you could use the ExistsKey family of functions and then get the value with the GetKey family of functions, this will cause you to do two hash table look ups instead of just one. Instead, you should get the iterator and use the ExistsAtIterator function followed by GetCachedValue so only one hash table look up is done.
I said you didn't need to know anything about Lua. Well, that's mostly true, but there are some interesting details you should know about in Lua which may be useful.
Canonical (unmodified) Lua only has one number type, which by default is double. Double has enough bits to store a 32-bit integer without any loss of precision which is why this works. But you may notice that LuaHashMap has APIs to deal with both lua_Number (floating point) and lua_Integer (integer). Canonical Lua converts everything to lua_Number so if you provide a lua_Number key that happens to be the same value as a lua_Integer key, these may in fact be the same key.
(But if you are using a patched version of Lua (e.g. LNUM) Lua that may avoid conversion and keep types separated. Please note that some of the iterator based functions in LuaHashMap may still convert due to the lack of a formal way to detect integer types.)
As an implementation detail, all strings in Lua are internalized which means there is only one immutable instance of a string within Lua for duplicate strings. This potentially can be exploited to yield further optimizations in your code.
LuaHashMap does not rely on this fact, but do note that the APIs return the internalized string pointer when you add a key (which may be different than the one you fed the API). This was intended to allow you to exploit this fact if you choose in your own code.
However, if you use alternative implementations of Lua, they are not guaranteed to behave in this same way.
Related to Lua's string internalization, for LuaHashMap, this means when you add a string into the hash table, Lua creates its own copy so you don't need to hold on to the original.
But be careful. If you Get a string from LuaHashMap and then completely remove the entries that are holding it, the memory could be deleted out from under you if you still are holding on to the string pointer and trying to use it.
LuaHashMap does exploit an implementation detail of Lua for functions that return strings, such as const char* LuaHashMap_GetValueStringForKeyString. In the Lua API, you communicate between C and Lua using a stack API provided by Lua. When you are finished with an operation, the stack should generally be returned to the state you started in. That means when this function finishes, the Lua stack should be reset.
Technically, the pointer to string that is returned is not guaranteed by the Lua specification to still be valid and the pointer could be dangling. But as an implementation detail, the Lua authors have stated that this pointer is still valid as long as the string is still in Lua (e.g. not removed).
It is likely that all implementations of Lua share this behavior (since moving collectors are hard to deal with in C for this exact reason of pointers), so this implementation detail is probably safe. However, if you do use a completely different implementation of Lua that is not from PUC-Rio, you may want to verify this behavior.
If you create a hash with sequential, continuous integer keys, you may trigger an optimization in Lua which treats that as an array. This will yield very fast performance as Lua is working with C arrays instead of hashing.
This is a behavior of Lua which LuaHashMap keeps. (It is related to the fact that Lua does not recompact tables when keys are removed.)
Also note that the LuaHashMap_Clear() function will only clear the entries from the table. The number of allocated buckets will not shrink. If you want to clear all the entries and free the memory, use LuaHashMap_Purge().
The book "Lua Gems" gives wonderful insights to how tables are designed and work in Lua, and what their behavior and performance characteristics are. The free sample chapter tells you everything you need to know.
http://www.lua.org/gems/sample.pdf
Unfortunately, using the standard API, each LuaHashMap instance also creates an entirely new Lua virtual machine instance. On a 64-bit machine, this amounts to about 4-5KB of RAM (measured on a Mac). For large data sets, you will not notice, but for small data sets and a lot of separate instances of LuaHashMap, this might eventually add up.
So an addition to the standard API, a CreateShare API exists to allow you to create a new instance of LuaHashMap without creating an entirely new virtual machine instance. Instead it will simply create a new Lua table instance in the existing virtual machine and not incur any more RAM overhead.
To the rest of the LuaHashMap API, everything is transparent and requires no other special handling (everything except CreateShare and FreeShare). The hash map instance looks like a separate instance with its own hash and you may think about them in those terms.
Technically speaking, the original and shared maps are peers of each other. The implementation does not make a distinction about which one the original is so any hash map with the lua_State you want to share may be passed in as the parameter.
Every CreateShare should be balanced by FreeShare. Free should balance the original Create and should be the very last thing to be called. (Free will destroy the entire virtual machine instance as it calls lua_close().)
Lua supports mixed types (i.e. numbers, strings, pointers) in the same table. However you need to be careful about doing this with LuaHashMap since typing systems are not as dynamic/automatic in C.
Putting in mixed types is easy. You simply call the normal API functions as you always do:
But accessing the key/value pairs is the real trick. The fundamental rule is that you must call the correct type function to access the value. So if you know for a fact that for the "string_key1" is a number type (from the last line above), then you can do as you always do.
In the case where you don't know what the key types and value types are, iterators are the solution. To help support mixed types, three functions are provided:
These return the int values defined by Lua which are LUA_TSTRING, LUA_TNUMBER, LUA_TLIGHTUSERDATA for strings, numbers, and pointers respectively.
You may have noticed that there is no distinct type for integers in this list. This is an extremely important point. You must understand that Lua only has one number type and thus integers and numbers (doubles in stock Lua) are the same. So an integer key of say 100 would be the same as a number key of 100.00, so be very careful about understanding which of your keys are unique. In addition, when you Get a number, there is no identifying information (in stock Lua) about whether the number was originally an integer or number, so it is up to you to decide if that number is really an integer or not.
That said, here is an example that iterates through a collection and dynamically figures out the correct types and calls the correct API functions to extract the values.
So while possible to mix types in a single hash map instance, you may find it cumbersome to deal with depending on what you are doing. You may find keeping separate hash map instances may be easier to work with. But if mixing types is advantageous for you, then this is a powerful feature of Lua/LuaHashMap at your disposal.
For kicks and giggles, there is also a STL (C++ Standard Template Library) like template interface wrapper included which gives an STL hash_map/unordered_map like interface for LuaHashMap.
Theoretically you might be able to switch between implementations fairly easily using a strategic typedef. (Though if you can use the STL, I'm not sure why you would be here. And yes, this STL interface was a colossal waste of time.) This probably won't be maintained over the long haul.
Yes, I said there is "no macro-hell" earlier. But trust me; this isn't really that bad and it's optional. (And it's really quite cool.) C11 introduces _Generic which extends the C preprocessor to allow you to map different functions to a single macro function name based on their parameter types. Essentially this brings C++-like function overloading to C, but without all the name-mangling problems and other C++ baggage. And this was a lot less code than the C++ partial template specialization mentioned above. So for example, these 16 following functions:
can now be called with just this one macro:
The preprocessor/compiler will look at your types and automatically inject the correct version of the function to call.
We can take this a step further with a few more macro tricks from C99. Utilizing these features, we can also handle variable number of arguments to also unify the WithLength versions of the String permutations:
as well as the Iterator versions:
So all of these can be combined and reduced down to just:
So for example:
For convenience, if C11 _Generic support is detected on your compiler, LUAHASHMAP_SUPPORTS_GENERICS will be defined to 1.
Refer to the API documentation to see all the macros available.
This is another (experimental) flag designed to put your table instances in the main Lua script's global table instead of the (private) registry. The advantage of this is if you are using Lua in your project already and have Lua tables in your scripts that you would like to easily interoperate with on the C side. To activate this, recompile with the flag LUAHASHMAP_USE_GLOBAL_TABLE defined.
Also of related note, there are defines for LUAHASHMAP_SETTABLE and LUAHASHMAP_GETTABLE in the implementation which are set to lua_rawset and lua_rawget. Particularly if you are doing advanced things with tables in your scripts, you may want metamethod behaviors (which raw* bypasses for speed). So in that case, you'll want to redefine these to lua_settable and lua_gettable.
Yes, I actually did benchmarks. Check out the main site: http://playcontrol.net/opensource/LuaHashMap
Yes, I actually did unit tests too. You might refer to these for more examples on how to use LuaHashMap.
Note: Depending on your compiler and platform, you may see a lot of compiler warnings in the test programs. These are from using fprintf to print values to the console for visual spot checking/testing. To make these compile cleanly, C99 format tokens are very convenient to handle unpredicatable sizes for standard typedefs. But since this code needs to compile on legacy compilers, it is not possible to take advantage of these. (The C11 test should compile cleanly on C11 compilers.) The actual LuaHashMap library should compile cleanly so these test warnings are not really an issue.
I think LuaHashMap is pretty much done.
Official LuaHashMap Site http://playcontrol.net/opensource/LuaHashMap
Lua Performance Tips (explains tables) http://www.lua.org/gems/sample.pdf
Mailing List Reference: How is string passed from Lua to C http://lua-users.org/lists/lua-l/2011-12/msg00222.html
Programming in Lua http://www.lua.org/pil/
Lua Reference Manual http://www.lua.org/manual/5.1/manual.html
Please look at LuaHashMap.h for the complete list of API functions.
And for more examples, refer to the unit tests in the repository.