Thanks for the pointer. I hadn't thought of the "sliding" fifo algorithm. This idea looks nice but I wonder how it scales on large trees. It seems to me that memory allocation for lua tables in such a situation is not that great. But it might also be irrelevant.
I have worked my way into a nice iterative-deepening depth-first search which only tests each node exactly once and does not need to have any stack or fifo. If I find some time and a large enough xml tree, I will do some testing to compare speed of breadth-first search with the fifo you provided and the other algorithm.