The new EcmaScript standard, ES6, contains specifications for Map and Set built-in classes. Each of these collections treats an object-valued key as opaque and unique, e.g.:
var foo = new Set();
foo.add({bar: "baz"}).has({bar: "baz"}) // == false
However, there are cases in which one might want to treat similar objects as the same key. For this purpose, I wrote the following HashMap and HashSet classes. The objectives:
- High quality code (readable and maintainable)
- Two different objects with identical enumerable properties and corresponding values are treated as identical
- The value of a circular reference is considered identical if it refers to the same relative path from the object used as a key
- With the exception of the previous two objectives, instances of
HashMapandHashSetshould be fully interchangeable with instances ofMapandSetrespectively. In other words, where not conflicting with the above, the ES6 standard for instances of those classes (http://www.ecma-international.org/ecma-262/6.0/#sec-map-objects) should apply to instances of the corresponding classes as well.
Code below uses ES6 features supported by Firefox at present. Sadly that does not include class or super.
// This is for replacing cyclic references with relative names.
// as long as the reference is internally the same relative to
// the hashed object, they will hash equal.
const HASH_PATH = Symbol("hash path");
function hashVal(thing, name = "this") {
var keys, hash;
if (thing instanceof Array) {
hash = "[" +
thing.map((el, idx) =>
hashVal(el, name + "[" + idx.toString() + "]")
).join(",") + "]";
} else if (typeof thing === "object") {
if (thing[HASH_PATH]) {
hash = thing[HASH_PATH];
} else {
keys = Object.keys(thing);
thing[HASH_PATH] = name;
keys.sort((k1, k2) => k1 > k2 ? 1 : -1);
hash = "{" +
keys.map(key =>
key + ":" +
hashVal(thing[key], name + "." + key)
).join(",") + "}";
delete thing[HASH_PATH];
}
} else if (typeof thing === "string") {
hash = '"' + thing + '"';
} else {
hash = thing.toString();
}
return hash;
}
function HashMap(els, hash = this.defHash) {
this.hash = hash;
var elByHash = this.elByHash = new Map();
var keyByHash = this.keyByHash = new Map();
var el, key, hashKey;
if (els)
for ([key, el] of els.entries()) {
hashKey = this.hash(key);
keyByHash.set(hashKey, key);
elByHash.set(hashKey, el);
}
}
HashMap.prototype = {
get size() {
return this.elByHash.size;
},
clear() {
this.elByHash.clear();
this.keyByHash.clear();
return this;
},
delete(key) {
var hash = this.hash(key);
return this.keyByHash.delete(hash) && this.elByHash.delete(hash);
},
entries() {
return function*() {
var keys = this.keyByHash.entries();
var hash, key;
for ([hash, key] of keys) {
yield [key, this.elByHash.get(hash)];
}
};
},
keys() {
return this.keyByHash.values();
},
values() {
return this.elByHash.values();
},
set(key, val) {
var hash = this.hash(key);
this.keyByHash.set(hash, key);
this.elByHash.set(hash, val);
return this;
},
get(key) {
return this.elByHash.get(this.hash(key));
},
has(key) {
return this.keyByHash.has(this.hash(key));
},
forEach(cb, cx = this) {
var key, val;
for ([key, val] of this.entries()) {
cb.call(cx, val, key, this);
}
}, [Symbol.iterator]() {
return this.entries();
},
defHash: hashVal
};
// spec says set.values, set.keys, set[@@iterator] are same object
function hashSetValIterator() {
return this.elements.values();
}
function HashSet(els, hashFn = hashVal) {
var hash = this.hash = hashFn;
var elements = this.elements = new Map();
var key, val;
if (els)
for (val of els) {
elements.set(this.hash(val), val);
}
}
HashSet.prototype = {
defHash: hashVal,
get size() {
return this.elements.size;
},
add(el) {
this.elements.set(this.hash(el), el);
return this;
},
clear() {
this.elements.clear();
},
delete(el) {
return this.elements.delete(this.hash(el));
},
entries() {
var vals = this.values();
return function*() {
for (var v of vals) yield [v, v];
}
},
forEach(cbFn, thisArg = this) {
for (var v of this.values()) {
cbFn.call(thisArg, v, v, this);
}
},
has(el) {
return this.elements.has(this.hash(el));
},
values: hashSetValIterator,
keys: hashSetValIterator,
[Symbol.iterator]: hashSetValIterator,
defHash: hashVal
};
Does this code meet the stated objectives? How so or why not?
I elected to not subclass Map or Set. While I intended to reproduce their interfaces to ensure maximum portability into code currently using the built-in classes and zero cost to learn, the new classes do not fulfill exactly the same contract as the built-ins. In addition, they don't need to borrow any methods from the built-ins, and there are no opportunities for them to exploit a superclass relationship via super as far as I know. Thus, subclassing would be a violation of substitution principle with no practical payoff. If you disagree, I would like to hear about it.
PRE-REVIEW REVISION (no review exists at the time of this revision, so CR policy of forbidding edits to code is not applicable)
The use of the HASH_PATH symbol in the hashVal function to temporarily associate an object with its relative path within the key object is a subtle but important mistake. The following are consequences:
- An object may be frozen or inextensible; the creation of a privately scoped property via adding a closured-key property will throw an error in this case.
Proxyobiects are now widely available, and assignment anddeleteto aProxy-handled object's property may have unpredictable consequences.- The value of
HASH_PATHcould be leaked to code in the scope that defines theProxyhandler, violating the assumption of locality. Proxyhandler methods for these operations are permitted to have side effects with no desirable contract with theHashMaporHashSet.
- The value of
- If the object is
null, adding and deleting properties will cause an unwanted exception (although this can be avoided without abandoning a privateSymbol-valued key).
Tentative solution (pending review) is
- Add a parameter to hashVal, ns = new WeakMap which maps object-valued properties to name paths
- The type of the name path is open for suggestion. To consider:
- Strings formed by concatenation of keys and appropriate punctuation are temptingly simple for objects whose keys are all strings following the grammatical production identifierName, but this is far from guaranteed and overcomplicated to detect.
- A key whose value is a Symbol (widely implemented feature) had no accessible String representation that is different from non-equal Symbol values, leading to identical object path strings for paths that actually differ. Open to suggestion, but one possible mitigation is introduction of a second object-specific local Map of Symbol-valued keys to unique String identifiers. Measures should be taken to ensure that non-identifierName keys are escaped or bracketed or both, whereas path substrings are marked with a distinguishing token remaing unescaped and unbraceket. Again suggestions are welcome.
1 Answer 1
There really isn't too much that can be said about this code - probably the reason it has gone unanswered for so long. That said, there are a few recommendations I can make while keeping in mind the state of browsers when the code was written, and a few more considering browsers today. Most of this is probably nitpicks.
HashSet.prototype.defHashis included twice, at the top and at the bottom of the prototype declaration, since it is not used (constructor function directly referenceshashVal), get rid of it.HashMap.prototype.defHashis not a good name.defaultHashis only a few characters longer and it's worth it for the increase in readability. Additionally, sincehashValis in scope, I would prefer referencing it directly in the constructor instead of referencingthis.defHashasHashSetdoes.Prefer
letandconsttovarin practically every case. In this code, I would prefer it in every case.Prefer
Array.isArraytoarr instanceof Arrayas the latter will fail when running in a frame. See this SO question for details.The spec for both
MapandSetrequires that they be constructed, not just called. This code does not and thus is not completely compliant with the spec.Don't duplicate code, in the
HashSetconstructor you can usethis.setand in theHashMapconstructor you can usethis.addto add values. This is better than duplicating the logic for adding elements.Object.keysonly returns enumerable properties. This may cause some unexpected equality values. To fix, useObject.getOwnPropertyNamesandObject.getOwnPropertySymbolsorReflect.ownKeys. SinceReflect.ownKeysincludes symbols, which this function really can't handle yet, I would useObject.getOwnPropertyNameshashValdoes not behave nicely when passed HTML elements.The sort function for the keys would be better expressed as
(a, b) => a.localeCompare(b)especially since it avoids the potential problem of someone copying the code and using the current comparison function where there may be duplicates in the array and thus the sort order would become undefined.hashValis difficult to read. The cause of this isn't the logic in it, but the fact that you have to remember too much at once. If you utilize early returns instead of a chain ofif..else, it immediately becomes easier to understand without any change to the logical flow since you can immediately stop thinking about a block once it's finished.function hashVal(thing, name = "this") { if (Array.isArray(thing)) { const elementHashes = thing.map((el, i) => hashVal(el, `${name}[${i}]`)); return "[" + elementHashes.join(',') + "]"; } if (typeof thing == "object" && thing != null) { if (thing[HASH_PATH]) return thing[HASH_PATH]; const keys = Object.getOwnPropertyNames(thing); thing[HASH_PATH] = name; const keyHashes = keys.map(key => `${key}:${hashVal(thing[key], `${name}.${key}`)}`); delete thing[HASH_PATH]; return "{" + keyHashes.join(',') + "}"; } if (typeof thing === "string") { return `"${thing}"`; } return thing.toString(); }Using a
WeakMapto store keys is a good idea. I believe the solution proposed in the question to handle symbols is probably the best way to do so. In the implementation below, I have used this method and simply added a symbol index to define it's unique string value.
I've updated the original code with new features available in the latest Firefox and the above feedback, below is the resulting code with a few tests.
const symbolNames = new Map()
function hashVal(thing) {
const savedPaths = new WeakMap();
function hashSymbol(symbol) {
if (!symbolNames.has(symbol)) {
symbolNames.set(symbol, symbol.toString() + ` - ${symbolNames.size}`)
}
return symbolNames.get(symbol)
}
function hash(thing, name) {
if (Array.isArray(thing)) {
return '[' + thing.map((el, i) => hash(el, `${name}[${i}]`)).join(',') + ']';
}
if (typeof thing === "object") {
if (savedPaths.has(thing)) return savedPaths.get(thing)
savedPaths.set(thing, name)
const keyHashes = Object.getOwnPropertyNames(thing)
.sort((a, b) => a.localeCompare(b))
.map(key => `${key}:${hash(thing[key], `${name}.${key}`)}`);
const symbolHashes = Object.getOwnPropertySymbols(thing)
.sort((a, b) => hashSymbol(a).localeCompare(hashSymbol(b)))
.map(symbol => {
const key = hashSymbol(symbol)
return `[${key}]:${hash(thing[symbol], `${name}[${key}]`)}`
})
return '{' + keyHashes.concat(symbolHashes).join(',') + '}'
}
if (typeof thing === "string") {
return `"${thing}"`
}
if (typeof thing === "symbol") {
return hashSymbol(thing)
}
return thing.toString()
}
return hash(thing, "this")
}
class HashMap {
constructor(elements, hash = hashVal) {
this._hash = hash;
this._elements = new Map();
this._keys = new Map();
if (elements) {
for (const [key, val] of elements.entries()) {
this.set(key, val);
}
}
}
get size() {
return this._elements.size;
}
clear() {
this._elements.clear();
this._keys.clear();
return this;
}
delete(key) {
const hash = this._hash(key)
return this._elements.delete(hash) && this._keys.delete(hash)
}
entries() {
return function * () {
const keys = this._keys.entries();
for (const [hash, key] of keys) {
yield [key, this._elements.get(hash)];
}
}
}
keys() {
return this._keys.values();
}
values() {
return this._elements.values();
}
set(key, val) {
const hash = this._hash(key);
this._keys.set(hash, key);
this._elements.set(hash, val);
return this;
}
get(key) {
return this._elements.get(this._hash(key));
}
has(key) {
return this._elements.has(this._hash(key));
}
forEach(callback, thisArg = this) {
for (const [key, val] of this.entries()) {
callback.call(thisArg, val, key, this);
}
}
[Symbol.iterator]() {
return this.entries();
}
}
class HashSet {
constructor(elements, hash) {
this._hash = hash;
this._elements = new Map();
if (elements) {
for (const element of elements) {
this.add(element)
}
}
}
get size() {
return this._elements.size;
}
add(element) {
this._elements.set(this._hash(element), element);
return this;
}
clear() {
this._elements.clear();
}
delete(element) {
return this._elements.delete(this._hash(element));
}
entries() {
return function* () {
const values = this.values();
for (const v of values) yield [v, v];
}
}
forEach(callback, thisArg = this) {
for (var v of this.values()) {
callback.call(thisArg, v, v, this);
}
}
has(element) {
return this._elements.has(this._hash(element));
}
values() {
return this._elements.values()
}
}
HashSet.prototype.keys = HashSet.prototype.values;
HashSet.prototype[Symbol.iterator] = HashSet.prototype.values;
const s = Symbol('s');
const s2 = Symbol('s2');
const s3 = Symbol();
const s4 = Symbol();
const obj1 = {
a: 1,
inner: [ s, s2 ],
[s]: 's',
[s3]: 's3',
[s4]: 's4'
};
obj1.circular = obj1
function addNonEnumerable(obj) {
Object.defineProperties(obj, {
[s2]: { enumerable: false, value: 's2'},
b: { enumerable: false, value: 2 }
});
}
const obj2 = { ...obj1 }
obj2.circular = obj2
addNonEnumerable(obj1)
addNonEnumerable(obj2)
const map = new HashMap();
map.set(obj1, "obj1")
console.log("Resulting hash:", hashVal(obj1))
console.log("Has 2 after inserting 1 =>", map.has(obj2))
console.log("Get with 1 == get with 2 =>", map.get(obj1) === map.get(obj2))
-
\$\begingroup\$ 1. Good catch on the doubling, but I would rather fix the constructor to use
this.defHash. It's there and used in the constructor because you need to be able to override it in a subclass. For example, suppose you want to hash html elements or include non-enumerable keys, but don't want to specify your hash function as a parameter every time younew, make a subclass and overridedefHashand you're done. 2. Fair point, but I'm keeping it as a property. 3. Agreed, maybe it wasn't supported at the time. 4. Good point. 5. At the time there was nosuperornew.target... \$\endgroup\$sqykly– sqykly2018年03月21日 06:31:59 +00:00Commented Mar 21, 2018 at 6:31 -
\$\begingroup\$ ... So that would leave subclasses in a pickle. 5. No, you can't. Calling any method before the constructor returns throws an exception. Otherwise I would have done that. 7. This was by design, so you can hide private or irrelevant properties. 8. True, but giving it a special case for HTML elements could break node code, when you can just as easily subclass and override the hash function. 9. Probably should either way, but how can you get duplicate keys without digging into prototypes? 10. I like making the logic explicit. Maybe I should take a poll. 11. Yeah I went with "@n"... \$\endgroup\$sqykly– sqykly2018年03月21日 06:44:44 +00:00Commented Mar 21, 2018 at 6:44
-
\$\begingroup\$ Oops, that was supposed to be a 6. \$\endgroup\$sqykly– sqykly2018年03月21日 06:51:52 +00:00Commented Mar 21, 2018 at 6:51
-
\$\begingroup\$ 1. Good point, totally forgot about subclassing as I so rarely use it. 5. Classes are just sugar. A proper check would still work as expected. jsFiddle 6. Errr, no, you can call methods. jsFiddle 7. Good point, didn't consider that. 8. I agree - that's why I left it out in my implementation as well. 9. You're right, you can't get duplicate keys in this specific scenario. \$\endgroup\$Gerrit0– Gerrit02018年03月21日 16:25:23 +00:00Commented Mar 21, 2018 at 16:25
hash = thing.toString();, doing(null).toString()(or withundefined) would error, so I would recommend instead doinghash = "" + thing; \$\endgroup\$