I have written two hash map implementations, one that stores Vertices and one that stores self implemented vectors of Edges. They are all fully functioning, however in my application they are quite slow and was hoping if anyone could point out how to improve their speed.
Vertex Hash Map
public class VertexHashMap
{
public double noofitems;
public Vertex[] data;
int resize;
public VertexHashMap(int initlen)
{
noofitems=0;
data=new Vertex[initlen];
this.resize = 67;
}
public void AddItem(Vertex value)
{
if ((noofitems/data.length) > 0.7)
{
resize();
}
int index=value.city % data.length;
boolean inserted = false;
int increment = 1;
do
{
if (data[index] == null) {
data[index]=value;
noofitems++;
inserted = true;
}
else {
index+= (increment<<4);
index = index % data.length;
}
} while (!inserted);
}
public Vertex GetValue(int key)
{
int index=key % data.length;
boolean found = false;
int increment = 1;
do
{
if (data[index].city == key)
return data[index];
else if (data[index] == null)
return null;
else {
index+= (increment<<4);
index = index % data.length;
}
} while (!found);
return null;
}
private void resize()
{
//long time = System.nanoTime();
Vertex[] newData = new Vertex[data.length * resize];
resize = 11;
for (Vertex oldMap1 : data) {
if (oldMap1 != null) {
int index = oldMap1.city % newData.length;
boolean inserted = false;
int increment = 1;
while (!inserted) {
if (newData[index] == null) {
newData[index] = oldMap1;
inserted = true;
} else {
index+= (increment<<4);
index = index % newData.length;
}
}
}
}
//System.out.println("Resizing hashmap took: " + ((System.nanoTime() - time)/1000000) + "ms");
data = newData;
}
public double getNumberOfItems()
{
return noofitems;
}
}
In my application, I call this method 215,000 times which takes (according to my timings) around 300ms.
public void AddVertex(final int vertexid, final int x, final int y)
{
allVertices.AddItem(new Vertex(vertexid, x, y));
}
Edge Hash Map
public class EdgeHashMap
{
private double noofitems, noitems;
public EdgeHashPair[] data;
int multiplier = 60;
public EdgeHashMap(int initlen)
{
noofitems=0;
noitems = 0;
data=new EdgeHashPair[initlen];
}
public void AddItem(Edge value)
{
if ((noofitems/data.length) > 0.7)
resize();
value.hashIndex=HashFunction(value.label);
int index = value.hashIndex % data.length;
int increment = 0;
boolean inserted = false;
do {
if (data[index] != null)
if (data[index].data.GetItem(0).label.compareTo(value.label) == 0) {
data[index].addItem(value);
inserted = true;
noitems++;
}
else {
increment+=1;
index+= (increment<<4);
index = index % data.length;
}
else {
data[index] = new EdgeHashPair();
data[index].addItem(value);
noofitems++;
inserted = true;
}
} while (!inserted);
}
private int HashFunction(String key)
{
//long time = System.nanoTime();
char[] t = key.toCharArray();
int a = 0;
for (int i=0; i<t.length; ++i)
a = ((a<<5) - a) + t[i];
//System.out.println("Hash Function took: " + ((System.nanoTime() - time)) + "ns");
return Math.abs(a);
}
public EdgeVector GetValue(String key)
{
int index=HashFunction(key);
int increment = 0;
index = index % data.length;
boolean found = false;
do
{
if (data[index] == null)
return null;
else if (data[index].data.GetItem(0).label.compareTo(key) == 0)
return data[index].data;
else {
increment+=1;
index+= (increment<<4);
index = index % data.length;
}
} while (!found);
return null;
}
private void resize()
{
//long time = System.nanoTime();
EdgeHashPair[] newMap = new EdgeHashPair[data.length * multiplier];
multiplier = 8;
for (EdgeHashPair oldMap1 : data)
{
if (oldMap1 != null) {
int index = oldMap1.data.GetItem(0).hashIndex;
int increment = 0;
index = index % newMap.length;
boolean inserted = false;
do {
if (newMap[index] == null) {
newMap[index] = oldMap1;
inserted = true;
}
else {
increment+=1;
index+= (increment<<4);
index = index % newMap.length;
}
} while (!inserted);
}
}
data = newMap;
//System.out.println("Edge Hashmap resizing took: " + ((System.nanoTime() - time)/1000000) +"ms");
}
In the application, I call this method around 260,000 times, however I only store around 110,000 Edge Vectors, as they each contain multiple Edges.
public void AddEdge(final String label, final int fromid, final int toid, final double distance, final boolean oneway, final int speedlimit)
{
currentRoad = new Edge(label, fromid, toid, distance, speedlimit, oneway);
allVertices.GetValue(fromid).addEdge(currentRoad);
allEdges.AddItem(currentRoad);
if (!oneway)
{
allVertices.GetValue(toid).addEdge(new Edge(label, toid, fromid, distance, speedlimit, oneway));
}
}
It takes roughly 700-1000ms to execute - I was expecting much less and I can't seem to make it quicker (it originally started at over 1000ms).
EdgeHashPair
public class EdgeHashPair
{
EdgeVector data;
public EdgeHashPair()
{
data = new EdgeVector(50);
}
public EdgeHashPair(int length)
{
data = new EdgeVector(length+10);
}
public void addItem(Edge item)
{
data.AddItem(item);
}
}
The EdgeVector and VertexVector are basic implementations of Vectors. They have methods such as AddItem, GetItem, DeleteItem etc.
If anyone could take a look and suggest some improvements that would be fantastic.
1 Answer 1
Naming Conventions
public double noofitems;
When I first read this, I wondered what a noof
was. Then I realized that you were using No. as an abbreviation for number. Why not write it out? It's a few more letters but much clearer.
Elsewhere you use StudlyCaps for class names, which conforms to the Java standard. The accompanying standard for variable and field names would be camelCase, so numberOfItems
.
I might have called this count
, which is shorter and more consistent with other classes.
this.resize = 67;
As a general rule, methods are verbs while classes, objects, fields, and variables are nouns. Resize is a verb, so it doesn't fit this. I might go with resizeMultiplier
.
Hashing collision resolution
Your collision resolution tends to produce chains. Because it produces the same increment regardless of whether this is the first match or the thirtieth, any collision into a chain has to iterate through all remaining elements in the chain.
Consider using an increasing increment. That way collisions only repeat if they start from the same point. Chains don't merge together.
Write what you mean
boolean inserted = false; int increment = 1; do { if (data[index] == null) { data[index]=value; noofitems++; inserted = true; } else { index+= (increment<<4); index = index % data.length; } } while (!inserted);
What you are doing here is iterating through the structure until you find an empty slot. Then you update that slot. So just do that:
int increment = 16;
while (data[index] != null) {
{
index += increment;
index %= data.length;
increment += 2;
}
data[index] = value;
numberOfItems++;
This removes a redundant branching operation and an unnecessary variable. I suspect that it would be slightly faster, although I haven't tested it.
Set an initial size
Note that a resize
operation reinserts all the existing data. You can avoid this by making the initial data structure the size that it needs to be. This will be much more effective than any kind of efficiency adjustments to the operation.
-
\$\begingroup\$ Thank you that was very informative! I've changed my variable names and also edited my methods with the code you suggested; doing that across both my hashmaps has improved performance by around 100ms! \$\endgroup\$Craig– Craig2015年04月30日 14:53:59 +00:00Commented Apr 30, 2015 at 14:53