1

I can't understand the following and I'm hoping someone can shed some light on it for me:

In C++ if I create a vector of test data containing 2M different bits of text (testdata) then create a map using these strings as the index values, then look up all the values, like this:

 //Create test data
for(int f=0; f<loopvalue; f++)
{ 
 stringstream convertToString;
 convertToString << f;
 string strf = convertToString.str();
 testdata[f] = "test" + strf;
}
 time_t startTimeSeconds = time(NULL);
 for(int f=0; f<2000000; f++) testmap[ testdata[f] ] = f; //Write to map
 for(int f=0; f<2000000; f++) result = testmap[ testdata[f] ]; //Lookup
 time_t endTimeSeconds = time(NULL);
 cout << "Time taken " << endTimeSeconds - startTimeSeconds << "seconds." << endl;

It takes 10 seconds.

If I do seemingly at least the same in PHP:

<?php
$starttime = time();
$loopvalue = 2000000;
//fill array
for($f=0; $f<$loopvalue; $f++)
{
 $filler = "test" . $f;
 $testarray[$filler] = $f;
}
//look up array
for($f=0; $f<$loopvalue; $f++)
{
 $filler = "test" . $f;
 $result = $testarray[$filler];
}
$endtime = time();
echo "Time taken ".($endtime-$starttime)." seconds.";
?>

...it takes only 3 seconds.

Given that PHP is written in C does anyone know how PHP achieves this much faster text index lookup?

Thanks C

asked Nov 9, 2011 at 11:18
5
  • 2
    Switch the string-keyed map to std::unordered_map<std::string, int> and be amazed about the difference. You can do the same for the other map, coming to think of it. If you don't have that class, consider std::tr1::unordered_map from <tr1/unordered_map>. Commented Nov 9, 2011 at 11:20
  • 3
    What compiler settings did you use for the C++ code? Especially which compiler and what level of optimisations? Commented Nov 9, 2011 at 11:24
  • Is result = testmap[ testdata[f] ] = f a typo? Commented Nov 9, 2011 at 11:35
  • Sorry about typo...corrected. Commented Nov 9, 2011 at 11:38
  • I've moved the timing around only the lookup and used unordered_map and sure enough it is much faster, the same speed between PHP and C++. I was expecting C++ to be alot faster. I'm using g++, I'm just looking into the compiler optimisations now. Commented Nov 9, 2011 at 12:04

3 Answers 3

5

Your loops are not absolutely equivalent algorithms. Note that in the C++ version you have

  1. testmap[ testdata[f] ] - this is actually a lookup + insert
  2. testmap[ testdata[f] ] - 2 lookups

In the PHP versions you just have insert in the first loop and lookup in the second one.

PHP is interpreted - generally if you code is faster in PHP, check the code first ! ;-)

answered Nov 9, 2011 at 11:26

Comments

2

I suspect you benchmark the wrong things. Anyway, I used your code (had to make some assumptions on your data types) and here are the results from my machine:

PHP: Time taken 2 seconds.

C++ (using std::map): Time taken 3 seconds.

C++ (using std::tr1::unordered_map): Time taken 1 seconds.

C++ compiled with

g++ -03

Here is my test C++ code:

#include <map>
#include <sstream>
#include <string>
#include <vector>
#include <iostream>
#include <tr1/unordered_map>
int main(){
 const int loopvalue=2000000;
 std::vector<std::string> testdata(loopvalue);
 std::tr1::unordered_map<std::string, int> testmap;
 std::string result;
 for(int f=0; f<loopvalue; f++)
 { 
 std::stringstream convertToString;
 convertToString << f;
 std::string strf = convertToString.str();
 testdata[f] = "test" + strf;
 }
 time_t startTimeSeconds = time(NULL);
 for(int f=0; f<loopvalue; f++) testmap[ testdata[f] ] = f; //Write to map
 for(int f=0; f<loopvalue; f++) result = testmap[ testdata[f] ]; //Lookup
 time_t endTimeSeconds = time(NULL);
 std::cout << "Time taken " << endTimeSeconds - startTimeSeconds << "seconds." << std::endl;
}

Conclusion:

You tested unoptimized C++ code, probably even compiled with VC++, which by default has a bounds check in std::vector::operator[] when compiled in debug mode.

There still is a difference of PHP to the optimised C++ code, when we use std::map, because of the difference in lookup complexity (see n0rd's answer), but C++ is faster when you use a Hashmap.

answered Nov 9, 2011 at 12:15

1 Comment

Of course for benchmarks, one should use a timing function with a lot finer granularity, among other things.
0

According to another question, associative arrays in PHP are implemented as hash tables, which have search complexity of O(1) on average, while std::map in C++ is a binary tree with search complexity of O(log n), which is slower.

answered Nov 9, 2011 at 11:54

2 Comments

Arguably, it's not the theoretical complexity difference between hash table and binary search tree that matters here, but the fact that string comparison is very slow. The hashing makes that aspect a lot more efficient. If you tried the same with maps keyed on integers, I doubt the difference would be very large.
@Kerrek SB You are right, but you can shave of some time of the string comparison by putting the number first.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.