How can I improve the performance of this concurrent http Concurrent HTTP request loop?
I'm using the rolling curl library to fire httpHTTP requests for content-length headers of images (we need to know their size to weed out placeholders and low res images). The image urlsURLs are stored in a database so I need to loop over the data in our products table (approx 1 million rows but will grow bigger, potentially much bigger).
I'm using PHP and the laravelLaravel framework (the artisan cliCLI component). The operation seems to slow down as time progresses e.g. it starts processing 100 requests in less than a second and later the time to process 100 rows/requests is logged at over 20 seconds. Can anyone explain this and / or offer any performance improvement suggestions? The task is running on an Amazon EC2 micro instance so processing power / memory is limited.
done in...3.8803641796112
done in...7.4326379299164
done in...8.1860301494598
done in...8.5088090896606
done in...10.606615781784
done in...10.655412912369
done in...10.804574966431
done in...14.004528045654
done in...10.903785943985
done in...11.905344009399
done in...13.763195991516
done in...14.723680019379
done in...15.823812961578
done in...17.972007989883
done in...31.734715938568
done in...20.509822845459
done in...22.924754858017
done in...34.274693012238
done in...39.217702865601
done in...29.883662939072
done in...24.094554901123
done in...25.726534128189
done in...31.788655996323
done in...24.713880062103
done in...25.855134963989
done in...23.161122083664
done in...32.380167007446
done in...36.53077507019
done in...31.859884023666
done in...71.458341121674
done in...3.8803641796112 done in...7.4326379299164 done in...8.1860301494598 done in...8.5088090896606 done in...10.606615781784 done in...10.655412912369 done in...10.804574966431 done in...14.004528045654 done in...10.903785943985 done in...11.905344009399 done in...13.763195991516 done in...14.723680019379 done in...15.823812961578 done in...17.972007989883 done in...31.734715938568 done in...20.509822845459 done in...22.924754858017 done in...34.274693012238 done in...39.217702865601 done in...29.883662939072 done in...24.094554901123 done in...25.726534128189 done in...31.788655996323 done in...24.713880062103 done in...25.855134963989 done in...23.161122083664 done in...32.380167007446 done in...36.53077507019 done in...31.859884023666 done in...71.458341121674
How can I improve the performance of this concurrent http request loop?
I'm using the rolling curl library to fire http requests for content-length headers of images (we need to know their size to weed out placeholders and low res images). The image urls are stored in a database so I need to loop over the data in our products table (approx 1 million rows but will grow bigger, potentially much bigger)
I'm using PHP and the laravel framework (the artisan cli component). The operation seems to slow down as time progresses e.g. it starts processing 100 requests in less than a second and later the time to process 100 rows/requests is logged at over 20 seconds. Can anyone explain this and / or offer any performance improvement suggestions? The task is running on an Amazon EC2 micro instance so processing power / memory is limited.
done in...3.8803641796112
done in...7.4326379299164
done in...8.1860301494598
done in...8.5088090896606
done in...10.606615781784
done in...10.655412912369
done in...10.804574966431
done in...14.004528045654
done in...10.903785943985
done in...11.905344009399
done in...13.763195991516
done in...14.723680019379
done in...15.823812961578
done in...17.972007989883
done in...31.734715938568
done in...20.509822845459
done in...22.924754858017
done in...34.274693012238
done in...39.217702865601
done in...29.883662939072
done in...24.094554901123
done in...25.726534128189
done in...31.788655996323
done in...24.713880062103
done in...25.855134963989
done in...23.161122083664
done in...32.380167007446
done in...36.53077507019
done in...31.859884023666
done in...71.458341121674
Concurrent HTTP request loop
I'm using the rolling curl library to fire HTTP requests for content-length headers of images (we need to know their size to weed out placeholders and low res images). The image URLs are stored in a database so I need to loop over the data in our products table (approx 1 million rows but will grow bigger, potentially much bigger).
I'm using PHP and the Laravel framework (the artisan CLI component). The operation seems to slow down as time progresses e.g. it starts processing 100 requests in less than a second and later the time to process 100 rows/requests is logged at over 20 seconds. Can anyone explain this and / or offer any performance improvement suggestions? The task is running on an Amazon EC2 micro instance so processing power / memory is limited.
done in...3.8803641796112 done in...7.4326379299164 done in...8.1860301494598 done in...8.5088090896606 done in...10.606615781784 done in...10.655412912369 done in...10.804574966431 done in...14.004528045654 done in...10.903785943985 done in...11.905344009399 done in...13.763195991516 done in...14.723680019379 done in...15.823812961578 done in...17.972007989883 done in...31.734715938568 done in...20.509822845459 done in...22.924754858017 done in...34.274693012238 done in...39.217702865601 done in...29.883662939072 done in...24.094554901123 done in...25.726534128189 done in...31.788655996323 done in...24.713880062103 done in...25.855134963989 done in...23.161122083664 done in...32.380167007446 done in...36.53077507019 done in...31.859884023666 done in...71.458341121674
public function fire()
{
$dt = new DateTime();
Log::info("started: ".$dt->format('Y-m-d H:i:s'));
$counter = 1;
ItemItem2::chunkwhere('img_size', '=', NULL)->chunk(1000, function($items) use ( &$counter)
{
$results = array();
$filePath = storage_path().'/imports/img_sizes_'new/new_img_sizes_'.$counter.'.csv';
if (!File::exists($filePath)) {
File::put($filePath, '');
}
$start = microtime(true);
$rollingCurl = new \RollingCurl\RollingCurl();
$rollingCurl->setOptions($this->curlOptions);
foreach ($items as $item){
if ($item->bigimg>img !== '') {
$results[$item->id] = array('url' => $item->bigimg>img, 'size' => null);
$rollingCurl->get($item->bigimg>img);
}
}
//callback runs on each curl request
$rollingCurl->setCallback(function(\RollingCurl\Request $request, \RollingCurl\RollingCurl $rollingCurl) use (&$results, $filePath) {
$responseInfo = $request->getResponseInfo();
//var_dump($responseInfo);exit;
$length = $responseInfo['download_content_length'];
foreach ($results as $key => $value) {
if (array_search($request->getURL(),$value)) {
$idKey = $key;
break;
$results[$idKey]['size'] = $length;
}
File::append($filePath,$idKey.','.$results[$idKey]['size']."\r\n");
}
$results[$idKey]['size'] = $length;break;
File::append($filePath,$idKey.','.$length."\r\n"); }
}
})
->setSimultaneousLimit(10)
->execute();
$counter++;
echo 'done in...'.(microtime(true) - $start).PHP_EOL;
Log::info('1000 records: '.(microtime(true) - $start));
Log::info('Last url was: '.json_encode(end($results)));
exit;
}); // end item chunk
}
public function fire()
{
$dt = new DateTime();
Log::info("started: ".$dt->format('Y-m-d H:i:s'));
$counter = 1;
Item::chunk(1000, function($items) use ( &$counter)
{
$results = array();
$filePath = storage_path().'/imports/img_sizes_'.$counter.'.csv';
if (!File::exists($filePath)) {
File::put($filePath, '');
}
$start = microtime(true);
$rollingCurl = new \RollingCurl\RollingCurl();
$rollingCurl->setOptions($this->curlOptions);
foreach ($items as $item){
if ($item->bigimg !== '') {
$results[$item->id] = array('url' => $item->bigimg, 'size' => null);
$rollingCurl->get($item->bigimg);
}
}
$rollingCurl->setCallback(function(\RollingCurl\Request $request, \RollingCurl\RollingCurl $rollingCurl) use (&$results, $filePath) {
$responseInfo = $request->getResponseInfo();
$length = $responseInfo['download_content_length'];
foreach ($results as $key => $value) {
if (array_search($request->getURL(),$value)) {
$idKey = $key;
break;
}
}
$results[$idKey]['size'] = $length;
File::append($filePath,$idKey.','.$length."\r\n");
})
->setSimultaneousLimit(10)
->execute();
$counter++;
echo 'done in...'.(microtime(true) - $start).PHP_EOL;
Log::info('1000 records: '.(microtime(true) - $start));
Log::info('Last url was: '.json_encode(end($results)));
}); // end item chunk
}
public function fire()
{
$dt = new DateTime();
Log::info("started: ".$dt->format('Y-m-d H:i:s'));
$counter = 1;
Item2::where('img_size', '=', NULL)->chunk(1000, function($items) use ( &$counter)
{
$results = array();
$filePath = storage_path().'/imports/new/new_img_sizes_'.$counter.'.csv';
if (!File::exists($filePath)) {
File::put($filePath, '');
}
$start = microtime(true);
$rollingCurl = new \RollingCurl\RollingCurl();
$rollingCurl->setOptions($this->curlOptions);
foreach ($items as $item){
if ($item->img !== '') {
$results[$item->id] = array('url' => $item->img, 'size' => null);
$rollingCurl->get($item->img);
}
}
//callback runs on each curl request
$rollingCurl->setCallback(function(\RollingCurl\Request $request, \RollingCurl\RollingCurl $rollingCurl) use (&$results, $filePath) {
$responseInfo = $request->getResponseInfo();
//var_dump($responseInfo);exit;
$length = $responseInfo['download_content_length'];
foreach ($results as $key => $value) {
if (array_search($request->getURL(),$value)) {
$idKey = $key;
$results[$idKey]['size'] = $length;
File::append($filePath,$idKey.','.$results[$idKey]['size']."\r\n");
break;
}
}
})
->setSimultaneousLimit(10)
->execute();
$counter++;
echo 'done in...'.(microtime(true) - $start).PHP_EOL;
Log::info('1000 records: '.(microtime(true) - $start));
Log::info('Last url was: '.json_encode(end($results)));
exit;
}); // end item chunk
Some benchmarks:
done in...3.8803641796112
done in...7.4326379299164
done in...8.1860301494598
done in...8.5088090896606
done in...10.606615781784
done in...10.655412912369
done in...10.804574966431
done in...14.004528045654
done in...10.903785943985
done in...11.905344009399
done in...13.763195991516
done in...14.723680019379
done in...15.823812961578
done in...17.972007989883
done in...31.734715938568
done in...20.509822845459
done in...22.924754858017
done in...34.274693012238
done in...39.217702865601
done in...29.883662939072
done in...24.094554901123
done in...25.726534128189
done in...31.788655996323
done in...24.713880062103
done in...25.855134963989
done in...23.161122083664
done in...32.380167007446
done in...36.53077507019
done in...31.859884023666
done in...71.458341121674
Some benchmarks:
done in...3.8803641796112
done in...7.4326379299164
done in...8.1860301494598
done in...8.5088090896606
done in...10.606615781784
done in...10.655412912369
done in...10.804574966431
done in...14.004528045654
done in...10.903785943985
done in...11.905344009399
done in...13.763195991516
done in...14.723680019379
done in...15.823812961578
done in...17.972007989883
done in...31.734715938568
done in...20.509822845459
done in...22.924754858017
done in...34.274693012238
done in...39.217702865601
done in...29.883662939072
done in...24.094554901123
done in...25.726534128189
done in...31.788655996323
done in...24.713880062103
done in...25.855134963989
done in...23.161122083664
done in...32.380167007446
done in...36.53077507019
done in...31.859884023666
done in...71.458341121674