I wrote a script that uses an uploaded excel file to import data to the database. It checks for the image first if it can be downloaded without any error then insert the rest of the data. With ten or hundred thousands of rows this takes hours. Is it possible to make it faster somehow?
<?php
namespace App\Services\Import;
use App\Models\Tags;
use App\Models\Videos;
use GuzzleHttp\Client;
use App\Models\Stars;
use App\Models\Categories;
use Illuminate\Support\Str;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;
use Maatwebsite\Excel\Facades\Excel;
use Illuminate\Support\Facades\Storage;
use Spatie\MediaLibrary\MediaCollections\Models\Media;
use Intervention\Image\ImageManager;
use Illuminate\Support\Facades\File;
class DataImport
{
public function checkURL($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSLVERSION,'all');
curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$ssl = curl_getinfo($ch, CURLINFO_SSL_VERIFYRESULT);
curl_close($ch);
if($http_code >= 400 && $http_code < 600 || $ssl === 1) {
return false;
}
try {
$stream = stream_context_create (array("ssl" => array("capture_peer_cert" => true)));
if(!empty($url)){
$read = @fopen($url, "rb", false, $stream);
if ($read !== false) {
$cont = @stream_context_get_params($read);
$ssl = ($cont["options"]["ssl"]);
fclose($read);
return (isset($ssl['peer_certificate']) && !is_null($ssl['peer_certificate'])) ? true : false;
} else {
return false;
}
}
}catch(ErrorException $e){
error_log($e->getMessage());
return false;
}
}
public function import($path)
{
$tempDirectoryName = 'temp';
$data = Excel::toArray([], $path);
$dataChunks = array_chunk($data[0], 25);
foreach ($dataChunks as $chunk) {
DB::beginTransaction();
foreach ($chunk as $row) {
if (!empty($row[3])) {
$valid = $this->checkURL($row[3]);
if ($valid === true) {
$video = Videos::with(['categories', 'stars', 'tags'])->where('url', $row[0])->first();
if (!$video) {
$slug = Str::slug($row[1], '-');
$i = 1;
while (Videos::where('slug', $slug)->exists()) {
$slug = Str::slug($row[1], '-') . '-' . $i;
$i++;
}
$video = Videos::create([
'url' => $row[0],
'title' => $row[1],
'slug' => $slug,
'length' => intval($row[2]),
'code' => intval($row[7]),
'embed_url' => $row[4],
'channel' => $row[9]
]);
}
try {
$media = $video->addMediaFromUrl($row[3])->toMediaCollection('video_index_images', 'public');
} catch (\Exception $e) {
Log::error("Error importing media for video ID {$video->id}: {$e->getMessage()}");
continue;
}
if (!$media && !$media->exists()) {
Log::error("Error importing media for video ID {$video->id}: Invalid media file");
continue;
}
$title = str_replace('_', ' ', $row[8]);
$slug = strtolower(str_replace(' ', '-', $title));
$title = ucfirst(strtolower($title));
$category = Categories::firstOrCreate(['title' => $title], ['slug' => $slug]);
$video->categories()->syncWithoutDetaching([$category->id]);
$names = explode(',', $row[6]);
foreach ($names as $name) {
$name = trim($name);
$name = str_replace('_', '', $name);
$slug = strtolower(str_replace(' ', '-', $name));
$star = Stars::firstOrCreate(['name' => $name], ['slug' => $slug]);
$video->stars()->syncWithoutDetaching([$star->id]);
}
$tags = explode(',', $row[5]);
$tagIds = [];
foreach ($tags as $tagName) {
$tagName = str_replace('-', ' ', trim($tagName));
if (!empty(trim($tagName))) {
$tag = Tags::firstOrCreate(['name' => $tagName, 'slug' => Str::slug($tagName)]);
$tagIds[] = $tag->id;
}
}
$video->tags()->detach();
$video->tags()->attach($tagIds);
$tempDirectory = storage_path('media-library/temp/'.$tempDirectoryName);
if (file_exists($tempDirectory)) {
\File::deleteDirectory($tempDirectory);
}
}
}
}
DB::commit();
}
}
}
I know the code is doing a lot insede the loop but all these steps are necessary in order to work as intended.
The excel file consists of 10 columns:
- url to the video
- title of the video
- lenght in seconds
- url to an index image of the video
- embed code
- tags
- actors/actresses in the video
- unique code
- category
- name of the site where the video is from
The site will be a collection of videos, so the external media files are always needed at the time of the upload. Some of the url for the index images are missing etc., those videos wont get uploaded, thats why I first check for the images, and only then upload the rest.
The script is called in a filamentphp resource file using an action:
->actions([ Tables\Actions\Action::make('Run')
->icon('heroicon-o-play')
->requiresConfirmation()
->form([ FileUpload::make('excel')
->label('Excel')
->acceptedFileTypes(['application/vnd.openxmlformatsofficedocument.spreadsheetml.sheet'])
->preserveFilenames(),])
->action(function ($data) {
$path = 'public/' . $data['excel'];
$importScript = new DataImport();
$importScript->import($path);
Storage::delete($path);
})
])
1 Answer 1
Some quick comments:
- The function
checkURL
is called in a loop, but inside that function you close the Curl connection every time. Instead you could speed up things by reusing the Curl handle - opening a new connection adds some overhead. Plus, there is overhead in the SSL negotiation. Some background reading - This is not necessarily the biggest bottleneck in your code, but if you call the function often while fetching data from the same remote host, then you should be able to reduce run time significantly.
- Better yet, you can fetch several URLs in parallel, for example using the curl_multi_init function. This requires some refactoring of your code. You'll also need to keep in mind that the target website may apply rate limiting. Too many connections will strain the server and possibly get you banned so be gentle.
- But since you are not spoofing the user agent, it's already obvious you're a bot anyway
- If you are fetching "ten or hundred thousands of rows" this is always going to take some time, especially if the files you're downloading are somewhat large. But again, parallel processing should help.
- This code cannot easily be tested in proper live conditions, so as said above it would help if you could benchmark the different parts of your code and figure out which exactly is taking a long time.
Explore related questions
See similar questions with these tags.
->actions([ Tables\Actions\Action::make('Run') ->icon('heroicon-o-play') ->requiresConfirmation() ->form([ FileUpload::make('excel') ->label('Excel') >acceptedFileTypes(['application/vnd.openxmlformatsofficedocument.spreadsheetml.sheet']) ->preserveFilenames(),]) ->action(function ($data) { $path = 'public/' . $data['excel']; $importScript = new DataImport(); $importScript->import($path); Storage::delete($path);})])
\$\endgroup\$