Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

It's been a long time since I've written PHP, but I think I can give you a few pointers anyhow.

I believe the following line is problematic speed-wise:

if (in_array($val, $preventRepeat) || !strpos($val, '@') || !$val) {
...

The in_array function does a sequential case sensitive lookup across all currently stored addresses. This call will become slower and slower the more addresses you store.

The solution is to use hash table like lookup. I'm not entirely sure but I believe the PHP equivalent can be achieved by using the keys of the array instead of the values.

// Store a certain address.
$addressList[$val] = true;

Checking whether a value is present for a given key indicates whether it has already been stored. Notice how $preventRepeat can be removed and everything is stored in $addressList. This removes the need of the array_merge, again resulting in more performance.

Beware, this is speculation, so I'm hoping someone who is certain can verify this. :)

Relating to my earlier comment:

You can probably make the script already twice as fast by utilizing both cores. Separate the processing across two processes, or use two threads. I find it strange that it indicates CPU usage of 100% at the moment. Shouldn't it just run on one core?

PHP doesn't seem to support multithreading PHP doesn't seem to support multithreading, so the only option would be to logically split the script to separate work into two different executions. If my previous comments don't improve the speed much, it's probably advisable to use a different language than PHP for these purposes.

It's been a long time since I've written PHP, but I think I can give you a few pointers anyhow.

I believe the following line is problematic speed-wise:

if (in_array($val, $preventRepeat) || !strpos($val, '@') || !$val) {
...

The in_array function does a sequential case sensitive lookup across all currently stored addresses. This call will become slower and slower the more addresses you store.

The solution is to use hash table like lookup. I'm not entirely sure but I believe the PHP equivalent can be achieved by using the keys of the array instead of the values.

// Store a certain address.
$addressList[$val] = true;

Checking whether a value is present for a given key indicates whether it has already been stored. Notice how $preventRepeat can be removed and everything is stored in $addressList. This removes the need of the array_merge, again resulting in more performance.

Beware, this is speculation, so I'm hoping someone who is certain can verify this. :)

Relating to my earlier comment:

You can probably make the script already twice as fast by utilizing both cores. Separate the processing across two processes, or use two threads. I find it strange that it indicates CPU usage of 100% at the moment. Shouldn't it just run on one core?

PHP doesn't seem to support multithreading, so the only option would be to logically split the script to separate work into two different executions. If my previous comments don't improve the speed much, it's probably advisable to use a different language than PHP for these purposes.

It's been a long time since I've written PHP, but I think I can give you a few pointers anyhow.

I believe the following line is problematic speed-wise:

if (in_array($val, $preventRepeat) || !strpos($val, '@') || !$val) {
...

The in_array function does a sequential case sensitive lookup across all currently stored addresses. This call will become slower and slower the more addresses you store.

The solution is to use hash table like lookup. I'm not entirely sure but I believe the PHP equivalent can be achieved by using the keys of the array instead of the values.

// Store a certain address.
$addressList[$val] = true;

Checking whether a value is present for a given key indicates whether it has already been stored. Notice how $preventRepeat can be removed and everything is stored in $addressList. This removes the need of the array_merge, again resulting in more performance.

Beware, this is speculation, so I'm hoping someone who is certain can verify this. :)

Relating to my earlier comment:

You can probably make the script already twice as fast by utilizing both cores. Separate the processing across two processes, or use two threads. I find it strange that it indicates CPU usage of 100% at the moment. Shouldn't it just run on one core?

PHP doesn't seem to support multithreading, so the only option would be to logically split the script to separate work into two different executions. If my previous comments don't improve the speed much, it's probably advisable to use a different language than PHP for these purposes.

Source Link
Steven Jeuris
  • 2.7k
  • 3
  • 21
  • 34

It's been a long time since I've written PHP, but I think I can give you a few pointers anyhow.

I believe the following line is problematic speed-wise:

if (in_array($val, $preventRepeat) || !strpos($val, '@') || !$val) {
...

The in_array function does a sequential case sensitive lookup across all currently stored addresses. This call will become slower and slower the more addresses you store.

The solution is to use hash table like lookup. I'm not entirely sure but I believe the PHP equivalent can be achieved by using the keys of the array instead of the values.

// Store a certain address.
$addressList[$val] = true;

Checking whether a value is present for a given key indicates whether it has already been stored. Notice how $preventRepeat can be removed and everything is stored in $addressList. This removes the need of the array_merge, again resulting in more performance.

Beware, this is speculation, so I'm hoping someone who is certain can verify this. :)

Relating to my earlier comment:

You can probably make the script already twice as fast by utilizing both cores. Separate the processing across two processes, or use two threads. I find it strange that it indicates CPU usage of 100% at the moment. Shouldn't it just run on one core?

PHP doesn't seem to support multithreading, so the only option would be to logically split the script to separate work into two different executions. If my previous comments don't improve the speed much, it's probably advisable to use a different language than PHP for these purposes.

lang-php

AltStyle によって変換されたページ (->オリジナル) /