4
\$\begingroup\$

I have the following code which works great to do what it is meant to do. It scrapes the bodies of all e-mails in the inbox at an e-mail address using IMAP and retrieves the unique code sent inside the body and stores it in a database with the amount paid. I'm hoping to make it so when a user purchases something they can send an Interac e-transfer and then enter the code that both of us receive via e-mail in the website and it will apply the credit of the e-transfer to their account/purchase.

However, after setting it up on a cron job to cycle every few minutes so the content in the database stays fresh it eventually exceeds the bandwidth for the account within a day or so (not positive on how long it took but it didn't take long). Now, like I said the code works it's just very resource intensive apparently.

I do not pipe the script because the e-mail account has already been set up a while ago and we do use the account for other e-transfers which I review manually for other transactions.

  1. Is there any way to clean it up so it works the same but uses less resources/bandwidth?
  2. Would it be better to run it when a user enters their payment code? Depending on the number of users running it this could also lead to bandwidth troubles.
  3. Are there any improvements or what ideas do you have?
 define("MAX_EMAIL_COUNT", $_POST['maxcount']);
 
 /* took from https://gist.github.com/agarzon/3123118 */
 function extractEmail($content) {
 $regexp = '/([a-z0-9_\.\-])+\@(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i';
 preg_match_all($regexp, $content, $m);
 return isset($m[0]) ? $m[0] : array ();
 }
 
 function getAddressText(&$emailList, &$nameList, $addressObject) { 
 $emailList = '';
 $nameList = '';
 foreach ($addressObject as $object) {
 $emailList .= ';';
 if (isset($object->personal)) { 
 $emailList .= $object->personal;
 } 
 $nameList .= ';';
 if (isset($object->mailbox) && isset($object->host)) { 
 $nameList .= $object->mailbox . "@" . $object->host;
 } 
 } 
 $emailList = ltrim($emailList, ';');
 $nameList = ltrim($nameList, ';');
 } 
 
 function processMessage($mbox, $messageNumber) { 
 global $db;
 // get imap_fetch header and put single lines into array
 $header = imap_rfc822_parse_headers(imap_fetchheader($mbox, $messageNumber));
 
 $timestamp = strtotime($header->Date);
 
 $fromEmailList = '';
 $fromNameList = '';
 if (isset($header->from)) { 
 getAddressText($fromEmailList, $fromNameList, $header->from); 
 }
 $toEmailList = '';
 $toNameList = '';
 if (isset($header->to)) {
 getAddressText($toEmailList, $toNameList, $header->to); 
 } 
 $body = imap_fetchbody($mbox, $messageNumber, 1);
 
 //echo "<pre>".print_r($body,true)."</pre>";
 
 /* Find Reference Number */
 //echo "<pre style='background-color: #A2A2A2; border: 1px solid black'>$body</pre>";
 $searchfor = 'Reference Number';
 
 // get the file contents, assuming the file to be readable (and exist)
 $contents = $body;
 // escape special characters in the query
 $pattern = preg_quote($searchfor, '/');
 // finalise the regular expression, matching the whole line
 $pattern = "/^.*$pattern.*\$/m";
 // search, and store all matching occurences in $matches
 
 if(preg_match_all($pattern, $contents, $matches)){
 $reference = trim(str_replace('Reference Number : ','',$matches[0][0]));
 }
 else{
 }
 /* Find Amount Paid */
 //echo "<pre style='background-color: #A2A2A2; border: 1px solid black'>$body</pre>";
 $searchfor = 'has sent you a money transfer for the amount of';
 
 // get the file contents, assuming the file to be readable (and exist)
 $contents = $body;
 // escape special characters in the query
 $pattern = preg_quote($searchfor, '/');
 // finalise the regular expression, matching the whole line
 $pattern = "/^.*$pattern.*\$/m";
 // search, and store all matching occurences in $matches
 
 if(preg_match_all($pattern, $contents, $matches)){
 $amount = trim(preg_replace("/[^0-9\.]/", "",$matches[0][0]),'.'); 
 }
 else{
 }
 
 
 $bodyEmailList = implode(';', extractEmail($body));
 // Delete all messages older than one year (31557600 seconds). Divide it by two for six months.
 if($timestamp < time()-31557600) {
 if(imap_delete($mbox,$messageNumber)) {
 /*echo "<strong>";
 print_r($messageNumber . ' , ' . date("F j, Y g:i A",$timestamp).' , ' . 'Deleted' . "\n");
 echo "</strong>";*/
 }
 }
 else {
 if(!empty($reference) && !empty($amount)) {
 if($fromNameList == "[email protected]" && $toNameList!='[email protected]') {
 $query = "SELECT * FROM `".$db->prefix."payments_etransfer` WHERE `reference_id` = '".$reference."'";
 
 $select = $db->select($query);
 if($db->num_rows($select) > 0) {
 
 }
 else {
 
 $do = $db->insert_sql("INSERT INTO `".$db->prefix."payments_etransfer` SET
 `email_id` = '".$messageNumber."',
 `timestamp` = '".$timestamp."',
 `reference_id` = '".$reference."',
 `amount` = '".$amount."',
 `sender` = '".$fromEmailList."'");
 
 if($do) {
 }
 else {
 echo "Error<br><blockquote><pre>";
 
 
 print_r($messageNumber . ',' . $timestamp. ',' . $reference . ',$' . $amount .
 ',' . $fromEmailList . ',' . $fromNameList 
 . ',' . $toEmailList . ',' . $toNameList 
 . ',' . $bodyEmailList . "\n"
 );
 echo "</pre></blockquote>";
 }
 }
 }
 }
 }
 } 
 
 // imap_timeout(IMAP_OPENTIMEOUT, 300);
 
 // Open pop mailbox
 if (!$mbox = imap_open($_POST['mailbox'], $_POST['login'], $_POST['password'])) {
 die('Cannot connect/check pop mail! Exiting');
 }
 
 if ($hdr = imap_check($mbox)) {
 $msgCount = $hdr->Nmsgs;
 } else {
 echo "Failed to get mail";
 exit;
 }
 
 /* echo "<pre>";
 echo 'emails count=' . $msgCount . "\n\n\n";
 echo "record number,from emails list,from names list,to emails list, to names list,extracted from body\n";
 */
 /* might improve performance according to
 http://www.php.net/manual/en/function.imap-headerinfo.php#98809 
 imap_headers($mbox);
 */
 
 for ($X = $msgCount; $X > 0; $X--) {
 if($X > 0) {
 processMessage($mbox, $X);
 }
 
 } 
 /*echo "</pre>";*/
 imap_expunge($mbox);
 imap_close($mbox);
 /*
mdfst13
22.4k6 gold badges34 silver badges70 bronze badges
asked Sep 11, 2021 at 2:54
\$\endgroup\$
3
  • \$\begingroup\$ I have no idea what you are actually using this for, but to scrape financial details from within an e-mail box, which you do not own, seems a recipe for disaster. It sounds like it doesn't matter that this will be highly unreliable? \$\endgroup\$ Commented Sep 12, 2021 at 11:23
  • \$\begingroup\$ @mdfst13 This is the entire code ran by cron. It circulates through all the e-mails in the inbox so yes, it reads all of them multiple times. Marks any that are 1 year old set for deletion. I could set that limit lower to a week or so. I could also do it so it deletes it right away but as I want a chance to view the actual e-mail before it removes it from the system. \$\endgroup\$ Commented Sep 13, 2021 at 2:33
  • \$\begingroup\$ @KIKOSoftware I do own the e-mail box and the e-mail box is on my server as part of my website (my domain). It is reliable and works great. It does exactly what I want except for running up the bandwidth and then shutting down the website due to that fact; which is why I posted here to see how I can improve on it. There are a few unreliable areas that could potentially cause problems but at the moment I am not worried about those. \$\endgroup\$ Commented Sep 13, 2021 at 2:38

2 Answers 2

3
\$\begingroup\$

Only read new messages

From a comment:

It circulates through all the e-mails in the inbox so yes, it reads all of them multiple times.

I think that this is the source of your bandwidth problem. Either

  1. Turn down the cron frequency (easy but least effective).
  2. Track which emails you've already read and don't read them again.
  3. Change the program to only look at emails that arrived since you last looked (most difficult but also most efficient in terms of bandwidth).

Tracking seen messages

So in this code:

 for ($X = $msgCount; $X > 0; $X--) {
 if($X > 0) {
 processMessage($mbox, $X);
 }
 }

First, it could just be

 for ($X = $msgCount; $X > 0; $X--) {
 processMessage($mbox, $X);
 }

The if doesn't do anything.

But more importantly, you want to recognize if you've seen a message previously.

 $seen = false;
 for ($X = $msgCount; !$seen && ($X > 0); $X--) {
 $seen = processMessage($mbox, $X);
 }

Then change processMessage to return false; at the end. And earlier, when you process an email, save the email identifier so that the next time you see that email, you can return true.

 $header = imap_rfc822_parse_headers(imap_fetchheader($mbox, $messageNumber));

This could be simplified to

 $header = imap_headerinfo($mbox, $messageNumber);

That will also give you access to $header->message_id, which is the thing that you can store to see if you've seen a message previously. Alternately, imap_uid also gives the message ID.

Move the message

Another simple option would be to move the message out of the inbox and into a separate folder. So messages in inbox need to be processed. Messages in the folder have already been processed. The imap_mail_move function moves messages.

answered Sep 13, 2021 at 3:42
\$\endgroup\$
2
\$\begingroup\$
  1. extractEmail() is actually generating an array of emails, so the function name should be pluralized. /[a-z0-9_]/i is more simply expressed as \w. Inside of a character class, a dot doesn't need to be escaped. If a hyphen is the first or final character in a character class or if it immediately follows a range of characters it does not need to be escaped. An @ symbol is not a special character in regex, so it doesn't need escaping. You aren't using these captured substrings, so it seems logical to omit them or make them non-capturing where appropriate. Just so you know, there will be valid emails (fringe cases) that will not be matched by your email pattern. If this is sufficient for your project scope, then I suppose you don't need to spend anymore time on it -- just be careful where else you use this pattern.

    function extractEmails(string $content): array {
     $regexp = '/[\w.-]+@(?:[a-z\d-]+\.)+[a-z\d]{2,4}+/i';
     return preg_match_all($regexp, $content, $m) ? $m[0] : [];
    }
    
  2. I find function name getAddressText() to be semantically misleading -- it is not "getting" anything, it is mutating the first two arguments which are reference variables. Rather than unconditionally wiping the data clean on the first two lines, then doing a bunch of string concatenation, then trimming the excess, I find it tidier to create arrays, then implode them. I suppose, if I am being honest, you never need to respect the data coming in via the first two parameters, so I think I'd just pass in $addressObject and return a two-element array.

    function getAddressEmailsAndLists(array $addressObjects): array { 
     $emails = [];
     $names = [];
     foreach ($addressObject as $object) {
     $emails[] = $object->personal ?? '';
     $names[] = isset($object->mailbox, $object->host)
     ? $object->mailbox . "@" . $object->host
     '';
     } 
     return [implode(';', $emails), implode(';', $names)],
    }
    

    One call of it would look like: [$fromEmailList, $fromNameList] = getAddressText($header->from);

  3. I am concerned by both occurrences of $pattern = "/^.*$pattern.*\$/m";. First, the escaping slash before $ means that you are matching a literal dollar sign -- but your comment suggests you mean to mark the end of the line. Do you mean to remove the slash or match a literal dollar sign?

  4. [^0-9\.] can be simplified to [^\d.]+ -- this will match the same list of characters, but it will make potentially longer matches and therefore fewer replacements.

  5. Your sql does not look stable or secure to me because you are not using prepared statements with placeholders. This looks like a major problem for your database query wrappers.

  6. I agree with @mdfst13's assessment of your for() loop logic.

answered Sep 13, 2021 at 5:14
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.