As part of a larger Java application I'm working on, I have to retrieve emails and parse the data for the emails' content (subject, date, text, attachments, sender). In the method below, I pass a Message
as a parameter, which is the Javamail abstract representation of an email. The details of the Message
are recovered and then returned as a MailList
. A MailList
is just a holder for the five objects that describe each email.
public MailList getContent(Message message) throws MessagingException, IOException
{
String body = "";
String from = "";
ArrayList<MimeBodyPart> attachments = new ArrayList<MimeBodyPart>();
String contentType = message.getContentType();
Address[] addresses = message.getFrom();
if(addresses.length == 1)
from = addresses[0].toString();
else
{
for(int num = 0; num < addresses.length - 1; num++)
from += addresses[num].toString() + ", ";
from += addresses[addresses.length].toString();
}
if(contentType.contains("TEXT/PLAIN"))
{
Object content = message.getContent();
if(content != null)
body += content.toString();
}
else if(contentType.contains("TEXT/HTML"))
{
Object content = message.getContent();
if(content != null)
body += Jsoup.parse((String)content).text();
}
else if(contentType.contains("multipart"))
{
Multipart mp = (Multipart)message.getContent();
int numParts = mp.getCount();
for(int count = 0; count < numParts; count++)
{
MimeBodyPart part = (MimeBodyPart)mp.getBodyPart(count);
String content = part.getContent().toString();
if(MimeBodyPart.ATTACHMENT.equalsIgnoreCase(part.getDisposition()))
attachments.add(part);
else if(part.getContentType().contains("TEXT/HTML"))
body += Jsoup.parse(content).text();
else
body += content;
}
}
return new MailList(from, message.getSubject(), body,
message.getSentDate().toString(), attachments);
}
This code works exactly as intended, but is very slow. Each method run can range from 0.5s to 1.5s depending on the content of the email. This means that parsing 10 emails might take up to 15s. Up to around 100 emails might be expected to be parsed at any given time, and 2 to 3 min is too long for such a process. Not sure if this could be improved to be more efficient but any advice is greatly appreciated.
3 Answers 3
if(addresses.length == 1) from = addresses[0].toString(); else { for(int num = 0; num < addresses.length - 1; num++) from += addresses[num].toString() + ", "; from += addresses[addresses.length].toString(); }
Concatenations of strings in a loop should be done using a StringBuilder
like so
StringBuilder builder = new StringBuilder(1024);
for(int i = 0; i < addresses.length ; i++)
{
builder.append(addresses[i].toString())
.append(", ");
}
int builderLength = builder.length();
if(builderLength > 2)
{
// removing the last appended ", "
builder.setLength(builderLength - 2);
}
String from = builder.toString();
if(contentType.contains("TEXT/PLAIN")) { Object content = message.getContent(); if(content != null) body += content.toString(); } else if(contentType.contains("TEXT/HTML")) { Object content = message.getContent(); if(content != null) body += Jsoup.parse((String)content).text(); }
here it would be sufficient to just assign the content.toString()
or Jsoup.parse((String)content).text()
to the body
variable. There is no need to use +=
.
else if(contentType.contains("multipart")) { Multipart mp = (Multipart)message.getContent(); int numParts = mp.getCount(); for(int count = 0; count < numParts; count++) { MimeBodyPart part = (MimeBodyPart)mp.getBodyPart(count); String content = part.getContent().toString(); if(MimeBodyPart.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) attachments.add(part); else if(part.getContentType().contains("TEXT/HTML")) body += Jsoup.parse(content).text(); else body += content; } }
As mentioned above if you are using a loop for concatenations of strings this should be done using a StringBuilder
.
You will only need the String content
if the execution steps into the else if
or else
part and because one should declare variables as near to its usage, you could use continue;
after the attachment is added like so
else if(contentType.contains("multipart"))
{
StringBuilder bodyBuilder = new StringBuilder(1024);
Multipart mp = (Multipart)message.getContent();
int numParts = mp.getCount();
for(int count = 0; count < numParts; count++)
{
MimeBodyPart part = (MimeBodyPart)mp.getBodyPart(count);
if(MimeBodyPart.ATTACHMENT.equalsIgnoreCase(part.getDisposition()))
{
attachments.add(part);
continue;
}
String content = part.getContent().toString();
if(part.getContentType().contains("TEXT/HTML"))
{
bodyBuilder.append(Jsoup.parse(content).text());
} else {
bodyBuilder.append(content);
}
}
body = bodyBuilder.toString();
}
-
\$\begingroup\$ Good suggestion with
StringBuilder
. All this shaved off about 200ms per email. Still slow but better than before. Thanks! \$\endgroup\$connorp– connorp2015年08月18日 06:55:34 +00:00Commented Aug 18, 2015 at 6:55
if(addresses.length == 1) from = addresses[0].toString(); else {
Don't omit the brackets around your if
statements, funky things can happen later on if you mess those up. It's best to get in the habit of wrapping everything carefully.
if (adresses.length == 1) {
from = adresses[0].toString();
} else {
I would suggest replacing the following for
loop with an array joiner.
for(int num = 0; num < addresses.length - 1; num++)
from += addresses[num].toString() + ", ";
from += addresses[addresses.length].toString();
You could try StringUtils.join
or Collectors.joining
(Java 8)
In both .contains("TEXT/HTML")
and .contains("TEXT/PLAIN")
the executions are very similar, so, you could consider combining them, and splitting the possible pathways at body +=
Instead of:
int numParts = mp.getCount(); for(int count = 0; count < numParts; count++)
You could declare numParts
inside the for
loop, like so:
for(int count = 0, numParts = mp.getCount(); count < numParts; count++)
-
1\$\begingroup\$ @connorp not that you can do it in java, but looking at
goto fail
bug of apple will help why it is a good practice not to omit parentheses \$\endgroup\$bunyaCloven– bunyaCloven2015年08月18日 08:13:23 +00:00Commented Aug 18, 2015 at 8:13
I would suggest you use email-mime-parser,
Following sample code gives you all the relevant info you need:
ContentHandler contentHandler = new CustomContentHandler();
MimeConfig mime4jParserConfig = new MimeConfig();
BodyDescriptorBuilder bodyDescriptorBuilder = new DefaultBodyDescriptorBuilder();
MimeStreamParser mime4jParser = new MimeStreamParser(mime4jParserConfig,DecodeMonitor.SILENT,bodyDescriptorBuilder);
mime4jParser.setContentDecoding(true);
mime4jParser.setContentHandler(contentHandler);
InputStream mailIn = 'Provide email mime stream here';
mime4jParser.parse(mailIn);
Email email = ((CustomContentHandler) contentHandler).getEmail();
List<Attachment> attachments = email.getAttachments();
Attachment calendar = email.getCalendarBody();
Attachment htmlBody = email.getHTMLEmailBody();
Attachment plainText = email.getPlainTextEmailBody();
String to = email.getToEmailHeaderValue();
String cc = email.getCCEmailHeaderValue();
String from = email.getFromEmailHeaderValue();
Explore related questions
See similar questions with these tags.