PyPI - Version Coverage Status PyPI - Downloads
mail-parser is a production-grade, RFC-compliant email parsing library that goes far beyond a simple wrapper for Python's email module. It transforms raw email messages into richly structured Python objects with unparalleled precision, making complex email processing accessible and reliable.
As the battle-tested foundation of SpamScope —a powerful email security and threat analysis platform—mail-parser has proven itself in demanding production environments where accuracy and security matter most.
🔒 Security-First Design: Built specifically for email security analysis and digital forensics, mail-parser excels at detecting malformed structures, hidden content, and RFC non-compliance that could indicate malicious intent.
🎯 Comprehensive Parsing: Extracts every component of an email—headers, bodies (plain text and HTML), attachments, metadata, routing information, and even subtle defects that other parsers miss.
🔍 Multi-Format Access: Every parsed element is accessible in three formats (Python object, raw string, and JSON), enabling seamless integration with any workflow or downstream system.
🛡️ Defect Detection: Identifies and categorizes RFC violations, malformed MIME boundaries, and structural anomalies that could hide malicious payloads or bypass security filters.
📧 Outlook Support: Native handling of Microsoft Outlook .msg files alongside standard email formats, making it versatile for diverse email ecosystems.
⚡ Production-Ready: Trusted by security professionals and developers worldwide, with extensive test coverage and proven reliability in high-stakes environments.
Additionally, mail-parser provides full support for parsing Outlook email formats (.msg). To enable this functionality on Debian-based systems, simply install the required system package:
apt-get install libemail-outlook-message-perl
For further details about the package, you can run:
apt-cache show libemail-outlook-message-perl
mail-parser is fully compatible with Python 3, ensuring modern performance and reliability.
mail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.
mail-parser is a labor of love and commitment to the open-source community. Thousands of developers and security professionals worldwide rely on this library for critical email processing and threat analysis. Your support directly fuels continued innovation and excellence.
Your contribution—no matter the size—makes a real difference. By supporting mail-parser, you enable us to:
- Advance Security Capabilities: Develop cutting-edge detection mechanisms for emerging email threats and attack vectors.
- Expand Format Support: Add compatibility with new email formats and standards as they evolve.
- Enhance Performance: Optimize parsing speed and memory efficiency for large-scale deployments.
- Maintain Excellence: Ensure comprehensive testing, documentation, and bug-free releases that you can trust in production.
- Foster Community: Respond to issues, review contributions, and build a thriving ecosystem around email security.
- Stay RFC-Compliant: Keep pace with evolving email standards and specifications to ensure maximum compatibility.
Every donation, whether 5ドル or 500,ドル directly funds development time and infrastructure costs. Join the community of supporters who believe in accessible, reliable, and secure email parsing for everyone.
Or contribute with Bitcoin:
Bitcoin Address: bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32
Thank you for supporting the evolution of mail-parser!
Explore mail-parser on these platforms:
mail-parser transforms raw email messages into comprehensive, RFC-compliant Python objects that faithfully mirror the structure defined by IETF email protocol standards. Each property of the parsed object directly corresponds to standard RFC headers—"From", "To", "Cc", "Bcc", "Subject", and many more—providing intuitive, Pythonic access to every email component.
The library extracts and structures every aspect of an email message:
- Multi-format Bodies: Both plain text and HTML body content, cleanly separated and accessible.
- Complete Attachments: Full metadata extraction including filename, content type, encoding, content disposition, content-ID, charset, and base64-encoded payloads.
- Routing Intelligence: Parsed "Received" headers revealing the complete email journey, including hop-by-hop analysis with timestamps, delays, server information, and envelope data.
- Advanced Diagnostics: Timestamp parsing with timezone detection, defect identification for RFC non-compliance, and structural anomaly detection.
- Custom Headers: Full support for non-standard and vendor-specific headers using intuitive underscore substitution for hyphenated names.
Every parsed element offers three distinct access patterns for maximum flexibility:
- Native Python objects: Structured, typed data ready for immediate programmatic use
(
mail.to,mail.date,mail.attachments). - Raw strings: Original, unprocessed header content preserving exact formatting
(
mail.to_raw,mail.subject_raw). - JSON serialization: Clean, standardized JSON representations for easy integration with APIs,
databases, or other tools (
mail.to_json,mail.headers_json).
This versatile architecture makes mail-parser exceptionally powerful for diverse use cases—from security analysis and forensics to email migration, compliance auditing, and automated processing pipelines.
Standard RFC Headers (directly accessible as properties):
bcc- Blind carbon copy recipientscc- Carbon copy recipientsdate- Parsed timestamp with timezone supportdelivered_to- Final delivery addressfrom_- Sender address (underscore used sincefromis a Python keyword)message_id- Unique message identifierreceived- Parsed routing chain with hop-by-hop detailsreply_to- Reply-to addresssubject- Email subject lineto- Primary recipients
Additional Parsed Components:
body- Complete message bodytext_html- HTML body parts (list)text_plain- Plain text body parts (list)headers- All headers as a structured objectattachments- Complete attachment metadata and payloadsget_server_ipaddress()- Reliable sender IP extraction with trust levelsto_domains- Extracted recipient domains for analysistimezone- Detected timezone informationdefects- RFC compliance issues for security analysisdefects_categories- Categorized defect types
The attachments property returns a list of dictionaries, each containing comprehensive metadata:
binary- Boolean flag indicating binary contentcharset- Character encoding of the attachmentcontent_transfer_encoding- Transfer encoding method (e.g., base64, quoted-printable)content-disposition- Disposition type (attachment, inline, etc.)content-id- Content identifier for referencing within HTML bodiesfilename- Original filename of the attachmentmail_content_type- MIME content typepayload- Base64-encoded attachment data, ready for decoding or storage
To access custom or vendor-specific headers, replace hyphens with underscores. For example, to
access the X-MSMail-Priority header:
mail.X_MSMail_Priority
The received header is intelligently parsed into individual hops, revealing the complete email
routing path. Each hop contains structured fields:
by- Receiving mail serverdate- Timestamp of receipt (original timezone)date_utc- Normalized UTC timestampdelay- Time elapsed between consecutive hopsenvelope_from- SMTP envelope senderenvelope_sender- Alternative envelope sender fieldfor- Intended recipientfrom- Sending mail serverhop- Sequential hop numberwith- Protocol used for transmission (SMTP, ESMTP, etc.)
Critical Security Feature: mail-parser detects and reports structural defects in email messages.
The defects property identifies RFC non-compliance issues that may indicate malformed or malicious emails—a crucial capability for security analysis and threat detection.
Multi-Format Property Access Pattern:
All parsed properties provide three access variants using intuitive suffixes:
property_name- Returns structured Python objectproperty_name_json- Returns JSON-serialized representationproperty_name_raw- Returns original, unprocessed header string
Example usage:
mail.to # Python list of recipient objects mail.to_json # JSON string representation mail.to_raw # Original "To:" header string as it appears in the email
The command-line tool outputs parsed emails in JSON format by default for easy integration with other tools and pipelines.
Email structural defects are not merely technical curiosities—they represent potential security vulnerabilities that sophisticated attackers actively exploit to bypass spam filters, antivirus scanners, and email security gateways.
Malformed MIME boundaries, for example, can conceal illegitimate epilogue sections containing:
- Malware Payloads: Executable files or scripts hidden in non-standard message parts
- Phishing Links: Obfuscated URLs that bypass pattern-matching filters
- Command-and-Control Data: Encoded instructions for compromised systems
- Data Exfiltration: Steganographically hidden sensitive information
mail-parser was specifically engineered for security analysis and digital forensics, with defect detection as a core feature rather than an afterthought. The library captures and categorizes even subtle structural anomalies that other parsers silently ignore or mishandle.
By leveraging mail-parser's defect detection, security teams can:
- Expose Hidden Content: Discover deliberately obfuscated message parts that may contain malicious payloads.
- Identify Attack Patterns: Recognize non-standard formatting techniques used by threat actors to evade detection.
- Enable Deep Forensics: Conduct thorough structural analysis of suspicious emails during incident response.
- Strengthen Defenses: Build more resilient email security rules based on identified defect patterns.
- Ensure Compliance: Verify that outbound emails meet RFC standards to avoid delivery issues.
This robust defect detection mechanism has made mail-parser the trusted choice for security platforms like SpamScope, where identifying malicious intent hidden in structural anomalies can mean the difference between a blocked threat and a successful attack.
Fedele Mantuano: LinkedIn
mail-parser requires Python 3 and can be installed in seconds using pip. Follow these steps:
- Ensure Python 3 is installed on your system.
- Open your terminal or command prompt.
- Install mail-parser from PyPI:
pip install mail-parser
- (Optional) Verify the installation:
pip show mail-parser
For contributors and developers who want to work with the source code, we recommend using uv for
dependency management:
git clone https://github.com/SpamScope/mail-parser.git
cd mail-parser
uv syncThis setup installs all development and testing dependencies in an isolated virtual environment, ensuring a clean and reproducible development workflow.
For comprehensive documentation about uv, visit the official uv documentation.
Import the mailparser module and use the convenient factory functions:
import mailparser mail = mailparser.parse_from_bytes(byte_mail) # Parse from bytes object mail = mailparser.parse_from_file(f) # Parse from file path mail = mailparser.parse_from_file_msg(outlook_mail) # Parse Outlook .msg file mail = mailparser.parse_from_file_obj(fp) # Parse from file object mail = mailparser.parse_from_string(raw_mail) # Parse from string
Once parsed, access all email components through intuitive properties:
mail.attachments # List of all attachments with metadata mail.body # Complete message body mail.date # Parsed datetime object (UTC) mail.defects # List of RFC compliance defects mail.defects_categories # Categorized defect types mail.delivered_to # Delivery address mail.from_ # Sender information mail.get_server_ipaddress(trust="my_server_mail_trust") # Reliable sender IP mail.headers # All headers as structured object mail.mail # Fully tokenized mail object mail.message # Underlying email.message.Message object mail.message_as_string # Reconstructed message as string mail.message_id # Unique message identifier mail.received # Parsed routing information (hop-by-hop) mail.subject # Email subject mail.text_plain # Plain text body parts (list) mail.text_html # HTML body parts (list) mail.text_not_managed # Unprocessed text parts (check logs for subtypes) mail.to # Recipient information mail.to_domains # Extracted recipient domains mail.timezone # Timezone information (offset from UTC) mail.mail_partial # Partial mail object (main parts only)
Write all attachments to a specified directory:
mail.write_attachments(base_path)
After installing mail-parser with pip, you can use the mailparser command-line tool for quick
email analysis, batch processing, or integration with shell scripts and pipelines.
usage: mailparser [-h] (-f FILE | -s STRING | -k)
[-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-j] [-b]
[-a] [-r] [-t] [-dt] [-m] [-u] [-c] [-d] [-o]
[-i Trust mail server string] [-p] [-z] [-v]
Wrapper for email Python Standard Library
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE Raw email file (default: None)
-s STRING, --string STRING
Raw email string (default: None)
-k, --stdin Enable parsing from stdin (default: False)
-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Set log level (default: WARNING)
-j, --json Show the JSON of parsed mail (default: False)
-b, --body Print the body of mail (default: False)
-a, --attachments Print the attachments of mail (default: False)
-r, --headers Print the headers of mail (default: False)
-t, --to Print the to of mail (default: False)
-dt, --delivered-to Print the delivered-to of mail (default: False)
-m, --from Print the from of mail (default: False)
-u, --subject Print the subject of mail (default: False)
-c, --receiveds Print all receiveds of mail (default: False)
-d, --defects Print the defects of mail (default: False)
-o, --outlook Analyze Outlook msg (default: False)
-i Trust mail server string, --senderip Trust mail server string
Extract a reliable sender IP address heuristically
(default: None)
-p, --mail-hash Print mail fingerprints without headers (default:
False)
-z, --attachments-hash
Print attachments with fingerprints (default: False)
-sa, --store-attachments
Store attachments on disk (default: False)
-ap ATTACHMENTS_PATH, --attachments-path ATTACHMENTS_PATH
Path where store attachments (default: /tmp)
-v, --version show program's version number and exit
It takes as input a raw mail and generates a parsed object.
Parse an email file and output as formatted JSON:
mailparser -f example_mail -j
Extract only the subject and sender:
mailparser -f example_mail -u -m
Analyze an Outlook .msg file with defect detection:
mailparser -f email.msg -o -d -j
Parse from stdin (useful for pipelines):
cat raw_email.eml | mailparser -k -jSee the transformation from raw email to beautifully parsed JSON output.
mail-parser uses a well-structured exception hierarchy for precise error handling:
MailParserError: Base MailParser Exception
|
\── MailParserOutlookError: Raised with Outlook integration errors
|
\── MailParserEnvironmentError: Raised when the environment is not correct
|
\── MailParserOSError: Raised when there is an OS error
|
\── MailParserReceivedParsingError: Raised when a received header cannot be parsed
A pre-built Docker image is available for easy deployment and containerized workflows. Find the official image on Docker Hub.
After installing Docker, run the containerized mail-parser:
sudo docker run -it --rm -v ~/mails:/mails fmantuano/spamscope-mail-parserThis command mounts your local ~/mails directory into the container at /mails, allowing
mail-parser to access your email files. You can pass any command-line options supported by
mail-parser.
For more complex setups, a docker-compose.yml file is included in the repository. Run it with:
sudo docker-compose up
The default configuration includes:
- Read-only mount of your local
~/mailsdirectory to/mailsin the container. - A test command demonstrating mail-parser functionality.
Customize the docker-compose.yml file to adjust mount points, command-line options, or
environment variables for your specific use case.