Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SHAROZ221/codealpha_Task-Automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

7 Commits

Repository files navigation

Task Automation with Python β€” Email Extractor


πŸ“Œ Goal

Automate the extraction of all email addresses from a .txt file, categorize them, and save a detailed report to a separate output file.


πŸ“ Project Files

File Purpose
main.py Main script β€” extracts, categorizes, and reports on emails using regex
sample.txt Sample input file with multiple email addresses
extracted_emails.txt Auto-generated output file with extracted emails and stats

▢️ How to Run

Step 1 β€” Install Python

No external libraries needed. Uses only built-in Python modules (re, os, collections).

Step 2 β€” Run the script

python main.py

Step 3 β€” Enter the filename

Enter the .txt filename (e.g. sample.txt): sample.txt

Step 4 β€” Check the output

Extracted emails, category breakdown, and domain stats are saved to extracted_emails.txt in the same folder.


πŸ” How It Works

  1. Reads the contents of the input .txt file
  2. Uses a regex pattern to find all valid email addresses
  3. Removes duplicates while preserving order
  4. Categorizes each email as either:
    • Personal/Work β€” regular human or business addresses
    • System/No-reply β€” automated addresses (e.g. no-reply@, notifications@, alerts@, newsletter@)
  5. Groups emails by domain and counts how many addresses belong to each domain
  6. Saves a full report to extracted_emails.txt, including:
    • Total email count
    • Category breakdown (Personal/Work vs System/No-reply)
    • Domain breakdown (sorted by frequency)
    • Separate lists of Personal/Work and System/No-reply emails

πŸ’‘ Key Concepts Used

  • re β€” regular expressions for pattern matching
  • os β€” file existence check
  • collections.Counter β€” counting categories and domains
  • File handling β€” reading input, writing output
  • Deduplication logic using set()
  • String matching for email categorization

πŸ§ͺ Sample Output

Console

[+] Found 24 unique email(s):
 michael.ross@company.com
 ...
[*] Category Breakdown:
 Personal/Work: 21
 System/No-reply: 3
[*] Top Domains:
 devteam.org: 5
 company.com: 4
 secops.net: 4
 bigcorp.in: 3
 partnerltd.co.uk: 1
[βœ“] Saved to: extracted_emails.txt

extracted_emails.txt

Email Extraction Report
========================================
Total emails found: 24
========================================
Category Breakdown:
 Personal/Work emails: 21
 System/No-reply emails: 3
Domain Breakdown:
 devteam.org: 5
 company.com: 4
 secops.net: 4
 bigcorp.in: 3
 ...
----------------------------------------
Personal/Work Emails
----------------------------------------
michael.ross@company.com
rachel.zane@company.com
...
----------------------------------------
System/No-reply Emails
----------------------------------------
no-reply@alerts.system.com
notifications@monitor.net
noreply@newsletter.company.com

πŸš€ Extra Features Added Beyond the Original Brief

  • Email categorization β€” automatically flags addresses as Personal/Work or System/No-reply based on common automated-mail prefixes (no-reply, notifications, alerts, newsletter, etc.)
  • Domain-based statistics β€” counts and ranks how many extracted emails belong to each domain
  • Structured report β€” output file is organized into category stats, domain stats, and separated email lists for easier review

About

Python script that extracts all email addresses from a .txt file using regex and saves them to an output file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /