Name	Name	Last commit message	Last commit date
Latest commit History 3 Commits
architecture	architecture
cli	cli
data	data
infrastructure	infrastructure
lambda	lambda
scripts	scripts
.ash_config.yaml	.ash_config.yaml
.gitignore	.gitignore
README.md	README.md
bda_pricing_estimate_2025.md	bda_pricing_estimate_2025.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

BDA Resume Parser - Real-Time Processing

Automated resume parsing using Amazon Bedrock Data Automation with hierarchical data extraction and real-time S3 event triggers.

Architecture

BDA Architecture

Real-Time Event-Driven Processing:

S3 Upload → Automatic trigger via S3 Event Notifications
Lambda Processing → BDA hierarchical blueprint extraction
Structured Output → Organized JSON results saved to S3
Error Handling → DLQ + monitoring for failures

More detailed description in Architecture

Key Benefits:

✅ Hierarchical Data Structure - Organized sections (Personal, Education, Experience, Skills)
✅ Sub-second triggers - Immediate processing on upload
✅ Auto-scaling - Handle multiple concurrent resumes
✅ Infrastructure as Code - Complete CDK deployment with blueprints
✅ Development Workflow - DEV → LIVE promotion for safe deployments

Quick Start

1. Deploy Complete Infrastructure (Blueprint + Resources)

# Deploy everything with CDK (blueprint, project, S3, Lambda, etc.)
cd infrastructure
uv sync
uv run cdk bootstrap # One-time setup
uv run cdk deploy
# Note: This creates blueprint and project in DEVELOPMENT stage

2. Interactive Processing

# 🚀 Launch interactive menu
./scripts/bda_workflow.sh data/sample_resume.pdf
# Choose from menu:
# Option 3: Production processing with blueprint stage verification
# Option 5: Full DEV→LIVE workflow automation
# Option 1-2: Individual development and promotion steps
# Option 4: LIVE stage testing

Project Structure

bda_usecases/
├── infrastructure/ # CDK Infrastructure as Code
│ ├── app.py # CDK app entry point
│ ├── stacks/bda_stack.py # S3, Lambda, IAM resources
│ └── pyproject.toml # CDK dependencies
├── lambda/ # Lambda function code
│ ├── handler.py # S3 event processor
│ ├── bda_parser/ # Core BDA processing module
│ │ ├── bda_client.py # BDA client with blueprint processing
│ │ ├── blueprint_schema.json # Custom extraction schema
│ │ └── config.py # AWS configuration
│ └── pyproject.toml # Runtime dependencies
├── cli/ # Setup & testing tools
│ ├── promote_blueprint.py # Promote DEV → LIVE stage
├── scripts/ # Automation scripts
│ ├── bda_workflow.sh # 🚀 Unified processing & workflow automation
│ └── cleanup.sh # Infrastructure cleanup script
├── data/
│ └── sample_resume.pdf # Test document
└── architecture/ # Architecture diagrams and documentation
 ├── README.md # Detailed architecture overview
 ├── bda_architecture.png # System architecture diagram
 └── bda_workflow.png # Development workflow diagram

Real-Time Processing Flow

1. Automatic Trigger

Resume Upload → s3://bucket/input/resume.pdf
 ↓ (S3 Event Notification)
 Lambda Function Invoked

2. Lambda Processing

# Lambda receives S3 event
{
 "Records": [{
 "s3": {
 "bucket": {"name": "bda-resume-bucket"},
 "object": {"key": "input/resume.pdf"}
 }
 }]
}
# Processes with BDA blueprint
# Saves results to output/ prefix

3. Output Structure

s3://bucket/
├── input/
│ └── 20240923_112801_resume.pdf # Uploaded resume
└── output/
 └── 20240923_112801_resume/ # Processing results
 └── 0/custom_output/0/
 └── result.json # Structured data

Development Workflow

BDA Workflow

DEVELOPMENT Stage - Blueprint and project created for testing
Testing Phase - Validate extraction quality with sample resumes
LIVE Promotion - Move to production when ready
Monitoring - CloudWatch logs and error tracking

Hierarchical Blueprint Schema

Extracts structured resume data in organized sections:

{
 "matched_blueprint": {
 "arn": "arn:aws:bedrock:ap-south-1:###:blueprint/###",
 "name": "resume-parser-hierarchical-###",
 "confidence": 1
 },
 "document_class": {
 "type": "Resume"
 },
 "split_document": {
 "page_indices": [0, 1]
 },
 "inference_result": {
 "skills": {
 "technical": "Programming Languages: Python, JavaScript, Java, Go, SQL Cloud Platforms: AWS, Azure, Google Cloud Platform Frameworks: React, Django, Flask, Express.js Databases: PostgreSQL, MongoDB, Redis, DynamoDB DevOps: Docker, Kubernetes, Jenkins, Terraform, Git",
 "languages": "English (Native), Spanish (Conversational)",
 "certifications": "AWS Solutions Architect Associate (AWS-SAA-123456), Certified Kubernetes Administrator (CKA-789012)",
 "tools": "Python, AWS, Docker, Kubernetes, PostgreSQL, JavaScript, React, Django, Flask, Express.js, JavaScript, React, Node.js, MongoDB, PostgreSQL, Redis, DynamoDB, Jenkins, Terraform, Git",
 "soft": "Leadership, Team Collaboration, Problem Solving, Communication, Project Management"
 },
 "personal_info": {
 "full_name": "John Smith",
 "address": "123 Main Street, Seattle, WA 98101",
 "phone": "(555) 123-4567",
 "linkedin": "linkedin.com/in/johnsmith",
 "email": "john.smith@email.com"
 },
 "educational_info": {
 "institution": "University of Washington, Seattle, WA",
 "graduation_year": "June 2020",
 "degree": "Bachelor of Science",
 "gpa": "3.8",
 "field_of_study": "Computer Science"
 },
 "experience": {
 "key_achievements": "Lead development of cloud-native applications using AWS services, Reduced system latency by 40% through performance optimization, Led team of 5 engineers on microservices migration project, Implemented CI/CD pipeline reducing deployment time by 60%, Developed full-stack web applications for e-commerce platform, Built payment processing system handling 1ドルM+ in monthly transactions, Improved application performance by 50% through code optimization",
 "current_position": "Senior Software Engineer",
 "current_company": "Tech Corp",
 "years_total": "4+",
 "previous_roles": "Software Engineer, StartupXYZ, July 2020 - December 2021"
 }
 }
}

Benefits of Hierarchical Structure:

✅ Organized Output - Clear sections for downstream processing
✅ Easy Integration - Structured for databases and APIs
✅ Maintainable Schema - Reusable object definitions
✅ Type Safety - Consistent data types per section

Monitoring

# Lambda function logs
aws logs tail /aws/lambda/BDAResumeStack-BDAProcessorFunction --follow
# DLQ messages (failed processing)
aws sqs receive-message --queue-url <dlq-url>
# S3 event notifications
aws s3api get-bucket-notification-configuration --bucket <bucket-name>

Monitoring & Debugging

Real-Time Monitoring

# Check Lambda logs (real-time)
aws logs tail /aws/lambda/BDAResumeStack-BDAProcessorFunction --follow
# List processed results
aws s3 ls s3://<bucket-name>/output/ --recursive
# Check CDK stack outputs
cd infrastructure && uv run cdk list --json

Error Handling

# Check DLQ for failed processing
aws sqs receive-message --queue-url <dlq-url>
# S3 event configuration
aws s3api get-bucket-notification-configuration --bucket <bucket-name>

Advanced Usage

🚀 Script Usage Options

# Interactive menu (recommended)
./scripts/bda_workflow.sh data/sample_resume.pdf
# Process multiple files (choose option 3 for each)
for file in resumes/*.pdf; do
 echo "Processing $file..."
 ./scripts/bda_workflow.sh "$file"
 # Select option 3 for production processing
done

Infrastructure Management

cd infrastructure
# Preview changes before deployment
uv run cdk diff
# Deploy updates
uv run cdk deploy
# View stack outputs (ARNs, bucket names, etc.)
uv run aws cloudformation describe-stacks --stack-name BDAResumeStack --query "Stacks[0].Outputs"
# Cleanup (removes all resources)
uv run cdk destroy

Complete Infrastructure Cleanup

The project includes a comprehensive cleanup script that safely removes all AWS resources:

# Safe cleanup - shows what would be deleted but doesn't delete
./scripts/cleanup.sh
# Force cleanup - actually deletes all resources
./scripts/cleanup.sh true

Cleanup Script Features:

✅ Safe by default - Dry run mode shows resources without deleting
✅ CDK-aware - Reads actual resources from CloudFormation stack
✅ Comprehensive - Removes S3 buckets, Lambda functions, BDA resources, IAM roles
✅ Smart detection - Finds resources even if stack is partially deleted
✅ Force mode - true argument enables actual deletion
✅ Verification - Confirms cleanup completion and shows any remaining resources

What gets cleaned up:

CloudFormation stack (CDK-managed resources)
S3 bucket and all contents
Lambda functions
BDA blueprints and projects
SQS dead letter queues
IAM roles and policies

Usage Examples:

# Check what would be deleted (safe preview)
./scripts/cleanup.sh
# Actually delete everything (use with caution)
./scripts/cleanup.sh true
# Alternative: CDK-only cleanup (may leave some resources)
cd infrastructure && uv run cdk destroy

windson/bda-usecases

Folders and files

Latest commit

History

Repository files navigation

BDA Resume Parser - Real-Time Processing

Architecture

Quick Start

1. Deploy Complete Infrastructure (Blueprint + Resources)

2. Interactive Processing

Project Structure

Real-Time Processing Flow

1. Automatic Trigger

2. Lambda Processing

3. Output Structure

Development Workflow

Hierarchical Blueprint Schema

Monitoring

Monitoring & Debugging

Real-Time Monitoring

Error Handling

Advanced Usage

🚀 Script Usage Options

Infrastructure Management

Complete Infrastructure Cleanup

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages