This directory contains code for deploying OmniParser v2 to Amazon SageMaker as an asynchronous inference endpoint.
omniparser-sagemaker/
├── container/ # Container files for SageMaker deployment
│ ├── Dockerfile # Docker configuration for the container
│ └── inference.py # SageMaker model server implementation
├── model/ # Model artifacts
│ ├── download_weights.py # Script to download weights from Hugging Face
│ └── weights/ # Local directory for temporary weight storage
├── scripts/ # Deployment and build scripts
│ ├── build_and_push.sh # Script to build and push Docker image to ECR
│ └── deploy.py # Script to deploy model to SageMaker
├── .python-version # Python version specification
├── pyproject.toml # Project configuration and dev dependencies
├── requirements.txt # Production dependencies
└── .gitignore # Git ignore rules
- AWS CLI installed and configured with appropriate credentials
- Docker installed and running
- Python 3.11
- Required Python packages (install via
pip install -r requirements.txt
):# Core Dependencies boto3 sagemaker sagemaker-inference multi-model-server # ML & Vision torch torchvision transformers ultralytics==8.3.70 supervision==0.18.0 opencv-python opencv-python-headless # OCR Components paddlepaddle paddleocr easyocr # Utilities numpy==1.26.4 einops==0.8.0
This project uses pyproject.toml
for development dependencies and configuration. To set up a development environment:
# Create and activate a virtual environment python -m venv .venv source .venv/bin/activate # Install development dependencies pip install -e ".[dev]" # Install pre-commit hooks pre-commit install
# Install required packages pip install -r requirements.txt # Configure AWS CLI with your credentials aws configure
cd sagemaker/scripts # Set your S3 bucket for model weights export OMNIPARSER_MODEL_BUCKET="your-model-bucket-name" # Build and push (this will also download and upload model weights) ./build_and_push.sh
This script will:
- Create the S3 bucket if it doesn't exist
- Download model weights from Hugging Face
- Create a tarball and upload to:
s3://${OMNIPARSER_MODEL_BUCKET}/model/omniparser-v2/model.tar.gz
- Build and push the Docker container to ECR
from scripts.deploy import deploy_omniparser # Deploy using the same bucket used in build step predictor = deploy_omniparser( model_bucket="your-model-bucket-name" )
This will:
- Create a SageMaker model using the ECR container
- Configure the model to use weights from S3
- Deploy an async inference endpoint
- Return a predictor object for making inferences
from examples.invoke_endpoint import invoke_omniparser, get_results # Submit an inference request image_path = "path/to/your/image.png" output_location = invoke_omniparser(image_path) # Wait for processing (you can implement polling here) import time time.sleep(30) # Get results labeled_image, coordinates, content = get_results(output_location)
OmniParser v2 uses two main model components:
- Icon Detection Model (YOLO-based)
- Icon Caption Model (Florence2)
The weights are managed in two stages:
-
Build Time:
- Downloaded from Hugging Face
- Packaged into
model.tar.gz
- Uploaded to S3:
s3://<bucket>/model/omniparser-v2/model.tar.gz
-
Runtime:
- SageMaker automatically downloads weights from S3
- Extracts to
/opt/ml/model
in the container - Used by the model for inference
# Required: export OMNIPARSER_MODEL_BUCKET="your-bucket" # S3 bucket for model weights # Optional: export AWS_DEFAULT_REGION="us-west-2" # Defaults to us-west-2
# In deploy.py: predictor = deploy_omniparser( model_bucket="your-bucket", model_prefix="model/omniparser-v2" # Optional, defaults to this value )
# In invoke_endpoint.py: request = { 'image': encode_image(image_path), 'box_threshold': 0.05, # Detection confidence threshold 'iou_threshold': 0.7, # Box overlap threshold 'use_paddleocr': False, # Whether to use PaddleOCR 'batch_size': 128 # Batch size for caption generation }
-
CloudWatch Metrics:
- Endpoint invocations
- Model latency
- GPU utilization
-
CloudWatch Logs:
- Container logs
- Inference errors
-
S3 Monitoring:
- Async inference results
- Failed inference requests
-
Build Issues:
- Check S3 bucket permissions
- Verify Hugging Face access
- Check Docker build logs
- Ensure enough disk space for weights
-
Deployment Issues:
- Verify IAM roles have necessary permissions
- Check SageMaker service quotas
- Verify GPU instance availability
-
Inference Issues:
- Check async output location
- Verify input image format
- Monitor GPU memory usage
import boto3 # Delete endpoint sagemaker = boto3.client('sagemaker') sagemaker.delete_endpoint(EndpointName='omniparser-v2-async') # Delete model weights (optional) s3 = boto3.client('s3') s3.delete_object( Bucket='your-model-bucket', Key='model/omniparser-v2/model.tar.gz' )