This project is an assignment for the Artificial Intelligence major at Taylor's University.
This project aims to detect whether a person is lying by analyzing multiple modalities including audio, video, and text. We use deep learning and feature fusion strategies to build a comprehensive judgment model.
| English |
|---|
| Multimodal Data Loader |
| Audio Feature Extraction |
| Text Feature Extraction |
| Visual Feature Extraction |
| Model Training with Fusion Strategies |
| Feature Importance Analysis |
| Visualization of Results |
- Python 3.x
- PyTorch – Deep Learning Framework
- NumPy/Pandas – Data Processing
- Matplotlib/Seaborn – Data Visualization
- Git + GitHub – Version Control and Collaborative Development
| Filename | Description |
|---|---|
| DataConfig.py | Data path configuration class |
| Main.py | Main program entry point |
| ModelTrainer.py | Model training and fusion logic |
| MultimodalDataLoader.py | Multimodal data loading and processing |
| TextFeatureExtractor.py | Text feature extraction module |
| VisualFeatureExtractor.py | Visual feature extraction module |
| AudioFeatureExtractor.py | Audio feature extraction module |
dataset/
├── Clips/ # Video clips
│ ├── false/
│ └── true/
├── Transcription/ # Text transcripts
│ ├── false/
│ └── true/
├── audio/ # Audio files
│ ├── false/
│ └── true/
└── Annotation/ # Annotation files
└── annotation.csv
-
Install dependencies:
pip install torch numpy pandas matplotlib seaborn
-
Place your dataset in the
dataset/directory. -
Run the main program:
python Main.py
-
The program will automatically load multimodal data, train fusion models, and generate feature analysis charts.
| Name |
|---|
| Xiao Changhe |
| Guo Tao |
| Kan YiMing |
| Zhang ZhiAng |
| Zheng YaXin |
MIT License - Commercial use allowed, please retain original author attribution.
For questions or collaboration opportunities, please contact:
guotao2beijing@gmail.com