D4.3 Datasets for CANTATA project
In the context of the European CANTATA project, partners involved in multi content analysis validation methods combined their efforts to create a webpage to share knowledge about datasets (sets & metadata & ground truth & metrics...) for three different domains: surveillance, consumer electronic and medical. Datasets should fall into three categories (Surveillance, Medical, Consumer Applications)
Please follow the template below to provide detailed information:
-
Name
- Website: Webpage link (if any)
- Description of Dataset: (Content, size, etc)
- Description of Ground Truth/Metadata: (if any)
- Contextual info:environment conditions (calibration, scene...)
- Results from metrics and ground truth:
- Comments:
- Information on Copyright: Licence, Cost, etc.
- Contact person from Cantata: contact person to get more info.
If you want to add a new dataset, or if you have any comments, please contact Cédric Marchessoux ().
Surveillance
-
PETS
- Website: http://www.cvg.rdg.ac.uk/slides/pets.html
- Description of Dataset: Each year PETS run an evalaution frameowork on specific datasets with specific objecttive. 2000: 2001.... (more on duration and theme)
- Description of Ground Truth/Metadata: Ground truth depends on the theme of each year's workshop.
- Contextual info:
- Results from metrics and ground truth:
- Comments:
- Information on Copyright: Free download from website
- Contact person from Cantata: Dimitrios Makris,
-
PETS 2000
- Website: ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/
- Description of Dataset: Outdoor people and vehicle tracking (single camera). Two sequences: a) training sequence of 3672 frames at 25 Hz (146.88 secs) and b) test sequence of 1452 frames (58.08 secs). The sequences are available in 2 formats: a) QuickTime movie format with Motion JpegA compression (training.mov and test.mov) b) Individual Jpeg files (training_images/*.jpg and test_9images/*.jpeg).
- Description of Ground Truth/Metadata: No Ground Truth provided.
- Contextual info: Camera Calibration provided.
- Results from metrics and ground truth: Tracking Information
- Comments:
- Information on Copyright: Free download
- Contact person from Cantata: Dimitrios Makris,
PETS 2000 Dataset:
-
PETS 2001
- Website: http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html
- Description of Dataset: Outdoor people and vehicle tracking (two synchronised views; includes omnidirectional and moving camera). PETS'2001 consists of five separate sets of training and test sequences, i.e. each set consists of one training sequence and one test sequence. All the datasets are multi-view (2 cameras) and are significantly more challenging than for PETS'2000 in terms of significant lighting variation, occlusion, scene activity and use of multi-view data.
- Description of Ground Truth/Metadata: Tracking information on image plane and ground plane can be found at: http://www.cvg.cs.rdg.ac.uk/PETS2001/ANNOTATION/
- Contextual info: Camera Calibration provided
- Results from metrics and ground truth: Centroid and bounding box coordinates on image plane, object class (person, vehicle, other), position on ground plane and object orientation.
- Comments:
- Information on Copyright: Free download from website
- Contact person from Cantata: Dimitrios Makris,
PETS 2001 Dataset 1:
PETS 2001 Dataset 2:
PETS 2001 Dataset 3:
PETS 2001 Dataset 4:
PETS 2001 Dataset 5:
-
PETS 2002- VISOR BASE: Moving People
- Website: http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html
- Description of Dataset: Indoor people tracking (and counting). Two training and four testing sequences consist of people moving in front of a shop window. Sequences are provided as both MPEG movie format and as individual JPEG images.
- Description of Ground Truth/Metadata: People tracking, counting and activity recognition.
- Contextual info: No calibration
- Results from metrics and ground truth: How many people are passing in front of the shop window, how many people stop and look into the window, how many people are looking into the window at each instant (frame) in time, the trajectories of people passing in front of the store, the time spent per frame (processing time): a histogram of the microseconds spent processing each frame.
- Comments:
- Information on Copyright: Free download from website
- Contact person from Cantata: Dimitrios Makris,
PETS 2002 - VISOR BASE Dataset:
-
PETS-ICVS'2003 - FGnet
- Website: http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html
- Description of Dataset: Smart meeting, that includes facial expressions, gaze and gesture/action. The environment consists of three cameras: one mounted on each of two opposing walls, and an omnidirectional camera positioned at the centre of the room. The dataset consists of four scenarios.
- Description of Ground Truth/Metadata: a) Eye positions of people in Scenarios A, B and D. (every 10th frame is annotated). b) Facial expression and gaze estimation for Scenarios A and D, Cameras 1-2. c) Gesture/action annotations for Scenarios B and D, Cameras 1-2.
- Contextual info: Camera Calibration provided.
- Results from metrics and ground truth: For each frame, the requirement is to perform:face localisation (centre location of eyes), recognition of facial expression, recognition of face/hand gesture, estimation of face/head direction (gaze), recognition of actions.
- Comments:
- Information on Copyright: Free download
- Contact person from Cantata: Dimitrios Makris,
PETS-ICVS'2003 - FGnet Dataset:
-
VS-PETS'2003 - INMOVE
- Website: http://www.cvg.cs.rdg.ac.uk/VSPETS/vspets-db.html
- Description of Dataset: Outdoor people tracking - football data (three synchronised views). The datasets consists of football players moving around a pitch.
- Description of Ground Truth/Metadata: Tracking information on image plane for camera 3 can be found at: http://www.cvg.cs.rdg.ac.uk/VSPETS/Camera3Xml.zip. An AVI file of the ground truth for camera view 3 is also available at http://www.cvg.cs.rdg.ac.uk/VSPETS/Cam3_Gt.avi
- Contextual info:
- Results from metrics and ground truth: The location of each player on the pitch, for each frame of the sequence. For each player, the bounding box (with origin bottom left) in pixels should be determined. The position of the player is defined as the middle bottom of the bounding box (in pixels).
- Comments:
- Information on Copyright: Free download from website
- Contact person from Cantata: Dimitrios Makris,
VS-PETS'2003 - INMOVE Camera Setting:
VS-PETS'2003 - INMOVE Dataset Views: