Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Guide to finetune on custom dataset #251

Open
@vishalk2999

Description

I have created a dataset in the following format:

- Dataset_folder
 - videos
 - video1,mp4
 - video2.mp4
 train.json

train.json is in the following format:

[
 {
 "video":"videos/calling.mp4",
 "QA":[{
 "i":"Go through the video and understand the all the actions performed in the video",
 "q":"Describe the video",
 "a":"The person is making phone call and talking on the phone"
 }]
 },
]

How to prepare a custom dataset and what are the changes I need to do in order to train on this custom dataset for stage3 finetuning.

I have set the train_file variable of config_7b_stage3.py to the path of this train.json and i get the following error:

2024年12月07日T07:52:41 | __main__: train_file: /home/ubuntu/Custom_Data/train.json
2024年12月07日T07:52:41 | __main__: Creating dataset for it
2024年12月07日T07:52:41 | dataset.it_dataset: Load json file
Traceback (most recent call last):
 File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 221, in <module>
 main(cfg)
 File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 138, in main
 train_loaders, train_media_types = setup_dataloaders(
 File "/home/ubuntu/Ask-Anything/video_chat2/tasks/train_it.py", line 105, in setup_dataloaders
 train_datasets = create_dataset(f"{mode}_train", config)
 File "/home/ubuntu/Ask-Anything/video_chat2/dataset/__init__.py", line 174, in create_dataset
 datasets.append(dataset_cls(**dataset_kwargs))
 File "/home/ubuntu/Ask-Anything/video_chat2/dataset/it_dataset.py", line 37, in __init__
 with open(self.label_file, 'r') as f:
IsADirectoryError: [Errno 21] Is a directory: '/'

Could you please help in understading the steps and changes required to train on a custom dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /