Google Cloud Discovery Engine V1 Client - Class GcsTrainingInput (1.7.0)

Reference documentation and code samples for the Google Cloud Discovery Engine V1 Client class GcsTrainingInput.

Cloud Storage training data input.

Generated from protobuf message google.cloud.discoveryengine.v1.TrainCustomModelRequest.GcsTrainingInput

Namespace

Google \ Cloud \ DiscoveryEngine \ V1 \ TrainCustomModelRequest

Methods

__construct

Constructor.

Parameters
Name	Description
`data`	`array` Optional. Data for populating the Message object.
`↳ corpus_data_path`	`string` The Cloud Storage corpus data which could be associated in train data. The data path format is `gs://<bucket_to_data>/<jsonl_file_name>`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id, title and text. Example: `{"_id": "doc1", title: "relevant doc", "text": "relevant text"}`
`↳ query_data_path`	`string` The gcs query data which could be associated in train data. The data path format is `gs://<bucket_to_data>/<jsonl_file_name>`. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id and text. Example: {"_id": "query1", "text": "example query"}
`↳ train_data_path`	`string` Cloud Storage training data path whose format should be `gs://<bucket_to_data>/<tsv_file_name>`. The file should be in tsv format. Each line should have the doc_id and query_id and score (number). For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in `[0, inf+)`. The larger the number is, the more relevant the pair is. Example: * * `query-id\tcorpus-id\tscore` * * `query1\tdoc1\t1`
`↳ test_data_path`	`string` Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path.

getCorpusDataPath

The Cloud Storage corpus data which could be associated in train data.

The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id, title and text. Example: {"_id": "doc1", title: "relevant doc", "text": "relevant text"}

Returns
Type	Description
`string`

setCorpusDataPath

The Cloud Storage corpus data which could be associated in train data.

Parameter
Name	Description
`var`	`string`

Returns
Type	Description
`$this`

getQueryDataPath

The gcs query data which could be associated in train data.

The data path format is gs://<bucket_to_data>/<jsonl_file_name>. A newline delimited jsonl/ndjson file. For search-tuning model, each line should have the _id and text. Example: {"_id": "query1", "text": "example query"}

Returns
Type	Description
`string`

setQueryDataPath

The gcs query data which could be associated in train data.

Parameter
Name	Description
`var`	`string`

Returns
Type	Description
`$this`

getTrainDataPath

Cloud Storage training data path whose format should be gs://<bucket_to_data>/<tsv_file_name>. The file should be in tsv format. Each line should have the doc_id and query_id and score (number).

For search-tuning model, it should have the query-id corpus-id score as tsv file header. The score should be a number in [0, inf+). The larger the number is, the more relevant the pair is. Example:

query-id\tcorpus-id\tscore
query1\tdoc1\t1

Returns
Type	Description
`string`

setTrainDataPath