TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

Module: tfds.deprecated.text

View source on GitHub

Text utilities.

tfds includes a set of TextEncoders as well as a Tokenizer to enable expressive, performant, and reproducible natural language research.

Classes

class ByteTextEncoder: Byte-encodes text.

class SubwordTextEncoder: Invertible TextEncoder using word pieces with a byte-level fallback.

class TextEncoder: Abstract base class for converting between text and integers.

class TextEncoderConfig: Configuration for tfds.features.Text.

class TokenTextEncoder: TextEncoder backed by a list of tokens.

class Tokenizer: Splits a string into tokens, and joins them back.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024年04月26日 UTC.