Module: tfdv

View source on GitHub

Init module for TensorFlow Data Validation.

Classes

class CombinerStatsGenerator: A StatsGenerator which computes statistics using a combiner function.

class CrossFeatureView: View of a single cross feature.

class DatasetListView: View of statistics for multiple datasets (slices).

class DatasetView: View of statistics for a dataset (slice).

class DetectFeatureSkew: API for detecting feature skew between training and serving examples.

class FeaturePath: Represents the path to a feature in an input example.

class FeatureView: View of a single feature.

class GenerateStatistics: API for generating data statistics.

class MergeDatasetFeatureStatisticsList: API for merging sharded DatasetFeatureStatisticsList.

class StatsOptions: Options for generating statistics.

class TransformStatsGenerator: A StatsGenerator which wraps an arbitrary Beam PTransform.

class WriteStatisticsToBinaryFile: API for writing serialized data statistics to a binary file.

class WriteStatisticsToRecordsAndBinaryFile: API for writing statistics to both sharded records and binary pb.

class WriteStatisticsToTFRecord: API for writing serialized data statistics to TFRecord file.

Functions

compare_slices(...): Compare statistics of two slices using Facets.

default_sharded_output_suffix(...): Returns the default sharded output suffix.

default_sharded_output_supported(...): True if sharded output is supported by default.

display_anomalies(...): Displays the input anomalies (for use in a Jupyter notebook).

display_schema(...): Displays the input schema (for use in a Jupyter notebook).

experimental_get_feature_value_slicer(...): Returns a function that generates sliced record batches for a given one.

generate_dummy_schema_with_paths(...): Generate a schema with the requested paths and no other information.

generate_statistics_from_csv(...): Compute data statistics from CSV files.

generate_statistics_from_dataframe(...): Compute data statistics for the input pandas DataFrame.

generate_statistics_from_tfrecord(...): Compute data statistics from TFRecord files containing TFExamples.

get_confusion_count_dataframes(...): Returns a pandas dataframe representation of a sequence of ConfusionCount.

get_domain(...): Get the domain associated with the input feature from the schema.

get_feature(...): Get a feature from the schema.

get_feature_stats(...): Get feature statistics from the dataset statistics.

get_match_stats_dataframe(...): Formats MatchStats as a pandas dataframe.

get_skew_result_dataframe(...): Formats FeatureSkew results as a pandas dataframe.

get_slice_stats(...): Get statistics associated with a specific slice.

get_statistics_html(...): Build the HTML for visualizing the input statistics using Facets.

infer_schema(...): Infers schema from the input statistics.

load_anomalies_text(...): Loads the Anomalies proto stored in text format in the input path.

load_schema_text(...): Loads the schema stored in text format in the input path.

load_sharded_statistics(...): Read a sharded DatasetFeatureStatisticsList from disk as a DatasetListView.

load_statistics(...): Loads data statistics proto from file.

load_stats_binary(...): Loads a serialized DatasetFeatureStatisticsList proto from a file.

load_stats_text(...): Loads the specified DatasetFeatureStatisticsList proto stored in text format.

set_domain(...): Sets the domain for the input feature in the schema.

update_schema(...): Updates input schema to conform to the input statistics.

validate_corresponding_slices(...): Validates corresponding sliced statistics.

validate_examples_in_csv(...): Validates examples in csv files.

validate_examples_in_tfrecord(...): Validates TFExamples in TFRecord files.

validate_statistics(...): Validates the input statistics against the provided input schema.

visualize_statistics(...): Visualize the input statistics using Facets.

write_anomalies_text(...): Writes the Anomalies proto to a file in text format.

write_schema_text(...): Writes input schema to a file in text format.

write_stats_text(...): Writes a DatasetFeatureStatisticsList proto to a file in text format.

Other Members

version '1.16.1'

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024年10月18日 UTC.