Text_LanguageDetect
[ class tree: Text_LanguageDetect ] [ index: Text_LanguageDetect ] [ all elements ]
Packages:
Text_LanguageDetect


Classes:
Text_LanguageDetect
Text_LanguageDetect_Exception
Text_LanguageDetect_ISO639
Text_LanguageDetect_Parser
Files:
confidence.php
example_clui.php
example_web.php
Exception.php
iso.php
ISO639.php
LanguageDetect.php
languages.php
Parser.php
simple.php

Class: Text_LanguageDetect_Parser

Source Location: /Text_LanguageDetect-1.0.0/Text/LanguageDetect/Parser.php


Inherited Variables

Class: Text_LanguageDetect

Text_LanguageDetect::$_clusters
Text_LanguageDetect::$_data_dir
Text_LanguageDetect::$_db_filename
Text_LanguageDetect::$_lang_db
Text_LanguageDetect::$_max_score
Text_LanguageDetect::$_name_mode
Text_LanguageDetect::$_perl_compatible
Text_LanguageDetect::$_threshold
Text_LanguageDetect::$_unicode_db_filename
Text_LanguageDetect::$_unicode_map
Text_LanguageDetect::$_use_unicode_narrowing

Inherited Methods

Class: Text_LanguageDetect

Text_LanguageDetect::__construct()
Constructor
Text_LanguageDetect::clusteredSearch()
Perform an intelligent detection based on clusterLanguages()
Text_LanguageDetect::clusterLanguages()
Cluster known languages according to languageSimilarity()
Text_LanguageDetect::detect()
Detects the closeness of a sample of text to the known languages
Text_LanguageDetect::detectConfidence()
Returns an array containing the most similar language and a confidence rating
Text_LanguageDetect::detectSimple()
Returns only the most similar language to the text sample
Text_LanguageDetect::detectUnicodeBlocks()
Returns the distribution of unicode blocks in a given utf8 string
Text_LanguageDetect::getLanguageCount()
Returns the number of languages that this object can detect
Text_LanguageDetect::getLanguages()
Returns the list of detectable languages
Text_LanguageDetect::languageExists()
Checks if the language with the given name exists in the database
Text_LanguageDetect::languageSimilarity()
Calculate the similarities between the language models
Text_LanguageDetect::omitLanguages()
Omits languages
Text_LanguageDetect::setNameMode()
Sets the way how language names are accepted and returned.
Text_LanguageDetect::setPerlCompatible()
Make this object behave like Language::Guess
Text_LanguageDetect::unicodeBlockName()
Returns the block name for a given unicode value
Text_LanguageDetect::useUnicodeBlocks()
Whether to use unicode block ranges in detection
Text_LanguageDetect::utf8strlen()
UTF8-safe strlen()
Text_LanguageDetect::_arr_rank()
Converts a set of trigrams from frequencies to ranks
Text_LanguageDetect::_bub_sort()
Sorts an array by value breaking ties alphabetically
Text_LanguageDetect::_checkTrigram()
Checks if this object is ready to detect languages
Text_LanguageDetect::_convertFromNameMode()
Converts an $language input parameter from the configured mode to the language name that is used internally.
Text_LanguageDetect::_convertToNameMode()
Converts an $language output parameter from the language name that is used internally to the configured mode.
Text_LanguageDetect::_distance()
Calculates a linear rank-order distance statistic between two sets of ranked trigrams
Text_LanguageDetect::_get_data_loc()
Returns the path to the location of the database
Text_LanguageDetect::_next_char()
UTF8-safe fast character iterator
Text_LanguageDetect::_normalize_score()
Normalizes the score returned by _distance()
Text_LanguageDetect::_readdb()
Loads the language trigram database from filename
Text_LanguageDetect::_read_unicode_block_db()
Brings up the unicode block database
Text_LanguageDetect::_sort_func()
Sort function used by bubble sort
Text_LanguageDetect::_trigram()
Converts a piece of text into trigrams
Text_LanguageDetect::_unicode_block_name()
Searches the unicode block database
Text_LanguageDetect::_utf8char2unicode()
Returns the unicode value of a utf8 char

Class Details

[line 33]
This class represents a text sample to be parsed.

This separates the analysis of a text sample from the primary LanguageDetect class. After a new profile has been built, the data can be retrieved using the accessor functions.

This class is intended to be used by the Text_LanguageDetect class, not end-users.



[ Top ]


Class Variables

$_compile_trigram = false

[line 75]

Whether the parser should compile trigrams
  • Access: protected

Type: bool


[ Top ]

$_compile_unicode = false

[line 68]

Whether the parser should compile the unicode ranges
  • Access: protected

Type: bool


[ Top ]

$_string =

[line 40]

The piece of text being parsed
  • Access: protected

Type: string


[ Top ]

$_trigrams = array()

[line 47]

Stores the trigram frequencies of the sample
  • Access: protected

Type: string


[ Top ]

$_trigram_pad_start = false

[line 82]

Whether the trigram parser should pad the beginning of the string
  • Access: protected

Type: bool


[ Top ]

$_trigram_ranks = array()

[line 54]

Stores the trigram ranks of the sample
  • Access: protected

Type: array


[ Top ]

$_unicode_blocks = array()

[line 61]

Stores the unicode blocks of the sample
  • Access: protected

Type: array


[ Top ]

$_unicode_skip_symbols = true

[line 89]

Whether the unicode parser should skip non-alphabetical ascii chars
  • Access: protected

Type: bool


[ Top ]



Method Detail

Text_LanguageDetect_Parser (Constructor) [line 108]

void Text_LanguageDetect_Parser( string $string)

PHP 4 constructor for backwards compatibility.
  • Access: public

Parameters:

string $string — string to be parsed

[ Top ]

__construct (Constructor) [line 96]

Text_LanguageDetect_Parser __construct( string $string)

Constructor
  • Access: public

Overrides Text_LanguageDetect::__construct() (Constructor)

Parameters:

string $string — string to be parsed

[ Top ]

analyze [line 220]

void analyze( )

Executes the parsing operation

Be sure to call the set*() functions to set options and the prepare*() functions first to tell it what kind of data to compute

Afterwards the get*() functions can be used to access the compiled information.

  • Access: public

[ Top ]

getTrigramFreqs [line 194]

array getTrigramFreqs( )

Return the trigram freqency table

Only used in testing to make sure the parser is working

  • Return: Trigram freqencies in the text sample
  • Access: public

[ Top ]

getTrigramRanks [line 182]

array getTrigramRanks( )

Returns the trigram ranks for the text sample
  • Return: Trigram ranks in the text sample
  • Access: public

[ Top ]

getUnicodeBlocks [line 204]

array getUnicodeBlocks( )

Returns the array of unicode blocks
  • Return: Unicode blocks in the text sample
  • Access: public

[ Top ]

prepareTrigram [line 136]

void prepareTrigram( [bool $bool = true])

Turn on/off trigram counting
  • Access: public

Parameters:

bool $bool — true for on, false for off

[ Top ]

prepareUnicode [line 148]

void prepareUnicode( [bool $bool = true])

Turn on/off unicode block counting
  • Access: public

Parameters:

bool $bool — true for on, false for off

[ Top ]

setPadStart [line 160]

void setPadStart( [bool $bool = true])

Turn on/off padding the beginning of the sample string
  • Access: public

Parameters:

bool $bool — true for on, false for off

[ Top ]

setUnicodeSkipSymbols [line 172]

void setUnicodeSkipSymbols( [bool $bool = true])

Should the unicode block counter skip non-alphabetical ascii chars?
  • Access: public

Parameters:

bool $bool — true for on, false for off

[ Top ]

validateString [line 120]

bool validateString( string $str)

Returns true if a string is suitable for parsing
  • Return: true if acceptable, false if not
  • Access: public

Parameters:

string $str — input string to test

[ Top ]


Documentation generated on 2019年3月11日 14:34:13 -0400 by phpDocumentor 1.4.4. PEAR Logo Copyright © PHP Group 2004.

AltStyle によって変換されたページ (->オリジナル) /