PHP 8.5.0 Beta 2 available for testing

The IntlCodePointBreakIterator class

(PHP 5 >= 5.5.0, PHP 7, PHP 8)

Introduction

This break iterator identifies the boundaries between UTF-8 code points.

Class synopsis

class IntlCodePointBreakIterator extends IntlBreakIterator {
/* Inherited constants */
/* Methods */
public getLastCodePoint (): int
/* Inherited methods */
public IntlBreakIterator::getPartsIterator (string $type = IntlPartsIterator::KEY_SEQUENTIAL): IntlPartsIterator
public IntlBreakIterator::next (? int $offset = null ): int
}

Table of Contents

Found A Problem?

Learn How To Improve This PageSubmit a Pull RequestReport a Bug
+add a note

User Contributed Notes 1 note

up
1
Matt Kynx
2 years ago
An example of using this to find all the code points in a string that cannot be transliterated to Latin-ASCII:

<?php

$string
= "Народm, Intl gurus get paid 10000ドル/hr 😁";

$latinAscii = Transliterator::create('NFC; Any-Latin; Latin-ASCII;');
$transliterated = $latinAscii->transliterate($string);

$codePoints = IntlBreakIterator::createCodePointInstance();
$codePoints->setText($transliterated);

foreach (
$codePoints->getPartsIterator() as $char) {
$ord = IntlChar::ord($char);
if (
255 < $ord) {
echo
IntlChar::charName($ord) . "\n";
}
}
?>

Outputs:
EURO SIGN
GRINNING FACE WITH SMILING EYES
+add a note

AltStyle によって変換されたページ (->オリジナル) /