(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)
mb_strcut — Get part of string
mb_strcut() extracts a substring from a string similarly to mb_substr() , but operates on bytes instead of characters. If the cut position happens to be between two bytes of a multi-byte character, the cut is performed starting from the first byte of that character. This is also the difference to the substr() function, which would simply cut the string between the bytes and thus result in a malformed byte sequence.
string
The string being cut.
start
If start
is non-negative, the returned string
will start at the start
'th byte position in
string
, counting from zero. For instance,
in the string 'abcdef
', the byte at
position 0
is 'a
', the
byte at position 2
is
'c
', and so forth.
If start
is negative, the returned string
will start at the start
'th byte
counting back from the end of string
. However, if the
magnitude of a negative start
is greater than the
length of the string, the returned portion will start from the beginning of
string
.
length
Length in bytes. If omitted or NULL
is passed, extract all bytes to the end of the string.
If length
is negative, the returned string will
end at the length
'th byte counting back from the
end of string
. However, if the magnitude of a negative
length
is greater than the number of characters
after the start
position, an empty string will
be returned.
encoding
The encoding
parameter is the character encoding. If it is omitted or null
, the internal character
encoding value will be used.
mb_strcut() returns the portion of
string
specified by the
start
and
length
parameters.
Version | Description |
---|---|
8.0.0 |
encoding is nullable now.
|
Here is an example with UTF8 characters, to see how the start and length arguments are working:
$str_utf8 = utf8_encode("Déjà_vu");
$str_utf8_0 = mb_strcut($str_utf8, 0, 4, "UTF-8"); // Déj
$str_utf8_1 = mb_strcut($str_utf8, 1, 4, "UTF-8"); // éj
$str_utf8_2 = mb_strcut($str_utf8, 2, 4, "UTF-8"); // éj
$str_utf8_3 = mb_strcut($str_utf8, 3, 4, "UTF-8"); // jà_
$str_utf8_4 = mb_strcut($str_utf8, 4, 4, "UTF-8"); // à_v
The string includes two special charaters, "é" and "à" internally coded with two bytes.
Note that a multibyte character is removed rather than kept in half at the end of the output.
Note also that the result is the same for a cut 1,4 and a cut 2,4 with this string.
What the manual and the first commenter are trying to say is that mb_strcut uses byte offsets, as opposed to mb_substr which uses character offsets.
Both mb_strcut and mb_substr appear to treat negative and out-of-range offsets and lengths in the basically the same way as substr. An exception is that if start is too large, an empty string will be returned rather than FALSE. Testing indicates that mb_strcut first works out start and end byte offsets, then moves each offset left to the nearest character boundary.
This was driving me crazy, because mb_strcut() kept returning an empty string. The $length parameter seems to have a max value of 2^32-1 (2147483647).
Works:
<?php
# output: Полуустав
echo mb_strcut('Полуустав', 0, pow(2,31)-1);
?>
Doesn't work:
<?php
# nothing is output
echo mb_strcut('Полуустав', 0, pow(2,31));
?>
My PHP_INT_MAX value is much larger than 2^32-1, so I'm not sure why larger values for $length don't work. :(
<?php
# output: 9223372036854775807
echo PHP_INT_MAX;
?>
diffrence between mb_substr and mb_substr
example:
mb_strcut('I_ROHA', 1, 2) returns 'I_'. Treated as byte stream.
mb_substr('I_ROHA', 1, 2) returns 'ROHA' Treated as character stream.
# 'I_' 'RO' 'HA' means multi-byte character