String-functions in Postgres do not recognize non-breaking-space as whitespace, neither when trimming nor when using regular expressions:
select 'x' || test || 'x'
, 'x' || trim(test) || 'x'
, 'x' || regexp_replace(test, '\s+', '') || 'x'
from (values
(''),
(' '),
(' y '),
('s s s s')
) as foo(test)
(Not sure if the non-breaking ( ) survives in the above code, but the two last rows contain nbsp.)
Is it a Postgres-thing not to handle this, or is it a bug? I know of char(160)
for nbsp, but would prefer en general-purpose way to strip all whitespace.
Collation in use is en_US.utf8
2 Answers 2
Defining "whitespace" precisely is a tricky subject with UTF. See:
trim()
and friends by default only trim the basic Latin space character (Unicode: U+0020 / chr(32)
.
In your regexp_replace()
call, \s
(shorthand for [[:space:]]
) would already catch more whitespace characters, as defined in that character class:
- in ASCII: tab, line feed, form feed, carriage return, and space;
- in Unicode: also no-break spaces, next line, and the variable-width spaces (among others).
But the flag 'g' is missing as 4th parameter in your call. Without that, only the first match is replaced. So that was never going to work as expected.
On top of that, the non-breaking space (UTF name/code point "no break space"/U+00A0, char(160)
, E'\u00A0'
) is not defined as "whitespace" in that character class to begin with. You would have to add it manually. Basically:
SELECT regexp_replace(test, '[\s\u00A0]+', '', 'g');
Or you add more. Demo:
Yes, non-breaking space (char(160)
) is not a whitespace.
But you may specify your custom chars to be removed from the text for TRIM()
function. – Akina
Explore related questions
See similar questions with these tags.