(non-breaking space) is not considered whitespace by Postgres?

Question 1

String-functions in Postgres do not recognize non-breaking-space as whitespace, neither when trimming nor when using regular expressions:

select 'x' || test || 'x'
 , 'x' || trim(test) || 'x'
 , 'x' || regexp_replace(test, '\s+', '') || 'x'
from (values
(''),
(' '),
('  y  '),
('s s s s')
) as foo(test)

(Not sure if the non-breaking (&nbsp) survives in the above code, but the two last rows contain nbsp.)

Onecompiler SQL demonstration

Is it a Postgres-thing not to handle this, or is it a bug? I know of char(160) for nbsp, but would prefer en general-purpose way to strip all whitespace.

Collation in use is en_US.utf8

Question 2

Defining "whitespace" precisely is a tricky subject with UTF. See:

Trim trailing spaces with PostgreSQL

trim() and friends by default only trim the basic Latin space character (Unicode: U+0020 / chr(32).

In your regexp_replace() call, \s (shorthand for [[:space:]]) would already catch more whitespace characters, as defined in that character class:

in ASCII: tab, line feed, form feed, carriage return, and space;
in Unicode: also no-break spaces, next line, and the variable-width spaces (among others).

But the flag 'g' is missing as 4th parameter in your call. Without that, only the first match is replaced. So that was never going to work as expected.

On top of that, the non-breaking space (UTF name/code point "no break space"/U+00A0, char(160), E'\u00A0') is not defined as "whitespace" in that character class to begin with. You would have to add it manually. Basically:

SELECT regexp_replace(test, '[\s\u00A0]+', '', 'g');

Or you add more. Demo:

fiddle

Question 3

Yes, non-breaking space (char(160)) is not a whitespace.

But you may specify your custom chars to be removed from the text for TRIM() function. – Akina

score 2 · Accepted Answer · 2025-03-04 02:09:24Z

Defining "whitespace" precisely is a tricky subject with UTF. See:

Trim trailing spaces with PostgreSQL

trim() and friends by default only trim the basic Latin space character (Unicode: U+0020 / chr(32).

In your regexp_replace() call, \s (shorthand for [[:space:]]) would already catch more whitespace characters, as defined in that character class:

in ASCII: tab, line feed, form feed, carriage return, and space;
in Unicode: also no-break spaces, next line, and the variable-width spaces (among others).

But the flag 'g' is missing as 4th parameter in your call. Without that, only the first match is replaced. So that was never going to work as expected.

On top of that, the non-breaking space (UTF name/code point "no break space"/U+00A0, char(160), E'\u00A0') is not defined as "whitespace" in that character class to begin with. You would have to add it manually. Basically:

SELECT regexp_replace(test, '[\s\u00A0]+', '', 'g');

Or you add more. Demo:

fiddle

Stack Exchange Network

(non-breaking space) is not considered whitespace by Postgres?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

&nbsp; (non-breaking space) is not considered whitespace by Postgres?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

(non-breaking space) is not considered whitespace by Postgres?