Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Code Golf

Return to Answer

Post Made Community Wiki by Dennis
added 586 characters in body
Source Link

Regex

Length 2 snippet

[]

JavaScript: An empty character class that doesn't match anything.

PCRE, Java, Python re, Ruby (tested on version 2.0): Syntax error.

Length 1 snippet

.

., called dot-all, is available in all flavors I had a chance to look at.

What does it match?

I̧n͟ g̨͜e҉̡͞n̵͢e͜͝r̷͝a͘l̢҉,̡͟ ̴̕̕.̸̴̢̛́ ̸̡̢m͞ąt̴̨c͞h̛e͢͡s̶͘ ͘a҉n̛͜͠ỳ̸ ͢c̵̡hár͘͝a̕͢ćt͘͠e͏̀͠r̷̀ ̴̕͢ex͝͞͞c҉ep̀t̛ ̕f̴҉o͟͜r̴͢ ͞n͏ę͟w̢̕͜ ͡l͝i̸̧n͢͡e̶.͟

Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.

Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.

.NET: Same as Python. The dot-all mode in .NET is called Singleline.

JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.

PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.

PHP (tested on ideone): PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.

Postgres: In default mode, dot-all matches any code point without exception.

Ruby (tested on version 2.0.0): In default mode, . matches any code point except for \n. Dot-all mode is accessible via m flag (!).

s flag is used to indicate Windows-31J encoding in Ruby.


Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́

Regex

Length 1 snippet

.

., called dot-all, is available in all flavors I had a chance to look at.

What does it match?

Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.

Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.

.NET: Same as Python. The dot-all mode in .NET is called Singleline.

JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.

PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.

PHP: PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.

Postgres: In default mode, dot-all matches any code point without exception.


Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́

Regex

Length 2 snippet

[]

JavaScript: An empty character class that doesn't match anything.

PCRE, Java, Python re, Ruby (tested on version 2.0): Syntax error.

Length 1 snippet

.

., called dot-all, is available in all flavors I had a chance to look at.

What does it match?

I̧n͟ g̨͜e҉̡͞n̵͢e͜͝r̷͝a͘l̢҉,̡͟ ̴̕̕.̸̴̢̛́ ̸̡̢m͞ąt̴̨c͞h̛e͢͡s̶͘ ͘a҉n̛͜͠ỳ̸ ͢c̵̡hár͘͝a̕͢ćt͘͠e͏̀͠r̷̀ ̴̕͢ex͝͞͞c҉ep̀t̛ ̕f̴҉o͟͜r̴͢ ͞n͏ę͟w̢̕͜ ͡l͝i̸̧n͢͡e̶.͟

Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.

Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.

.NET: Same as Python. The dot-all mode in .NET is called Singleline.

JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.

PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.

PHP (tested on ideone): PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.

Postgres: In default mode, dot-all matches any code point without exception.

Ruby (tested on version 2.0.0): In default mode, . matches any code point except for \n. Dot-all mode is accessible via m flag (!).

s flag is used to indicate Windows-31J encoding in Ruby.


Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́

added 1899 characters in body
Source Link

Regex

Length 1 snippet

.

., called dot-all, is available in all flavors I had a chance to look at.

FactoidWhat does it match?

Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.

Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.

.NET: Same as Python. The dot-all mode in .NET is called Singleline.

JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.

PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.

PHP: PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.

Postgres: In default mode, dot-all matches any code point without exception.


Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́


Regex

Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́


Regex

Length 1 snippet

.

., called dot-all, is available in all flavors I had a chance to look at.

What does it match?

Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.

Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.

.NET: Same as Python. The dot-all mode in .NET is called Singleline.

JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.

PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.

PHP: PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.

Postgres: In default mode, dot-all matches any code point without exception.


Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́

Regex

Factoid

Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́


AltStyle によって変換されたページ (->オリジナル) /