- 5.9k
- 27
- 41
Regex
Length 2 snippet
[]
JavaScript: An empty character class that doesn't match anything.
PCRE, Java, Python re, Ruby (tested on version 2.0): Syntax error.
Length 1 snippet
.
., called dot-all, is available in all flavors I had a chance to look at.
What does it match?
I̧n͟ g̨͜e҉̡͞n̵͢e͜͝r̷͝a͘l̢҉,̡͟ ̴̕̕.̸̴̢̛́ ̸̡̢m͞ąt̴̨c͞h̛e͢͡s̶͘ ͘a҉n̛͜͠ỳ̸ ͢c̵̡hár͘͝a̕͢ćt͘͠e͏̀͠r̷̀ ̴̕͢ex͝͞͞c҉ep̀t̛ ̕f̴҉o͟͜r̴͢ ͞n͏ę͟w̢̕͜ ͡l͝i̸̧n͢͡e̶.͟
Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.
Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.
.NET: Same as Python. The dot-all mode in .NET is called Singleline.
JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.
PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.
PHP (tested on ideone): PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.
Postgres: In default mode, dot-all matches any code point without exception.
Ruby (tested on version 2.0.0): In default mode, . matches any code point except for \n. Dot-all mode is accessible via m flag (!).
s flag is used to indicate Windows-31J encoding in Ruby.
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́
Regex
Length 1 snippet
.
., called dot-all, is available in all flavors I had a chance to look at.
What does it match?
Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.
Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.
.NET: Same as Python. The dot-all mode in .NET is called Singleline.
JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.
PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.
PHP: PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.
Postgres: In default mode, dot-all matches any code point without exception.
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́
Regex
Length 2 snippet
[]
JavaScript: An empty character class that doesn't match anything.
PCRE, Java, Python re, Ruby (tested on version 2.0): Syntax error.
Length 1 snippet
.
., called dot-all, is available in all flavors I had a chance to look at.
What does it match?
I̧n͟ g̨͜e҉̡͞n̵͢e͜͝r̷͝a͘l̢҉,̡͟ ̴̕̕.̸̴̢̛́ ̸̡̢m͞ąt̴̨c͞h̛e͢͡s̶͘ ͘a҉n̛͜͠ỳ̸ ͢c̵̡hár͘͝a̕͢ćt͘͠e͏̀͠r̷̀ ̴̕͢ex͝͞͞c҉ep̀t̛ ̕f̴҉o͟͜r̴͢ ͞n͏ę͟w̢̕͜ ͡l͝i̸̧n͢͡e̶.͟
Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.
Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.
.NET: Same as Python. The dot-all mode in .NET is called Singleline.
JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.
PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.
PHP (tested on ideone): PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.
Postgres: In default mode, dot-all matches any code point without exception.
Ruby (tested on version 2.0.0): In default mode, . matches any code point except for \n. Dot-all mode is accessible via m flag (!).
s flag is used to indicate Windows-31J encoding in Ruby.
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́
- 5.9k
- 27
- 41
Regex
Length 1 snippet
.
., called dot-all, is available in all flavors I had a chance to look at.
FactoidWhat does it match?
Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.
Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.
.NET: Same as Python. The dot-all mode in .NET is called Singleline.
JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.
PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.
PHP: PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.
Postgres: In default mode, dot-all matches any code point without exception.
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́
Regex
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́
Regex
Length 1 snippet
.
., called dot-all, is available in all flavors I had a chance to look at.
What does it match?
Java Pattern: In default mode, dot-all matches any code point, except for these 5 code points \r\n\u0085\u2028\u2029. With UNIX_LINES mode on (but without DOTALL), dot-all matches any code point, except for \n. With DOTALL mode on, dot-all matches any code point. From Java 5, Pattern operates on code point, so astral characters are matched by dot-all.
Python re (tested on 2.7.8 and 3.2.5, may be different on 3.3+): In default mode, dot-all matches any UTF-16 code unit (0000 to FFFF inclusive), except for \n. re.DOTALL lifts the exception and makes . matches any UTF-16 code unit. In these versions, re operates on UTF-16 code units, so . only manages to match one code unit of characters in astral plane.
.NET: Same as Python. The dot-all mode in .NET is called Singleline.
JavaScript (C++11 <regex>): In default mode, dot-all matches any UTF-16 code unit, except for these 4 code points \n\r\u2028\u2029. With s flag on, dot-all matches any UTF-16 code unit. JavaScript also operates on UTF-16 code units.
PCRE: Depending on build option, dot-all can exclude \r, \n or \r\n, or all 3 CR LF sequences, or any Unicode newline sequence in default mode. In default mode, the engine operates on code unit (can be 8, 16, or 32-bit code unit), so dot-all matches any code unit, except for the newline sequences. In UTF mode, the engine operates on code point, so dot-all matches any code point except for newline sequences. The dot-all mode is called PCRE_DOTALL.
PHP: PCRE, compiled as UTF-8 library and \n is the only newline sequence by default. Dot-all mode is accessible via s flag.
Postgres: In default mode, dot-all matches any code point without exception.
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́
Regex
Factoid
Ŗ͞e̡͟҉ǵ͟͢e̴̢͘͡x̡́͞ ̛̀҉҉̢c҉̷̨a̸̛͞n҉̛͠ ̷̸̀p̴͠͡҉̵ą̧͜͢r̸̸̷̢͝s̢̀͡e̷̷̷͘͞ ̨̧̀H̨̧͜͜T̷͞M̷̛͜L͢.̴̡́ Repeat after me. R̶̶̢̧̰̞̲̻̮̳̦̥ͭͯ́̓̎͂̈ͯͤ̇͊͊͟ĕ̱̹̩̪͈͈͍̗͎̝͚̽̈́ͨ̐̽ͪͮ̍͂͐ͮͧ̔̏̓ͣ̀ĝ̵̢̢̖̤̜̭͔͊͒ͦ͛ͤ͗ͬͧͪ̾͘͟eͦ̄ͭ̑̾҉̨̨̝̬̹̘̭͔͟͢x̣̻͓̠͈͕̥̜͚̱̝̫͚̳̾̍ͦ̑̈́̋̌̉̎͊ͮ͗̄̆̒́̚̚ͅ ̸̦͈̥̬̺͇͂ͧͧ̃͐̎ͮ̌ͤ̈́̒̆ͣ̈́̏̔͊̐̀ç̨̬̪̳̦͎̖͕̦͔ͨ̃̿̓̈́ͅȁ̸̳̺̠̭ͮ̎̓̐͘̕͜͡ņ̨̫͔͍̬̤̘͎͚̣̟̦͍̜ͭͭ̈́ͦ̈́̽͗ͥ̑͝͡ ̸̛̖̝̻̻͎̍̄ͭ̓ͨ̋͋̈́͗̌̇ͤ͋ͬ͘pͪ̒̍ͫͤͭ͊ͮ̇̿̆̐̄̎͌̚͏̧͏͇̼͚̰͓̲͕̰̖̘̟̞̺̲ḁ̛͇̫̻̉̊ͣͭͤ̇ͨ́͘͠rͦ̂̈́̆͑͊ͣ̊ͮ̉̉͆ͧ̒͛̐̋̚͏̴̭̫̞̯̘̖͍̼̖̜̞̖̩͕̹̻̮̗͜͡͞ͅs̟͈̺͖̦̟̙̦͖̤ͬ̋͌̄͂ͩ̓̐̔̓͌̾̀̈͊̊ͤ̀̚eͫ̐͒̽ͯͫͨ͏̨̡̦̤̙͍̙̪̝̮̤͎̭̖̪̻͙͍͖͉̀́ ͉̭̫̰͔̝͓̼̮͚̻͎͎͉̐͗͗͊̇ͣ͒͗͑̆͐̎̐̀ͬ͛ͮ͝H̢̥͕̼͓̫͙̺̼̮ͣͦ̍ͨ͒̔̌T̪̲̦̻̦͖̞̤͒̑ͭ̐̑̃ͭͣ͐̎̒̉͊̀͜͜M̞̪͇͕̩͉͗ͧ̌ͯ͋͂̉̍ͭ̓̇̐̌͜͠Ĺ̷̨̳̘̯͚͓͛͌ͭ̉̍.ͯ͆̊̌ͯ̇̓̏͐ͪ̋̈́͑̕҉̷̠̰̼̤̲̀́