The ruby syntax highlighting is not working properly when using regexes.
- Here is the ruby syntax highlighting issue:
It looks like multiple issues are happening here.
- It seems that it interprets string interpolations inside regexes as a comment (
#) and messes up the whole syntax highlighting from that point on on that line. - It seems that the combination of
"and'in the line of thestring_literalis messing up from that point on until the end of the file. Which is much more serious.
- Here is the example as a code:
class Tokenizer
def initialize(expression)
@expression = expression
end
TOKEN_REGEX = /
(?<whitespace>\s+) |
(?<parenthesis>[\(\)]) |
(?<comparison_operator>#{ComparisonNode::OPERATORS.map { |op| Regexp.escape(op) }.join('|')}) |
(?<logical_operator>\b(?:#{LogicalNode::OPERATORS.join('|')})\b) |
(?<boolean_literal>\b(?:#{ValueNode::BOOLEAN_LITERALS.join('|')})\b) |
(?<number_literal>\d+) |
(?<string_literal>"[^"]*"|'[^']*') |
(?<identifier>[a-z_][a-z0-9_\.]*) |
(?<unknown>.)
/ix.freeze
def tokenize
tokens = []
@expression.scan(TOKEN_REGEX) do
match_data = Regexp.last_match
if match_data[:whitespace]
next
elsif match_data[:parenthesis]
tokens << Token.new(:parenthesis, match_data[0])
elsif match_data[:comparison_operator]
tokens << Token.new(ComparisonNode::TYPE, match_data[0])
elsif match_data[:logical_operator]
tokens << Token.new(LogicalNode::TYPE, match_data[0].upcase)
elsif match_data[:boolean_literal]
tokens << Token.new(:literal, match_data[0].downcase)
elsif match_data[:number_literal]
tokens << Token.new(:literal, match_data[0])
elsif match_data[:string_literal]
value = match_data[0][1...-1] # Remove surrounding quotes
tokens << Token.new(:literal, value)
elsif match_data[:identifier]
tokens << Token.new(FieldNode::TYPE, match_data[0])
else
raise "Unexpected character: #{match_data[0]}"
end
end
tokens
end
end
Initially, this is happening with the builtin ruby syntax highlight from the Sublime Text 3 (Version 3.2.2, Build 3211). I tried to install ruby syntax highlighting specific packages that tries to fix this issue, such as Sublime Better Ruby, but without success.
Is there someone with the same issue? If so, how did you fix it? Thanks!
-
1FYI: I tried your code in Sublime Text build 4113, and there this problem does not happen. So it appears to be fixed in later versions.Casper– Casper2024εΉ΄11ζ06ζ₯ 20:44:11 +00:00Commented Nov 6, 2024 at 20:44
1 Answer 1
Sublime Text Ruby Syntax takes an opinionated view that multi-line Regexps generally use the %r literal syntax.
So using / / only works correctly if the leading and trailing forward slash are on the same line.
As shown in Ruby.sublime-syntax. I linked v3211 because that is your stated version but the same applies to all versions before and up through v4108. It appears this was patched in v4109
try-regex:
# Generally for multiline regexes, one of the %r forms below will be used,
# so we bail out if we can't find a second / on the current line
- match: '\s*(/)(?![*+{}?])(?=.*/)'
captures:
1: string.regexp.classic.ruby punctuation.definition.string.ruby
push:
- meta_content_scope: string.regexp.classic.ruby
- match: "(/)([eimnosux]*)"
scope: string.regexp.classic.ruby
captures:
1: punctuation.definition.string.ruby
2: keyword.other.ruby
pop: true
- include: regex-sub
- match: ''
pop: true
Knowing this you can alter your code to:
TOKEN_REGEX = %r{
(?<whitespace>\s+) |
(?<parenthesis>[\(\)]) |
(?<comparison_operator>#{ComparisonNode::OPERATORS.map { |op| Regexp.escape(op) }.join('|')}) |
(?<logical_operator>\b(?:#{LogicalNode::OPERATORS.join('|')})\b) |
(?<boolean_literal>\b(?:#{ValueNode::BOOLEAN_LITERALS.join('|')})\b) |
(?<number_literal>\d+) |
(?<string_literal>"[^"]*"|'[^']*') |
(?<identifier>[a-z_][a-z0-9_\.]*) |
(?<unknown>.)
}ix.freeze
and the syntax highlighting works as expected.
As an aside Regexp::union provides a means for unioning an Array of values so you don't need to manually join or escape. This means you could just use:
(?<comparison_operator>#{Regexp.union(ComparisonNode::OPERATORS)}) |
(?<logical_operator>\b(?:#{Regexp.union(LogicalNode::OPERATORS)})\b) |
(?<boolean_literal>\b(?:#{Regexp.union(ValueNode::BOOLEAN_LITERALS)})\b) |
1 Comment
Explore related questions
See similar questions with these tags.