1

The ruby syntax highlighting is not working properly when using regexes.

  • Here is the ruby syntax highlighting issue:

enter image description here

It looks like multiple issues are happening here.

  1. It seems that it interprets string interpolations inside regexes as a comment (#) and messes up the whole syntax highlighting from that point on on that line.
  2. It seems that the combination of " and ' in the line of the string_literal is messing up from that point on until the end of the file. Which is much more serious.
  • Here is the example as a code:
class Tokenizer
 def initialize(expression)
 @expression = expression
 end
 TOKEN_REGEX = /
 (?<whitespace>\s+) |
 (?<parenthesis>[\(\)]) |
 (?<comparison_operator>#{ComparisonNode::OPERATORS.map { |op| Regexp.escape(op) }.join('|')}) |
 (?<logical_operator>\b(?:#{LogicalNode::OPERATORS.join('|')})\b) |
 (?<boolean_literal>\b(?:#{ValueNode::BOOLEAN_LITERALS.join('|')})\b) |
 (?<number_literal>\d+) |
 (?<string_literal>"[^"]*"|'[^']*') |
 (?<identifier>[a-z_][a-z0-9_\.]*) |
 (?<unknown>.)
 /ix.freeze
 def tokenize
 tokens = []
 @expression.scan(TOKEN_REGEX) do
 match_data = Regexp.last_match
 if match_data[:whitespace]
 next
 elsif match_data[:parenthesis]
 tokens << Token.new(:parenthesis, match_data[0])
 elsif match_data[:comparison_operator]
 tokens << Token.new(ComparisonNode::TYPE, match_data[0])
 elsif match_data[:logical_operator]
 tokens << Token.new(LogicalNode::TYPE, match_data[0].upcase)
 elsif match_data[:boolean_literal]
 tokens << Token.new(:literal, match_data[0].downcase)
 elsif match_data[:number_literal]
 tokens << Token.new(:literal, match_data[0])
 elsif match_data[:string_literal]
 value = match_data[0][1...-1] # Remove surrounding quotes
 tokens << Token.new(:literal, value)
 elsif match_data[:identifier]
 tokens << Token.new(FieldNode::TYPE, match_data[0])
 else
 raise "Unexpected character: #{match_data[0]}"
 end
 end
 tokens
 end
end

Initially, this is happening with the builtin ruby syntax highlight from the Sublime Text 3 (Version 3.2.2, Build 3211). I tried to install ruby syntax highlighting specific packages that tries to fix this issue, such as Sublime Better Ruby, but without success.

Is there someone with the same issue? If so, how did you fix it? Thanks!

asked Nov 5, 2024 at 18:04
1
  • 1
    FYI: I tried your code in Sublime Text build 4113, and there this problem does not happen. So it appears to be fixed in later versions. Commented Nov 6, 2024 at 20:44

1 Answer 1

3

Sublime Text Ruby Syntax takes an opinionated view that multi-line Regexps generally use the %r literal syntax.

So using / / only works correctly if the leading and trailing forward slash are on the same line.

As shown in Ruby.sublime-syntax. I linked v3211 because that is your stated version but the same applies to all versions before and up through v4108. It appears this was patched in v4109

 try-regex:
 # Generally for multiline regexes, one of the %r forms below will be used,
 # so we bail out if we can't find a second / on the current line
 - match: '\s*(/)(?![*+{}?])(?=.*/)'
 captures:
 1: string.regexp.classic.ruby punctuation.definition.string.ruby
 push:
 - meta_content_scope: string.regexp.classic.ruby
 - match: "(/)([eimnosux]*)"
 scope: string.regexp.classic.ruby
 captures:
 1: punctuation.definition.string.ruby
 2: keyword.other.ruby
 pop: true
 - include: regex-sub
 - match: ''
 pop: true

Knowing this you can alter your code to:

TOKEN_REGEX = %r{
 (?<whitespace>\s+) |
 (?<parenthesis>[\(\)]) |
 (?<comparison_operator>#{ComparisonNode::OPERATORS.map { |op| Regexp.escape(op) }.join('|')}) |
 (?<logical_operator>\b(?:#{LogicalNode::OPERATORS.join('|')})\b) |
 (?<boolean_literal>\b(?:#{ValueNode::BOOLEAN_LITERALS.join('|')})\b) |
 (?<number_literal>\d+) |
 (?<string_literal>"[^"]*"|'[^']*') |
 (?<identifier>[a-z_][a-z0-9_\.]*) |
 (?<unknown>.)
 }ix.freeze

and the syntax highlighting works as expected.

enter image description here

As an aside Regexp::union provides a means for unioning an Array of values so you don't need to manually join or escape. This means you could just use:

 (?<comparison_operator>#{Regexp.union(ComparisonNode::OPERATORS)}) | 
 (?<logical_operator>\b(?:#{Regexp.union(LogicalNode::OPERATORS)})\b) |
 (?<boolean_literal>\b(?:#{Regexp.union(ValueNode::BOOLEAN_LITERALS)})\b) |
answered Nov 5, 2024 at 19:02
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! That worked! Very good and detailed answer πŸ‘‘

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.