-
-
Notifications
You must be signed in to change notification settings - Fork 7k
Fix keyword loading to use any whitespace as separator #6693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of forcing keywords.txt to use tabs, let library developers use spaces too.
bperrybap
commented
Oct 15, 2017
So is this finally going to be fixed after years of neglect?
This PR was merged in beta-1.9 branch as 13824d7 , if no user will report any problem we are going to proceed merging it into master soon.
merged in eed9e70
@bhagman @bperrybap
unfortunately it's not as easy as it seems, the "old" keyword.txt format uses \t
as a separator (not as a whitespace), if a row contains two consecutive tabs like A\t\tB
it means that there are 3 fields: "A"
, empty string ""
and "B"
.
Some examples are:
operator LITERAL1 RESERVED_WORD_2
enum LITERAL1 RESERVED_WORD_2
delete LITERAL1 RESERVED_WORD_2
bool LITERAL1 RESERVED_WORD_2
double LITERAL1 RESERVED_WORD_2
null LITERAL1 RESERVED_WORD_2
NULL LITERAL1 RESERVED_WORD_2
new LITERAL1 RESERVED_WORD_2
private LITERAL1 RESERVED_WORD_2
protected LITERAL1 RESERVED_WORD_2
public LITERAL1 RESERVED_WORD_2
short LITERAL1 RESERVED_WORD_2
signed LITERAL1 RESERVED_WORD_2
true LITERAL1 LITERAL_BOOLEAN
unsigned LITERAL1 RESERVED_WORD_2
word LITERAL1 RESERVED_WORD_2
Keyboard KEYWORD1 DATA_TYPE
Mouse KEYWORD1 DATA_TYPE
override KEYWORD3 RESERVED_WORD
final KEYWORD3 RESERVED_WORD
goto KEYWORD3 RESERVED_WORD
throw KEYWORD3 RESERVED_WORD
try KEYWORD3 RESERVED_WORD
export KEYWORD3 RESERVED_WORD
this is the IDE before the patch:
and this is after the patch:
as you can see the color coding is not taken but instead is interpreted as a reference-link (3rd position instead of 4th) this is confirmed if I right-click on the operator
word and select "Find in reference":
this failure is also reported by the PdeKeywordsTest test:
java.lang.AssertionError: expected:<IncrementCompound> but was:<null>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
-> at processing.app.syntax.PdeKeywordsTest.testKeywordsTxtParsing(PdeKeywordsTest.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod1ドル.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner3ドル.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner1ドル.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access000ドル(ParentRunner.java:53)
at org.junit.runners.ParentRunner2ドル.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:539)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:761)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:461)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:207)
I'm going to revert this patch, until we find a different solution.
At this point I don't know if a backward-compatible solution may be find, maybe the best way forward is to keep the current format and provide a better documentation?
bperrybap
commented
Feb 19, 2018
@cmaglie
I'm not understanding this:
if a row contains two consecutive tabs like A\t\tB it means that there are 3 fields: "A", empty string "" and "B"
Regardless of how the current code is parsing it, shouldn't a line like that actually mean two fields "A" and "B" ?
So are you saying that the keywords file format intentionally supports the ability to skip fields or just that the current parsing code to parse it is broken?
Off topic but, but your comments seem to suggest that there are fields that I'm not familiar with.
Is there any documentation on the keywords file?
I've not been able to find any.
Regardless of the number of fields in the keywords file,
it would seem better for the keywords file format to NOT support blank/null fields and use whitespace as a field separator.
Supporting null/blank fields is mess and forever precludes the ability to use whitespace as a field separator.
When using whitespace as a field seperator, you won't have empty fields anymore since whitespace means 1 or more of either or
It appears that the patch used didn't use whitespace as a separator but rather simply used a single space or a single tab which is not the same as whitespace.
So are you saying that the keywords file format intentionally supports the ability to skip fields or just that the current parsing code to parse it is broken?
Yes, there is the ability to skip a field using two tabs as in my A\t\tB
example. This is how the parser works now, it's counter-intuitive I know, in fact I discovered it after merging this PR.
We can argue if this is nice or bad, but this is not something we can change now because hundreds of keywords.txt
are based on that parsing, so we should support the old format.
Anyway let's continue the discussion in #7245 that looks more promising.
bperrybap
commented
Feb 19, 2018
What is most important is the intended file format not the way the code may be parsing it.
Was the old format intentionally allowing blank fields? And that old format must be supported?
If so, then it is over since you can never use whitespace as a field separator.
IMO, supporting blank fields is a bad idea.
The only documentation I know of is here:
https://github.com/arduino/Arduino/wiki/Arduino-IDE-1.5:-Library-specification#keywords
and a brief mention here:
https://www.arduino.cc/en/Hacking/LibraryTutorial
Note they both do mention a tab separator. After looking at hundreds of keywords.txt files I can tell you that the use of spaces instead of tabs is very common (~15% of libraries). Of the rest, the great majority use multiple tabs as a separator (even though the documentation doesn't specify that being supported) and disturbingly high number have a random mixture of tabs and spaces.
I don't understand the purpose of the multiple identifiers per keyword used in the IDE's global keywords.txt. If relevant to libraries, it would be nice if that usage was documented on the library specification page. I'm happy to do the writing but can't without understanding how it works.
bperrybap
commented
Feb 19, 2018
It is really messy since depending on the version of the IDE the parsers were changed/modified which alters the expected format and what is supported.
I think the library spec page should be updated to reflect fields supported and the expected format, even if the format (field separators) actually supported under the hood is a bit more liberal.
Currently, I've not found any information other than Cristian's recent comment here about the file format and fields supported:
#7245 (comment)
Thanks to @per1234 to point me to this issue.
I didn't realize about the fact that is completely mandatory, inside keywords file, to use tabulator as member separator. First by the fact that the IDE provides a poor variety of code editors capabilities, and as I'm used to Sublime Text, it add 0 work to see a function as a function, or a variable as a variable.
Being a tabulator a (extremely old) type of CSV file, and giving tons of headaches along lot and lot of years because it's a width-variable and non-visible character. Why is still in use, and moreover as a unique and a mandatory way to create keywords in Arduino?
As a CSV-like file (which means positional members with the error-prone that it directly inferes), Why is not at least selectable which character to use for fields separators? If keywords does not allow spaces, what is the sense of use a non-space character for that?
There is a lot of ways to define keywords, in use in top editors:
- http://git.savannah.gnu.org/cgit/nano.git/tree/syntax/c.nanorc
- http://vimdoc.sourceforge.net/htmldoc/syntax.html
- https://www.gnu.org/software/emacs/manual/html_node/elisp/Customizing-Keywords.html
- https://www.sublimetext.com/docs/3/syntax.html
- https://macromates.com/manual/en/language_grammars
- https://atom.io/packages/highlight-registered-keyword
Not a single one use positional fields for that.
And just offtopic (as we're not talking about indentation), but nice to read:
- https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces
- https://stackoverflow.blog/2017/06/15/developers-use-spaces-make-money-use-tabs/
Come on Arduino! Great devices, great software, great libraries ... but this as the unique way for syntax highlight of custom keywords?
My support to @per1234, hard work he does!
In a bit over 35 years of using s/w on many operating systems. The only tool and files that I have ever seen that have used such a rigid format for separators is make. makefiles are stupidly rigid and use a single tab field as separator for rules.
The format of the keyword file is even worse, in that not only is it using an invisible field separator, but the format of the file is not formally defined and its format has changed over the years making it impossible to create something that is fully compatible with all possible file formats from the past.
IMO, it is time to move to a new/extended format.
A format that is simple and mostly compatible with the existing format.
My suggestion would be to put a version #, key, or identifier on the first line so that the new format could be detected and then used.
If the identifier is not there, fall back to the older/current form of parsing.
The new format should be VERY liberal, and allow whitespace or a comma as field separators and NOT allow blank/missing/skipped fields.
That way, nearly everybody should be able to get something they like and get things to look "pretty" in the file.
single tab, multiple tabs, single space, multiple spaces, single comma, or any combination of those would be a single field separator.
i.e.
<tab><space><space><tab><comma>
would be a single field separator.
If you want/need to skip a field or have a null field, use commas and put nothing between the two commas for that field.
That way it is obvious that the field is not being used.
The point is to be very liberal in the format to support a few different styles of field separation all while keeping the parsing very simple.
But mainly to totally eliminate the use of hidden/invisible non used fields which is what is creating all the issues.
bperrybap
commented
May 28, 2019
Oh and once there is a version identifier in the file, it would be easy to change formats in the future and or add support for additional formats.
The other thing to consider is that this file is not really intended for end user consumption so it doesn't necessarily have to be a simple field separated format like it is today.
Instead of forcing keywords.txt to use tabs, let library developers use spaces too.