Fix keyword loading to use any whitespace as separator #6693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

bhagman wants to merge 1 commit into arduino:master from bhagman:keyword-load-fix

Closed

Fix keyword loading to use any whitespace as separator #6693

bhagman wants to merge 1 commit into arduino:master from bhagman:keyword-load-fix

Conversation

bhagman

Copy link

Contributor

@bhagman bhagman commented Sep 4, 2017

Instead of forcing keywords.txt to use tabs, let library developers use spaces too.

@bhagman


 Fix keyword loading to use any whitespace as separator

7f2e52a

Instead of forcing keywords.txt to use tabs, let library developers use spaces too.

@bhagman bhagman mentioned this pull request

Sep 4, 2017

Use correct separator in keywords.txt bhagman/Tone#6

Open

@facchinm facchinm added the feature request label

Sep 4, 2017

@facchinm facchinm added this to the Next milestone

Sep 7, 2017

@per1234 per1234 mentioned this pull request

Oct 14, 2017

Multiple white spaces can not be used in keywords.txt #2435

Closed

@bperrybap

Copy link

bperrybap commented Oct 15, 2017

So is this finally going to be fixed after years of neglect?

@facchinm

Copy link

Member

facchinm commented Oct 16, 2017

This PR was merged in beta-1.9 branch as 13824d7 , if no user will report any problem we are going to proceed merging it into master soon.

@cmaglie

Copy link

Member

cmaglie commented Oct 20, 2017

merged in eed9e70

@cmaglie cmaglie closed this

Oct 20, 2017

@cmaglie cmaglie modified the milestones: Next, Release 1.8.6

Oct 20, 2017

@cmaglie

Copy link

Member

cmaglie commented Feb 19, 2018

@bhagman @bperrybap
unfortunately it's not as easy as it seems, the "old" keyword.txt format uses \t as a separator (not as a whitespace), if a row contains two consecutive tabs like A\t\tB it means that there are 3 fields: "A", empty string "" and "B".

Some examples are:

operator	LITERAL1		RESERVED_WORD_2
enum	LITERAL1		RESERVED_WORD_2
delete	LITERAL1		RESERVED_WORD_2
bool	LITERAL1		RESERVED_WORD_2
double	LITERAL1		RESERVED_WORD_2
null	LITERAL1		RESERVED_WORD_2
NULL	LITERAL1		RESERVED_WORD_2
new	LITERAL1		RESERVED_WORD_2
private	LITERAL1		RESERVED_WORD_2
protected	LITERAL1		RESERVED_WORD_2
public	LITERAL1		RESERVED_WORD_2
short	LITERAL1		RESERVED_WORD_2
signed	LITERAL1		RESERVED_WORD_2
true	LITERAL1		LITERAL_BOOLEAN
unsigned	LITERAL1		RESERVED_WORD_2
word	LITERAL1		RESERVED_WORD_2
Keyboard	KEYWORD1		DATA_TYPE
Mouse	KEYWORD1		DATA_TYPE
override	KEYWORD3		RESERVED_WORD
final	KEYWORD3		RESERVED_WORD
goto	KEYWORD3		RESERVED_WORD
throw	KEYWORD3		RESERVED_WORD
try	KEYWORD3		RESERVED_WORD
export	KEYWORD3		RESERVED_WORD

this is the IDE before the patch:

image

and this is after the patch:

image

as you can see the color coding is not taken but instead is interpreted as a reference-link (3rd position instead of 4th) this is confirmed if I right-click on the operator word and select "Find in reference":

image

this failure is also reported by the PdeKeywordsTest test:

java.lang.AssertionError: expected:<IncrementCompound> but was:<null>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
->	at processing.app.syntax.PdeKeywordsTest.testKeywordsTxtParsing(PdeKeywordsTest.java:49)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod1ドル.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner3ドル.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner1ドル.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access000ドル(ParentRunner.java:53)
	at org.junit.runners.ParentRunner2ドル.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:539)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:761)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:461)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:207)

I'm going to revert this patch, until we find a different solution.

At this point I don't know if a backward-compatible solution may be find, maybe the best way forward is to keep the current format and provide a better documentation?

@cmaglie cmaglie removed this from the Release 1.8.6 milestone

Feb 19, 2018

cmaglie added a commit that referenced this pull request

Feb 19, 2018

@cmaglie


 Revert "Fix keyword loading to use any whitespace as separator"

0175a4e

This reverts commit eed9e70.
See #6693

@bhagman bhagman mentioned this pull request

Feb 19, 2018

Allow janky empty fields within double tabs in keywords.txt files #7245

Closed

@bperrybap

Copy link

bperrybap commented Feb 19, 2018

@cmaglie
I'm not understanding this:

if a row contains two consecutive tabs like A\t\tB it means that there are 3 fields: "A", empty string "" and "B"

Regardless of how the current code is parsing it, shouldn't a line like that actually mean two fields "A" and "B" ?

So are you saying that the keywords file format intentionally supports the ability to skip fields or just that the current parsing code to parse it is broken?

Off topic but, but your comments seem to suggest that there are fields that I'm not familiar with.
Is there any documentation on the keywords file?
I've not been able to find any.

Regardless of the number of fields in the keywords file,
it would seem better for the keywords file format to NOT support blank/null fields and use whitespace as a field separator.
Supporting null/blank fields is mess and forever precludes the ability to use whitespace as a field separator.

When using whitespace as a field seperator, you won't have empty fields anymore since whitespace means 1 or more of either or

It appears that the patch used didn't use whitespace as a separator but rather simply used a single space or a single tab which is not the same as whitespace.

@cmaglie

Copy link

Member

cmaglie commented Feb 19, 2018

So are you saying that the keywords file format intentionally supports the ability to skip fields or just that the current parsing code to parse it is broken?

Yes, there is the ability to skip a field using two tabs as in my A\t\tB example. This is how the parser works now, it's counter-intuitive I know, in fact I discovered it after merging this PR.

We can argue if this is nice or bad, but this is not something we can change now because hundreds of keywords.txt are based on that parsing, so we should support the old format.

Anyway let's continue the discussion in #7245 that looks more promising.

@bperrybap

Copy link

bperrybap commented Feb 19, 2018

What is most important is the intended file format not the way the code may be parsing it.
Was the old format intentionally allowing blank fields? And that old format must be supported?
If so, then it is over since you can never use whitespace as a field separator.
IMO, supporting blank fields is a bad idea.

@per1234

Copy link

Collaborator

per1234 commented Feb 19, 2018

The only documentation I know of is here:
https://github.com/arduino/Arduino/wiki/Arduino-IDE-1.5:-Library-specification#keywords
and a brief mention here:
https://www.arduino.cc/en/Hacking/LibraryTutorial

Note they both do mention a tab separator. After looking at hundreds of keywords.txt files I can tell you that the use of spaces instead of tabs is very common (~15% of libraries). Of the rest, the great majority use multiple tabs as a separator (even though the documentation doesn't specify that being supported) and disturbingly high number have a random mixture of tabs and spaces.

I don't understand the purpose of the multiple identifiers per keyword used in the IDE's global keywords.txt. If relevant to libraries, it would be nice if that usage was documented on the library specification page. I'm happy to do the writing but can't without understanding how it works.

@bperrybap

Copy link

bperrybap commented Feb 19, 2018

It is really messy since depending on the version of the IDE the parsers were changed/modified which alters the expected format and what is supported.
I think the library spec page should be updated to reflect fields supported and the expected format, even if the format (field separators) actually supported under the hood is a bit more liberal.
Currently, I've not found any information other than Cristian's recent comment here about the file format and fields supported:
#7245 (comment)

@mfalkvidd mfalkvidd mentioned this pull request

Apr 1, 2018

TinyGSM Gateway for MySensors mysensors/MySensors#1097

Merged

@per1234 per1234 mentioned this pull request

May 28, 2019

Use correct field separator in keywords.txt CieNTi/arduino-IoTesla-client#1

Closed

CieNTi added a commit to CieNTi/arduino-IoTesla-client that referenced this pull request

May 28, 2019

@CieNTi


 Removing keywords.txt until TAB is not required anymore

5128a80

References:
- #1
- arduino/Arduino#6693

@CieNTi

Copy link

CieNTi commented May 28, 2019 •

edited

Loading

Thanks to @per1234 to point me to this issue.

I didn't realize about the fact that is completely mandatory, inside keywords file, to use tabulator as member separator. First by the fact that the IDE provides a poor variety of code editors capabilities, and as I'm used to Sublime Text, it add 0 work to see a function as a function, or a variable as a variable.

Being a tabulator a (extremely old) type of CSV file, and giving tons of headaches along lot and lot of years because it's a width-variable and non-visible character. Why is still in use, and moreover as a unique and a mandatory way to create keywords in Arduino?

As a CSV-like file (which means positional members with the error-prone that it directly inferes), Why is not at least selectable which character to use for fields separators? If keywords does not allow spaces, what is the sense of use a non-space character for that?

There is a lot of ways to define keywords, in use in top editors:

Not a single one use positional fields for that.

And just offtopic (as we're not talking about indentation), but nice to read:

Come on Arduino! Great devices, great software, great libraries ... but this as the unique way for syntax highlight of custom keywords?

My support to @per1234, hard work he does!

@bperrybap

Copy link

bperrybap commented May 28, 2019 •

edited

Loading

In a bit over 35 years of using s/w on many operating systems. The only tool and files that I have ever seen that have used such a rigid format for separators is make. makefiles are stupidly rigid and use a single tab field as separator for rules.

The format of the keyword file is even worse, in that not only is it using an invisible field separator, but the format of the file is not formally defined and its format has changed over the years making it impossible to create something that is fully compatible with all possible file formats from the past.

IMO, it is time to move to a new/extended format.
A format that is simple and mostly compatible with the existing format.

My suggestion would be to put a version #, key, or identifier on the first line so that the new format could be detected and then used.
If the identifier is not there, fall back to the older/current form of parsing.

The new format should be VERY liberal, and allow whitespace or a comma as field separators and NOT allow blank/missing/skipped fields.

That way, nearly everybody should be able to get something they like and get things to look "pretty" in the file.
single tab, multiple tabs, single space, multiple spaces, single comma, or any combination of those would be a single field separator.
i.e.

<tab><space><space><tab><comma>

would be a single field separator.

If you want/need to skip a field or have a null field, use commas and put nothing between the two commas for that field.
That way it is obvious that the field is not being used.

The point is to be very liberal in the format to support a few different styles of field separation all while keeping the parsing very simple.
But mainly to totally eliminate the use of hidden/invisible non used fields which is what is creating all the issues.

@bperrybap

Copy link

bperrybap commented May 28, 2019

Oh and once there is a version identifier in the file, it would be easy to change formats in the future and or add support for additional formats.
The other thing to consider is that this file is not really intended for end user consumption so it doesn't necessarily have to be a simple field separated format like it is today.

@per1234 per1234 added the Type: Invalid label

Mar 13, 2023

Labels

feature request Type: Invalid

6 participants

@bhagman @bperrybap @facchinm @cmaglie @per1234 @CieNTi

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix keyword loading to use any whitespace as separator #6693

Fix keyword loading to use any whitespace as separator #6693

Uh oh!

Conversation

@bhagman bhagman commented Sep 4, 2017

Uh oh!

bperrybap commented Oct 15, 2017

Uh oh!

facchinm commented Oct 16, 2017

Uh oh!

cmaglie commented Oct 20, 2017

Uh oh!

cmaglie commented Feb 19, 2018

Uh oh!

bperrybap commented Feb 19, 2018

Uh oh!

cmaglie commented Feb 19, 2018

Uh oh!

bperrybap commented Feb 19, 2018

Uh oh!

per1234 commented Feb 19, 2018

Uh oh!

bperrybap commented Feb 19, 2018

Uh oh!

CieNTi commented May 28, 2019 •

edited

Loading

Uh oh!

Uh oh!

bperrybap commented May 28, 2019 •

edited

Loading

Uh oh!

Uh oh!

bperrybap commented May 28, 2019

Uh oh!

Uh oh!

Uh oh!

Fix keyword loading to use any whitespace as separator #6693

Fix keyword loading to use any whitespace as separator #6693

Uh oh!

Conversation

@bhagman bhagman commented Sep 4, 2017

Uh oh!

bperrybap commented Oct 15, 2017

Uh oh!

facchinm commented Oct 16, 2017

Uh oh!

cmaglie commented Oct 20, 2017

Uh oh!

cmaglie commented Feb 19, 2018

Uh oh!

bperrybap commented Feb 19, 2018

Uh oh!

cmaglie commented Feb 19, 2018

Uh oh!

bperrybap commented Feb 19, 2018

Uh oh!

per1234 commented Feb 19, 2018

Uh oh!

bperrybap commented Feb 19, 2018

Uh oh!

CieNTi commented May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bperrybap commented May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bperrybap commented May 28, 2019

Uh oh!

Uh oh!

CieNTi commented May 28, 2019 •

edited

Loading

bperrybap commented May 28, 2019 •

edited

Loading