Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Accurate column error info with tabs. Avoid copy of pattern string. #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shewitt-au wants to merge 18 commits into WerWolv:master
base: master
Choose a base branch
Loading
from shewitt-au:accurate-column

Conversation

Copy link
Contributor

@shewitt-au shewitt-au commented Jul 10, 2025
edited
Loading

Problem description

The lexer does not handle handle tabs correctly. In some cases this results in incorrect columns in errror messages. Currently PatternLanguage::executeString does a rewrite of the pattern string like this:

code = wolv::util::replaceStrings(code, "\r\n", "\n");
code = wolv::util::replaceStrings(code, "\t", " ");

This does not handle pattern files like this:

#pragma author WerWolv
#pragma description Microsoft PE Portable Executable (exe/dll)
#pragma MIME application/x-dosexec
#pragma MIME application/x-msdownload
#pragma MIME application/vnd.microsoft.portable-executable
import std.core;
import std.string;
import type.guid;
import type.time;
//3456789
;;**destruct DOSHeader {

Note:
"**" is a single tab char (hard coded to four spaces in ImHex). It is shown as two characters to illustrate the space it occupies.

Error:

E: error: Type destruct has not been declared yet.
E: --> in <Source Code>:13:7
E: 13 | ;; destruct DOSHeader {
E: ^^^^^^^^

Link to org fix for this issue:
f29a7fd#diff-715e5fce329a964b5f0cae4fa2d45a4273ba388614701f95a33a0e7750b1e9f9

Implementation description

The lexer has been modifed to handle tabs. Multiple newline encoding support has also been added. This fixes the wrong columns in errors as described above, and as a bonus removes the need to copy the pattern-code string in PatternLanguage::executeString, and instead use a const &

Other stuff

In the lexer before these changes:
m_longestLineLength
The line length of the first line of a pattern file is one char short (compared to subsequent lines). This is due to m_lineBegin pointing to the \n of the previous line, except for the first line, where it points to the first char of the line. This is special-cased in Lexer::location() but not handled in the longest-line-length code.

@shewitt-au shewitt-au marked this pull request as ready for review July 11, 2025 19:00
@shewitt-au shewitt-au changed the title (削除) Add code from dev branch (削除ここまで) (追記) Accurate columns error info with tabs (追記ここまで) Jul 11, 2025
@shewitt-au shewitt-au changed the title (削除) Accurate columns error info with tabs (削除ここまで) (追記) Accurate column error info with tabs. Avoid string copy of pattern. (追記ここまで) Jul 11, 2025
@shewitt-au shewitt-au changed the title (削除) Accurate column error info with tabs. Avoid string copy of pattern. (削除ここまで) (追記) Accurate column error info with tabs. Avoid copy of pattern string. (追記ここまで) Jul 11, 2025
Copy link
Collaborator

paxcut commented Jul 12, 2025

Tabs and newlines are not supported by the pattern language so it is not necessary to create tokens for them. the code you pointed out as being responsible of handling tabs is really not where that happens because that code doesn't work. it swaps a \t char with 4 spaces always regardless of the location of the /t char so in your example it wouldn't have added 2 spaces, it always adds 4. Tabs are eliminated using a function that swaps tabs for spaces in string utilities and the source code is always processed before it even gets sent out to the editor. Pasting tabs or code that has tabs is also processed before being sent and even if you import files that contain tabs in them they get processed before being sent to the editor.

I can't find any evidence of the errors not handling spaces together in your example or any other file that contains errors in them. Here are some pics of your example results in various versions of ImHex:
1.37.4 Release:
image

1.38.0 nightly from July 7 (the colors are wrong on purpose for unrelated reasons)
image

Local build pf feature branch SyntaxHighlights
image

There is only one small issue with misaligned arrows when the error line length is longer than 40 chars where an extra space is added to the arrows padding which has been fixed locally and will be committed soon.

The calculation of the longest line being off by one for the first line is being handled correctly and produces no bugs. The longest line calculation is only used to determine if horizontal scrolling is necessary. If the first line is not the only one then it can't be the longest line if it is empty. If it is the only one a length of 0 or 1 will not result in horizontal scrolling being displayed because the editor windows can't be made that small.

Why are any of the changes in this PR necessary?

Copy link
Collaborator

@paxcut paxcut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent of this pr of correcting the error formatting doesn't seem to exist so most of the changes proposed are not necessary. the only changes I think are useful would be the removal of the unnecessary editing in execute sting that allows the code parameter to become constant.

Copy link
Contributor Author

shewitt-au commented Jul 12, 2025
edited
Loading

@paxcut

I can't find any evidence of the errors not handling spaces together in your example or any other file

Perhaps I should have been more specific. Here's how I get the issue:

  • Open a file (I use a PE file)
  • Open the pattern file that already has tab in it.

That last bit is important. I modified pe.hexpat to contains tabs outside of ImHex.

Why are any of the changes in this PR necessary?

I guess they're not necessary. Column numbers being off in certain edge cases is not the end of the world. But given it was quite easy to fix I gave it a go. In my opinion handling the character level parsing belongs in the lexer. I believe it to be more robust.

Copy link
Collaborator

paxcut commented Jul 12, 2025

That last bit is important. I modified pe.hexpat to contains tabs outside of ImHex.

Why sis you modify it? pe.hexpat is known to have been created using tabs and it is stored with tabs. It was used as input example when the changes to eliminate tabs from the text editor were made. It is currently impossible to have an opened file in the text editor that contains tabs. Allowing tabs to be added to the test editor creates many more problems that the one you encountered so they are not allowed to be used and are always converted to spaces before the code is added to the text editor.

I guess they're not necessary. Column numbers being off in certain edge cases is not the end of the world. But given it was quite easy to fix I gave it a go.

If column numbers were wrong under any circumstance then that would require changes to make sure they are correct. I reran the tests in all three versions of imhex using pe.hexpat and in all cases the column numbers were correct. When tabs are introduced in the text editor it is easy to detect them because using the left and right arrow keys skip the entire tab but with spaces you move one char at a time.If tabs were introduced the changes here alone would not have fixed all the problems that tabs create so the only possible way to fix all the issues tabs create is to ensure they are never added and are always converted to spaces beforehand.

In my opinion handling the character level parsing belongs in the lexer. I believe it to be more robust.

That's not an opinion, that's a know fact about lexers. Lexers convert the character stream into token streams so they need to parse the character stream. Languages also have character sets that list all possible characters that can be used to create valid language statements. Tab characters are not part of the pattern language character set so adding support for them in the lexer makes no sense.

The changes to remove tabs were done months ago and all the tests I run in previous versions show the right column numbers in the errors using any known form to add code in the text editor and trying to create a tab character. you can import, paste , type or drop text into the editor and the tabs will be converted into spaces before the editor gets to display them.

That being said it is possible that some of the changes never made it to master but a few hours ago I updated master to include all the changes and fixes that I had in my local build so now they should be included without any doubt.

Copy link
Contributor Author

shewitt-au commented Jul 12, 2025
edited
Loading

Ok, I'll take this a bit at a time. Sorry if that's a pain:

paxcut

Why sis you modify it? pe.hexpat is known to have been created using tabs and it is stored with tabs. It was used as input example when the changes to eliminate tabs from the text editor were made.

I modified it (outside of ImHex) for two reasons. The first was for testing purposes. The second is that, and I mean no insult here, there are better text editors than ImHex's.

Currently ImHex does not supprt tabs, at least not correctly. A tab (4 spaces) on column 1 should take us to column 5. A tab on column 2, also to column 5. Currently, a tab on column 2 will take us to column 6.

#pragma author Me
u32 a = 1+1;
;****struct

**** = single tab. Should be 3 spaces because of the semicolon. A text search and replace does not account for this so it's 4.

Error message

E: error: Invalid function statement.
E: --> in <Source Code>:3:6
E: 3 | ; struct
E: ^^^^^^
I: Evaluation took 0.0037951s

The lexer is where this can be fixed.

Allowing tabs to be added to the test editor creates many more problems that the one you encountered

I sympathise here. Problems are obvious a pain in the ass. And it's more than possible I'm underestimating them. That said a text edit should support tabs. There seems to be a gentlemen's agreement between the lexer and the editor: I don't parse tabs, so don't give them to me. This is sematic coupling. If in future the pattern editor is upgraded to support tabs (and a programmer's editor should) now the lexer and the editor need to be changed. Also on this, I've read requests from users asking for featues like auto-reload so they can use an external editor. Closing the gap between the editor of choice and ImHex's should be made easier if possible. Finally on this, I've not actually changed the editor's behaviour. But, hopefully, the lexer will be more cooperative if it is in future.

That's not an opinion, that's a know fact about lexers. Lexers convert the character stream into token streams so they need to parse the character stream. Languages also have character sets that list all possible characters that can be used to create valid language statements. Tab characters are not part of the pattern language character set so adding support for them in the lexer makes no sense.

Respectfully, I think the pattern language should support tabs. Adding support for them, to me at least, makes perfect sense. Does the pattern language CLI do this string filtering? What if it is decided to add support for tabs in future? What if the pattern language is used in another project (it is a separate repo)?

Later...

$ ./plcli format "C:\projects\SteveImHex\build-release\install\imhex-gui.exe" "C:\projects\SteveImHex\build-debug\install\patterns\pe.hexpat"
Compilation failed
error: Invalid function statement.
 --> in C:\projects\SteveImHex\build-debug\install\patterns\pe.hexpat:3:6
3 | ; struct
 ^^^^^^

Looks like it does do the filtering. Still the wrong column though.

Copy link
Collaborator

paxcut commented Jul 12, 2025

I modified pe.hexpat to contains tabs outside of ImHex.
I modified it (outside of ImHex) for two reasons. The first was for testing purposes.

Right, except the pe pattern already uses tab characters in all indented lines so I can't see how you modified it to contain tabs.

The second is that, and I mean no insult here, there are better text editors than ImHex's.

We can agree on something at least. Being the only person currently working on upgrading the editor I could definitely use some help adding things which nobody is going to argue against the need to adding them or working on them.

Currently ImHex does not supprt tabs, at least not correctly. A tab (4 spaces) on column 1 should take us to column 5. A tab on column 2, also to column 5. Currently, a tab on column 2 will take us to column 6.

I'm not sure I follow what this sentence means. Pressing the tab key on column 1,2,3 and 4 takes you to the exact same location in the text editor so you must be meaning something else. I don't get those error messages with the wrong columns and I can't insert tab characters (\t) in the text editor.

The text editor doesn't accept tabs, period. there is no agreement and no coupling because tabs never make it to the editor. Text editors don't need to support tabs. A tab is a one character sequence that stands for 0, 1, 2 or 3 spaces depending on its location on the line. It is very easy to convert tabs to spaces and it can be done on the fly as the editor is receiving inputs from any source. This is what the text editor does and it guarantees that there will be no tab chars in the character stream. Any input mode will be patched to prevent tabs from entering the character stream. The text editor will never need to support tabs so they will never be added. The rest of your argument about closing gaps and modifying the lexer does not apply then.

Respectfully, I think the lexer should support tabs. The lexer parses files and tabs live in some files.

Respectfully no. In all cases the preprocessor makes sure that tabs are removed from the character stream.

Long ago, when errors related to tabs started to pop up all over the place there was a series of commits dedicated at eliminating tabs from the editor. The problem was resolved and the case was closed. If there are still problems they will be addressed the same way. You say the lexer should support tabs but the reason you give can be easily worked around and you don't address any of the bad consequences that adding tab support will bring. I can't see any advantage of using tabs over spaces and I mean real advantages that cannot be obtained otherwise.

If there is a bug please post all the steps needed to create it and please try the latest nightly first to make sure it hasn't been fixed already. My review of the PR stands.

Copy link
Contributor Author

shewitt-au commented Jul 12, 2025
edited
Loading

@paxcut

I'm not sure I follow what this sentence means. Pressing the tab key on column 1,2,3 and 4 takes you to the exact same location in the text editor so you must be meaning something else. I don't get those error messages with the wrong columns and I can't insert tab characters (\t) in the text editor.

I'm loading a pattern that already contains tabs, not adding tabs in the ImHex editor. Take care not to let the build overwrite your pattern. I did explain this. I'll try to be more clear in future.

More speficially, a character and then a tab. The extra char before the tab skewes the error column.

The example files I gave are pre-existing, not altered in the ImHex editor.

Copy link
Contributor Author

shewitt-au commented Jul 12, 2025
edited
Loading

@paxcut

In all cases the preprocessor makes sure that tabs are removed from the character stream.

As far as I can tell lexing happens before preprocessing. Unless you mean the string search-and replaces.

On a side note I like the lex before preprocess approach.

And the lexer does get tabs. From ImHex there are multiple paths to the lexer.

void ViewPatternEditor::loadPatternFile(const std::fs::path &path, prv::Provider *provider) {
 wolv::io::File file(path, wolv::io::File::Mode::Read);
 if (file.isValid()) {
 auto code = file.readString();
 // Debugging
 std::cout << code << std::endl;
 // TODO: lextwice: we lex twice here! 1st lex.
 this->evaluatePattern(code, provider); // TODO: This seems to be highlighting.
 m_textEditor.get(provider).SetText(code, true);
 m_sourceCode.get(provider) = code;
 // TODO: lextwice: we lex twice here! 2nd lex.
 TaskManager::createBackgroundTask("hex.builtin.task.parsing_pattern", [this, code, provider](auto&) { this->parsePattern(code, provider); });
 }
 }

evaluatePattern and parsePattern both lex. Only one removes tabs. That's how I discovered this issue (by examining the token streams from both lexes).

Copy link
Collaborator

paxcut commented Jul 12, 2025

There are lots of steps you didn't explain on how to reproduce the problem. The pe.hexpat file already contains tabs as I have explained. change the first line that contains a tab to have and error, the -> indicates the presence of a tab in original file. open it in notepad++ and show all characters to confirm.

->char signature[2] [[hex::spec_name("e_magic")]];

into

->c har signature[2] [[hex::spec_name("e_magic")]];

import the file and the column error is 5. This clearly shows that the problem you are seeing is not occurring where you think it is else the lexer would have chocked on the tab and give a value of 2 for the column. The fact is that even for imported files the tabs are also replaced with spaces. I explained this and showed how to check if there are tabs or spaces by following the cursor using the left and right arrows. tabs if present will jump the length of the tab, spaces will move one char at a time.

Now lets look at your error. You neglected to say that the imported file had to be changed so that it contained two valid characters, followed by a tab and followed by valid characters that will trigger an error when evaluating. Under those conditions I see a problem but not where you thing it is at. the column is erroneously set to 7 but notice that the gap in the error line is longer than the one in the test editor.

The gap is indeed 4 spaces which explains why you are getting a column of 7. Why there are 4 spaces in the error is because the tab interception code is not being applied to the code that generates the error strings. That code reads the input file and not the text editor and incorrectly sets the tab to 4 chars instead of two.

Although this is the best explanation I can come up with right now, I suspect there is more to it than that because it doesn't explain how changes to the lexer can revert the effect of the error string processor if indeed it is reading the text from the file and not the editor's buffer. I need some time to look into it.

Copy link
Contributor Author

shewitt-au commented Jul 12, 2025
edited
Loading

Take this with a grain of salt. I'm still trying to figure out exactly what you're getting at.

I've been up to my elbows in the lexer for some time now. I was writing a new lexer using a header-only lexer library when I discoved these issues. The examples I gave are reverse engineered by comparing the token streams of my prototype lexer to that of ImHex's lexer.

And I can tell you for a fact that the lexer does get tabs. I've added breakpoints and seen them.

You need at laeat one character before the tab to show the error.

Stuff:

Assume a tab width of four.

Definition: A tab band is a range of columns where a tab moves the current column to the first column of the following tab band. So a tab in the range of 1 to 4 moves to 5.

ImHex simply replaces tabs with 4 spaces.

This works fine if the tab is on the first character of a tab band.

Lets assume the tab is on the 2nd char of a tab band. ImHew will simply add 4 and come up with 6. Shoudl be 5 (the first column of the next tab band).

Copy link
Contributor Author

shewitt-au commented Jul 13, 2025
edited
Loading

paxcut

You are obviously not going to implement the fix the proper way and are going to continue insisting your way is the only way. I then have no choice again but to create my own pr and fix the problem the way it needs to be fixed.

I am no more guity because I think my way is better than you are for thinking yours is.

paxcut

No it isn't. The only way for the lexer to receive tabs is if you remove the lines in executeString() that convert the tabs to spaces. You can't change the code to make it do what you want to show as being the problem.

I didn't fake the test if that's what you mean by, "You can't change the code to make it do what you want to show as being the problem."

On the tabs, if you mean lexing performed that's used to display errors messages, then as far as I know, yes. But as I mentioned ealier there are many paths to the lexer and not all go through executeString.

void ViewPatternEditor::loadPatternFile(const std::fs::path &path, prv::Provider *provider) {
 wolv::io::File file(path, wolv::io::File::Mode::Read);
 if (file.isValid()) {
 auto code = file.readString();
 // Debugging
 std::cout << code << std::endl;
 // TODO: lextwice: we lex twice here! 1st lex.
 this->evaluatePattern(code, provider); // TODO: This seems to be highlighting.
 m_textEditor.get(provider).SetText(code, true);
 m_sourceCode.get(provider) = code;
 // TODO: lextwice: we lex twice here! 2nd lex.
 TaskManager::createBackgroundTask("hex.builtin.task.parsing_pattern", [this, code, provider](auto&) { this->parsePattern(code, provider); });
 }
 }

parsePattern does no tab replacements. There may be other paths to the lexer I'm not aware of. I know there are background tasks that use the lexer for example. Tabs do make it into the lexer and it's easy to verify.

Copy link
Collaborator

paxcut commented Jul 13, 2025

I am no more guity because I think my way is better than you are for thinking yours is.

this sentence here pretty much summarizes the biggest problem we are dealing with. The code you refer to as being mine is not my code. It is code the author wrote as a fix to deal with the problems tabs were causing. He made a mistake so I am correcting the mistake he made and I can't make any assumptions beyond the fact that a mistake was made that needs fixing. you found the mistake and even quoted his commit of the fix and your fixes go against the intent of his commit. I remember the discussions we had on the issues and implemented my versions of his patches in the projects I was working on.

Copy link
Contributor Author

shewitt-au commented Jul 13, 2025
edited
Loading

I am going to finish this PR. The last issue is the longest line length. After that I can't control what is done with it.

The code you refer to as being mine is not my code

I was referring to the fixes. Not the code base as a whole. I understand that you may not want to go against the intent of the author. But I find it hard to believe fixing the problem in the lexer violates the intent, although it may. I don't know.

Fixing the problem anywhere but the lexer, in my opinion, is just asking for troubles in the future. If the lexer handles tabs and newlines no string preprocessing is needed. We can avoid some string copying. Errors will be reported accuratly without fuss. Any new clients of the lexer (directly or indirectly) are less like to get nasty suprises and have to address this issue again. If any of the existing code that lexes but does not massage the string first (and there is such code) decides to use the location info of a token it will be correct.

Basically it's less fragile.

You are obviously not going to implement the fix the proper way and are going to continue insisting your way is the only way.

I am not being stubborn and refusing to see the "proper way". I've explained my reasons and they make sense.

Copy link
Collaborator

paxcut commented Jul 13, 2025

I am going to finish this PR. The last issue is the longest line length. After that I can't control what is done with it.

As I explained in the code comments the function that returns the longest line length should return exactly that and not the adjusted length to see 2 chars beyond the end.

I was referring to the fixes. Not the code base as a whole. I understand that you may not want to go against the intent of the author. But I find it hard to believe fixing the problem in the lexer violates the intent, although it may. I don't know.

What fixes? the one where I replace the calls to replaceString with a call to replaceTabsWithSpaces? how is that not his code adjusted to fix the problem you found? He posted the original fix and placed the tab substitution in executeString and not in the lexer. so your code is going against his fixes. and yes it violates the intent as we discussed the issue which led to him posting the fix.

Fixing the problem anywhere but the lexer, in my opinion, is just asking for troubles in the future

the difference is that we have experienced the problems that come with allowing tabs to enter imhex and we found a way to solve them and this is not an issue with the lexer. that's your focus because that happens to be the code you wrote. The reasons you give sound like good reasons bur not when weighed against the problems that tabs create in imhex. String preprocessing is what you are doing but in the wrong place. it needs to be done before the code enters the main evaluation engine that starts with the lexer. that's the only way to avoid the issues the original commit addresses. String copying is not even worth mentioning here. Errors are being reported accurately and without fuss. Bugs will happen and they will be fixed when found so that is always the truth. it is possible to introduce errors in the lexer parsing also. no code is immune to bugs. you cannot write code now to address future code issues. if you can predict the future maybe. It isnt less fragile. you are not taking the problems the original commit addresses into consideration in your evaluation and that makes your solution a lot more fragile than the current one where no tabs exist in Imhex and all code locations are reported correctly. the lexer only sees spaces.

I am not being stubborn and refusing to see the "proper way". I've explained my reasons and they make sense.

i dont know if your are being stubborn or not. I know you are not accepting the solution that the author wrote and you think you know better than him. At least that's the impression I get when reading your posts. You may be right, I don't really know. I can only judge evidence that I know about that means make minimal changes that fix the errors. that leaves original code like it was before and fixes the new issues found. To evaluate your changes positively I need a leap of faith that I can't make. From what i understand, trouble will come that is best avoided by doing what we do.

Copy link
Contributor Author

shewitt-au commented Jul 13, 2025
edited
Loading

i dont know if your are being stubborn or not. I know you are not accepting the solution that the author wrote and you think you know better than him

I don't think I "know better", but nor do I think he is some God whose solutions must be strictly adhered to. I don’t know what constraints he was under when he authored the fix. How many bugs were waiting to be fixed at the time? etc...... I am not criticizing the author. I do not have the history to be critical in this regard. I found a problem and fixed it in the place where it only has to be fixed once. The DRY principle. A simple choice: make the lexer more robust or push the burden onto every caller. I tire of this arguing. If you can't at least acknowledge the benifts of my solution by now (and I'm not saying there aren't downsides), and to be fair to you you did say "The reasons you give sound like good reasons bur not when weighed against the problems that tabs create in imhex", there seesms little point in flogging a dead horse. I’ll complete my PR and what will be will be.

You say my reasoning "sound like good reasons bur not when weighed against the problems that tabs create in imhex", but make no attempt to explain why. What are the problems? There are no comments in the code I can find to explain this. Nothing to say why the lexer should not be fed tabs (which it is getting fed btw).

Copy link
Contributor Author

shewitt-au commented Aug 1, 2025
edited
Loading

@paxcut

here you said that, "there should never be a \r char to the code sent to the lexer."

This recent PR by @WerWolv: "lexer: Handle \r correctly"

Given this do you still object to this PR?

I'm not trying to reopen an old argument here. But naturally I would prefer code that I spent time on be useful if possible.

Thanks.

Copy link
Owner

WerWolv commented Aug 2, 2025

I didn't read though this whole arguing tbh but my stance is, and I discussed that with paxcut on Discord, that the pattern language library needs to be able to handle whatever's thrown at it. Even though ImHex specifically filters out these characters, they can still end up in the library if it's not called from within ImHex.
Up until that commit, all I did was to just get rid of carriage returns and tabs before feeding the data into the lexer which is obviously less than ideal.
So the tab handling code I think would be great to add now at this point

shewitt-au reacted with thumbs up emoji

Copy link
Collaborator

paxcut commented Aug 2, 2025

yes, i still object to the code in this pr. I was never given a chance to voice my opinion on those changes so I'll do it now. There are two strategies to deal with this issue. you either fix every problem they create or you don't allow the problem to exist in the first place. The patching of the lexer is not enough to fix the problems that occur outside of the lexer, even if it is only for the pattern language. This 'fix' will have to be revisited again and again. the fix i propose is a one time thing and doesn't need to be looked at again.

Copy link
Collaborator

paxcut commented Aug 2, 2025
edited
Loading

The lexer doesn't handle everything you send its way. When it detects an error it gives a message. that's how errors are handled, not by silently allowing the bad code to remain in the source code. With the changes here we will have no way of knowing if bad characters are making it past the filters so it is a bad idea to simply parse them. they should be treated like any other errors in the code because they are illegal characters.

Copy link
Contributor Author

shewitt-au commented Aug 2, 2025
edited
Loading

paxcut

There are two strategies to deal with this issue. you either fix every problem they create or you don't allow the problem to exist in the first place.

There are three actually. You make the lexer more robust if convenient, as WerWov did, directly contradicting your assertions that the lexer should never get \r chars. And you fix other problems where you have to. In the future, if the lexer needs to handle \r (like it does now), or God forbid, tabs, awesome... Less work.. Even in the mean time, if some leak in, like they did before your work, less bugs.

Conspicuously absent here is an explanation from you as to why you scolded me for handling \r in the lexer but make no comment on WerWov 's PR doing exactly that.

I'm not going to comment on this any further unless prompted.

I'm forced to the conclusion that you are just determined to reject this regardless of its merrits or pitfalls.

If I see tab introduced into the lexer I will commet however.

Copy link
Collaborator

paxcut commented Aug 2, 2025

That's not three, you are describing the first one. you think the lexer is the only one having problems but you can't be sure. Beside, you dont seem to understand why the code was added, it has nothing to do with the error this pr fixes. it arises because there are tests done that bypass al the filters and feed raw text straight to the evaluation. that is going to present problems if tests are done that use the bad characters that don't involve the lexer.
Imhex is free of tab and \r problems thanks to the early elimination of them from the character stream and the pattern language should be using the same proven implementation to deal with the same problem

Copy link
Contributor Author

shewitt-au commented Aug 2, 2025
edited
Loading

paxcut

you think the lexer is the only one having problem

No, I don't. I do think a more flexible lexer is a benifit. I never thought what you assert and this characterisation is incorrect.

This PR had two goals:

  1. Address the location info being off
  2. Make the lexer better along the way

Any time a programmer works on some code they aim to improve it.

You seem to flat out reject any mods to lexer, despite WereWov doing, in part, what I did, and you rectected. Yoy have no problems making numerous mods anywhere else.

Copy link
Collaborator

paxcut commented Aug 2, 2025

look, it is simple. you either deal with it early, or you deal with it at the end. thats two strategies, not three. early means as soon as the characters are read, at the end is when the errors occur. in this case the errors we know of are at the lexer so thats at the end. if errors occur somewhere else we'll have to fix them again and again. if you fix them early you never again have to deal with them. I have explained this over and over and it is simple and clear and can't be argued against. not only patching the parser wont fix errors occurring elsewhere but that hides the fact that those characters exist in the input files. making it harder to find and fix the new errors.

Copy link
Owner

WerWolv commented Aug 2, 2025
edited
Loading

I still disagree with this Ax, if we don't properly handle these characters in the Lexer or at least shortly before the Lexer, we will have to fix this same issue everywhere. We need to guarantee that any code that passes any pattern language code to the library needs to not contain any carriage returns or tabs which moves the responsibility out to the user who now needs to know about this and needs to handle it. If we just handle these characters properly ourselves inside the library, everybody using the library, including us in ImHex, will have it just work. And I am personally a big fan of libraries that just work

shewitt-au reacted with thumbs up emoji

Copy link
Contributor Author

shewitt-au commented Aug 2, 2025
edited
Loading

This is the last timeI will comment on this (extenuating circumstances aside), and here it is:

The lexer not handling tabs and various new line encodings is obviously just bad design. I really can't be bothered trying to convince you of the obvious any more. It just seems pointless.

Copy link
Collaborator

paxcut commented Aug 2, 2025

look for example what happens if the pattern has an invalid identifier. we don't allow it and issue an error. invalid chars are the same thing.thats my argument. if bad chars are in the lexer they are already everywhere else. right now there may be other parts of the pattern language affected by them that we don't know of. maybe not, maybe yes. we'll have to wait until somebody writes a test for that to find them. then they''ll have to be patched too. And patching the lexer will make finding them harder. These are valid arguments that nobody is counteracting. All I hear is the same message that the parser needs to deal with it. O have explained why it is not a good idea.

The lexer not handling tabs and various new line encodings in the lexer is obviously just bad design. I really can't be bothered trying to convince you of the obvious any more. It just seems pointless.

Bad design is one that allows errors to happen. The problem with your statement is that you don't validate them with any sort of evidence.It isn't you can't be bothered and it isn't avoid convincing anybody. If you can't explain why you have an opinion you shouldn't voice it.That's what makes this pointless.

Copy link
Contributor Author

shewitt-au commented Aug 2, 2025
edited
Loading

WerWolv weighing in counts as an extenuating circumstance.

I'm at my wits end here. How about be we build software using reusable components? Components where the price of admission is kept to a minimum. How about we let lexers lex? Why not just let them convert char streams to token streams without having to be hand-fed like a babies?

You're allowed to make things better.

Copy link
Collaborator

paxcut commented Aug 2, 2025

It is a bad idea because there may be other parts affected by those characters so the lexer should issue and error just like when it detects a bad identifier. There is plenty of character streams that the lexer cannot lex. it only accepts a subset of all possible char streams as valid code. The question is not to accept as many as possible. It doesn't matter how you want to use the pattern language library. if it contains code that uses those bad characters in some form that causes errors then accepting them as valid is a bad idea. how can you guarantee that will never happen?

Copy link
Contributor Author

shewitt-au commented Aug 2, 2025
edited
Loading

paxcut

There is plenty of character streams that the lexer cannot lex.

Naturally. But it should handle all chars in the character set.

Don't worry. We'll make other stuff better too. The lexer should not ba an assert.

Copy link
Collaborator

paxcut commented Aug 2, 2025

You want to deal with errors as they happen? fine. If it ever falls on me to fix problems about tabs or other chars I know how to fix them. I have already spent many hours (days?) dealing with bad chars in ImHex as a whole and I have 5 or 6 prs to prove it. I have learned from that experience but nobody wants to hear what that is so they'll have to learn on their own. Imhex is already dealing with tabs and chars properly.

The lexer running in imhex will never use the code you are adding. it will only be used in the unit tests and probably no even there since now all the test are passing.Future test may fall because the characters were not removed early. That can be avoided and I am showing how. removing them early is the only way to guarantee no further problems will exist. thats indisputable unless you can predict the future.

I choose that option because it is clearly better to not have to deal with problems than having to. you choose the other, why? i really don't know and you don't say. how can be better to deal with problems than not having to deal with them? that sounds to me like the very definition of bad design.

From a purely practical point of view there is no doubt that early patching is superior. If you want to include what you have learned at the university or elsewhere that's not just some heuristic or cliche then by all mean, explain in detail. I am very reasonable when it comes to logical statements that are well made and will change my opinion instantly it you can show your pov has merit. I have proven that many times in many interactions in the past and recorded here in github. None of the arguments I have heard so far sound convincing. then again this is not my decision to make so my opinion is not that important. Agree to disagree and move on.

Copy link
Contributor Author

shewitt-au commented Aug 2, 2025
edited
Loading

paxcut

The lexer running in imhex will never use the code you are adding

Not if you have any say in it. There's history here. You object, in unfathomably creative ways, to anything I do. On multiple occasions reimplementing them without any reference to the "many hours (days?)" I spent on them. Frankly, I think you have something against me.

My version of the lexer is better than what currently exits. And I'm writing a new lexer. Which, if history repeats, you will reject.

But so be it.

Copy link
Owner

WerWolv commented Aug 2, 2025

I'm just gonna step in here and stop this whole discussion.
It's honestly extremely frustrating reading all of that.

I'm going to take care of this issue my way which mostly aligns with the things shewitt has proposed.

Repository owner locked and limited conversation to collaborators Aug 2, 2025
Copy link
Collaborator

paxcut commented Aug 2, 2025

What's unfathomably about the verifiable existence of the prs mentioned in my hours of days of work on fixing problems on tabs and other chars on imhex before you ever posted your fist post on github? check the facts before you accuse.

Copy link
Owner

WerWolv commented Aug 2, 2025

Stop means stop for you too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

@paxcut paxcut paxcut requested changes

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /