- 
  Notifications
 
You must be signed in to change notification settings  - Fork 133
 
Add binary files whose filenames contain spaces into changeset #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...ames contain spaces could be recognized in the changeset. diff -r 000000000000 -r 4f849d92debe 01 foo/bar.jpg Binary file 01 foo/bar.jpg has changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this would actually be an issue or not (I don't use Hg), but considering that \s matches HT (9), LF (10), FF (12), CR (13), and space (32), your change actually goes beyond what the PR's description says it does (i.e. it allows more than just space), and I'm wondering if this could be problematic in some situations, e.g. with CR+LF line endings on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response and thank you for point out my fault. Because I have to match Chinese characters, but I was too greedy by using ([^\n]*) only. According to PHP page, I will change it to ([\P{C}]*) and have a test.
As a side note, when sending pull requests, could you please
- create a dedicated feature branch for each pull requests instead of using master (here you used the same branch as for HgWeb: replace invalid function map() by array_map() #213 , and as a result the PR includes an unrelated commit)
 - rebase your feature branch on top of latest master, so there's no need to deal with merge commits and potential conflicts
 
Thanks in advance
OK, I am not familiar with git. I used the old-fashion SVN before. But I will try out your advice next time. Thanks for the guide.
No worries. More importantly, you have not answered #217 (comment)
I have tested \P{C} in PHP 5.6.30.
preg_match_all('#([\P{C}]*)#', '中文', $t_matches, PREG_SET_ORDER); var_dump($t_matches);
But it cannot match the Chinese characters as I thought.
If I change the pattern to ^\n, it will do the matches.
preg_match_all('#([^\n]*)#', '中文', $t_matches, PREG_SET_ORDER); var_dump($t_matches);
I am not sure what is wrong about the \P{C}. Maybe I should change the original patter of [^\s]* to [^\r\n\t\f\v]* in order to enable space in the file path.
By the way, the sample input file for SourceHgWeb to match the file name in the changeset is
diff -r c56b67cfe60d -r bb6504e6a338 01 中文資料夾/中文檔案.jpg Binary file 01 中文資料夾/中文檔案.jpg has changed
According to this post, an Unicode modifier u should be appended in order to enable \P{C} to match Unicode characters.
But the \P{C} will match the \n. That is not what we need. The correct pattern should be [\p{L}\p{N}\p{S}\p{P}\p{M} ]*. The pattern [^\r\n\t\f\v]* also works in my test. And I think the later one is easier to understand. I will create a new PR for this.
Update HgWeb filename match pattern. So that binary files whose filenames contain spaces could be recognized in the changeset.
test sample:
diff -r 000000000000 -r 4f849d92debe 01 foo/bar.jpg
Binary file 01 foo/bar.jpg has changed