Module talk:DecodeEncode
Bug report: bad decoding of U+03B5 ε (epsilon)
[edit ]About U+03B5 ε GREEK SMALL LETTER EPSILON (ε ε)
- Issue: after resolving HTML entity
ε
bymw.text.decode()
, the plain character is not found bymw.ustring.gsub()
. No issue with alternative HTML entityε
. ε good, ε bad.
- Report limitations: Original report and bug reproduction is at enwiki Module talk:DecodeEncode, from where en:module:DecodeEncode and en:module:String are used live. At phabricator pseudocode may be used and some "results" may be hardcoded. In-text the escape
&
is used, not in-function. Lua patterns not used ("no%
").
- To reproduce:
- 1. Create research string:
Xε1Xε2X
(shows live and unedited as: Xε1Xε2X)
- 2. Render the string by
decode()
(as inner function) - 3. then on rendered result use
gsub()
to replace plain characterε
→E
: (as outer function)mw.ustring.gsub( s=(
[is pseudo-code, see note. 21:10, 7 February 2023 (UTC)]mw.text.decode( s=Xε1Xε2X, decodeNamedEntities=true )
), pattern=ε, repl=E )
- 4. Result3 (s&r pattern use ε from
Xε1X
):- XE1XE2X
- 5. Result4 (s&r pattern use ε from
Xε2X
):- XE1XE2X
- Expected:
XE1XE2X
(only one characterε
exists)
- Note 21:10, 7 February 2023 (UTC): This step 3 is in pseudo-code. To reproduce, use Lua modules module:String and Module:DecodeEncode:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
- → XE1XE2X
- -DePiep (talk) 21:10, 7 February 2023 (UTC) [reply ]
Workaround A, ad hoc
[edit ]Workaround A, ad hoc: add innermost function to first replace in the research string ε
→ ε
:
- A1:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s={{#invoke:String|replace|source=Xε1Xε2X|pattern=ε|replace=ε|plain=true}}}}|pattern=ε|replace=E|plain=true}}
→ - XE1XE2X
Workaround B, in module (THIN SPACE example)
[edit ]Workaround B: early in :en:module:DecodeEncode, replace ε
→ ε
About THIN SPACE: it looks like character U+2009 THIN SPACE (   ) has a samilar issue.   good,   bad.
Currently in code:
functionp._decode(s,subset_only) localret=nil; s=mw.ustring.gsub(s,' ',' ')-- Workaround for bug:   gets properly decoded in decode, but   doesn't. ret=mw.text.decode(s,notsubset_only) returnret end
In en:module:DecodeEncode/sandbox, I have coded a similar handling of EPSILON:
functionp._decode(s,subset_only) localret=nil; -- U+2009 THIN SPACE: workaround for bug: HTML entity   is decoded incorrect. Entity   gets decoded properly s=mw.ustring.gsub(s,' ',' ') -- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity ε is decoded incorrect for gsub(). Entity ε gets decoded properly s=mw.ustring.gsub(s,'ε','ε') ret=mw.text.decode(s,notsubset_only) returnret end
- /sandbox tests:
- B.
{{#invoke:String|replace|source={{#invoke:DecodeEncode/sandbox|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
- B1. ResultB1 (s&r pattern use ε from
Xε1X
): XE1XE2X - B2. ResultB2 (s&r pattern use ε from
Xε2X
): XE1XE2X
I propose to edit the module along this way.
Workaround C (mw, Lua)
[edit ]Changes in mw, Lua: I have not idea.
- I propose to consider module editing along § Workaround B. -DePiep (talk) 12:26, 4 February 2023 (UTC) [reply ]
testcases EPSILON
[edit ]- Original failure, now solved=not showing any more:
-
- (hardcoded explanation here): in cell marked Red XN, the result showed as "XE1Xε2X". That is: wikitext input "
ε
" was not recognised & replaced. -DePiep (talk) 07:49, 19 February 2023 (UTC) [reply ]
- (hardcoded explanation here): in cell marked Red XN, the result showed as "XE1Xε2X". That is: wikitext input "
EPSILON ε ⟨ε ⟩ error & fix proposal (16 Feb 2023)
| |||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 |
id | entity code | plain | mod:.. decode(&entity;) | replace(decode(..)) with E pattern=hardcoded ⟨ε⟩ from plain (s=&entity;) (s=checkstring) |
mod:..decode/sandbox |
checkstring | Xε1Xε2X
|
>Xε1Xε2X< | >Xε1Xε2X< | ||
EPSI | ε
|
>ε< | >ε< | E XE1XE2X |
E XE1XE2X |
EPSILON | ε
|
>ε< | >ε< | E XE1XE2X Red XN |
E XE1XE2X |
- See § Workaround B, in module (THIN SPACE example) for code change;
- Similar fix as U+2009 THIN SPACE ( ,  ) has (though original cause bug may be different for THIN SPACE).
- Phabricator T328840 did not gain traction. Would be mw-level, not this module.
Template-protected edit request on 16 February 2023
[edit ]|answered=
parameter to no to reactivate your request.- Please copy all code from module:DecodeEncode/sandbox into module:DecodeEncode (diff)
- Issue: bad decoding of HTML entity
ε
Red XN - re U+03B5 ε GREEK SMALL LETTER EPSILON (ε, ε)
- Change: fix by replacing with entity
ε
Green tickY before applyingdecode()
. See § Workaround B for code diff & backgrounds; minor comment change - Discussion: (1) reported at T328840, no responses (mw-level); (2) bug report here not challenged
- Testcases: See § testcases EPSILON.
- DePiep (talk) 06:49, 16 February 2023 (UTC) [reply ]
NBSP behaviour
[edit ]Leaving this note here.
About NBSP, U+00A0 NO-BREAK SPACE ( ,  ). With input
I am experiencing problems reminding of § epsilon (T328840, now resolved).
When nested like: (replace|s=(decode|s=AB YZ
)|replace=AB_YZ) returns breaking code (breaking when used in/with HTML/css code like span, sup, class).
No time to build the reproduction/test, so have to leave it for now. Not reported on phab. DePiep (talk) 07:27, 20 February 2023 (UTC) [reply ]
Template-protected edit request on 21 March 2023
[edit ]|answered=
parameter to no to reactivate your request.Please replace all code Module:DecodeEncode with module:DecodeEncode/sandbox. (compare )
Change: apply require('strict')
, and declade function local explicit. DePiep (talk) 14:34, 21 March 2023 (UTC) [reply ]
|answered=pause
: needs some extra eyes first. Will invite. -DePiep (talk) 14:36, 21 March 2023 (UTC) [reply ]
- Invitation is out. -DePiep (talk) 14:49, 21 March 2023 (UTC) [reply ]
- Upd: Gonnym has made large improvements, so the sandboxdiff is large. I do not see strict-related changes. DePiep (talk) 21:31, 21 March 2023 (UTC) [reply ]
- The changes are good and no globals remain. The two mw.ustring could be string. Johnuniq (talk) 06:40, 22 March 2023 (UTC) [reply ]
- thx. As said, please someone with trust perform ER because me editing/commenting in between does not help. DePiep (talk) 08:18, 22 March 2023 (UTC) [reply ]
- The changes are good and no globals remain. The two mw.ustring could be string. Johnuniq (talk) 06:40, 22 March 2023 (UTC) [reply ]
- Upd: Gonnym has made large improvements, so the sandboxdiff is large. I do not see strict-related changes. DePiep (talk) 21:31, 21 March 2023 (UTC) [reply ]
- Set
|answered=no
after two positive critiques. Also, I met no error while developing with this sandbox. -DePiep (talk) 09:00, 22 March 2023 (UTC) [reply ]
- Done — Martin (MSGJ · talk) 18:35, 22 March 2023 (UTC) [reply ]