I wrote a simple transpiler (that may be a bit of a stretch) in perl to cut down on some boiler plate code. I'm very new to perl, this being maybe my third ever project in it. My main question would be to ask for feedback on my regular expressions and making it more open to expansion (adding error messages, etc).
My code:
#!/usr/bin/perl
$/ = undef;
while(<>) {
print "#include <SDL2/SDL.h>\n#include <stdint.h>\n\n";
s/.include ([^;]+);/#include "1円"/g;
s/^include ([^;]+);/#include <1円>/g;
s/entry:/int main(void) {/g;
s/entry ([A-z][A-z0-9]*)\s*,\s*([A-z][A-z0-9]*)\s*:/int main(int 1,円 char* 2円\[\]) {\n\tSDL_Init(SDL_INIT_EVERYTHING);/g;
s/([A-z][A-z0-9]*)\s+:u(8|16|32|64)(\**)/uint2円_t3円 1円/g;
s/([A-z][A-z0-9]*)\s+:(8|16|32|64)(\**)/int2円_t3円 1円/g;
s/([A-z][A-z0-9]*)\s+:Color\s+({[^}]+})/struct { uint8_t r, g, b, a; }1円 = 2円/g;
s/([A-z][A-z0-9]*)\s+:Color/struct\s+{ uint8_t r, g, b, a; }1円;/g;
s/([A-z][A-z0-9]*)\s+:-([A-z][A-z0-9]*)/struct 2円 1円/g;
s/([A-z][A-z0-9]*)\s+:Win\s*{([^}]+)}/SDL_Window* 1円 = SDL_CreateWindow(2円)/g;
s/([A-z][A-z0-9]*)\s+:Win/SDL_Window* 1円/g;
s/([A-z][A-z0-9]*)\s+:Ren\s*{([^}]+)}/SDL_Renderer* 1円 = SDL_CreateRenderer(2円)/g;
s/([A-z][A-z0-9]*)\s+:Ren/SDL_Renderer* 1円/g;
s/@([A-z][A-z0-9]*)\s*\(([^)]+)\)/typedef struct 1円 {2円} 1円;/g;
s/([A-z][A-z0-9]*)\s+:@@\s*\(([^)]+)\)/struct {2円}1円;/g;
s/@/struct /g;
s/([A-z][A-z0-9]*)\s+:([A-z][A-z0-9]*) ({[^}]*})/2円 1円 = 3円/g;
s/([A-z][A-z0-9]*)\s+:([A-z][A-z0-9]*)/2円 1円/g;
s/Wait\s+a\s+minute!!!/SDL_Delay(60*1000);/g;
s/wait\s+([0-9bx]+)/SDL_Delay(1円)/g;
s/clear\s+([A-z][A-z0-9]*)/SDL_RenderClear(1円)/g;
s/set\s+([A-z][A-z0-9]*)\s+color\s+to\s+([A-z][A-z0-9]*)/SDL_SetRenderDrawColor(1,円 2円.r, 2円.g, 2円.b, 2円.a)/g;
s/set\s+([A-z][A-z0-9]*)\s+([A-z][A-z0-9]*)\s+to\s+([A-z0-9*\/+-]+)/1円.2円 = 3円/g;
s/show\s+([A-z][A-z0-9]*)/SDL_RenderPresent(1円)/g;
s/([A-z][A-z0-9]*)\s([A-z][A-z0-9]*\**)\s+([A-z][A-z0-9]*)\s*:/void 1円(2円 3円) {/g;
s/([A-z][A-z0-9]*)\s*:/void 1円(void) {/g;
s/-!-/}/g;
print;
}
print "\n";
Sample input:
q :@@(
a :u8;
)
@pixel (
r :u8; g :u8; b :u8; a :u8;
)
huh a :u8*:
-!-
letmecheck this :pixel:
-!-
entry:
p :pixel {0, 255, 0, 255};
win :Win {"Hello world!", 300, 300, 600, 600, 0};
ren :Ren {win, -1, 0};
set p g to 127;
set ren color to p;
clear ren;
show ren;
wait 500;
set p b to 127;
set ren color to p;
clear ren;
show ren;
wait 500;
set p r to 127;
set ren color to p;
clear ren;
show ren;
wait 500;
-!-
output:
#include <SDL2/SDL.h>
#include <stdint.h>
struct {
uint8_t a;
}q;
typedef struct pixel {
uint8_t r; uint8_t g; uint8_t b; uint8_t a;
} pixel;
void huh(uint8_t* a) {
}
void letmecheck(pixel this) {
}
int main(void) {
pixel p = {0, 255, 0, 255};
SDL_Window* win = SDL_CreateWindow("Hello world!", 300, 300, 600, 600, 0);
SDL_Renderer* ren = SDL_CreateRenderer(win, -1, 0);
p.g = 127;
SDL_SetRenderDrawColor(ren, p.r, p.g, p.b, p.a);
SDL_RenderClear(ren);
SDL_RenderPresent(ren);
SDL_Delay(500);
p.b = 127;
SDL_SetRenderDrawColor(ren, p.r, p.g, p.b, p.a);
SDL_RenderClear(ren);
SDL_RenderPresent(ren);
SDL_Delay(500);
p.r = 127;
SDL_SetRenderDrawColor(ren, p.r, p.g, p.b, p.a);
SDL_RenderClear(ren);
SDL_RenderPresent(ren);
SDL_Delay(500);
}
I am open to any and all feedback, thanks in advance!
EDIT: This is asking for a review of the PERL code, not the C code
4 Answers 4
Making shebang line more portable
Fixing the perl
binary to /usr/bin/perl
in the shebang will not work if you are using perlbrew
instead of the system perl
. Using /usr/bin/env perl
will be a more portable alternative.
use strict
and warnings
pragmas to catch error at an early stage
By adding warnings
you will get quite a few warnings, as discussed below.
Unescaped left brace in a regex is deprecated and will be illegal from perl version 5.32 on
You have two lines (line 11 and 21) where you have used a literal {
in your regex without escaping it. The warning you will get is:
Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.32), passed through in regex; marked by <-- HERE in m/([A-z][A-z0-9]*)\s+:([A-z][A-z0-9]*) ({ <-- HERE [^}]*})/
Unrecognized escape \s
used in substitution text
On line 12 you have a \s
in the substitution text. This is not a valid escape in a text string (it is only valid in the regex part of the substitution operator).
Unrecognized escape \s passed through at ./p.pl line 12.
You can replace \s+
with a single space for example.
1円 better written as 1ドル
Replace all uses of the back references 1円
, 2円
, 3円
with 1ドル
, 2ドル
, 3ドル
, see perldoc perlre for the details behind this warning.
Use say
instead of print
to avoid having to type \n
at the end of a string.
Since perl version 5.10 you can add use feature qw(say)
to your program to use this feature.
use local
to avoid clobbering special (global) variables
Even if it is not necessary in your short program, you should localize $/
when you set it to undef
: This will avoid bugs from creeping in at a later time when your program grows larger. If you do not localize the variable it will be set globally for your whole program to undef
.
Use dedicated parsing modules to simplify maintenance and readability of your code
Your code is already starting to become hard to read and maintain (due to the complicated syntax of the regexes). Consider using a dedicated grammar and a parser like Regexp::Grammars
to make your program readable and easier to maintain.
Here is a modified version of your code incorporating some of the changes suggested above:
#!/usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
local $/ = undef;
while(<>) {
say "#include <SDL2/SDL.h>\n#include <stdint.h>\n";
s/.include ([^;]+);/#include "1ドル"/g;
s/^include ([^;]+);/#include <1ドル>/g;
s/entry:/int main(void) {/g;
s/entry ([A-z][A-z0-9]*)\s*,\s*([A-z][A-z0-9]*)\s*:/int main(int 1,ドル char* 2ドル\[\]) {\n\tSDL_Init(SDL_INIT_EVERYTHING);/g;
s/([A-z][A-z0-9]*)\s+:u(8|16|32|64)(\**)/uint2ドル_t3ドル 1ドル/g;
s/([A-z][A-z0-9]*)\s+:(8|16|32|64)(\**)/int2ドル_t3ドル 1ドル/g;
s/([A-z][A-z0-9]*)\s+:Color\s+(\{[^}]+\})/struct { uint8_t r, g, b, a; }1ドル = 2ドル/g;
s/([A-z][A-z0-9]*)\s+:Color/struct { uint8_t r, g, b, a; }1ドル;/g;
s/([A-z][A-z0-9]*)\s+:-([A-z][A-z0-9]*)/struct 2ドル 1ドル/g;
s/([A-z][A-z0-9]*)\s+:Win\s*{([^}]+)}/SDL_Window* 1ドル = SDL_CreateWindow(2ドル)/g;
s/([A-z][A-z0-9]*)\s+:Win/SDL_Window* 1ドル/g;
s/([A-z][A-z0-9]*)\s+:Ren\s*{([^}]+)}/SDL_Renderer* 1ドル = SDL_CreateRenderer(2ドル)/g;
s/([A-z][A-z0-9]*)\s+:Ren/SDL_Renderer* 1ドル/g;
s/@([A-z][A-z0-9]*)\s*\(([^)]+)\)/typedef struct 1ドル {2ドル} 1ドル;/g;
s/([A-z][A-z0-9]*)\s+:@@\s*\(([^)]+)\)/struct {2ドル}1ドル;/g;
s/@/struct /g;
s/([A-z][A-z0-9]*)\s+:([A-z][A-z0-9]*) (\{[^}]*\})/2ドル 1ドル = 3ドル/g;
s/([A-z][A-z0-9]*)\s+:([A-z][A-z0-9]*)/2ドル 1ドル/g;
s/Wait\s+a\s+minute!!!/SDL_Delay(60*1000);/g;
s/wait\s+([0-9bx]+)/SDL_Delay(1ドル)/g;
s/clear\s+([A-z][A-z0-9]*)/SDL_RenderClear(1ドル)/g;
s/set\s+([A-z][A-z0-9]*)\s+color\s+to\s+([A-z][A-z0-9]*)/SDL_SetRenderDrawColor(1,ドル 2ドル.r, 2ドル.g, 2ドル.b, 2ドル.a)/g;
s/set\s+([A-z][A-z0-9]*)\s+([A-z][A-z0-9]*)\s+to\s+([A-z0-9*\/+-]+)/1ドル.2ドル = 3ドル/g;
s/show\s+([A-z][A-z0-9]*)/SDL_RenderPresent(1ドル)/g;
s/([A-z][A-z0-9]*)\s([A-z][A-z0-9]*\**)\s+([A-z][A-z0-9]*)\s*:/void 1ドル(2ドル 3ドル) {/g;
s/([A-z][A-z0-9]*)\s*:/void 1ドル(void) {/g;
s/-!-/}/g;
print;
}
say "";
Along with the excellent answer of Håkon Hægland,
Use perl's quoted regexes to simplify your code and improve readability massively.
my $keyword = qr/[A-z][A-z0-9]*/;
my $bits = qr/(?:8|16|32|64)/;
Makes:
s/entry ([A-z][A-z0-9]*)\s*,\s*([A-z][A-z0-9]*)\s*:/int main(int 1,円 char* 2円\[\]) {\n\tSDL_Init(SDL_INIT_EVERYTHING);/g;
s/([A-z][A-z0-9]*)\s+:u(8|16|32|64)(\**)/uint2円_t3円 1円/g;
into
s/entry ($keyword)\s*,\s*($keyword)\s*:/int main(int 1,円 char* 2円\[\]) {\n\tSDL_Init(SDL_INIT_EVERYTHING);/g;
s/($keyword)\s+:u($bits)(\**)/uint2円_t3円 1円/g;
open to expansion
Avoid naked undocumented numbers
Code has SDL_Delay(500);
3 times. Are the values related? 500 what?
Since it is likely the values are to change in common, create code the reflects that. This makes for easier to expand/maintain code.
#define VIEWING_PAUSE (500 /* ms */)
// SDL_Delay(500);
SDL_Delay(VIEWING_PAUSE);
Likewise for 127, 300, 600 .... as maybe HALF_RED, WIDTH, HEIGHT
I am open to any and all feedback
Minor: keywords
When forming C code, even though not required, recommend to avoid C++ keywords like this
. (delete
, new
, class
, ....)
// void letmecheck(pixel this) {
void letmecheck(pixel this_pixel) {
-
1\$\begingroup\$ I think I need to adjust the question in some way. I was actually asking for advice on the perl code that outputs the C code that you responded too. Sorry about that \$\endgroup\$Shipof123– Shipof1232020年06月09日 16:52:18 +00:00Commented Jun 9, 2020 at 16:52
-
\$\begingroup\$ @Shipof123 My thoughts here are meant to imply a change to the perl code to effect the suggested changes in C. \$\endgroup\$chux– chux2020年06月09日 17:00:49 +00:00Commented Jun 9, 2020 at 17:00
-
1\$\begingroup\$ I think that adding definitions into the perl output could cause collisions, that being said, adding comments into the output code that reflect the input would be useful. \$\endgroup\$Shipof123– Shipof1232020年06月09日 17:41:35 +00:00Commented Jun 9, 2020 at 17:41
-
1\$\begingroup\$ The repetitions of 500 come from the input to the perl code. The transpiler could recognize repeated constants in the input and turn them into a single definition, but that's a feature request and not a code review. \$\endgroup\$Kaz– Kaz2020年06月10日 07:25:26 +00:00Commented Jun 10, 2020 at 7:25
First of all, let me mention that if you want to build a production quality, maintainable, and scalable transpiler you should not do it using regular expressions. That is regardless of the language in which the regular expressions are written, even if perl. Without going into much detail, regular expressions work on plain text, and you need more than that to do anything non-trivial. You can look into compiler theory if you want to learn more about that.
For your use, namely converting something you have full control over into something else you also have full control over in a simple line-by-line conversion, and that only you will use this is fine. Others have discussed the quality of your current code, I won't go into that.
I will, however, mention that undocumented regular expressions are very hard to maintain. You should at least document each well - Perl has a verbose form well suited for this. You should have test cases that ensures that if you put in a known source file, you get exactly what you expect out in the other end. If not, something broke.
You may also want to consider peer review. Have a colleague or fellow student to look at your code and listen to what baffles them. If it confuses them now, it will confuse you too later when you have forgotten the tiny details, which happens to each and every programmer at some point, and THAT is when important projects get rewritten. At the current state you will probably have a hard time convincing anybody to inherit maintainership of your code.
Good luck - this is a very good exercise.
A-z
should beA-Za-z
. These are not equivalent. \$\endgroup\$