I have ack
(the searching tool) "installed" as a single file at C:\ack\bin\ack.pl
on a Windows 10 machine and was wondering how to make it executable from the powershell window and other terminals.
I wanted to do it ideally without touching the PATHEXT
variable and associating an interpreter to the extension because that has system-wide consequences. I'm only really after a shebang workalike.
First I wrote a batch script wrapper around it (ack.bat
).
@echo off
perl.exe %~dp0\ack.pl %*
But, that has the annoying property of handling interruptions via ^C
with an are you sure?
prompt.
Terminate batch job (Y/N)? y
I was curious how to make something interruptible without prompting and tried looking for an equivalent of the exec*
system calls on Windows. Then I came across this answer on ServerFault which complains about the same problem and suggests generating a C program as a potential solution.
I wrote a C++ wrapper that does the trick, but it turned out quite a bit uglier than I would have liked.
// The strings in this file are UTF-16LE for compat with the win32 api
// The source code itself is in UTF-8.
#include <windows.h> // GetCommandLineW
#include <iostream> // wcout
#include <string> // wstring
#include <cassert> // assert
#include <utility> // pair
#include <deque> // deque, size_type
// always debug. Assert is only used when we actually want to
// crash the wrapper process.
#undef NDEBUG
// unsigned index type, probably good enough for traversing
// a vector or deque
typedef std::deque<char>::size_type uidx;
// interpreter_name must be absolute path
std::wstring interpreter_name = %%%%INTERPETER_NAME%%%% ;
std::wstring script_name = %%%%SCRIPT_NAME%%%% ;
class Reaper {
public:
HANDLE job_handle;
JOBOBJECT_EXTENDED_LIMIT_INFORMATION limit_info;
Reaper() {
job_handle = CreateJobObject(NULL, NULL);
assert(job_handle != NULL);
limit_info = { 0 };
limit_info.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
DWORD set_success = SetInformationJobObject(
job_handle,
JobObjectExtendedLimitInformation,
&limit_info,
sizeof(limit_info));
assert(set_success);
}
};
Reaper& get_reaper(void) {
static Reaper r;
return r;
}
// the leading int is the error code.
std::pair<DWORD, std::deque<std::wstring>> argvw_of_cmdline(std::wstring command_line) {
LPCWSTR cmd_line = command_line.c_str();
int count = 0;
LPWSTR *the_processed_args = CommandLineToArgvW(
cmd_line, &count
);
// first we handle the error case
if (the_processed_args == nullptr) {
return {GetLastError(), std::deque<std::wstring>()};
} else {
std::deque<std::wstring> s;
for (int i = 0; i < count; ++i) {
s.push_back(the_processed_args[i]);
}
return {0, s};
}
}
std::wstring escape_string(std::wstring ws) {
bool contains_suspect_char = (std::wstring::npos != ws.find_first_of(L"\"" L"\\"));
if (contains_suspect_char) {
std::wstring out(L"\"");
for (uidx i = 0; i < ws.size(); ++i) {
if (ws[i] == L'"' || ws[i] == L'\\') {
out += L'\\';
out += ws[i];
} else {
out += ws[i];
}
}
out += L'"';
return out;
} else {
return ws;
}
}
std::wstring cmdline_of_argvw(const std::deque<std::wstring> &argvw) {
std::wstring the_line(L"");
// this is okay even if the deque is empty
// because the loop will be traversed zero times.
uidx last_index = argvw.size() - 1;
for (uidx i = 0; i < argvw.size() ; i++) {
the_line += escape_string(argvw[i]);
if (i != last_index) {
the_line += L' ';
}
}
return the_line;
}
struct RawWinProcessCreatorW {
LPCWSTR app_name = NULL;
LPWSTR command_line = NULL;
LPSECURITY_ATTRIBUTES process_attributes = NULL;
LPSECURITY_ATTRIBUTES thread_attributes = NULL;
BOOL inherit_handles = false;
DWORD creation_flags = 0;
LPVOID environment = NULL;
LPCWSTR current_directory = NULL;
LPSTARTUPINFOW startup_info = NULL;
LPPROCESS_INFORMATION process_information = NULL;
bool run() {
return CreateProcessW(
app_name,
command_line,
process_attributes,
thread_attributes,
inherit_handles,
creation_flags,
environment,
current_directory,
startup_info,
process_information
);
}
};
std::wstring current_exe_directory(void) {
HMODULE h_module = GetModuleHandleW(nullptr);
WCHAR path[MAX_PATH];
memset(path, 0, sizeof(path));
GetModuleFileNameW(h_module, path, MAX_PATH);
std::wstring w_path(path);
// if the last character is a path separator
// remove it.
if (w_path.back() == L'\\') {
w_path.pop_back();
}
// keep popping until the last character is a \ -- thwart line continuation
while (!w_path.empty()) {
if (w_path.back() == L'\\') {
w_path.pop_back();
return w_path;
} else {
w_path.pop_back();
}
}
return w_path;
}
int main(int argc, char **argv)
{
std::wstring exe_dir(current_exe_directory());
std::wstring fullpath;
fullpath += exe_dir;
fullpath += std::wstring(L"\\");
fullpath += script_name;
std::wstring old_command_line(GetCommandLineW());
std::pair<DWORD, std::deque<std::wstring>> p = argvw_of_cmdline(old_command_line);
DWORD err = p.first;
assert(err == 0);
std::deque<std::wstring> split_cl = p.second;
// remove old executable (it's the current one)
split_cl.pop_front();
// need to push interpreter_name and script_name.
// but the order is reversed.
split_cl.push_front(fullpath);
split_cl.push_front(interpreter_name);
std::wstring command_line = cmdline_of_argvw(split_cl);
// make sure to zero-initialize these things.
STARTUPINFOW si = { 0 };
PROCESS_INFORMATION pi = { 0 };
RawWinProcessCreatorW r;
r.app_name = (interpreter_name.c_str());
r.command_line = const_cast<LPWSTR>(command_line.c_str());
r.inherit_handles = true;
r.startup_info = &si;
r.process_information = π
r.creation_flags |= CREATE_SUSPENDED;
bool success = r.run();
assert(success);
// DWORD last_error = GetLastError();
// assign to the job object whatever.
DWORD assign_status = AssignProcessToJobObject(
get_reaper().job_handle,
pi.hProcess
);
assert(assign_status);
// resume the process.
DWORD resume_status = ResumeThread(pi.hThread);
// wait for the process we spawned.
DWORD wait_res = WaitForSingleObject(pi.hProcess, INFINITE);
assert(wait_res != WAIT_ABANDONED);
assert(wait_res != WAIT_TIMEOUT);
assert(wait_res != WAIT_FAILED);
// after the process is gone, try to figure out whether it succeeded
// and use that information when deciding how to exit yourself.
// we're using 10, bad environment, as a sentinel.
DWORD child_exit_status = 10;
bool recover_exit_status_success = GetExitCodeProcess(
pi.hProcess,
&child_exit_status
);
assert(recover_exit_status_success);
assert(child_exit_status != 10);
return child_exit_status;
}
And here's a Perl script, make_wrapper.pl
, that generates and compiles a wrapper script.
The template is concatenated underneath the end of the __DATA__
token with %%%%INTERPETER_NAME%%%%
appearing where the interpreter name is and %%%SCRIPT_NAME%%%%
appearing where the script name is.
#! /usr/bin/env perl
use strict;
use warnings;
use utf8;
use Getopt::Long;
use File::Spec;
my $interpreter;
my $script;
my $cxx_compiler;
my $output;
GetOptions(
"int|i=s" => \$interpreter,
"script|s=s" => \$script,
"cxx|c=s" => \$cxx_compiler,
"output|o=s" => \$output,
);
sub escape_wide_string_literal {
my ($contents) = @_;
my $out = q[];
$out .= 'L"';
$out .= ($contents =~ s/([\"\\])/\\1ドル/gr);
$out .= '"';
return $out;
}
sub defined_and_nonempty {
my ($x) = @_;
return (defined $x) && ($x ne q[]);
}
die "need interpreter (--int|-i)" unless defined_and_nonempty($interpreter);
die "need script (--script|-s)" unless defined_and_nonempty($script);
die "need C++ compiler (--cxx|-c)" unless defined_and_nonempty($cxx_compiler);
die "need output file (--output|-o)" unless defined_and_nonempty($output);
die "interpreter must exist (--int|-i)" unless (-f $interpreter);
die "script must exist (--script|-s)" unless (-f $script);
die "C++ compiler must exist (--cxx|-c)" unless (-f $cxx_compiler);
die "intepreter must be absolute path (--int|-i)" unless (File::Spec->file_name_is_absolute($interpreter));
die "script should be relative path with no separators (.\\ is okay) (--script|-s)" if ($script =~ /\\/ and not $script =~ /\A[.][\\][^\\]*\z/);
my $cxx_template;
do {
local $/;
$cxx_template = <DATA>;
};
close(DATA);
die "internal error" unless defined $cxx_template;
my $interpreter_literal = escape_wide_string_literal($interpreter);
$cxx_template =~ s/%%%%INTERPETER_NAME%%%%/$interpreter_literal/g;
my $script_literal = escape_wide_string_literal($script);
$cxx_template =~ s/%%%%SCRIPT_NAME%%%%/$script_literal/g;
open my $fh, '>', "temp.cpp";
print $fh $cxx_template;
close($fh);
system($cxx_compiler, "-o", $output, "temp.cpp");
die "did not create file" unless (-f $output);
__DATA__
4 Answers 4
C++
Globally, your C++ seems good enough.
In argvw_of_cmdline
you don't have to wrap your second part in a else
since you return in the if
.
You can remove a lot of things in current_exe_directory
:
std::wstring current_exe_directory(void) {
HMODULE h_module = GetModuleHandleW(nullptr);
WCHAR path[MAX_PATH];
memset(path, 0, sizeof(path));
GetModuleFileNameW(h_module, path, MAX_PATH);
std::wstring_view w_path(path);
return std::wstring{ws.begin(), ws.find_last_not_of(L'\\')};
}
(note: you can improve it, but it's already kinda simplified)
You can simplify your escape_string(...)
function:
std::wstring escape_string(std::wstring ws) {
const auto esc = std::wstring{L"\\\""};
for (size_t index = 0; (index = ws.find_first_of(esc, index)) != std::wstring::npos; index += 2) {
ws.insert(index, 1, L'\\');
}
return ws;
}
Note that I didn't tested these functions, but they should work :)
Disclaimer
It isn't clear to me what part you want reviews to focus on. I'm not a C++ on Windows person so I can't say much about that part of things. Your C++ is nicely formatted. The main()
function could be broken down more, but it is commented well enough that it wouldn't be a burden to maintain in the current state.
Perl
Perl - even on Windows - is something I can talk about...... and your Perl is in good shape too.
- Absolutely have
strict
andwarning
pragmas, all the time, every day, for 99% of Perl, yes yes. - Good formatting.
- Idiomatic use of
die X unless Y
. Yourdie
messages are excellent as well. - Variables are localized! (There's nothing worse than 50k lines of perl and no localization)
- Variables and subroutines are named in ways that are easy to follow. Thanks.
- I like using scalars for filehandles also. I believe this is a Perl Best Practice now.
But - like everything - it could be better:
- It is funny to me that you commented the C++ and not the Perl at all. Part of me takes this as a compliment to Perl's natural readability, but it still seems like you could comment a bit and not distract from the code.
- The primary instance where missing comments slowed me down in reading this was in the variables. Your variables names are fine, but having a comment that uses slightly different words can help avoid ambiguities and confusion.
- The next instance of where a comment would be great is explaining the "hanging"
__DATA__
section. This looks really out of place, but your explanation above the code was perfect. Why not paste that into the code above the "hanging"__DATA__
section?
Beyond commentary on comments:
- The way you slurp data in from the
DATA
filehandle definitely works I would encapsulate it into asub
or useFile::Slurp
itself. Hiding the implementation would make it slightly easier to follow the code. my $out = q[];
isn't idiomatic Perl. I would recommendmy $out = '';
which uses two single quotes instead ofq[]
.- Your
defined_and_nonempty
is pretty similar to what Perl would give you forscalar($x)
. Do you care if$x == 0
that it evaluates false? Again the use ofq[]
feels a bit odd.
#undef NDEBUG
needs to be done before the point at which you #include <cassert>
.
Here are some additional suggestions not addressed in the excellent review of the Perl code in this previous answer.
Documentation
The code needs usage documentation, and it would be helpful to provide a -help
option
to show the usage information.
The standard way to document code in Perl is to use plain old documentation (POD) in conjunction with the Pod::Usage module.
The POD should include:
- Summary of the code's purpose
- A description of the required input
- A description for each command line option
It also gives you manpage-like help with perldoc:
perldoc make_wrapper.pl
Options
There is absolutely nothing wrong with how you are processing the command line options. However, I have found a different approach to be slightly more scalable in terms of adding or removing options.
The approach is to store options in a hash variable as opposed to individual scalar variables.
I have always found the GetOptions
requirement of passing a
reference to a scalar variable to be a bit ugly. In general, it is much more
common to pass a reference to an array or a hash than it is to a scalar.
Here is what it would look like:
my %opt;
GetOptions(\%opt, qw(
int|i=s
script|s=s
cxx|c=s
output|o=s
)) or die "Option parsing failure\n";
Instead of 4 scalar variables, there is a single variable, %opt
,
which would be used as follows, for example:
die "script must exist (--script|-s)" unless (-f $opt{script});
I sometimes find it helpful to know that the variable was a command line option, and this makes that clear.
Another thing to notice is that in this format, GetOptions
accepts
the hash ref as the 1st argument, followed by a list of option
specifications. The specs are just a list of text strings:
'int|i=s', 'script|s=s', etc.
We can take advantage of the "quote words" operator, qw
, and omit
all the quotes and commas, leaving a clean list of specs.
I typically have this code inside a sub
, which is preferable as the
number of options grows. It is also good being in a sub if there
is further option value checking.
Namespace
It is best to import only what is needed to avoid namespace pollution. For example, change:
use Getopt::Long;
use File::Spec;
to:
use Getopt::Long qw(GetOptions);
use File::Spec qw();
Warnings
The previous answer suggested:
use warnings;
This is fine. My preference is to use a very strict version of warnings:
use warnings FATAL => 'all';
In my experience, the warnings have always pointed to a bug in my code. The issue is that, in some common usage scenarios, it is too easy to miss the warning messages unless you are looking for them. They can be hard to spot even if your code generates a small amount of output, not to mention anything that scrolls off the screen. This option will kill your program dead so that there is no way to miss the warnings.
Layout
The subs are in the middle of the code, which interrupts the natural
flow of the code (from a human readability standpoint).
It is common to move the functions to the bottom of the code,
just above the __DATA__
line in your code.
There is some inconsistent use of parentheses around built-in functions. It is cleaner to omit the parens:
close $fh;
There's also inconsistency with quote usage:
open my $fh, '>', "temp.cpp";
In this case, it is more consistent to only use single quotes:
open my $fh, '>', 'temp.cpp';
Since it is hard for most people to remember all of Perl's special variables, like $/, it is a good practice to use the English module:
use English;
$/
becomes $INPUT_RECORD_SEPARATOR
Explore related questions
See similar questions with these tags.