Original code and demo at this gist.
Given a Markdown document like
Here is some text,
and some more text.
```javascript
const message = "This is JavaScript!";
```
More text follows, and then
```javascript
console.log(message);
```
I want to print out the sections in code fences, not including the code fences, separated by a single blank line:
const message = "This is JavaScript!";
console.log(message);
I came up with the following AWK script, which seems to do the job nicely:
#!/usr/bin/awk -f
BEGIN { in_code_block = 0 }
/^```/ {
if (!in_code_block) {
in_code_block = 1;
first_line = 1;
} else {
in_code_block = 0;
print "";
}
}
{
if (in_code_block && !first_line) {
print;
}
first_line = 0;
}
A goal is for the script to be dependency-minimal. I don't want to have to install an implementation of CommonMark or an Erlang environment. AWK fits this bill well.
Correspondingly, a non-goal is for this script to be correct in all
cases: I’m happy to accept false positives on lines starting with
```inline code``` like this
, and similar edge cases.
I’m mostly looking for critique of my AWK, with respect to which I am a total newbie. But any comments are welcome!
3 Answers 3
You could shorten the code with the next
statement, which skips the current rule as well as all following it and starts a next iteration on the next input record.
See in GNU.org AWK manual Next-Statement.
Also you can use your variable as a condition to the command without additional if()
inside.
BEGIN { in_code_block = 0 }
/^```/ {
if (in_code_block)
print "";
in_code_block = ! in_code_block;
next
}
in_code_block { print; }
Tested with GNU Awk 4.1.3.
-
\$\begingroup\$ This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks. \$\endgroup\$wchargin– wchargin2018年05月23日 16:29:08 +00:00Commented May 23, 2018 at 16:29
-
\$\begingroup\$ I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerers—I learned something from each. :-) \$\endgroup\$wchargin– wchargin2018年05月29日 02:24:49 +00:00Commented May 29, 2018 at 2:24
While your code looks ok, it could be improved greatly by making use of RS
(record separator) and NR
number of record, provided you're using GNU awk
.
awk -v RS='```[a-z]*\n' '(NR+1)%2' file
In this case RS
is set such that it is catching everything between triple backticks with optional text.
The only awk
statement is to print one record out of two.
-
\$\begingroup\$ This is very cute. gawk is required so that a multi-character
RS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does usingRS='^```[a-z]*\n'
(added start-of-line anchor) not work? \$\endgroup\$wchargin– wchargin2018年05月23日 07:14:06 +00:00Commented May 23, 2018 at 7:14 -
\$\begingroup\$
RS
is by default set to\n
which means every line is anawk
record. ChangingRS
changes the meaning of^
and$
because you possibly have multi-lines record (which is the case here). So you cannot use^
inRS
in this case, but you could useRS='\n```[a-z]*\n'
\$\endgroup\$oliv– oliv2018年05月23日 07:27:12 +00:00Commented May 23, 2018 at 7:27 -
\$\begingroup\$ @wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something? \$\endgroup\$oliv– oliv2018年05月23日 07:46:08 +00:00Commented May 23, 2018 at 7:46
-
\$\begingroup\$ It's fine for text after a closing backtick to not be printed—this is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. Regarding
RS
: it sounds like^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case. \$\endgroup\$wchargin– wchargin2018年05月23日 15:44:09 +00:00Commented May 23, 2018 at 15:44
The Code looks perfect to me.
I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.
/^```/, /^```/ { ... }
Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.
-
\$\begingroup\$ This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind. \$\endgroup\$wchargin– wchargin2018年05月23日 16:31:46 +00:00Commented May 23, 2018 at 16:31