10
\$\begingroup\$

Original code and demo at this gist.

Given a Markdown document like

Here is some text,
and some more text.
```javascript
const message = "This is JavaScript!";
```
More text follows, and then
```javascript
console.log(message);
```

I want to print out the sections in code fences, not including the code fences, separated by a single blank line:

const message = "This is JavaScript!";
console.log(message);

I came up with the following AWK script, which seems to do the job nicely:

#!/usr/bin/awk -f
BEGIN { in_code_block = 0 }
/^```/ {
 if (!in_code_block) {
 in_code_block = 1;
 first_line = 1;
 } else {
 in_code_block = 0;
 print "";
 }
}
{
 if (in_code_block && !first_line) {
 print;
 }
 first_line = 0;
}

A goal is for the script to be dependency-minimal. I don't want to have to install an implementation of CommonMark or an Erlang environment. AWK fits this bill well.

Correspondingly, a non-goal is for this script to be correct in all cases: I’m happy to accept false positives on lines starting with ```inline code``` like this, and similar edge cases.

I’m mostly looking for critique of my AWK, with respect to which I am a total newbie. But any comments are welcome!

asked May 23, 2018 at 4:42
\$\endgroup\$

3 Answers 3

4
\$\begingroup\$

You could shorten the code with the next statement, which skips the current rule as well as all following it and starts a next iteration on the next input record.

See in GNU.org AWK manual Next-Statement.

Also you can use your variable as a condition to the command without additional if() inside.

BEGIN { in_code_block = 0 }
/^```/ {
 if (in_code_block)
 print "";
 in_code_block = ! in_code_block;
 next
 }
in_code_block { print; }

Tested with GNU Awk 4.1.3.

answered May 23, 2018 at 15:55
\$\endgroup\$
2
  • \$\begingroup\$ This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks. \$\endgroup\$ Commented May 23, 2018 at 16:29
  • \$\begingroup\$ I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerers—I learned something from each. :-) \$\endgroup\$ Commented May 29, 2018 at 2:24
9
\$\begingroup\$

While your code looks ok, it could be improved greatly by making use of RS (record separator) and NR number of record, provided you're using GNU awk.

 awk -v RS='```[a-z]*\n' '(NR+1)%2' file

In this case RS is set such that it is catching everything between triple backticks with optional text.

The only awk statement is to print one record out of two.

answered May 23, 2018 at 6:50
\$\endgroup\$
4
  • \$\begingroup\$ This is very cute. gawk is required so that a multi-character RS is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does using RS='^```[a-z]*\n' (added start-of-line anchor) not work? \$\endgroup\$ Commented May 23, 2018 at 7:14
  • \$\begingroup\$ RS is by default set to \n which means every line is an awk record. Changing RS changes the meaning of ^ and $ because you possibly have multi-lines record (which is the case here). So you cannot use ^ in RS in this case, but you could use RS='\n```[a-z]*\n' \$\endgroup\$ Commented May 23, 2018 at 7:27
  • \$\begingroup\$ @wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something? \$\endgroup\$ Commented May 23, 2018 at 7:46
  • \$\begingroup\$ It's fine for text after a closing backtick to not be printed—this is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. Regarding RS: it sounds like ^ is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case. \$\endgroup\$ Commented May 23, 2018 at 15:44
7
\$\begingroup\$

The Code looks perfect to me.

I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.

/^```/, /^```/ { ... }

Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.

answered May 23, 2018 at 5:27
\$\endgroup\$
1
  • \$\begingroup\$ This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind. \$\endgroup\$ Commented May 23, 2018 at 16:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.