-1

I'm trying to use sed to remove links like and leave just the title:

## [Some title](#some-title)

This is my command:

sed 's/^\(\#*\) *\[\([^\]]*\)\].*/1円 2円/'

I expect to have just the text without the link:

## Some title

But it doesn't work. What I do wrong?

I'm using Linux with GNU sed.

asked Jul 17 at 23:35
6
  • 1
    We can't really help unless you provide example input and expected output. Do all target lines start with ## and a space? Can there be text after the (#some-title)? Why are you escaping th #? Most importantly, what OS are you using so we know what sed implementation you have? Commented Jul 18 at 9:42
  • Aside: Don't use -i in examples in questions or answers as then people reading them can't copy/paste your code to test it without trashing their input file. It's trivial for you to add -i (or do anything else that updates the input file) later if you want it. Commented Jul 18 at 11:41
  • Can Some title include (, ], #, ), or newlines? Please edit your question to tell us about that and to provide a few lines of truly representative sample input and the expected output given that input. Commented Jul 18 at 11:42
  • @terdon I added expected output if remove the link from Markdown was not clear enough. Commented Jul 18 at 21:15
  • You still didn't tell us if a title can include (, ], # or ) and so far have only provided 1 sunny-day example of an input line and no similar-looking lines you do not want modified so YMMV with how robust the answers you get are when exposed to your real data. Commented Jul 19 at 12:05

3 Answers 3

4

Based on what you've told us so far (I'm trying to ... leave just the title:) and the sample input you provided (## [Some title](#some-title)) this might be what you're trying to do, using any awk:

$ awk -F'[][]' '{print 2ドル}' file
Some title

or any sed:

$ sed 's/.*\[\([^]]*\)].*/1円/' file
Some title

but without more truly representative sample input and expected output that's just a guess.

As for what's wrong with your sed script:

sed -i 's/^\(\#*\) *\[\([^\]]*\)\].*/1円 2円/'

Using -i like that will do "inplace" editing in GNU sed but in other sed versions, even BSD sed which also supports inplace editing but requires a backup file name, it'll do different things so you don't tell us what problem you're experiencing when running your script but maybe that's it?

Beyond that, in the first regexp segment \(\#*\):

  1. You're escaping the literal char # as \# which is undefined behavior per POSIX when you wanted just #.
  2. You're using #* which matches zero or more #s when you wanted 1 or more which is ##* or #\{1,\} in a BRE as sed uses by default (or #+ if you were using an ERE).

In the separating spaces part <blank>*:

  1. You're using <blank>* which matches zero or more <blank>s when you wanted 1 or more which is <blank><blank>* or <blank>\{1,\} in a BRE (or <blank>+ if you were using an ERE).

In the last regexp segment \[\([^\]]*\)\].*:

  1. You're using [^\]] and so escaping ] which is undefined behavior per POSIX when you wanted just [^]].
  2. You're using \] at the end which is undefined behavior per POSIX since there's no unescaped [ before it when you wanted just ].

If you fixed all of those issues you'd get:

$ sed 's/^\(##*\) *\[\([^]]*\)].*/1円 2円/' file
## Some title

or

$ sed 's/^\(#\{1,\}\) \{1,\}\[\([^]]*\)].*/1円 2円/' file
## Some title

and since you're using GNU sed which supports EREs you could write that as:

$ sed -E 's/^(#+) +\[([^]]*)].*/1円 2円/' file
## Some title

And then to leave just the title as you said you wanted just means removing the first capture group:

$ sed 's/^##* *\[\([^]]*\)].*/1円/' file
Some title
$ sed 's/^#\{1,\} \{1,\}\[\([^]]*\)].*/1円/' file
Some title
$ sed -E 's/^#+ +\[([^]]*)].*/1円/' file
Some title
answered Jul 18 at 11:46
0

It looks like this pattern [^\]] doesn't work in sed.

This seems to work:

sed 's/^\(#*\) \[\(.*\)\].*/1円 2円/'
Ed Morton
35.7k6 gold badges25 silver badges59 bronze badges
answered Jul 17 at 23:40
3
  • Use extended regexps with sed's -E option to avoid Leaning Toothpick Syndrome. Also, if ] is the first (optionally after a ^) character in a bracket expression, it doesn't need to be escaped (see man regex). e.g. echo '## [Some title](#some-title)' | sed -E 's/^(#+) *\[([^]]*)\].*/1円 2円/'. BTW, note the use of + after # instead of *. + means one-or-more, * means zero-or-more. It's each to match more than you mean if you use + instead of * - in this case, * would match ALL URLs, not just those in # headers. Commented Jul 18 at 3:27
  • ALL URLs at the beginning of a line starting with zero-or-more spaces, that is. Commented Jul 18 at 3:33
  • 1
    [^\]] is undefined behavior per POSIX so any sed can do whatever it likes with that. ITYM just []] instead. Commented Jul 18 at 12:21
0

Writing a pandoc filter can handle the most general version of this problem:

Remove any link within any level of heading.

Headers can differ by depth and style, their contents can be formatted, and header-like strings can appear in comments and code blocks. So, for example, your markdown file could be like this:

## [Some title](#some-title)
Some text here
Another *header [with a `link`](https://www.konami.com/yugioh/)*
------------
wow!
# What if [a header link](#like-this) appears outside of code?
 # What if [a header link](#like-this) appears in code?
<!--
# a [header link](#keep-me) that should not be altered
because it's commented out -->

Pandoc knows all about these cases. I don’t want to have to think about them. I just want to say "if you find a link somewhere in a heading, get rid of it." That’s a filter.

Here’s a pandoc filter (Haskell version)

Based on the behead.hs example in the documentation:

#!/usr/bin/env runhaskell
-- removeheaderlinks.hs
import Text.Pandoc.JSON
import Text.Pandoc.Walk
main :: IO ()
main = toJSONFilter removeheaderlinks
-- if this Inline is a link, remove the link but keep the attributes
removelink :: Inline -> Inline
removelink (Link at xs _) = Span at xs
removelink x = x
-- remove all links if the block is a header
removeheaderlinks :: Block -> Block
removeheaderlinks (Header n attr content) = Header n attr $ walk removelink content
removeheaderlinks x = x

You need to have haskell installed, as well as pandoc-types, so run cabal v2-update && cabal v2-install --lib pandoc-types --package-env . first.

Then run this to convert:

pandoc -f markdown -t markdown --filter removeheaderlinks.hs ./example.md

Result:

## Some title
Some text here
## Another *header with a `link`*
wow!
# What if a header link appears outside of code?
 # What if [a header link](#like-this) appears in code?
<!--
# a [header link](#keep-me) that should not be altered
because it's commented out -->
answered Jul 19 at 6:22
2
  • Don't you think that using Pandoc and Haskell is a bit overkill for something that can be done with a single sed command? I don't need to parse any possible Markdown code. I have few markdown files that I've written myself that I need to remove the links from. Commented Jul 20 at 16:53
  • Yeah, hahah, I admit this is overkill for your case (where the headers and links look exactly like this). But I hope it’s useful for somebody else googling "remove links in headers markdown" whose files look different from yours. Commented Jul 20 at 17:02

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.