11
\$\begingroup\$

Almost every language has a built-in function that can split a string at a given position. However, as soon as you have html tags in the string, the built-in function will not work properly.

Your task is to write a program or function which splits a string at the nth character but does not count characters of html tags and will output a valid html. The program must keep the formatting. Spaces outside the html tags may be counted or not counted, as you wish, but must be preserved. You can, however, exchange multiple consecutive spaces into a single space.

Input:

  1. the string
  2. the position to split at (0-based)

These can be taken as program or function arguments or can be read from the standard input.

Output: The split string which can be returned or written to the standard output.

The input will be valid html, it won't contain any entities (such as  ). Tags that are opened after the character limit should be omitted from the output (see the last example).

Example:

Input: <i>test</i>, 3
Output: <i>tes</i>

Input: <strong><i>more</i> <span style="color: red">complicated</span></strong>, 7
Output: <strong><i>more</i> <span style="color: red">co</span></strong>

Input: no html, 2
Output: no

Input: <b>no</b> <i>html root</i>, 5
Output: <b>no</b> <i>ht</i>

Input: <b>no img</b><img src="test.png" />more text, 6
Output: <b>no img</b>

You can use any language and the standard library of the given language. This is code golf, shortest program wins. Have fun!

qwr
12.4k6 gold badges49 silver badges82 bronze badges
asked Jul 4, 2014 at 19:36
\$\endgroup\$
8
  • 1
    \$\begingroup\$ can the input contain "<"s and ">"s that are not part of a HTML tag? \$\endgroup\$ Commented Jul 4, 2014 at 19:57
  • \$\begingroup\$ One should use &lt; and &gt; instead of <>, so no (&lt; or &gt; won't be present either). \$\endgroup\$ Commented Jul 4, 2014 at 20:01
  • \$\begingroup\$ Could you include an example where there is mark up after the text node where the split occurs? Like <i>ab</i><b>cd</b> 1? \$\endgroup\$ Commented Jul 4, 2014 at 20:56
  • \$\begingroup\$ Are there any other options than <i>a</i> ? \$\endgroup\$ Commented Jul 4, 2014 at 21:15
  • \$\begingroup\$ @DavidFrank <i>a</i><b></b> (Which makes sense if you consider that b could also be div or img.) \$\endgroup\$ Commented Jul 4, 2014 at 22:14

4 Answers 4

2
\$\begingroup\$

This answer is no longer valid with the latest rule.

Javascript (ES6) (削除) 94 (削除ここまで) 91

f=(s,l)=>s.split(/(<[^>]+>)/).map(x=>x[0]=='<'?x:[l-->0?y:''for(y of x)].join('')).join('')
f('<strong><i>more</i> <span style="color: red">complicated</span></strong>', 7);
// '<strong><i>more</i> <span style="color: red">co</span></strong>'

Ungolfed:

f=(s,l)=>
 s.split(/(<[^>]+>)/). // split string s by <*>, capture group is spliced into the array 
 map(x=> // map function to every item in the array
 x[0]=='<'? // if first character is a <
 x // don't modify the string
 : // else
 [ // array comprehension
 for(y of x) // for every character y in x
 l-->0? // if l > 0 (and decrement l)
 y // character y
 : // else
 '' // empty string 
 ].join('') // join characters in array
 ).
 join('') // join all strings in array
answered Jul 4, 2014 at 22:21
\$\endgroup\$
2
  • \$\begingroup\$ Could you please provide the un-golfed code, or maybe just explanation of what and why the code does? Its currently a bit hard to grasp. Thanks! \$\endgroup\$ Commented Jul 5, 2014 at 6:54
  • \$\begingroup\$ @GaurangTandon added ungolfed code with comments \$\endgroup\$ Commented Jul 5, 2014 at 7:11
2
\$\begingroup\$

Rebol - 252 chars

c: complement charset"<>"f: func[s n][t: e: 0 to-string collect[parse s[any[(m: 0)copy w[["</"some c">"](-- t)|["<"some c"/>"]|["<"some c">"](++ t)| any c(m: 1)](if e = 0[if m = 1[w: copy/part w n n: n - length? w]keep w]if all[n <= 0 t = 0][e: 1])]]]]

Ungolfed with comments:

c: complement charset "<>"
f: func [s n] [
 t: e: 0 ;; tag level (nesting) & end output flag
 to-string collect [
 parse s [
 any [
 (m: 0) ;; tag mode
 copy w [
 ["</" some c ">" ] (-- t) ;; close tag
 | ["<" some c "/>"] ;; self-closing / void elements
 | ["<" some c ">" ] (++ t) ;; open tag
 | any c (m: 1) ;; text mode
 ] (
 ;; flag not set so can still output
 if e = 0 [
 ;; in text mode - so trim text
 if m = 1 [
 w: copy/part w n
 n: n - length? w
 ]
 keep w
 ]
 ; if all trimmed and returned to flat tag level then end future output
 if all [n <= 0 t = 0] [e: 1]
 )
 ]
 ]
 ]
]

Examples in Rebol console:

>> f "<i>test</i>" 3
== "<i>tes</i>"
>> f {<strong><i>more</i> <span style="color: red">complicated</span></strong>} 7
== {<strong><i>more</i> <span style="color: red">co</span></strong>}
>> f {no html} 2
== "no"
>> f {<b>no</b> <i>html root</i>} 5
== "<b>no</b> <i>ht</i>"
>> f {<b>no img</b><img src="test.png" />more text} 6
== "<b>no img</b>"
>> f {<i>a</i><b></b>} 1
== "<i>a</i>"
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 14
== {<strong><i>even</i> <span style="color: red">more <b>diff</b></span></strong>}
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 3 
== {<strong><i>eve</i><span style="color: red"><b></b></span></strong>}
answered Jul 10, 2014 at 18:53
\$\endgroup\$
2
  • \$\begingroup\$ Again this breaks the last rule: tags that are opened after the character limit should be omitted from the output (see the last example). In the last example the span and b tags should be omitted. This rule makes the challenge almost impossible. \$\endgroup\$ Commented Jul 10, 2014 at 21:02
  • \$\begingroup\$ @edc65 - Unfortunately (@David Frank) hasn't commented or updated his examples so its unclear whether he wants this behaviour or not? I was hoping my last example would stir something! Going to leave as is until we get clarification. Anyway it would only take an additional 17 chars to make it work the way you've suggested. I didn't particular like the hack so instead rewrote it here (ungolfed) - gist.github.com/draegtun/93682f5a07c40bd86e31 \$\endgroup\$ Commented Jul 12, 2014 at 15:21
0
\$\begingroup\$

Ruby... Very unrubylike with loops

def split(str,n)
 i = current = 0 
 return_str = ""
 while i < n
 if str[current] == "<"
 while str[current] != ">"
 return_str.concat str[current]
 current += 1
 end
 return_str.concat str[current]
 current += 1
 else
 return_str.concat str[current]
 i += 1
 current += 1
 end
 end
 while current < str.length
 if str[current] == "<"
 while str[current] != ">"
 return_str.concat str[current]
 current += 1
 end
 return_str.concat str[current]
 current += 1
 end
 current += 1
 end
 return_str + str[current..-1]
end
nderscore
4,98016 silver badges40 bronze badges
answered Jul 5, 2014 at 22:30
\$\endgroup\$
1
  • \$\begingroup\$ This question is marked as codegolf, you should golf your reply. You can start by replacing the variable names to one letter names, using shorter function names and removing white-space wherever you can \$\endgroup\$ Commented May 14, 2017 at 16:40
0
\$\begingroup\$

(IE) JS - 135

(削除) function f(t,n){b=document.body;b.innerHTML=t;r=b.createTextRange();r.moveStart("character",n);r.select();r.execCommand('cut');return b.innerHTML}

Now I feel dirty. But need to start removing all those chars... (削除ここまで)

function f(t,n)
{b=document.body;b.innerHTML=t;r=b.createTextRange();r.collapse();r.moveEnd("character",n);
r.select();return r.htmlText}

Disclaimer:

  • run in IE console
answered Jul 5, 2014 at 22:18
\$\endgroup\$
2
  • 1
    \$\begingroup\$ This break the last (mad) rule: Tags that are opened after the character limit should be omitted from the output (try my example in comments above). \$\endgroup\$ Commented Jul 6, 2014 at 1:17
  • \$\begingroup\$ @edc65 hopefully, updated version checks on all rules \$\endgroup\$ Commented Jul 6, 2014 at 9:05

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.