Reduce html to n characters while keeping the formatting

Question 1

Almost every language has a built-in function that can split a string at a given position. However, as soon as you have html tags in the string, the built-in function will not work properly.

Your task is to write a program or function which splits a string at the nth character but does not count characters of html tags and will output a valid html. The program must keep the formatting. Spaces outside the html tags may be counted or not counted, as you wish, but must be preserved. You can, however, exchange multiple consecutive spaces into a single space.

Input:

the string
the position to split at (0-based)

These can be taken as program or function arguments or can be read from the standard input.

Output: The split string which can be returned or written to the standard output.

The input will be valid html, it won't contain any entities (such as  ). Tags that are opened after the character limit should be omitted from the output (see the last example).

Example:

Input: test, 3
Output: tes

Input: more complicated, 7
Output: more co

Input: no html, 2
Output: no

Input: no html root, 5
Output: no ht

Input: no img<img src="test.png" />more text, 6
Output: no img

You can use any language and the standard library of the given language. This is code golf, shortest program wins. Have fun!

Question 2

can the input contain "<"s and ">"s that are not part of a HTML tag?

Question 3

One should use < and > instead of <>, so no (< or > won't be present either).

Question 4

Could you include an example where there is mark up after the text node where the split occurs? Like abcd 1?

Question 5

Are there any other options than a ?

Question 6

@DavidFrank a (Which makes sense if you consider that b could also be div or img.)

Question 7

This answer is no longer valid with the latest rule.

Javascript (ES6) (削除) 94 (削除ここまで) 91

f=(s,l)=>s.split(/(<[^>]+>)/).map(x=>x[0]=='<'?x:[l-->0?y:''for(y of x)].join('')).join('')

f('<strong><i>more</i> <span style="color: red">complicated</span></strong>', 7);
// '<strong><i>more</i> <span style="color: red">co</span></strong>'

Ungolfed:

f=(s,l)=>
 s.split(/(<[^>]+>)/). // split string s by <*>, capture group is spliced into the array 
 map(x=> // map function to every item in the array
 x[0]=='<'? // if first character is a <
 x // don't modify the string
 : // else
 [ // array comprehension
 for(y of x) // for every character y in x
 l-->0? // if l > 0 (and decrement l)
 y // character y
 : // else
 '' // empty string 
 ].join('') // join characters in array
 ).
 join('') // join all strings in array

Question 8

Could you please provide the un-golfed code, or maybe just explanation of what and why the code does? Its currently a bit hard to grasp. Thanks!

Question 9

@GaurangTandon added ungolfed code with comments

Question 10

Rebol - 252 chars

c: complement charset"<>"f: func[s n][t: e: 0 to-string collect[parse s[any[(m: 0)copy w[["</"some c">"](-- t)|["<"some c"/>"]|["<"some c">"](++ t)| any c(m: 1)](if e = 0[if m = 1[w: copy/part w n n: n - length? w]keep w]if all[n <= 0 t = 0][e: 1])]]]]

Ungolfed with comments:

c: complement charset "<>"
f: func [s n] [
 t: e: 0 ;; tag level (nesting) & end output flag
 to-string collect [
 parse s [
 any [
 (m: 0) ;; tag mode
 copy w [
 ["</" some c ">" ] (-- t) ;; close tag
 | ["<" some c "/>"] ;; self-closing / void elements
 | ["<" some c ">" ] (++ t) ;; open tag
 | any c (m: 1) ;; text mode
 ] (
 ;; flag not set so can still output
 if e = 0 [
 ;; in text mode - so trim text
 if m = 1 [
 w: copy/part w n
 n: n - length? w
 ]
 keep w
 ]
 ; if all trimmed and returned to flat tag level then end future output
 if all [n <= 0 t = 0] [e: 1]
 )
 ]
 ]
 ]
]

Examples in Rebol console:

>> f "<i>test</i>" 3
== "<i>tes</i>"
>> f {<strong><i>more</i> <span style="color: red">complicated</span></strong>} 7
== {<strong><i>more</i> <span style="color: red">co</span></strong>}
>> f {no html} 2
== "no"
>> f {<b>no</b> <i>html root</i>} 5
== "<b>no</b> <i>ht</i>"
>> f {<b>no img</b><img src="test.png" />more text} 6
== "<b>no img</b>"
>> f {<i>a</i><b></b>} 1
== "<i>a</i>"
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 14
== {<strong><i>even</i> <span style="color: red">more <b>diff</b></span></strong>}
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 3 
== {<strong><i>eve</i><span style="color: red"><b></b></span></strong>}

Question 11

Again this breaks the last rule: tags that are opened after the character limit should be omitted from the output (see the last example). In the last example the span and b tags should be omitted. This rule makes the challenge almost impossible.

Question 12

@edc65 - Unfortunately (@David Frank) hasn't commented or updated his examples so its unclear whether he wants this behaviour or not? I was hoping my last example would stir something! Going to leave as is until we get clarification. Anyway it would only take an additional 17 chars to make it work the way you've suggested. I didn't particular like the hack so instead rewrote it here (ungolfed) - gist.github.com/draegtun/93682f5a07c40bd86e31

Question 13

Ruby... Very unrubylike with loops

def split(str,n)
 i = current = 0 
 return_str = ""
 while i < n
 if str[current] == "<"
 while str[current] != ">"
 return_str.concat str[current]
 current += 1
 end
 return_str.concat str[current]
 current += 1
 else
 return_str.concat str[current]
 i += 1
 current += 1
 end
 end
 while current < str.length
 if str[current] == "<"
 while str[current] != ">"
 return_str.concat str[current]
 current += 1
 end
 return_str.concat str[current]
 current += 1
 end
 current += 1
 end
 return_str + str[current..-1]
end

Question 14

This question is marked as codegolf, you should golf your reply. You can start by replacing the variable names to one letter names, using shorter function names and removing white-space wherever you can

Question 15

(IE) JS - 135

~~(削除) function f(t,n){b=document.body;b.innerHTML=t;r=b.createTextRange();r.moveStart("character",n);r.select();r.execCommand('cut');return b.innerHTML}~~

Now I feel dirty. But need to start removing all those chars... (削除ここまで)

function f(t,n)
{b=document.body;b.innerHTML=t;r=b.createTextRange();r.collapse();r.moveEnd("character",n);
r.select();return r.htmlText}

Disclaimer:

run in IE console

Question 16

This break the last (mad) rule: Tags that are opened after the character limit should be omitted from the output (try my example in comments above).

Question 17

@edc65 hopefully, updated version checks on all rules

nderscore nderscore 4,98016 silver badges40 bronze badges · Answer 1 · 2014-07-04 22:21:57Z

This answer is no longer valid with the latest rule.

Javascript (ES6) (削除) 94 (削除ここまで) 91

f=(s,l)=>s.split(/(<[^>]+>)/).map(x=>x[0]=='<'?x:[l-->0?y:''for(y of x)].join('')).join('')

f('<strong><i>more</i> <span style="color: red">complicated</span></strong>', 7);
// '<strong><i>more</i> <span style="color: red">co</span></strong>'

Ungolfed:

f=(s,l)=>
 s.split(/(<[^>]+>)/). // split string s by <*>, capture group is spliced into the array 
 map(x=> // map function to every item in the array
 x[0]=='<'? // if first character is a <
 x // don't modify the string
 : // else
 [ // array comprehension
 for(y of x) // for every character y in x
 l-->0? // if l > 0 (and decrement l)
 y // character y
 : // else
 '' // empty string 
 ].join('') // join characters in array
 ).
 join('') // join all strings in array

Could you please provide the un-golfed code, or maybe just explanation of what and why the code does? Its currently a bit hard to grasp. Thanks!

draegtun draegtun 1,69310 silver badges12 bronze badges · Answer 2 · 2014-07-10 18:53:57Z

Rebol - 252 chars

c: complement charset"<>"f: func[s n][t: e: 0 to-string collect[parse s[any[(m: 0)copy w[["</"some c">"](-- t)|["<"some c"/>"]|["<"some c">"](++ t)| any c(m: 1)](if e = 0[if m = 1[w: copy/part w n n: n - length? w]keep w]if all[n <= 0 t = 0][e: 1])]]]]

Ungolfed with comments:

c: complement charset "<>"
f: func [s n] [
 t: e: 0 ;; tag level (nesting) & end output flag
 to-string collect [
 parse s [
 any [
 (m: 0) ;; tag mode
 copy w [
 ["</" some c ">" ] (-- t) ;; close tag
 | ["<" some c "/>"] ;; self-closing / void elements
 | ["<" some c ">" ] (++ t) ;; open tag
 | any c (m: 1) ;; text mode
 ] (
 ;; flag not set so can still output
 if e = 0 [
 ;; in text mode - so trim text
 if m = 1 [
 w: copy/part w n
 n: n - length? w
 ]
 keep w
 ]
 ; if all trimmed and returned to flat tag level then end future output
 if all [n <= 0 t = 0] [e: 1]
 )
 ]
 ]
 ]
]

Examples in Rebol console:

>> f "<i>test</i>" 3
== "<i>tes</i>"
>> f {<strong><i>more</i> <span style="color: red">complicated</span></strong>} 7
== {<strong><i>more</i> <span style="color: red">co</span></strong>}
>> f {no html} 2
== "no"
>> f {<b>no</b> <i>html root</i>} 5
== "<b>no</b> <i>ht</i>"
>> f {<b>no img</b><img src="test.png" />more text} 6
== "<b>no img</b>"
>> f {<i>a</i><b></b>} 1
== "<i>a</i>"
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 14
== {<strong><i>even</i> <span style="color: red">more <b>diff</b></span></strong>}
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 3 
== {<strong><i>eve</i><span style="color: red"><b></b></span></strong>}

Again this breaks the last rule: tags that are opened after the character limit should be omitted from the output (see the last example). In the last example the span and b tags should be omitted. This rule makes the challenge almost impossible.
@edc65 - Unfortunately (@David Frank) hasn't commented or updated his examples so its unclear whether he wants this behaviour or not? I was hoping my last example would stir something! Going to leave as is until we get clarification. Anyway it would only take an additional 17 chars to make it work the way you've suggested. I didn't particular like the hack so instead rewrote it here (ungolfed) - gist.github.com/draegtun/93682f5a07c40bd86e31

user26900 user26900 1 · Answer 3 · 2014-07-05 22:30:50Z

Ruby... Very unrubylike with loops

def split(str,n)
 i = current = 0 
 return_str = ""
 while i < n
 if str[current] == "<"
 while str[current] != ">"
 return_str.concat str[current]
 current += 1
 end
 return_str.concat str[current]
 current += 1
 else
 return_str.concat str[current]
 i += 1
 current += 1
 end
 end
 while current < str.length
 if str[current] == "<"
 while str[current] != ">"
 return_str.concat str[current]
 current += 1
 end
 return_str.concat str[current]
 current += 1
 end
 current += 1
 end
 return_str + str[current..-1]
end

This question is marked as codegolf, you should golf your reply. You can start by replacing the variable names to one letter names, using shorter function names and removing white-space wherever you can

eithed eithed 1,2398 silver badges12 bronze badges · Answer 4 · 2014-07-05 22:18:50Z

(IE) JS - 135

~~(削除) function f(t,n){b=document.body;b.innerHTML=t;r=b.createTextRange();r.moveStart("character",n);r.select();r.execCommand('cut');return b.innerHTML}~~

Now I feel dirty. But need to start removing all those chars... (削除ここまで)

function f(t,n)
{b=document.body;b.innerHTML=t;r=b.createTextRange();r.collapse();r.moveEnd("character",n);
r.select();return r.htmlText}

Disclaimer:

run in IE console

This break the last (mad) rule: Tags that are opened after the character limit should be omitted from the output (try my example in comments above).

Stack Exchange Network

Reduce html to n characters while keeping the formatting

4 Answers 4

Javascript (ES6) (削除) 94 (削除ここまで) 91

Rebol - 252 chars

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Reduce html to n characters while keeping the formatting

4 Answers 4

Javascript (ES6) (削除) 94 (削除ここまで) 91

Rebol - 252 chars

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions