Almost every language has a built-in function that can split a string at a given position. However, as soon as you have html tags in the string, the built-in function will not work properly.
Your task is to write a program or function which splits a string at the nth character but does not count characters of html tags and will output a valid html. The program must keep the formatting. Spaces outside the html tags may be counted or not counted, as you wish, but must be preserved. You can, however, exchange multiple consecutive spaces into a single space.
Input:
- the string
- the position to split at (0-based)
These can be taken as program or function arguments or can be read from the standard input.
Output: The split string which can be returned or written to the standard output.
The input will be valid html, it won't contain any entities (such as
). Tags that are opened after the character limit should be omitted from the output (see the last example).
Example:
Input: <i>test</i>
, 3
Output: <i>tes</i>
Input: <strong><i>more</i> <span style="color: red">complicated</span></strong>
, 7
Output: <strong><i>more</i> <span style="color: red">co</span></strong>
Input: no html
, 2
Output: no
Input: <b>no</b> <i>html root</i>
, 5
Output: <b>no</b> <i>ht</i>
Input: <b>no img</b><img src="test.png" />more text
, 6
Output: <b>no img</b>
You can use any language and the standard library of the given language. This is code golf, shortest program wins. Have fun!
4 Answers 4
This answer is no longer valid with the latest rule.
Javascript (ES6) (削除) 94 (削除ここまで) 91
f=(s,l)=>s.split(/(<[^>]+>)/).map(x=>x[0]=='<'?x:[l-->0?y:''for(y of x)].join('')).join('')
f('<strong><i>more</i> <span style="color: red">complicated</span></strong>', 7);
// '<strong><i>more</i> <span style="color: red">co</span></strong>'
Ungolfed:
f=(s,l)=>
s.split(/(<[^>]+>)/). // split string s by <*>, capture group is spliced into the array
map(x=> // map function to every item in the array
x[0]=='<'? // if first character is a <
x // don't modify the string
: // else
[ // array comprehension
for(y of x) // for every character y in x
l-->0? // if l > 0 (and decrement l)
y // character y
: // else
'' // empty string
].join('') // join characters in array
).
join('') // join all strings in array
-
\$\begingroup\$ Could you please provide the un-golfed code, or maybe just explanation of what and why the code does? Its currently a bit hard to grasp. Thanks! \$\endgroup\$Gaurang Tandon– Gaurang Tandon2014年07月05日 06:54:06 +00:00Commented Jul 5, 2014 at 6:54
-
\$\begingroup\$ @GaurangTandon added ungolfed code with comments \$\endgroup\$nderscore– nderscore2014年07月05日 07:11:17 +00:00Commented Jul 5, 2014 at 7:11
Rebol - 252 chars
c: complement charset"<>"f: func[s n][t: e: 0 to-string collect[parse s[any[(m: 0)copy w[["</"some c">"](-- t)|["<"some c"/>"]|["<"some c">"](++ t)| any c(m: 1)](if e = 0[if m = 1[w: copy/part w n n: n - length? w]keep w]if all[n <= 0 t = 0][e: 1])]]]]
Ungolfed with comments:
c: complement charset "<>"
f: func [s n] [
t: e: 0 ;; tag level (nesting) & end output flag
to-string collect [
parse s [
any [
(m: 0) ;; tag mode
copy w [
["</" some c ">" ] (-- t) ;; close tag
| ["<" some c "/>"] ;; self-closing / void elements
| ["<" some c ">" ] (++ t) ;; open tag
| any c (m: 1) ;; text mode
] (
;; flag not set so can still output
if e = 0 [
;; in text mode - so trim text
if m = 1 [
w: copy/part w n
n: n - length? w
]
keep w
]
; if all trimmed and returned to flat tag level then end future output
if all [n <= 0 t = 0] [e: 1]
)
]
]
]
]
Examples in Rebol console:
>> f "<i>test</i>" 3
== "<i>tes</i>"
>> f {<strong><i>more</i> <span style="color: red">complicated</span></strong>} 7
== {<strong><i>more</i> <span style="color: red">co</span></strong>}
>> f {no html} 2
== "no"
>> f {<b>no</b> <i>html root</i>} 5
== "<b>no</b> <i>ht</i>"
>> f {<b>no img</b><img src="test.png" />more text} 6
== "<b>no img</b>"
>> f {<i>a</i><b></b>} 1
== "<i>a</i>"
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 14
== {<strong><i>even</i> <span style="color: red">more <b>diff</b></span></strong>}
>> f {<strong><i>even</i> <span style="color: red">more <b>difficult</b></span></strong>} 3
== {<strong><i>eve</i><span style="color: red"><b></b></span></strong>}
-
\$\begingroup\$ Again this breaks the last rule: tags that are opened after the character limit should be omitted from the output (see the last example). In the last example the span and b tags should be omitted. This rule makes the challenge almost impossible. \$\endgroup\$edc65– edc652014年07月10日 21:02:01 +00:00Commented Jul 10, 2014 at 21:02
-
\$\begingroup\$ @edc65 - Unfortunately (@David Frank) hasn't commented or updated his examples so its unclear whether he wants this behaviour or not? I was hoping my last example would stir something! Going to leave as is until we get clarification. Anyway it would only take an additional 17 chars to make it work the way you've suggested. I didn't particular like the hack so instead rewrote it here (ungolfed) - gist.github.com/draegtun/93682f5a07c40bd86e31 \$\endgroup\$draegtun– draegtun2014年07月12日 15:21:37 +00:00Commented Jul 12, 2014 at 15:21
Ruby... Very unrubylike with loops
def split(str,n)
i = current = 0
return_str = ""
while i < n
if str[current] == "<"
while str[current] != ">"
return_str.concat str[current]
current += 1
end
return_str.concat str[current]
current += 1
else
return_str.concat str[current]
i += 1
current += 1
end
end
while current < str.length
if str[current] == "<"
while str[current] != ">"
return_str.concat str[current]
current += 1
end
return_str.concat str[current]
current += 1
end
current += 1
end
return_str + str[current..-1]
end
-
\$\begingroup\$ This question is marked as codegolf, you should golf your reply. You can start by replacing the variable names to one letter names, using shorter function names and removing white-space wherever you can \$\endgroup\$sagiksp– sagiksp2017年05月14日 16:40:27 +00:00Commented May 14, 2017 at 16:40
(IE) JS - 135
(削除) function f(t,n){b=document.body;b.innerHTML=t;r=b.createTextRange();r.moveStart("character",n);r.select();r.execCommand('cut');return b.innerHTML}
Now I feel dirty. But need to start removing all those chars... (削除ここまで)
function f(t,n)
{b=document.body;b.innerHTML=t;r=b.createTextRange();r.collapse();r.moveEnd("character",n);
r.select();return r.htmlText}
Disclaimer:
- run in IE console
-
1\$\begingroup\$ This break the last (mad) rule: Tags that are opened after the character limit should be omitted from the output (try my example in comments above). \$\endgroup\$edc65– edc652014年07月06日 01:17:11 +00:00Commented Jul 6, 2014 at 1:17
-
\$\begingroup\$ @edc65 hopefully, updated version checks on all rules \$\endgroup\$eithed– eithed2014年07月06日 09:05:55 +00:00Commented Jul 6, 2014 at 9:05
<
and>
instead of<>
, so no (<
or>
won't be present either). \$\endgroup\$<i>ab</i><b>cd</b> 1
? \$\endgroup\$<i>a</i>
? \$\endgroup\$<i>a</i><b></b>
(Which makes sense if you consider thatb
could also bediv
orimg
.) \$\endgroup\$