- Definition of character for a programming language and for a human being are pretty different, for example in
(削除) Czech (削除ここまで)Slovak (thanks svick to correct me about this) dž is a single character however it's made by 2/3 Unicode code points which are in this case also 2/3 UTF-16 code units then"dž".Length > 1
. More about this and other cultural issues on this Stack Overflow post this Stack Overflow post. - Ligatures exist. Assuming one ligature is one code point (and also assuming it's encoded as one code unit) then you will treat it as a single glyph however it represents two characters. What to do in this case? In general definition of character may be pretty vague because it has a different meaning according to discipline where this word is used. You can't (probably) handle everything correctly but you should set some constraints and document code behavior.
- Definition of character for a programming language and for a human being are pretty different, for example in
(削除) Czech (削除ここまで)Slovak (thanks svick to correct me about this) dž is a single character however it's made by 2/3 Unicode code points which are in this case also 2/3 UTF-16 code units then"dž".Length > 1
. More about this and other cultural issues on this Stack Overflow post. - Ligatures exist. Assuming one ligature is one code point (and also assuming it's encoded as one code unit) then you will treat it as a single glyph however it represents two characters. What to do in this case? In general definition of character may be pretty vague because it has a different meaning according to discipline where this word is used. You can't (probably) handle everything correctly but you should set some constraints and document code behavior.
- Definition of character for a programming language and for a human being are pretty different, for example in
(削除) Czech (削除ここまで)Slovak (thanks svick to correct me about this) dž is a single character however it's made by 2/3 Unicode code points which are in this case also 2/3 UTF-16 code units then"dž".Length > 1
. More about this and other cultural issues on this Stack Overflow post. - Ligatures exist. Assuming one ligature is one code point (and also assuming it's encoded as one code unit) then you will treat it as a single glyph however it represents two characters. What to do in this case? In general definition of character may be pretty vague because it has a different meaning according to discipline where this word is used. You can't (probably) handle everything correctly but you should set some constraints and document code behavior.
public static IEnumerable<string> Split(this string value, int desiredLength)
{
var characters = StringInfo.GetTextElementEnumerator(value);
whiledo
(characters.MoveNext()) {
yield return String.Concat(characters.AsEnumerable<string>().Take(desiredLength));
} while (characters.MoveNext());
}
public static IEnumerable<string> Split(this string value, int desiredLength)
{
var characters = StringInfo.GetTextElementEnumerator(value);
while (characters.MoveNext())
yield return String.Concat(characters.AsEnumerable<string>().Take(desiredLength));
}
public static IEnumerable<string> Split(this string value, int desiredLength)
{
var characters = StringInfo.GetTextElementEnumerator(value);
do
{
yield return String.Concat(characters.AsEnumerable<string>().Take(desiredLength));
} while (characters.MoveNext());
}
Note that ifTo do not reinvent the wheel you may just create your own AsEnumerable()
extension method to walk through an existing enumerator:
public static IEnumerable<T> AsEnumerable<T>(this IEnumerator enumerator)
{
while (enumerator.MoveNext())
yield return (T)enumerator.Current;
}
Your code will be then simplified to:
public static IEnumerable<string> Split(this string value, int desiredLength)
{
var characters = StringInfo.GetTextElementEnumerator(value);
while (characters.MoveNext())
yield return String.Concat(characters.AsEnumerable<string>().Take(desiredLength));
}
If you don't need string[]
return type you may live with IEnumerable<string>
and use LINQ ToArray()
when/if required (code is slightly simplified for demonstrative purposes, in real world you will need some error checking).
Note that if you don't need string[]
return type you may live with IEnumerable<string>
and use LINQ ToArray()
when/if required (code is slightly simplified for demonstrative purposes, in real world you will need some error checking).
To do not reinvent the wheel you may just create your own AsEnumerable()
extension method to walk through an existing enumerator:
public static IEnumerable<T> AsEnumerable<T>(this IEnumerator enumerator)
{
while (enumerator.MoveNext())
yield return (T)enumerator.Current;
}
Your code will be then simplified to:
public static IEnumerable<string> Split(this string value, int desiredLength)
{
var characters = StringInfo.GetTextElementEnumerator(value);
while (characters.MoveNext())
yield return String.Concat(characters.AsEnumerable<string>().Take(desiredLength));
}
If you don't need string[]
return type you may live with IEnumerable<string>
and use LINQ ToArray()
when/if required (code is slightly simplified for demonstrative purposes, in real world you will need some error checking).