A function which replaces an unicode character in a string:
void replaceAllOccurences(std::string& source,
const std::string& replaceFrom,
const std::string& replaceTo)
{
std::vector<std::uint8_t> data(source.begin(), source.end());
std::vector<std::uint8_t> pattern(replaceFrom.begin(), replaceFrom.end());
std::vector<std::uint8_t> replaceData(replaceTo.begin(), replaceTo.end());
std::vector<std::uint8_t>::iterator itr;
while((itr = std::search(data.begin(), data.end(), pattern.begin(), pattern.end())) != data.end())
{
data.erase(itr, itr + pattern.size());
data.insert(itr, replaceData.begin(), replaceData.end());
}
source = std::string(data.begin(), data.end());
}
Usage:
std::string source = "123€AAA€BBB";
std::string replaceFrom = "€";
std::string replaceTo = "\x80";
replaceAllOccurences(source, replaceFrom, replaceTo);
replaceTo
may be get from some external conversion library, for example: iconvpp. Normally I would convert the whole source using iconvpp library but I have a case where I need to convert only particular character, for example "€".
-
\$\begingroup\$ Does this have anything specific to do with unicode or is it simply a way of replacing one sequence of characters with another? \$\endgroup\$Loki Astari– Loki Astari2019年04月01日 18:29:14 +00:00Commented Apr 1, 2019 at 18:29
2 Answers 2
If you're not worried about performance, you can replace all the manual work by using the <regex>
facilities, which results in a considerable reduction of code to test and maintain.
#include <regex>
source = std::regex_replace(source, std::regex("€"), "\x80");
I would still keep it in a separate function to make it easy to change the implementation afterwards.
You don't need the std::vector<std::uint8_t>
objects at all. You can use the input std::string
objects directly.
Also, the code in the while
loop needs to be updated for the following issues:
Make sure to capture the return value opf
source.erase
. If you don't the iterator is invalid.To avoid infinite loop, use
itr
as the first argument tostd::search
.Update
itr
inside the loop appropriately to avoid an infinite loop.
void replaceAllOccurences(std::string& source,
const std::string& replaceFrom,
const std::string& replaceTo)
{
std::string::iterator itr = source.begin();
while((itr = std::search(itr, source.end(), replaceFrom.begin(), replaceFrom.end())) != source.end())
{
itr = source.erase(itr, itr + replaceFrom.size());
// itr is going be invalid after insert. Keep track of its
// distance from begin() so we can update itr after insert.
auto dist = std::distance(source.begin(), itr);
source.insert(itr, replaceTo.begin(), replaceTo.end());
// Make itr point to the character 1 past what got replaced.
// This will avoid infinite loop incase the first character of
// replaceTo is the same as the character being replaced.
itr = std::next(source.begin(), dist+1);
}
}
-
1\$\begingroup\$ Doesn't
insert()
invalidate iterators? \$\endgroup\$Toby Speight– Toby Speight2019年04月01日 13:22:12 +00:00Commented Apr 1, 2019 at 13:22 -
1\$\begingroup\$ Might have an issue with infinite loops. \$\endgroup\$Loki Astari– Loki Astari2019年04月01日 18:28:06 +00:00Commented Apr 1, 2019 at 18:28
-
\$\begingroup\$ @TobySpeight, yes, it does. Thanks for pointing it out. \$\endgroup\$R Sahu– R Sahu2019年04月01日 19:16:19 +00:00Commented Apr 1, 2019 at 19:16
-
\$\begingroup\$ @MartinYork, Indeed. Updated to address that issue. \$\endgroup\$R Sahu– R Sahu2019年04月01日 19:16:35 +00:00Commented Apr 1, 2019 at 19:16