Replacing certain characters in a QString

Question 1

This question is kinda similar to mine. However, I am using C++ with Qt instead of C#.

How would I efficiently and easily remove all accents and special characters like !"§$%&/()=? etc. from a QString?

So "áche" should turn into "ache" or "über dir" to "ueber dir" (in german ü,ä,ö can be changed into the normalized character with an e appended) or at least "uber dir".

Note: Some people use a $ instead of s in some words so I want to make sure if a file is called "Ke$ha" that it will come out as "Kesha" or at least "KeSha".

The way I do it so far, incomplete, is like this:

void Utils::replaceInvalidChars(QString &str)
{
 if( str.size() == 0 )
 return;
 while( str.at(0) == '.' ) {
 str.remove(0,1);
 }
 str.replace( "/", "-" );
 str.replace( "|", "" );
 str.replace( ":", "-" );
 str.replace("\"", "" );
 str.replace( "?", "" );
 str.replace( "$", "s" );
 str.replace( "*", "" );
 str.replace( ",", "" );
 str.replace( "¿", "" );
 str.replace( "¡", "" );
 str.replace( "!", "" );
 str.replace( "'", "" );
 str.replace( "ë", "e" );
 str.replace( "ê", "e" );
 str.replace( "é", "e" );
 str.replace( "è", "e" );
 str.replace( "ç", "c" );
 str.replace( "ó", "o" );
 str.replace( "ö", "oe" );
 //U's...
 str.replace( "ü", "ue" );
 str.replace( "Ü", "U" );
 str.replace( "ù", "u" );
 str.replace( "Ù", "U" );
 str.replace( "û", "u" );
 str.replace( "Û", "u" );
 //ns
 str.replace( "ñ", "n" );
 //as
 str.replace( "ä", "ae" );
 str.replace( "Ä", "ae" );
 str.replace( "á", "a" );
 str.replace( "Á", "A" );
 str.replace( "à", "a" );
 str.replace( "À", "A" );
 str.replace( "ï", "i" );
}

So at first I remove all dots from the beginning. No matter how many there are. Then I replace certain characters with no character at all and some with a character like 's' or a depending on what it is.

My way is very long, tedious and chaotic. I am about to organize it a little with comments like "N's", "U's" etc. but still, if I make a mistake somewhere it will take way too long until I (eventually) find it.

Question 2

I would start by separating the data from the logic:

std::vector<std::pair<QString, QString>> replacements { 
 { "/", "-" },
 { "|", "" },
 // ...
 { "ï", "i" }
};
for ( auto const &r : replacements) { 
 str.replace(r.first, r.second);
}

I'm not sure the comments about the groups of letters being replaced really add a lot though.

Then I'd at least consider moving the data out of the program itself, and into a data file the program uses, so the replacements you do can be adjusted without re-compiling the code (this is the sort of thing that frequently seems to need a fair amount of "tweaking", since there's no one way of doing it that's obviously correct and the other ways are wrong).

Question 3

Loading the data from a file is a good idea. About the map, since "á" and "à" would both turn into "a", do you think I should use a QMap<QString, QStringList>() instead or would that be a rather bad idea?

Question 4

@Davlog: I doubt it makes a whole lot of difference in either direction.

Question 5

I would eliminate lines and clean up the code by relying on regular expressions.

QString s = "áche über dir Ke$ha is worth 100ドル";
// Performance: Eliminate characters you do not wish to have. 
s.remove(QRegularExpression("[" + QRegularExpression::escape("'!*,?|¡¿") + "]"));
qDebug().noquote() << "Before:\t" << s;
// Performance: Check for characters
if (s.contains(QRegularExpression("[" + QRegularExpression::escape("$/:ÀÁÄÙÛÜàáäçèéêëïñóöùûü") + "]")))
{
 // Special Characters 
 // Escape function is a safety measure in case you accidentally insert "^" in the square brackets.
 s.replace(QRegularExpression("[" + QRegularExpression::escape(":/") + "]"), "-");
 s.replace(QRegularExpression("[$]"), "s");
 // Upper Case
 s.replace(QRegularExpression("[ÁÀ]"), "A");
 s.replace(QRegularExpression("[Ä]"), "Ae");
 s.replace(QRegularExpression("[ÜÛÙ]"), "U");
 // Lower Case
 s.replace(QRegularExpression("[áà]"), "a");
 s.replace(QRegularExpression("[ä]"), "ae");
 s.replace(QRegularExpression("[ç]"), "c");
 s.replace(QRegularExpression("[ëêéè]"), "e");
 s.replace(QRegularExpression("[ï]"), "i");
 s.replace(QRegularExpression("[ñ]"), "n");
 s.replace(QRegularExpression("[óö]"), "o");
 s.replace(QRegularExpression("[ûù]"), "u");
 s.replace(QRegularExpression("[ü]"), "ue");
}
qDebug().noquote() << " After:\t" << s;

Before: áche über dir Ke$ha is worth 100ドル
 After: ache ueber dir Kesha is worth s100

Oops; found an error in your code. Lets just adjust this line then:

 s.replace(QRegularExpression("[$]([^0-9])"), "s\1円");

Before: áche über dir Ke$ha is worth 100ドル
 After: ache ueber dir Kesha is worth 100ドル

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges144 bronze badges · Answer 1 · 2016-01-14 17:22:42Z

I would start by separating the data from the logic:

std::vector<std::pair<QString, QString>> replacements { 
 { "/", "-" },
 { "|", "" },
 // ...
 { "ï", "i" }
};
for ( auto const &r : replacements) { 
 str.replace(r.first, r.second);
}

I'm not sure the comments about the groups of letters being replaced really add a lot though.

Then I'd at least consider moving the data out of the program itself, and into a data file the program uses, so the replacements you do can be adjusted without re-compiling the code (this is the sort of thing that frequently seems to need a fair amount of "tweaking", since there's no one way of doing it that's obviously correct and the other ways are wrong).

Loading the data from a file is a good idea. About the map, since "á" and "à" would both turn into "a", do you think I should use a QMap<QString, QStringList>() instead or would that be a rather bad idea?
@Davlog: I doubt it makes a whole lot of difference in either direction.

Anon Anon 1315 bronze badges · Answer 2 · 2016-10-01 23:50:12Z

I would eliminate lines and clean up the code by relying on regular expressions.

QString s = "áche über dir Ke$ha is worth 100ドル";
// Performance: Eliminate characters you do not wish to have. 
s.remove(QRegularExpression("[" + QRegularExpression::escape("'!*,?|¡¿") + "]"));
qDebug().noquote() << "Before:\t" << s;
// Performance: Check for characters
if (s.contains(QRegularExpression("[" + QRegularExpression::escape("$/:ÀÁÄÙÛÜàáäçèéêëïñóöùûü") + "]")))
{
 // Special Characters 
 // Escape function is a safety measure in case you accidentally insert "^" in the square brackets.
 s.replace(QRegularExpression("[" + QRegularExpression::escape(":/") + "]"), "-");
 s.replace(QRegularExpression("[$]"), "s");
 // Upper Case
 s.replace(QRegularExpression("[ÁÀ]"), "A");
 s.replace(QRegularExpression("[Ä]"), "Ae");
 s.replace(QRegularExpression("[ÜÛÙ]"), "U");
 // Lower Case
 s.replace(QRegularExpression("[áà]"), "a");
 s.replace(QRegularExpression("[ä]"), "ae");
 s.replace(QRegularExpression("[ç]"), "c");
 s.replace(QRegularExpression("[ëêéè]"), "e");
 s.replace(QRegularExpression("[ï]"), "i");
 s.replace(QRegularExpression("[ñ]"), "n");
 s.replace(QRegularExpression("[óö]"), "o");
 s.replace(QRegularExpression("[ûù]"), "u");
 s.replace(QRegularExpression("[ü]"), "ue");
}
qDebug().noquote() << " After:\t" << s;

Before: áche über dir Ke$ha is worth 100ドル
 After: ache ueber dir Kesha is worth s100

Oops; found an error in your code. Lets just adjust this line then:

 s.replace(QRegularExpression("[$]([^0-9])"), "s\1円");

Before: áche über dir Ke$ha is worth 100ドル
 After: ache ueber dir Kesha is worth 100ドル

Stack Exchange Network

Replacing certain characters in a QString

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Replacing certain characters in a QString

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions