0
\$\begingroup\$

This question is kinda similar to mine. However, I am using C++ with Qt instead of C#.

How would I efficiently and easily remove all accents and special characters like !"§$%&/()=? etc. from a QString?

So "áche" should turn into "ache" or "über dir" to "ueber dir" (in german ü,ä,ö can be changed into the normalized character with an e appended) or at least "uber dir".

Note: Some people use a $ instead of s in some words so I want to make sure if a file is called "Ke$ha" that it will come out as "Kesha" or at least "KeSha".

The way I do it so far, incomplete, is like this:

void Utils::replaceInvalidChars(QString &str)
{
 if( str.size() == 0 )
 return;
 while( str.at(0) == '.' ) {
 str.remove(0,1);
 }
 str.replace( "/", "-" );
 str.replace( "|", "" );
 str.replace( ":", "-" );
 str.replace("\"", "" );
 str.replace( "?", "" );
 str.replace( "$", "s" );
 str.replace( "*", "" );
 str.replace( ",", "" );
 str.replace( "¿", "" );
 str.replace( "¡", "" );
 str.replace( "!", "" );
 str.replace( "'", "" );
 str.replace( "ë", "e" );
 str.replace( "ê", "e" );
 str.replace( "é", "e" );
 str.replace( "è", "e" );
 str.replace( "ç", "c" );
 str.replace( "ó", "o" );
 str.replace( "ö", "oe" );
 //U's...
 str.replace( "ü", "ue" );
 str.replace( "Ü", "U" );
 str.replace( "ù", "u" );
 str.replace( "Ù", "U" );
 str.replace( "û", "u" );
 str.replace( "Û", "u" );
 //ns
 str.replace( "ñ", "n" );
 //as
 str.replace( "ä", "ae" );
 str.replace( "Ä", "ae" );
 str.replace( "á", "a" );
 str.replace( "Á", "A" );
 str.replace( "à", "a" );
 str.replace( "À", "A" );
 str.replace( "ï", "i" );
}

So at first I remove all dots from the beginning. No matter how many there are. Then I replace certain characters with no character at all and some with a character like 's' or a depending on what it is.

My way is very long, tedious and chaotic. I am about to organize it a little with comments like "N's", "U's" etc. but still, if I make a mistake somewhere it will take way too long until I (eventually) find it.

asked Jan 14, 2016 at 17:14
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

I would start by separating the data from the logic:

std::vector<std::pair<QString, QString>> replacements { 
 { "/", "-" },
 { "|", "" },
 // ...
 { "ï", "i" }
};
for ( auto const &r : replacements) { 
 str.replace(r.first, r.second);
}

I'm not sure the comments about the groups of letters being replaced really add a lot though.

Then I'd at least consider moving the data out of the program itself, and into a data file the program uses, so the replacements you do can be adjusted without re-compiling the code (this is the sort of thing that frequently seems to need a fair amount of "tweaking", since there's no one way of doing it that's obviously correct and the other ways are wrong).

answered Jan 14, 2016 at 17:22
\$\endgroup\$
2
  • \$\begingroup\$ Loading the data from a file is a good idea. About the map, since "á" and "à" would both turn into "a", do you think I should use a QMap<QString, QStringList>() instead or would that be a rather bad idea? \$\endgroup\$ Commented Jan 14, 2016 at 17:36
  • 1
    \$\begingroup\$ @Davlog: I doubt it makes a whole lot of difference in either direction. \$\endgroup\$ Commented Jan 14, 2016 at 18:31
0
\$\begingroup\$

I would eliminate lines and clean up the code by relying on regular expressions.

QString s = "áche über dir Ke$ha is worth 100ドル";
// Performance: Eliminate characters you do not wish to have. 
s.remove(QRegularExpression("[" + QRegularExpression::escape("'!*,?|¡¿") + "]"));
qDebug().noquote() << "Before:\t" << s;
// Performance: Check for characters
if (s.contains(QRegularExpression("[" + QRegularExpression::escape("$/:ÀÁÄÙÛÜàáäçèéêëïñóöùûü") + "]")))
{
 // Special Characters 
 // Escape function is a safety measure in case you accidentally insert "^" in the square brackets.
 s.replace(QRegularExpression("[" + QRegularExpression::escape(":/") + "]"), "-");
 s.replace(QRegularExpression("[$]"), "s");
 // Upper Case
 s.replace(QRegularExpression("[ÁÀ]"), "A");
 s.replace(QRegularExpression("[Ä]"), "Ae");
 s.replace(QRegularExpression("[ÜÛÙ]"), "U");
 // Lower Case
 s.replace(QRegularExpression("[áà]"), "a");
 s.replace(QRegularExpression("[ä]"), "ae");
 s.replace(QRegularExpression("[ç]"), "c");
 s.replace(QRegularExpression("[ëêéè]"), "e");
 s.replace(QRegularExpression("[ï]"), "i");
 s.replace(QRegularExpression("[ñ]"), "n");
 s.replace(QRegularExpression("[óö]"), "o");
 s.replace(QRegularExpression("[ûù]"), "u");
 s.replace(QRegularExpression("[ü]"), "ue");
}
qDebug().noquote() << " After:\t" << s;

Before: áche über dir Ke$ha is worth 100ドル
 After: ache ueber dir Kesha is worth s100

Oops; found an error in your code. Lets just adjust this line then:

 s.replace(QRegularExpression("[$]([^0-9])"), "s\1円");

Before: áche über dir Ke$ha is worth 100ドル
 After: ache ueber dir Kesha is worth 100ドル
answered Oct 1, 2016 at 23:50
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.