Function for removing forbidden characters

Question 1

void removeForbiddenChar(string* s)
{
 string::iterator it;
 for (it = s->begin() ; it < s->end() ; ++it){
 switch(*it){
 case '/':case '\\':case ':':case '?':case '"':case '<':case '>':case '|':
 *it = ' ';
 }
 }
}

I used this function to remove a string that has any of the following character: ,円 /, :, ?, ", <,>, |. This is for a file's name. This program runs fine. It simply change a character of the string to a blank when the respective character is the forbidden character. However, I have a feeling against this use of switch statement. I simply exploit the case syntax here, but this, somehow nags me. I just don't like it. Anybody else got a better suggestion of a better implementation in this case?

Question 2

If a char isn't forbidden, then we leave it be.

Question 3

Name it replaceForbiddenChars, as it replaces instead of removes and handles multiple characters.

Question 4

@Fred: If none of the cases in a switch-statement match, the control flow continues after the switch statement. That behavior is perfectly defined.

Question 5

@sepp2k: Thanks, it is well-defined. I'm not sure why I thought that, but I'll blame it on articles I've been reading about (micro-)optimizing.

Question 6

Seems you forgot * symbol (asterisk). It is forbidden also.

Question 7

Declare a string containing the illegal characters: "\\/:?"<>|". All you need to do is check if the char is in the array, so use a native function for that, or write a method CharInString(char* needle, string* haystack) which loops through the contents of the provided haystack to check if the needle is inside it.

Your loop should end up looking like this:

string illegalChars = "\\/:?\"<>|"
for (it = s->begin() ; it < s->end() ; ++it){
 bool found = illegalChars.find(*it) != string::npos;
 if(found){
 *it = ' ';
 }
}

It's more maintainable and readable. You can tell if you've duplicated a character quite easily and since you can do it with any target string and any string of illegalChars you've just created for yourself a generic RemoveIllegalChars(string* targetString, string* illegalChars) method usable anywhere in your program.

_{I may be using those pointers wrong. My C++fu is weak... for now.}

Question 8

+1, I was also going to recommend using a string to store the forbidden characters. I would add that this change makes it very easy to add the forbidden characters as a parameter to the removeForbiddenChars function, so that if the need should ever arise, it can be used in situations where different sets of characters are forbidden. Also you can use the find method to find out whether a character is in a string, so you don't necessarily need to write a CharInString function (or you could write is a simple wrapper around find).

Question 9

@sepp2k: We seem to be on the same wavelength here! :) I'll update my answer with the find method.

Question 10

Maybe the file names are short and we don't call this function much, but please be aware that the proposed solution is O(n*m) on the number of characters in the string (n) and the number of illegal characters in the string (m).

Question 11

you could always use transform

#include <algorithm>
#include <string>
#include <iostream>
const std::string forbiddenChars = "\\/:?\"<>|";
static char ClearForbidden(char toCheck)
{
 if(forbiddenChars.find(toCheck) != string::npos)
 {
 return ' ';
 }
 return toCheck;
}
int main()
{
 std::string str = "EXAMPLE:";
 std::transform(str.begin(), str.end(), str.begin(), ClearForbidden);
 std::cout << str << std::endl;
 return 0;
}

Question 12

Didn't even see this when I was just posting my answer. Yet another way to do it with a different STL algorithm :)

Question 13

Same thing with lambda:

std::transform(str.begin(), str.end(), str.begin(), [&forbidden](char c) { return forbidden.find(c) != std::string::npos ? ' ' : c; }

Question 14

Or, here's yet another way you could do it by using all stuff from the STL:

#include <algorithm>
#include <string>
#include <iostream>
bool isForbidden( char c )
{
 static std::string forbiddenChars( "\\/:?\"<>|" );
 return std::string::npos != forbiddenChars.find( c );
}
int main()
{
 std::string myString( "hell?o" );
 std::replace_if( myString.begin(), myString.end(), isForbidden, ' ' );
 std::cout << "Now: " << myString << std::endl;
}

Question 15

One thing that I would change about your function (in addition to Jonathan's recommendation of using a string to store the forbidden characters), is the argument type of removeForbiddenChar to string& instead of string*. It is generally considered good practice in C++ to use references over pointers where possible (see for example this entry in the C++ faq-lite).

One further, minor cosmetic change I'd recommend is renaming the function to removeForbiddenChars (plural) as that is more descriptive of what it does.

Question 16

You are never checking the validity of the string * s, so if a nullptr is passed in the removeForbiddenChar function will attempt to dereference a nullptr. This implies that the caller of removeForbiddenChar should check for nullptr before calling removeForbiddenChar, but the caller won't necessarily be aware of this unless they view the internals of removeForbiddenChar. Requiring the reference to be passed in instead of a pointer conveys that your intention is: "You MUST have a valid string in order to call removeForbiddenChar."

Question 17

C comes with a helpful function size_t strcspn(const char *string, const char *delimiters) that you can implement this on top of. The ASCII version is pretty fast; it uses a bit vector to test for the delimiter characters.

Question 18

If you are looking for performance, this one is hard to beat.

Question 19

Solution with no conditional branching.
Swapping space for time optimization.

Simplified algorithm:

void removeForbiddenChar(string* s)
{
 for (string::iterator it = s->begin() ; it < s->end() ; ++it)
 {
 // replace element with their counterpart in the map
 // This replaces forbidden characters with space.
 (*it) = charMap[*it];
 }
}

Or the C++0x version:

void removeForbiddenChar(std::string* s)
{
 std::transform(s->begin(), s->end(), [](char c) => {return charMap[c];});
}

Just need the data:

char charMap[] =
 // The majority of characters in this array
 // map the poistion to the same character code.
 // charMap['A'] == 'A'
 // For forbidden characters a space is in the position
 // charMap['<'] == ' '
 // Note: \xxx is an octal escape sequence
 "000円001円002円003円004円005円006円007円"
 "010円011円012円013円014円015円016円017円"
 "020円021円022円023円024円025円026円027円"
 "030円031円032円033円034円035円036円037円"
 "040円041円 043円044円045円046円047円" // replaced 042円(") with space
 "050円051円052円053円054円055円056円 " // replaced 057円(/) with space
 "060円061円062円063円064円065円066円067円"
 "070円071円 073円 075円 " // replaced 072円(:)074円(<)076円(>)077円(?) with space
 "100円101円102円103円104円105円106円107円"
 "110円111円112円113円114円115円116円117円"
 "120円121円122円123円124円125円126円127円"
 "130円131円132円133円 135円136円137円" // replaced 134円(\)
 "140円141円142円143円144円145円146円147円"
 "150円151円152円153円154円155円156円157円"
 "160円161円162円163円164円165円166円167円"
 "170円171円172円173円174円175円176円177円"
 "200円201円202円203円204円205円206円207円"
 "210円211円212円213円214円215円216円217円"
 "220円221円222円223円224円225円226円227円"
 "230円231円232円233円234円235円236円237円"
 "240円241円242円243円244円245円246円247円"
 "250円251円252円253円254円255円256円257円"
 "260円261円262円263円264円265円266円267円"
 "270円271円272円273円274円275円276円277円"
 "300円301円302円303円304円305円306円307円"
 "310円311円312円313円314円315円316円317円"
 "320円321円322円323円324円325円326円327円"
 "330円331円332円333円334円335円336円337円"
 "340円341円342円343円344円345円346円347円"
 "350円351円352円353円354円355円356円357円"
 "360円361円362円363円364円365円366円367円"
 "370円371円372円373円374円375円376円377円";

Question 20

Similar to strcspn is strpbrk, but instead of returning offsets, it returns a pointer to the next match and NULL if no more matches. This makes the replacement as simple as:

while ((filename = strpbrk(filename , "\\/:?\"<>|")) != NULL)
 *filename++ = '_';

doppelgreener doppelgreener 9939 silver badges18 bronze badges · Accepted Answer · 2011-01-27 16:14:58Z

Declare a string containing the illegal characters: "\\/:?"<>|". All you need to do is check if the char is in the array, so use a native function for that, or write a method CharInString(char* needle, string* haystack) which loops through the contents of the provided haystack to check if the needle is inside it.

Your loop should end up looking like this:

string illegalChars = "\\/:?\"<>|"
for (it = s->begin() ; it < s->end() ; ++it){
 bool found = illegalChars.find(*it) != string::npos;
 if(found){
 *it = ' ';
 }
}

It's more maintainable and readable. You can tell if you've duplicated a character quite easily and since you can do it with any target string and any string of illegalChars you've just created for yourself a generic RemoveIllegalChars(string* targetString, string* illegalChars) method usable anywhere in your program.

_{I may be using those pointers wrong. My C++fu is weak... for now.}

+1, I was also going to recommend using a string to store the forbidden characters. I would add that this change makes it very easy to add the forbidden characters as a parameter to the removeForbiddenChars function, so that if the need should ever arise, it can be used in situations where different sets of characters are forbidden. Also you can use the find method to find out whether a character is in a string, so you don't necessarily need to write a CharInString function (or you could write is a simple wrapper around find).
@sepp2k: We seem to be on the same wavelength here! :) I'll update my answer with the find method.
Maybe the file names are short and we don't call this function much, but please be aware that the proposed solution is O(n*m) on the number of characters in the string (n) and the number of illegal characters in the string (m).

Stack Exchange Network

Function for removing forbidden characters

7 Answers 7

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function for removing forbidden characters

7 Answers 7

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions