121

My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?

asked Dec 2, 2008 at 6:00
1

15 Answers 15

217

Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.

foreach (var c in Path.GetInvalidFileNameChars()) 
{ 
 fileName = fileName.Replace(c, '-'); 
}

Or in VB:

'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
 filename = filename.Replace(c, "")
Next
'See also IO.Path.GetInvalidPathChars
StayOnTarget
13.3k11 gold badges65 silver badges114 bronze badges
answered Dec 2, 2008 at 7:35
8
  • 10
    How would this solution handle name conflicts? It seems that more than one string can match to a single file name ("Hell?" and "Hell*" for example). If you are ok only removing offending chars then fine; otherwise you need to be careful to handle name conflicts. Commented Jun 13, 2011 at 9:55
  • 2
    what about the filesytem's limits of name (and path) length? what about reserved filenames (PRN CON)? If you need to store the data and the original name you can use 2 files with Guid names: guid.txt and guid.dat Commented Feb 26, 2013 at 11:26
  • 1
    I just wanted mention that this function allows whitespace characters. Commented Mar 18, 2013 at 21:29
  • 8
    One liner, for fun result = Path.GetInvalidFileNameChars().Aggregate(result, (current, c) => current.Replace(c, '-')); Commented Mar 22, 2013 at 3:44
  • 1
    @PaulKnopf, are you sure JetBrain does not have copyright to that code ;) Commented Jun 20, 2015 at 7:25
47

To strip invalid characters:

static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());

To replace invalid characters:

static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());

To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):

static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());
Ben Philipp
1,8961 gold badge16 silver badges34 bronze badges
answered Sep 9, 2010 at 15:58
0
35

This question has been asked many times before and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.

First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.

Second, there are a variety of length limitations. Read the full list for NTFS here.

Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.

Fourth, what do you do if two processes "arbitrarily" pick the same name?

In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.

answered Sep 9, 2010 at 16:24
2
  • 14
    Although you are technically accurate, the GetInvalidFileNameChars is good for 80%+ of the situations you'd use it in, hence it's a good answer. Your answer would have been more appropriate as a comment to the accepted answer I think. Commented Mar 15, 2011 at 13:24
  • 5
    I agree with DourHighArch. Save the file internally as a guid, reference that against the "friendly name" which is stored in a database. Don't let users control your paths on the website or they will try to steal your web.config. If you incorporate url rewriting to make it clean it will only work for matched friendly urls in the database. Commented Oct 16, 2012 at 13:11
22

I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()

Here's my C# contribution:

string file = @"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(), 
 c => file = file.Replace(c.ToString(), String.Empty));

p.s. -- this is more cryptic than it should be -- I was trying to be concise.

answered Dec 2, 2008 at 8:03
6
  • 3
    Why in the world would you use Array.ForEach instead of just foreach here Commented Apr 11, 2012 at 23:21
  • 9
    If you wanted to be even more concise / cryptic: Path.GetInvalidFileNameChars().Aggregate(file, (current, c) => current.Replace(c, '-')) Commented Oct 10, 2012 at 21:09
  • @BlueRaja-DannyPflughoeft Because you want to make it slower? Commented Nov 22, 2014 at 7:07
  • @Johnathan Allen, what makes you think foreach is faster than Array.ForEach ? Commented Nov 24, 2014 at 3:01
  • 5
    @rbuddicom Array.ForEach takes a delegate, which means it needs to invoke a function that can't be inlined. For short strings, you could end up spending more time on function call overhead than actual logic. .NET Core is looking at ways to "de-virtualize" calls, reducing the overhead. Commented Feb 5, 2018 at 23:47
14

Here's my version:

static string GetSafeFileName(string name, char replace = '_') {
 char[] invalids = Path.GetInvalidFileNameChars();
 return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}

I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.

I say all that w/o doing the profiling -- this one just "felt" nice to me. : )

answered Sep 20, 2013 at 12:59
1
  • 1
    You could do new HashSet<char>(Path.GetInvalidFileNameChars()) to avoid O(n) enumeration - micro-optimization. Commented Oct 1, 2015 at 17:36
13

Here's the function that I am using now (thanks jcollum for the C# example):

public static string MakeSafeFilename(string filename, char replaceChar)
{
 foreach (char c in System.IO.Path.GetInvalidFileNameChars())
 {
 filename = filename.Replace(c, replaceChar);
 }
 return filename;
}

I just put this in a "Helpers" class for convenience.

8

If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:

string myCrazyName = "q`w^e!r@t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
 myCrazyName,
 "\W", /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
 "",
 RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"
answered May 28, 2009 at 2:42
2
  • 1
    actually \W matches more than non-alpha-numerics ([^A-Za-z0-9_]). All Unicode 'word' characters (русский中文..., etc.) will not be replaced either. But this is a good thing. Commented Jul 28, 2014 at 21:04
  • 1
    Only downside is that this also removes . so you have to extract the extension first, and add it again after. Commented Sep 23, 2015 at 13:06
7

Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:

public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
 return Regex.Replace(s,
 "[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
 replacement, //can even use a replacement string of any length
 RegexOptions.IgnoreCase);
 //not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}
answered Jul 12, 2013 at 19:55
2
  • 1
    Just missing to compile the Regex and cache it. Commented Jun 23, 2022 at 11:54
  • Yes, ideally you'd compile and cache on first use. Would use a static field and check if it's null. If it is, would compile and cache to it, then use it. A race-condition might occur there if multiple threads call it the 1st time simultaneously, but since all would calculate and cache the same thing, only penalty is that extra calculation by the other threads (last one coming would succeed in persisting its calculated value to the static field) Commented Jun 24, 2022 at 12:14
5
static class Utils
{
 public static string MakeFileSystemSafe(this string s)
 {
 return new string(s.Where(IsFileSystemSafe).ToArray());
 }
 public static bool IsFileSystemSafe(char c)
 {
 return !Path.GetInvalidFileNameChars().Contains(c);
 }
}
answered Apr 18, 2013 at 12:30
5

Why not convert the string to a Base64 equivalent like this:

string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));

If you want to convert it back so you can read it:

UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));

I used this to save PNG files with a unique name from a random description.

answered Apr 28, 2017 at 2:47
1
  • Base64 is not path safe. It contains character like /, +, =. Besides, it has both upper and lower case, not suitable for Windows file system which is case insensitive Commented Sep 22, 2022 at 0:28
4

From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.

 public string GetSafeFilename(string filename)
 {
 string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));
 while (res.IndexOf("!!") >= 0)
 res = res.Replace("!!", "!");
 return res;
 }
answered Sep 11, 2019 at 17:07
1
  • 1
    I like the Split idea. Quick tip: instead of replacing "!!" in a post process step, use StringSplitOptions.RemoveEmptyEntries as the second argument to Split and you will not get any "!!" in what Join returns. Commented Jun 30, 2024 at 15:26
2
private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
 e.Handled = CheckFileNameSafeCharacters(e);
}
/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
 if (e.KeyChar.Equals(24) || 
 e.KeyChar.Equals(3) || 
 e.KeyChar.Equals(22) || 
 e.KeyChar.Equals(26) || 
 e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
 return false;
 if (e.KeyChar.Equals('\b'))//backspace
 return false;
 char[] charArray = Path.GetInvalidFileNameChars();
 if (charArray.Contains(e.KeyChar))
 return true;//Stop the character from being entered into the control since it is non-numerical
 else
 return false; 
}
answered Feb 19, 2016 at 18:22
2

Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.

Here is an example of code you could use :

 string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
 foreach (char c in filename)
 {
 if (!whitelist.Contains(c))
 {
 filename = filename.Replace(c, '-');
 }
 }
answered Jan 11, 2019 at 14:59
1

I find using this to be quick and easy to understand:

<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
 Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function

This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.

answered Apr 9, 2013 at 18:28
0

I took Jonathan Allen's answer and made an extension method that can be called on any string.

public static class StringExtensions
{
 public static string ReplaceInvalidFileNameChars(this string input, char replaceCharacter = '-')
 {
 foreach (char c in Path.GetInvalidFileNameChars())
 {
 input = input.Replace(c, replaceCharacter);
 }
 return input;
 }
}

This can then be used like:

string myFileName = "test > file ? name.txt";
string myValidFileName1 = myFileName.ReplaceInvalidFileNameChars();
string myValidFileName2 = myFileName.ReplaceInvalidFileNameChars('');
string myValidFileName3 = myFileName.ReplaceInvalidFileNameChars('_');
answered May 4, 2023 at 6:23

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.