struct with nonsensical default value

Question 1

In my system I frequently operate with airport codes ("YYZ", "LAX", "SFO", etc.), they are always in the exact same format (3 letter, represented as uppercase). The system typically deals with 25-50 of these (different) codes per API request, with over a thousand allocations total, they are passed around through many layers of our application, and are compared for equality quite often.

We started with just passing strings around, which worked fine for a bit but we quickly noticed lots of programming mistakes by passing in a wrong code somewhere the 3 digit code was expected. We also ran into issues where we were supposed to do a case-insensitive comparison and instead did not, resulting in bugs.

From this, I decided to stop passing strings around and create an Airport class, which has a single constructor that takes and validates the airport code.

public sealed class Airport
{
 public Airport(string code)
 {
 if (code == null)
 {
 throw new ArgumentNullException(nameof(code));
 }
 if (code.Length != 3 || !char.IsLetter(code[0]) 
 || !char.IsLetter(code[1]) || !char.IsLetter(code[2]))
 {
 throw new ArgumentException(
 "Must be a 3 letter airport code.", 
 nameof(code));
 }
 Code = code.ToUpperInvariant();
 }
 public string Code { get; }
 public override string ToString()
 {
 return Code;
 }
 private bool Equals(Airport other)
 {
 return string.Equals(Code, other.Code);
 }
 public override bool Equals(object obj)
 {
 return obj is Airport airport && Equals(airport);
 }
 public override int GetHashCode()
 {
 return Code?.GetHashCode() ?? 0;
 }
 public static bool operator ==(Airport left, Airport right)
 {
 return Equals(left, right);
 }
 public static bool operator !=(Airport left, Airport right)
 {
 return !Equals(left, right);
 }
}

This made our code much easier to understand and we simplified our equality checks, dictionary / set usages. We now know that if our methods accept an Airport instance that it will behave the way we expect, it has simplified our method checks to a null reference check.

The thing I did notice, however, was the garbage collection was running much more often, which I tracked down to a lot of instances of Airport getting collected.

My solution to this was to convert the class into a struct. Mostly it was just a keyword change, with the exception of GetHashCode and ToString:

public override string ToString()
{
 return Code ?? string.Empty;
}
public override int GetHashCode()
{
 return Code?.GetHashCode() ?? 0;
}

To handle the case where default(Airport) is used.

My questions:

Was creating an Airport class or struct a good solution in general, or am I solving the wrong problem / solving it the wrong way by creating the type? If it's not a good solution, what is a better solution?
How should my application handle instances where the default(Airport) is used? A type of default(Airport) is nonsensical to my application, so I've been doing if (airport == default(Airport) { throw ... } in places where getting an instance of Airport (and its Code property) is critical to the operation.

Notes: I reviewed the questions C#/VB struct – how to avoid case with zero default values, which is considered invalid for given structure?, and Use struct or not before asking my question, however I think my questions are different enough to warrant its own post.

Question 2

Does the garbage collection have a material impact on the way your application performs? In other words, does it matter?

Question 3

Anyway, yes, the class solution was a "good" one. The way you know that is it solved your problem without creating any new ones.

Question 4

One way you can solve the default(Airport) problem is by simply disallowing default instances. You can do that by writing a parameterless constructor and throwing InvalidOperationException or NotImplementedException in it.

Question 5

On a side note, instead of confirming that your initialization string is in fact 3 alpha characters, why not just compare it to the finite list of all airport codes (e.g., github.com/datasets/airport-codes or similar)?

Question 6

I'm willing to bet several beers that this is not the root of a performance problem. A normal laptop can allocate in the order 10M objects/seconds.

Question 7

Update: I rewrote my answer to address some incorrect assumptions about C# structs, as well as the OP informing us in comments that interned strings are being used.

If you can control the data coming in to your system, use a class as you posted in your question. If someone runs default(Airport) they will get a null value back. Be sure to write your private Equals method to return false whenever comparing null Airport objects, and then let the NullReferenceException's fly elsewhere in the code.

However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case a struct is ideal for the simple fact default(Airport) will give you something other than a null pointer. Make up an obvious value to represent "no value" or the "default value" so you have something to print on screen or in a log file (like "---" for instance). In fact, I would just keep the code private and not expose a Code property at all — just focus on behavior here.

public struct Airport
{
 private string code;
 public Airport(string code)
 {
 // Check `code` for validity, throw exceptions if not valid
 this.code = code;
 }
 public override string ToString()
 {
 return code ?? (code = "---");
 }
 // int GetHashcode()
 // bool Equals(...)
 // bool operator ==(...)
 // bool operator !=(...)
 private bool Equals(Airport other)
 {
 if (other == null)
 // Even if this method is private, guard against null pointers
 return false;
 if (ToString() == "---" || other.ToString() == "---")
 // "Default" values should never match anything, even themselves
 return false;
 // Do a case insensitive comparison to enforce logic that airport
 // codes are not case sensitive
 return string.Equals(
 ToString(),
 other.ToString(),
 StringComparison.InvariantCultureIgnoreCase);
 }
}

Worse case scenario converting default(Airport) to a string prints out "---" and returns false when compared to other valid Airport codes. Any "default" airport code matches nothing, including other default airport codes.

Yes, structs are meant to be values allocated on the stack, and any pointers to heap memory basically negate the performance advantages of structs, but in this case the default value of a struct has meaning and provides some additional bullet resistance to the rest of the application.

I would bend the rules a little here, because of that.

Original Answer (with some factual errors)

If you can control the data coming in to your system, I would do as Robert Harvey suggested in the comments: Create a parameterless constructor and throw an exception when it is called. This prevents invalid data from entering the system via default(Airport).

public Airport()
{
 throw new InvalidOperationException("...");
}

However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case you can create an airport code that is invalid, but make it seem like an obvious error. This would involve creating a parameterless constructor and setting the Code to something like "---":

public Airport()
{
 Code = "---";
}

Since you are using a string as the Code, there is no point in using a struct. The struct gets allocated on the stack, only to have the Code allocated as a pointer to a string in heap memory, so no difference here between class and struct.

If you changed the airport code to a 3 item array of char's then a struct would be fully allocated on the stack. Even then the volume of data isn't that big to make a difference.

Question 8

If my application were using interned strings for the Code property, would that change your justification regarding your point of the string being in heap memory?

Question 9

@Matthew: Is using a class giving you a performance problem? If not, then flip a coin to decide which to use.

Question 10

@Matthew: Really the important thing is you centralized the troublesome logic of normalizing the codes and comparisons. After that "class versus struct" is just an academic discussion, until you measure a big enough impact in performance to justify the extra developer time to have an academic discussion.

Question 11

That's true, I don't mind having an academic discussion from time to time if it helps me create better informed solutions in the future.

Question 12

@Matthew: Yup, you are absolutely right. They say "talk is cheap." It's certainly cheaper than not talking and building something poorly. :)

Question 13

Use the Flyweight pattern

Since Airport is, correctly, immutable, there is no need to create more than one instance of any particular one, say, SFO. Use a Hashtable or similar (note, I'm a Java guy, not C# so exact details might vary), to cache Airports when they get created. Before creating a new one, check in the Hashtable. You are never freeing up Airports, so GC has no need to free them.

One additional minor advantage (at least in Java, not sure about C#) is that you don't need to write an equals() method, a simple == will do. Same for hashcode().

Question 14

Excellent usage of the flyweight pattern.

Question 15

Assuming OP keeps using a struct and not a class, isn't string interning already handling the reusable string values? The structs already live on the stack, the strings are alreadybeing reused so as to avoid duplicate values in memory. What additional benefit would be gained from the flyweight pattern?

Question 16

Something to watch out for. If an airport is added or removed you'll want to build in a way to refresh this static list without restarting the application or redeploying it. Airports aren't added or removed often, but business owners tend to get a little upset when a simple change becomes this complicated. "Can't I just add it someplace?! Why do we have to schedule a release/application restart and inconvenience our customers?" But I was also thinking of using some sort of static cache at first too.

Question 17

@Flater Reasonable point. I'd say less need for junior programmers to reason about stack vs. heap. Plus see my addition - no need to write equals().

Question 18

@Greg Burghardt If the getAirportOrCreate() code is properly synchronized, there's no technical reason you can't create new Airports as needed during runtime. There may be business reasons.

Question 19

I’m not a particularly advanced programmer, but wouldn’t this be a perfect use for an Enum?

There are different ways to construct enum classes from lists or strings. Here’s one I’ve seen in the past, not sure if it’s the best way though.

https://blog.kloud.com.au/2016/06/17/converting-webconfig-values-into-enum-or-list/

Question 20

When there are potentially thousands of different values (as is the case with airport codes), an enum just isn't practical.

Question 21

Yes, but the link I posted is how to load strings as enums. Here's another link to load a lookup table as an enum. It might be a bit of work, but would take advantage of the power of enums. exceptionnotfound.net/…

Question 22

Or a list of valid codes could be loaded from a database or file. Then an airport code is just checked to be among that list. This is what you typically do when you no longer want to hardcode the values and/or the list becomes to long to manage.

Question 23

@BenCottrell that is precisely what code gen and T4 templates are for.

Question 24

One of the reasons you're seeing more GC activity is because you're creating a second string now -- the .ToUpperInvariant() version of the original string. The original string is eligible for GC right after the constructor runs and the second one is eligible at the same time as the Airport object is. You might be able to minimize it in a different fashion (note the third parameter to string.Equals()):

public sealed class Airport : IEquatable<Airport>
{
 public Airport(string code)
 {
 if (code == null)
 {
 throw new ArgumentNullException(nameof(code));
 }
 if (code.Length != 3 || !char.IsLetter(code[0])
 || !char.IsLetter(code[1]) || !char.IsLetter(code[2]))
 {
 throw new ArgumentException(
 "Must be a 3 letter airport code.",
 nameof(code));
 }
 Code = code;
 }
 public string Code { get; }
 public override string ToString()
 {
 return Code; // TODO: Upper-case it here if you really need to for display.
 }
 public bool Equals(Airport other)
 {
 return string.Equals(Code, other?.Code, StringComparison.InvariantCultureIgnoreCase);
 }
 public override bool Equals(object obj)
 {
 return obj is Airport airport && Equals(airport);
 }
 public override int GetHashCode()
 {
 return Code.GetHashCode();
 }
 public static bool operator ==(Airport left, Airport right)
 {
 return Equals(left, right);
 }
 public static bool operator !=(Airport left, Airport right)
 {
 return !Equals(left, right);
 }
}

Question 25

Doesn't this yield different hash codes for equal (but differently capitalized) Airports?

Question 26

Yeah, I would imagine so. Dangit.

Question 27

This is a very good point, never thought of it, I'll look at making these changes.

Question 28

In regards to GetHashCode, should just use StringComparer.OrdinalIgnoreCase.GetHashCode(Code) or similar

score 6 · Accepted Answer · 2018-09-10 18:09:55Z

Update: I rewrote my answer to address some incorrect assumptions about C# structs, as well as the OP informing us in comments that interned strings are being used.

If you can control the data coming in to your system, use a class as you posted in your question. If someone runs default(Airport) they will get a null value back. Be sure to write your private Equals method to return false whenever comparing null Airport objects, and then let the NullReferenceException's fly elsewhere in the code.

However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case a struct is ideal for the simple fact default(Airport) will give you something other than a null pointer. Make up an obvious value to represent "no value" or the "default value" so you have something to print on screen or in a log file (like "---" for instance). In fact, I would just keep the code private and not expose a Code property at all — just focus on behavior here.

public struct Airport
{
 private string code;
 public Airport(string code)
 {
 // Check `code` for validity, throw exceptions if not valid
 this.code = code;
 }
 public override string ToString()
 {
 return code ?? (code = "---");
 }
 // int GetHashcode()
 // bool Equals(...)
 // bool operator ==(...)
 // bool operator !=(...)
 private bool Equals(Airport other)
 {
 if (other == null)
 // Even if this method is private, guard against null pointers
 return false;
 if (ToString() == "---" || other.ToString() == "---")
 // "Default" values should never match anything, even themselves
 return false;
 // Do a case insensitive comparison to enforce logic that airport
 // codes are not case sensitive
 return string.Equals(
 ToString(),
 other.ToString(),
 StringComparison.InvariantCultureIgnoreCase);
 }
}

Worse case scenario converting default(Airport) to a string prints out "---" and returns false when compared to other valid Airport codes. Any "default" airport code matches nothing, including other default airport codes.

Yes, structs are meant to be values allocated on the stack, and any pointers to heap memory basically negate the performance advantages of structs, but in this case the default value of a struct has meaning and provides some additional bullet resistance to the rest of the application.

I would bend the rules a little here, because of that.

Original Answer (with some factual errors)

If you can control the data coming in to your system, I would do as Robert Harvey suggested in the comments: Create a parameterless constructor and throw an exception when it is called. This prevents invalid data from entering the system via default(Airport).

public Airport()
{
 throw new InvalidOperationException("...");
}

However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case you can create an airport code that is invalid, but make it seem like an obvious error. This would involve creating a parameterless constructor and setting the Code to something like "---":

public Airport()
{
 Code = "---";
}

Since you are using a string as the Code, there is no point in using a struct. The struct gets allocated on the stack, only to have the Code allocated as a pointer to a string in heap memory, so no difference here between class and struct.

If you changed the airport code to a 3 item array of char's then a struct would be fully allocated on the stack. Even then the volume of data isn't that big to make a difference.

If my application were using interned strings for the Code property, would that change your justification regarding your point of the string being in heap memory?
@Matthew: Is using a class giving you a performance problem? If not, then flip a coin to decide which to use.
@Matthew: Really the important thing is you centralized the troublesome logic of normalizing the codes and comparisons. After that "class versus struct" is just an academic discussion, until you measure a big enough impact in performance to justify the extra developer time to have an academic discussion.
That's true, I don't mind having an academic discussion from time to time if it helps me create better informed solutions in the future.
@Matthew: Yup, you are absolutely right. They say "talk is cheap." It's certainly cheaper than not talking and building something poorly. :)

Stack Exchange Network

struct with nonsensical default value

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

struct with nonsensical default value

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions