In my system I frequently operate with airport codes ("YYZ"
, "LAX"
, "SFO"
, etc.), they are always in the exact same format (3 letter, represented as uppercase). The system typically deals with 25-50 of these (different) codes per API request, with over a thousand allocations total, they are passed around through many layers of our application, and are compared for equality quite often.
We started with just passing strings around, which worked fine for a bit but we quickly noticed lots of programming mistakes by passing in a wrong code somewhere the 3 digit code was expected. We also ran into issues where we were supposed to do a case-insensitive comparison and instead did not, resulting in bugs.
From this, I decided to stop passing strings around and create an Airport
class, which has a single constructor that takes and validates the airport code.
public sealed class Airport
{
public Airport(string code)
{
if (code == null)
{
throw new ArgumentNullException(nameof(code));
}
if (code.Length != 3 || !char.IsLetter(code[0])
|| !char.IsLetter(code[1]) || !char.IsLetter(code[2]))
{
throw new ArgumentException(
"Must be a 3 letter airport code.",
nameof(code));
}
Code = code.ToUpperInvariant();
}
public string Code { get; }
public override string ToString()
{
return Code;
}
private bool Equals(Airport other)
{
return string.Equals(Code, other.Code);
}
public override bool Equals(object obj)
{
return obj is Airport airport && Equals(airport);
}
public override int GetHashCode()
{
return Code?.GetHashCode() ?? 0;
}
public static bool operator ==(Airport left, Airport right)
{
return Equals(left, right);
}
public static bool operator !=(Airport left, Airport right)
{
return !Equals(left, right);
}
}
This made our code much easier to understand and we simplified our equality checks, dictionary / set usages. We now know that if our methods accept an Airport
instance that it will behave the way we expect, it has simplified our method checks to a null reference check.
The thing I did notice, however, was the garbage collection was running much more often, which I tracked down to a lot of instances of Airport
getting collected.
My solution to this was to convert the class
into a struct
. Mostly it was just a keyword change, with the exception of GetHashCode
and ToString
:
public override string ToString()
{
return Code ?? string.Empty;
}
public override int GetHashCode()
{
return Code?.GetHashCode() ?? 0;
}
To handle the case where default(Airport)
is used.
My questions:
Was creating an
Airport
class or struct a good solution in general, or am I solving the wrong problem / solving it the wrong way by creating the type? If it's not a good solution, what is a better solution?How should my application handle instances where the
default(Airport)
is used? A type ofdefault(Airport)
is nonsensical to my application, so I've been doingif (airport == default(Airport) { throw ... }
in places where getting an instance ofAirport
(and itsCode
property) is critical to the operation.
Notes: I reviewed the questions C#/VB struct – how to avoid case with zero default values, which is considered invalid for given structure?, and Use struct or not before asking my question, however I think my questions are different enough to warrant its own post.
4 Answers 4
Update: I rewrote my answer to address some incorrect assumptions about C# structs, as well as the OP informing us in comments that interned strings are being used.
If you can control the data coming in to your system, use a class as you posted in your question. If someone runs default(Airport)
they will get a null
value back. Be sure to write your private Equals
method to return false whenever comparing null Airport objects, and then let the NullReferenceException
's fly elsewhere in the code.
However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case a struct is ideal for the simple fact default(Airport)
will give you something other than a null
pointer. Make up an obvious value to represent "no value" or the "default value" so you have something to print on screen or in a log file (like "---" for instance). In fact, I would just keep the code
private and not expose a Code
property at all — just focus on behavior here.
public struct Airport
{
private string code;
public Airport(string code)
{
// Check `code` for validity, throw exceptions if not valid
this.code = code;
}
public override string ToString()
{
return code ?? (code = "---");
}
// int GetHashcode()
// bool Equals(...)
// bool operator ==(...)
// bool operator !=(...)
private bool Equals(Airport other)
{
if (other == null)
// Even if this method is private, guard against null pointers
return false;
if (ToString() == "---" || other.ToString() == "---")
// "Default" values should never match anything, even themselves
return false;
// Do a case insensitive comparison to enforce logic that airport
// codes are not case sensitive
return string.Equals(
ToString(),
other.ToString(),
StringComparison.InvariantCultureIgnoreCase);
}
}
Worse case scenario converting default(Airport)
to a string prints out "---"
and returns false when compared to other valid Airport codes. Any "default" airport code matches nothing, including other default airport codes.
Yes, structs are meant to be values allocated on the stack, and any pointers to heap memory basically negate the performance advantages of structs, but in this case the default value of a struct has meaning and provides some additional bullet resistance to the rest of the application.
I would bend the rules a little here, because of that.
Original Answer (with some factual errors)
If you can control the data coming in to your system, I would do as Robert Harvey suggested in the comments: Create a parameterless constructor and throw an exception when it is called. This prevents invalid data from entering the system via default(Airport)
.
public Airport()
{
throw new InvalidOperationException("...");
}
However, if you are taking data into the system from sources you don't control, you don't necessary want to crash the whole thread. In this case you can create an airport code that is invalid, but make it seem like an obvious error. This would involve creating a parameterless constructor and setting the Code
to something like "---":
public Airport()
{
Code = "---";
}
Since you are using a string
as the Code, there is no point in using a struct. The struct gets allocated on the stack, only to have the Code
allocated as a pointer to a string in heap memory, so no difference here between class and struct.
If you changed the airport code to a 3 item array of char's then a struct would be fully allocated on the stack. Even then the volume of data isn't that big to make a difference.
-
If my application were using interned strings for the
Code
property, would that change your justification regarding your point of the string being in heap memory?Matthew– Matthew2018年09月10日 18:16:55 +00:00Commented Sep 10, 2018 at 18:16 -
@Matthew: Is using a class giving you a performance problem? If not, then flip a coin to decide which to use.Greg Burghardt– Greg Burghardt2018年09月10日 18:25:23 +00:00Commented Sep 10, 2018 at 18:25
-
4@Matthew: Really the important thing is you centralized the troublesome logic of normalizing the codes and comparisons. After that "class versus struct" is just an academic discussion, until you measure a big enough impact in performance to justify the extra developer time to have an academic discussion.Greg Burghardt– Greg Burghardt2018年09月10日 18:27:13 +00:00Commented Sep 10, 2018 at 18:27
-
1That's true, I don't mind having an academic discussion from time to time if it helps me create better informed solutions in the future.Matthew– Matthew2018年09月10日 19:14:44 +00:00Commented Sep 10, 2018 at 19:14
-
@Matthew: Yup, you are absolutely right. They say "talk is cheap." It's certainly cheaper than not talking and building something poorly. :)Greg Burghardt– Greg Burghardt2018年09月10日 19:25:41 +00:00Commented Sep 10, 2018 at 19:25
Use the Flyweight pattern
Since Airport is, correctly, immutable, there is no need to create more than one instance of any particular one, say, SFO. Use a Hashtable or similar (note, I'm a Java guy, not C# so exact details might vary), to cache Airports when they get created. Before creating a new one, check in the Hashtable. You are never freeing up Airports, so GC has no need to free them.
One additional minor advantage (at least in Java, not sure about C#) is that you don't need to write an equals()
method, a simple ==
will do. Same for hashcode()
.
-
3Excellent usage of the flyweight pattern.Neil– Neil2018年09月11日 06:53:57 +00:00Commented Sep 11, 2018 at 6:53
-
2Assuming OP keeps using a struct and not a class, isn't string interning already handling the reusable string values? The structs already live on the stack, the strings are alreadybeing reused so as to avoid duplicate values in memory. What additional benefit would be gained from the flyweight pattern?Flater– Flater2018年09月11日 08:38:11 +00:00Commented Sep 11, 2018 at 8:38
-
Something to watch out for. If an airport is added or removed you'll want to build in a way to refresh this static list without restarting the application or redeploying it. Airports aren't added or removed often, but business owners tend to get a little upset when a simple change becomes this complicated. "Can't I just add it someplace?! Why do we have to schedule a release/application restart and inconvenience our customers?" But I was also thinking of using some sort of static cache at first too.Greg Burghardt– Greg Burghardt2018年09月11日 12:10:05 +00:00Commented Sep 11, 2018 at 12:10
-
@Flater Reasonable point. I'd say less need for junior programmers to reason about stack vs. heap. Plus see my addition - no need to write equals().user949300– user9493002018年09月11日 14:48:25 +00:00Commented Sep 11, 2018 at 14:48
-
1@Greg Burghardt If the
getAirportOrCreate()
code is properly synchronized, there's no technical reason you can't create new Airports as needed during runtime. There may be business reasons.user949300– user9493002018年09月11日 15:12:38 +00:00Commented Sep 11, 2018 at 15:12
I’m not a particularly advanced programmer, but wouldn’t this be a perfect use for an Enum?
There are different ways to construct enum classes from lists or strings. Here’s one I’ve seen in the past, not sure if it’s the best way though.
https://blog.kloud.com.au/2016/06/17/converting-webconfig-values-into-enum-or-list/
-
2When there are potentially thousands of different values (as is the case with airport codes), an enum just isn't practical.Ben Cottrell– Ben Cottrell2018年09月10日 22:36:46 +00:00Commented Sep 10, 2018 at 22:36
-
Yes, but the link I posted is how to load strings as enums. Here's another link to load a lookup table as an enum. It might be a bit of work, but would take advantage of the power of enums. exceptionnotfound.net/…Adam B– Adam B2018年09月10日 22:43:36 +00:00Commented Sep 10, 2018 at 22:43
-
1Or a list of valid codes could be loaded from a database or file. Then an airport code is just checked to be among that list. This is what you typically do when you no longer want to hardcode the values and/or the list becomes to long to manage.Neil– Neil2018年09月11日 06:52:43 +00:00Commented Sep 11, 2018 at 6:52
-
@BenCottrell that is precisely what code gen and T4 templates are for.RubberDuck– RubberDuck2018年09月11日 11:54:43 +00:00Commented Sep 11, 2018 at 11:54
One of the reasons you're seeing more GC activity is because you're creating a second string now -- the .ToUpperInvariant()
version of the original string. The original string is eligible for GC right after the constructor runs and the second one is eligible at the same time as the Airport
object is. You might be able to minimize it in a different fashion (note the third parameter to string.Equals()
):
public sealed class Airport : IEquatable<Airport>
{
public Airport(string code)
{
if (code == null)
{
throw new ArgumentNullException(nameof(code));
}
if (code.Length != 3 || !char.IsLetter(code[0])
|| !char.IsLetter(code[1]) || !char.IsLetter(code[2]))
{
throw new ArgumentException(
"Must be a 3 letter airport code.",
nameof(code));
}
Code = code;
}
public string Code { get; }
public override string ToString()
{
return Code; // TODO: Upper-case it here if you really need to for display.
}
public bool Equals(Airport other)
{
return string.Equals(Code, other?.Code, StringComparison.InvariantCultureIgnoreCase);
}
public override bool Equals(object obj)
{
return obj is Airport airport && Equals(airport);
}
public override int GetHashCode()
{
return Code.GetHashCode();
}
public static bool operator ==(Airport left, Airport right)
{
return Equals(left, right);
}
public static bool operator !=(Airport left, Airport right)
{
return !Equals(left, right);
}
}
-
Doesn't this yield different hash codes for equal (but differently capitalized) Airports?Hero Wanders– Hero Wanders2018年09月11日 05:37:54 +00:00Commented Sep 11, 2018 at 5:37
-
Yeah, I would imagine so. Dangit.Jesse C. Slicer– Jesse C. Slicer2018年09月11日 08:48:12 +00:00Commented Sep 11, 2018 at 8:48
-
This is a very good point, never thought of it, I'll look at making these changes.Matthew– Matthew2018年09月11日 13:09:19 +00:00Commented Sep 11, 2018 at 13:09
-
1In regards to
GetHashCode
, should just useStringComparer.OrdinalIgnoreCase.GetHashCode(Code)
or similarMatthew– Matthew2018年09月11日 13:14:20 +00:00Commented Sep 11, 2018 at 13:14
default(Airport)
problem is by simply disallowing default instances. You can do that by writing a parameterless constructor and throwingInvalidOperationException
orNotImplementedException
in it.