Deserializing a customly formatted string

Question 1

The component receives a serialized message as a string and a serialized schema as a string. The message contains key-value pairs or just the values. The schema describes how to turn that message into a Java object. Firstly, we parse the schema into a Java object (schema format is known in advance). There is an enum with possible data providers, which imply the message format (it can be XML, JSON, or something custom). So, the schema contains this enum value as well as some additional information (like the keys if the message only has the values, so we can later map already known key names to values and we don't need to send the keys every time over the wire).

Now, that's how it works:

// Get the schema serialization format.
String schemaFormat = "XML";
// Get the schema.
Serialized schemaSerialized = new Serialized(
 "<DataProvider format=\"FIXED_LENGTH_STRING\">\n" +
 " <variable name=\"agentId\" len=\"5\"/>\n" +
 " <variable name=\"callsPerDay\" len=\"3\"/>\n" +
 "</DataProvider>");
// Parse the schema.
Schema schemaAbstract = Deserializer.deserializeSchema(
 schemaSerialized,
 FormatSchema.valueOf(schemaFormat));
// Downcast the schema to its real type.
XMLSchema schema = (XMLSchema) schemaAbstract;
// Get the data from the wire. metric1 | metric2
Serialized msgSerialized = new Serialized("12230015");
// Deserialize the data.
Message msg = Deserializer.deserializeMessage(msgSerialized, schema);
// Serialize the message into JSON
String msgJson= msg.toJSONString();

Result: msgJson is now: {"Metrics":{"agentId":"12230","callsPerDay":"15"}}

Now, I'll explain the classes briefly (too much code to fit here):

class Serialized: Has a String private field, sets in the constructor, there is a getter.

interface Deserialized extends Serializable: just an empty interface. Should I implement Serializable? I don't really know much about it, just seems appropriate.

interface Schema extends Deserialized: represents an abstract Java Object for any kind of schema. Has a getter for FormatMessage enum value.

interface Message extends Deserialized: represents an abstract Java Object for any kind of message. Has a toJSONString() method.

interface Parser: empty. Because for a message I need 2 parameters, for schema one.

interface ParserMessage extends Parser: has a method Message parse(Serialized, Schema)

interface ParserSchema extends Parser: has a method Schema parse(Serialized)

interface Format: empty

enum FormatMessage implements Format

enum FormatSchema implements Format

For each FormatMessage value, there is a concrete class for Message and ParserMessage. For each FormatSchema value, there is a concrete class for Schema and ParserSchema. Obviously, ParserSchema returns its corresponding Schema and the same with messages.

Now, the main part:

class Deserializer: has 2 methods: deserializeMessage and deserializeSchema. For both he accepts a Format (for message in a form of schema). Inside those two methods this class uses a Strategy pattern by calling methods named setParserMessage and setParserSchema. Inside those methods, there is a call to a ParserFactory which holds all the concrete Parser static instances and gives away references to them by switch-casing by the given Format on the appropriate enum.

The problem is that I'm afraid that the Deserializer class abstracts everything too much that I have to cast a lot, even the caller has to cast a little. My team accepted this design, but I just want to make sure :)

UPD: The code for Deserializer and the ParserFactory as requested.

/**
 * Used to deserialize schemas and the corresponding to them messages.
 * Abstracts all the Parser implementations and uses the Strategy design pattern to switch between them based on the Format using the ParserFactory.
 */
public class Deserializer {
 private static ParserMessage parserMessage;
 private static ParserSchema parserSchema;
 private Deserializer() {} // Imitating a static class
 /**
 * Deserializes a schema into the concrete Schema object.
 * The returned object can be later used to deserialize the actual messages sent by the corresponding provider.
 *
 * @param schema a serialized schema
 * @param schemaFormat the serialization type of the schema
 * @return this schema as a Java object
 */
 public static Schema deserializeSchema(Serialized schema,
 FormatSchema schemaFormat
 ) {
 setParserSchema(schemaFormat); // Strategy pattern
 return parserSchema.parse(schema);
 }
 /**
 * Deserializes a message into the concrete Message object.
 *
 * @param msg a serialized message
 * @param schema the schema as a Java object
 * @return this message as a Java object
 */
 public static Message deserializeMessage(Serialized msg,
 Schema schema
 ) {
 setParserMessage(schema.getFormatMessage()); // Strategy pattern
 return parserMessage.parse(msg, schema);
 }
 private static void setParserMessage(FormatMessage msgFormat) { // Strategy pattern
 parserMessage = ParserFactory.getParserMessage(msgFormat); // Factory pattern
 }
 private static void setParserSchema(FormatSchema configFileFormat) { // Strategy pattern
 parserSchema = ParserFactory.getParserSchema(configFileFormat); // Factory pattern
 }
}
/**
 * Abstracts all the Parser implementations using the Factory design pattern.
 * Defines a Factory that gives away the reference at the Parser implementation instance based on the given Format implementation instance (which is an enum option value).
 * <p>
 * The factory contains a lazy-initialized static instance of each currently available Parser implementation.
 */
public class ParserFactory {
 private static final XMLSchemaParser XML_SCHEMA_PARSER = new XMLSchemaParser();
 private static final FIXED_LENGTH_STRINGMessageParser FIXED_LENGTH_STRING_MESSAGE_PARSER = new FIXED_LENGTH_STRINGMessageParser();
 private ParserFactory() {} // imitating a static class
 /**
 * Returns the ParserMessage instance for parsing messages of specific FormatMessage.
 *
 * @param msgFormat the format of the acquired Serialized message
 * @return a reference at the corresponding static ParserMessage instance. Null if not found
 * @see FormatMessage All message formats
 */
 public static ParserMessage getParserMessage(FormatMessage msgFormat) {
 switch (msgFormat) {
 case FIXED_LENGTH_STRING:
 return getFixedLengthStringMessageParserInstance();
 default:
 return null;
 }
 }
 /**
 * Returns the ParserSchema instance for parsing schemas of specific FormatSchema.
 *
 * @param schemaFormat format of the Serialized schema
 * @return reference at the corresponding static ParserSchema instance. Null if not found
 * @see FormatSchema All schema formats
 */
 public static ParserSchema getParserSchema(FormatSchema schemaFormat) {
 switch (schemaFormat) {
 case XML:
 return getXmlSchemaParser();
 default:
 return null;
 }
 }
 private static XMLSchemaParser getXmlSchemaParser() {
 return XML_SCHEMA_PARSER;
 }
 private static FixedLengthStringMessageParser getFixedLengthStringMessageParserInstance() {
 return FIXED_LENGTH_STRING_MESSAGE_PARSER;
 }
}

Question 2

OK, so I'll try to review that which you have provided. Since you didn't post all the relevant code, I might have misunderstood some of your descriptions, so I'll base my review on how I understood them.

First, I suggest you get rid of the class Serialized. The way you describe it, Serialized only seems to be a wrapper class for a String, without any functionality. I really don't see a point in that. You might as well use a String itself whenever you deal with a serialized representation of something.

Then, about your empty interfaces. I don't really see why the interface Deserialized would be useful. As far as I could figure out your description, there is no case where a method parameter can be both a Message or a Schema, and Deserialized doesn't declare any methods itself, so it is completely dispensable. The same applies to Parser and Format. If the ParserMessage and ParserSchema don't even share a single method, then what's the use of them having a common superinterface?

Next, I'm not sure if I understand the purpose of the interfaces Schema and Message. They both return a FormatSchema and FormatMessage, respectively. But if I understand the design of your program correctly, then a message is deserialized using a schema, and a schema is deserialized using a schema format. So why does Schema have a getter for a FormatSchema? I thought a FormatSchema describes how a schema is (de-)serialized, which means that a FormatSchema is not a property of a Schema, so there is no place for a getFormatSchema() (or similar) method in a Schema. The same logic can be applied to Message.

Furthermore, I don't think I understand the difference between a FormatMessage and a Schema. If a message might either contain both keys and values, or just the values, then it seems to me that these would be two different message formats, or at least two different variations of the same message format. Since you didn't provide concrete code, I can only make assumptions, but it seems like you scatter a single responsibility over the two classes/interfaces Schema and FormatMessage, which makes everything more confusing.

Now about something in the code you provided. You cast schemaAbstract to XMLSchema. As far as I can figure out, this is unnecessary, because Deserializer.deserializeMessage accepts a Schema as an argument, and not only an XMLSchema. The whole point of this abstraction is that you don't need to know the concrete type of your Schema at compile time because the deserializer will determine at runtime what to do based on the runtime type of the passed Schema. So there is really no point in casting schemaAbstract to XMLSchema, you might just pass schemaAbstract to the deserializer directly.

And concerning the "main part" of your question: I really think it would be easier to review the class Deserializer if you posted its source code. However, maybe it would be better if you first responded to the points I mentioned, because, depending on how well I understood your description, redesigning the rest of your code might make everything easier to manage.

A last thing about your class/interface names. I think names like FormatSchema or ParserMessage are confusing. If I'm not mistaken, a FormatSchema is primarily a format, and a ParserMessage is primarily a parser, and "schema" and "message" only describe the format/parser further. In the English language, this would be expressed with a noun phrase where the primary noun (the "head") comes last and the modifiers precede it, like "parser factory", or "data provider". Therefore, I also think that a better name for these interfaces would be "MessageParser" or "SchemaFormat" (unless I have misunderstood the purpose of these interfaces). This can cause even more confusion when a method name contains a verb, like "setParserMessage" (at least it did for me when I read this method name).

Update (reviewing `Deserializer` and `ParserFactory`)

OK, so about your class Deserializer: There is really no need to declare two private static variables to hold instances of ParserMessage and ParserSchema, because their sole purpose is to act as intermediate storage when invoking deserializeSchema(Serialized, FormatSchema) and deserializeMessage(Serialized, Schema). You could instead make the methods setParserMessage(FormatMessage) and setParserSchema(FormatSchema) return a ParserMessage and ParserSchema, respectively, instead of void (and rename them accordingly), and just use the returned parser inside the "deserialize" methods, thereby reducing its scope to the smallest extent necessary. Otherwise, you are littering the heap with unused objects that the garbage collector can't collect because, while being assigned to a static field, they are still being referenced by the class Deserializer (and, in a way, littering the class with unused static fields ...).

And regarding the class ParserFactory, I think this class can be dropped entirely by simply taking advantage of the fact that an enum can have fields and methods. Each FormatSchema could contain a field with the respective ParserSchema, and each FormatMessage could contain a field with the respective ParserMessage (in the following code example, I'm going to rename the interfaces by switching the order of the nouns, it's just too confusing for me otherwise):

enum SchemaFormat {
 XML(new XMLSchemaParser());
 final SchemaParser schemaParser;
 SchemaFormat(SchemaParser schemaParser) {
 this.schemaParser = schemaParser;
 }
}

Add access modifiers if needed. Then you can just do this in Deserializer:

public static Schema deserializeSchema(String schema, SchemaFormat schemaFormat) {
 return schemaFormat.schemaParser.parse(schema);
}

The same can be applied to MessageFormat as well. This reduces the number of methods needed in Deserializer to 2, namely the two "deserialize" methods. Maybe you would agree that, with such a design, the Deserializer class can be dropped as well, because now it is nothing more than a container for 2 static methods that both merely wrap another method.

Question 3

Comments are not for extended discussion; this conversation has been moved to chat.

Question 4

@Sam See above comment (my response to your last comment is there).

Stingy Stingy 2,1689 silver badges12 bronze badges · Accepted Answer · 2017-08-02 23:53:42Z