I'm trying to refactor an ugly code and make it easly extendable in the future.
The application should be nothing else but a series of components that have input(s) and output(s). The components are chained in such manner that the current component's input is the output of one (or more) previous components.
Here's a quick overview of what I have so far:
Reader
- Denotes a data source
- Can be a file on HDD, online resource, database, etc.
Splitter
- Input is
Reader(s)
- Splits the contents of what reader delivers into parts
- Outputs are split contents of
Reader(s)
- Input is
Model
- Input is
Splitter(s)
- Creates a model of something based on
Splitter(s)
outputs - Output is silent, but you can say that the output is an internal state that can be queried for individual inputs
- Input is
Tester
- Input is a model
- It can query a model for a result on some data
- Output is optional, but in case it is used, it's a stream of (queryInput, queryOutput) data
Writer
- Input is practiacally anything that produces a collection of objects
- Writes that collection to wherever
- I dunno what the output should be right now
So, I want to do have the ability to plug them in the following manner:
-->Reader-->Splitter-->Model-->Tester-->Writer-->
However, this would also be a valid combination (it obviously does nothing more but a simply data transfer)
-->Reader-->Writer-->
Now, since I'd like to be able to plug (almost) everything to everything and (possibly) create quite long chains, I'd assume I'd have to have some sort of Pluggable
interface.
Also, when creating such big chain, I'd probably want to wrap it behind a Facade
. And since I'd want each pluggable component (class) to be replaced by some other, then Strategy
pattern comes to mind.
Now, since I've already mentioned the term chain here, Chain of Responsibility
pattern comes to my mind, in the following (or similar way):
public interface Pluggable<InputType, OutputType> {
public void consume(Pluggable<?, InputType> previous);
public Collection<OutputType> produce();
}
For example, if I wanted to have my Splitter
to split a list of File
s provided by Reader
, it might looks something like this:
public class Splitter<InputType, OutputType> implements Pluggable<?, InputType> {
public void consume(Pluggable<?, InputType> previous) {
// retrieves for example a list of InputType objects
}
public Collection<OutputType> produce() {
// filters the collection it got and returns a sublist for example
}
}
In the end, the components might look something like this:
Reader<File, Foo> --> Splitter<Foo, Foo> --> Model<Foo, Void> --> Test<Model, Bar> --> Writer<Bar, DAO>
I don't know how else to describe the problem I'm having, but I'm sure something like this is quite achieveable. For visual example, here's the image of RapidMiner's process
RapidMinerChainExample
Note that I'm not trying to replicate or copy Rapid Miner, it's just that the project I was assigned looks like it can be implemented in simiar way.
I'd appreciate any help regarding how to design such application.
2 Answers 2
No, Chain of Responsibility doesn't make sense here, because it assumes all components have same interface. I don't think Java's type system is good enough to make this fully generic, so I would opt in to type erasure and some kind of "manager" that pipes output of one module into input of next one, while encapsulating the erasure.
The module's interface would look something like this:
public interface Pluggable {
public Class inputType();
public Class outputType();
public Object handle(Object previous);
}
The manager would accept list of instances that implement this interface, check if their input/output classes work and then do the piping between them.
If this was C#, I would make it even nicer by creating abstract class, that would be generic in same way as your interface, I would implement the Pluggable
explicitly(to hide it from sight) and implement the inputType
and outputType
with typeof()
and create abstract and wrap the handle method in generic abstract method. I don't know how that would be possible in Java.
public interface IPluggable
{
Type InputType { get; }
Type OutputType { get; }
Object Handle(Object value);
}
public abstract class Pluggable<TInput, TOutput> : IPluggable
{
Type IPluggable.InputType { get { return typeof(TInput); } }
Type IPluggable.OutputType { get { return typeof(TOutput); } }
object IPluggable.Handle(object value)
{
return Handle((TInput)value);
}
protected abstract TOutput Handle(TInput value);
}
This is how the "manager" class might look like. Sorry C# again.
public class PluginPipeline
{
public static PluginPipeline Create(IEnumerable<IPluggable> plugins)
{
EnsurePluginsAreInRightOrder(plugins);
return new PluginPipeline(plugins);
}
private static void EnsurePluginsAreInRightOrder(IEnumerable<IPluggable> plugins)
{
Type previousType = typeof(object); // first is little special..
foreach(IPluggable plugin in plugins)
{
if (plugin.InputType != previousType)
throw new Exception("Invalid link in pipeline!");
previousType = plugin.OutputType;
}
}
private readonly IEnumerable<IPluggable> _plugins;
private PluginPipeline(IEnumerable<IPluggable> plugins)
{
_plugins = plugins;
}
public void Execute()
{
object previousValue = null;
foreach(IPluggable plugin in _plugins)
{
previousValue = plugin.Handle(previousValue);
}
}
}
-
1@Lopina Yes, see my edit.Euphoric– Euphoric2015年02月26日 08:00:38 +00:00Commented Feb 26, 2015 at 8:00
-
1This is a good answer, but I just want to point out one potential issue (because I had used similar code before) - if certain plugins consume multiple objects to produce a single object, or if it consumes a single object and produce multiple objects (or a variable number of outputs, from zero to unbounded), then the abstract generic
Pluggable.Handle
method will have to also synthesize the code for acceptingIEnumerable<InputType>
and producingIEnumerable<OutputType>
respectively. Also, theforeach (plugin)
style of execution scheduling will not work.rwong– rwong2015年02月26日 08:05:46 +00:00Commented Feb 26, 2015 at 8:05 -
1@Lopina: by "multiple inputs and/or outputs", do you mean multiple items of data, or multiple source/destination plugin nodes?rwong– rwong2015年02月26日 08:06:43 +00:00Commented Feb 26, 2015 at 8:06
-
1One way of extending Euphoric's answer is to put FIFO queues in between each linked producer-consumer pair. If one producer needs to produce data to different consumers, each such link will require its own FIFO queue, and each produced item will need to be "broadcast" into all queues.rwong– rwong2015年02月26日 08:15:20 +00:00Commented Feb 26, 2015 at 8:15
-
1@Lopina Of course, my solution only assume single input/output. It can be generalized for multiple inputs/outputs, but then the complexity spikes up rapidly. Also, the nice generic implementation stops being so nice and simple. As for the last question, resource management is quite complex in itself. Recreating the whole pipeline only when it is needed might be a good idea if resources are scarce.Euphoric– Euphoric2015年02月26日 08:57:15 +00:00Commented Feb 26, 2015 at 8:57
I know it's an oldish question, but check out ReactiveX if you haven't already. I discovered it recently, and it's really changed how I see designs like these. It's basically a combination of type-safe observables, event-based push (sync or async), and classic pipes-and-filters.
You just write little components and snap them together into pipelines, like lego for plumbers.
-
3This answer is a recommendation of a product, but doesn't actually answer how the product solves the problem in the question. Please fully describe the solution to the problem posed in the question.user40980– user409802015年05月20日 00:14:38 +00:00Commented May 20, 2015 at 0:14
Explore related questions
See similar questions with these tags.
Void
to a function that takes input.