Design - Parser.hasInfo(MyClass) vs MyClass.hasInfo()

Question 1

I'm working on a document processing system.

I feel confident with a Document class which represents each document being processed.

The issue:

Each Document can have a CoverSheet, and if it does, we need to get CoverSheetInfo from this CoverSheet (for renaming and processing). But checking for a CoverSheet and coercing the info into CoverSheetInfo involves a fair amount of Apache PDFBox code.

I'm trying to decide on the best place to have this functionality.

Option 1

Document class will have these methods:

public boolean hasCoverSheet()

public CoverSheetInfo getCoverSheetInfo()

Pros:

Behavior is close to the data -- the process of checking a Document for a CoverSheet takes place in Document which seems sensible.

Cons:

This adds a lot of PDFBox - related parsing lines which make the other-wise simple set-get Document look cluttered and makes the Document class exceed 300 lines to include this functionality. Thus Option 2...

Option 2

Create a DocumentParser class which would have:

public boolean hasCoverSheet(Document document)

public CoverSheetInfo getCoverSheetInfo(Document document)

Pros:

All the PDFBox - specific parsing code is in it's own Class. I think this is a good example of enforcing Single Responsibility/Law of Demeter As I don't think Document should necessarily know how to parse information from cover sheets.

Cons:

Awkward(?) separation of behavior from data(?)

Which one seems most reasonable and how so?

Edit: I'm desperate. Any feed back would be absolutely. fricken. loved.

Edit 2

A Document is in this case a scanned mortgage document, and it will always be a PDF. A Document is created when my app finds files in a directory (one Document is made for each file found). DocumentParser should process Documents, right, File was a typo.

At this point, Document is just a wrapper around the File essentially. In Option 1, it would have CoverSheetInfo as a field and File stubFile as well as boolean regarding the existence of these things.

Here's the "story" for what I'm doing:

Someone will scan a document. It will end up in a directory. My app needs to look at that directory, and

rename the files by their cover sheet (if they have one)
make a stub out of the first 8 pages (if the file is very large)
Upload these files (and any stubs made) to Google Drive.

Question 2

Is checking a Document for the existence of a CoverSheet an expensive operation? Or is just retrieving the full CoverSheetInfo the expensive part?

Question 3

Does this breakdown make any sense at all? dropbox.com/s/zyfnmh1yva56o2t/mmcrae-classes.txt?dl=0 (I'm primarily a C# & JavaScript guy, so that tentative design might not work at all in the Java world... if it even makes sense in my .NET world...)

Question 4

@TroyGizzi that's actually super similar to what I had going on in the Option 1 situation, but your idea of private constructor for CoverSheetInfo and StubInfo are different/interesting. One thing on my mind about this though: I feel like it's a bad(ish) thing to have references to specific classes (CoverSheetInfo and StubInfo in this case) set in Document's class i.e. I feel it's best to have those things passed into a constructor -- at least that's something discussed in Dependency Injection but maybe that doesn't quite apply here... Thanks a bunch for your feedback

Question 5

Ah, DI, now I get what you meant by your "CoverSheet and Stub are not part of the constructor" comment in your other post. In C#, we often pass Interfaces as constructor parameters to provide DI. Makes sense that would probably be a common approach in Java as well. -- And shame on me for not getting what you meant right away, especially since I suggested a DI solution to another recent question here. :-)

Question 6

I think answering your question requires a little bit more information. What is a Document and what kind of information it holds? Are all Documents PDF documents? How a Document object gets created? Why the DocumentParser processes Files, not Documents? (Option 2 seems a safer choice. At least it decouples a third party library from an entity.)

Question 7

As far as I understood, the cover page is an important property of a document in your application. Then I'd represent it with a getter. The business classes dealing with renaming the document then only need the Document object and have no dependency to a parser which they do not care about.

How that getter is implemented is a different question. You could for example provide a mock implementation that provides a pre-defined cover page for unit testing. To solve the problem of a long class with multiple responsibilities, you could extract the parsing code to a separate class that is used by the real Document implementation.

To go one step further, the parser instance could be provided to the Document constructor. That's called dependency injection and decoples the parser from the document, so that other parsers could be used. For exapmle, the unit test of Document can use a mock implementation of the parser. There are frameworks like weld which essentially provide a factory to create classes without knowing the exact dependency (i.e. the parser).

Question 8

I know it's been a while... but I came back to this issue again today and so I checked here. I appreciate / understand your first paragraph. That makes sense to me... a Coversheet is an important part of a Document. But you lost me in the second paragraph... are you suggesting I pass in a Parser object to Document constructor, and then use that Parser in Document.get...? If I'm passing a Parser it makes me think I might as well use the Parser. Does your 2nd paragraph contradict your first?

Question 9

@mmcrae: Yes, that's called dependency injection. Note that only the one who creates a document has to know the parser, not the one using it. But you you don't need to use DI if you think it's overkill - my suggestions on the model go on their own. I extracted the DI part into a new paragraph. Did that make it clear?

Question 10

Why not have a separate CoverSheet object? Then you can cleanly do

CoverSheetInfo coverInfo = null;
if(theDocument.hasCoverSheet()) {
 CoverSheet cover = theDocument.getCoverSheet();
 coverInfo = cover.getInfo();
}

Sorry, I'm not familiar with PDFBox, so I don't know if this introduces more complexity than it solves, but (IMHO) it's a much cleaner design.

Yogu Yogu 2701 silver badge10 bronze badges · Accepted Answer · 2014-10-14 15:53:06Z

As far as I understood, the cover page is an important property of a document in your application. Then I'd represent it with a getter. The business classes dealing with renaming the document then only need the Document object and have no dependency to a parser which they do not care about.

How that getter is implemented is a different question. You could for example provide a mock implementation that provides a pre-defined cover page for unit testing. To solve the problem of a long class with multiple responsibilities, you could extract the parsing code to a separate class that is used by the real Document implementation.

To go one step further, the parser instance could be provided to the Document constructor. That's called dependency injection and decoples the parser from the document, so that other parsers could be used. For exapmle, the unit test of Document can use a mock implementation of the parser. There are frameworks like weld which essentially provide a factory to create classes without knowing the exact dependency (i.e. the parser).

I know it's been a while... but I came back to this issue again today and so I checked here. I appreciate / understand your first paragraph. That makes sense to me... a Coversheet is an important part of a Document. But you lost me in the second paragraph... are you suggesting I pass in a Parser object to Document constructor, and then use that Parser in Document.get...? If I'm passing a Parser it makes me think I might as well use the Parser. Does your 2nd paragraph contradict your first?
@mmcrae: Yes, that's called dependency injection. Note that only the one who creates a document has to know the parser, not the one using it. But you you don't need to use DI if you think it's overkill - my suggestions on the model go on their own. I extracted the DI part into a new paragraph. Did that make it clear?

Stack Exchange Network

Design - Parser.hasInfo(MyClass) vs MyClass.hasInfo()

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Design - Parser.hasInfo(MyClass) vs MyClass.hasInfo()

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions