Dynamically apply methods/"rules" to documents
I hope someone here can help me/guide me in the right direction. I am currently creating a web application, where users can import a text file, and then programmaticaly apply different methods on the text.
Example
Imagine that an user have imported a text document, that looks like below.
Now as explained, I wish to allow my users to perform/apply a range of different methods to above text. They should be able to apply all rules, in any given order. Consider below example where I have perfomed 4 rules to the original text document:
As you can see, the text is transformed during each rule, as the method should be applied to the text and output the new text string.
Now the users should be able to save these rules, so the next time the user uploads a document to this specific stream. The thought is, that the next time the user uploads a document, these rules should automatically be applied for the document.
My question is, what would be the best approach to allowing my users to do this dynamically?
I will define the methods/rules that a user can perform on the text document - but what's the best approach:
- Save the rules to the database
- Programmatically apply the rules to each incoming document (parse each document, based on the rules)
My progress so far
So I am a bit lost on where to begin here, but I was thinking something like below.
Streams: A stream is kind of like a "stack" for all documents. I can upload multiple different documents to a stream. I can create multiple streams, which holds multiple rules.
streams
table
id | name
- Name: the name of the stream. For example "Documents from Acme Inc"
documents
table:
id | stream_id | path | content
- Stream Id: A
Stream
will be can have many documents. So each document uploaded to a specific stream, will be parsed by the rules defined on the stream. - Path: the server path to the document
- Content: The text content of the document
parsing_rules
table
id | stream_id | method | arguments
- Stream Id: Parsing rules will belong to a stream. So all documents imported into the stream, will be parsed by the rules associated with the specific stream.
- Rule: the name of the rule applied by the user. This will also refer to the method name in my PHP code.
- Arguments: Optional. The arguments that will be applied to each rule/method.
An example of the rules from the 2nd screenshot above, would then look like in the parsing_rules
table:
1 | 5 | remove_empty_lines | null
2 | 5 | text_replace | "a:2:{s:6:"Search";s:9:"Laracasts";s:7:"Replace";s:6:"Google";}"
3 | 5 | regex_text_replace | "a:2:{s:7:"Pattern";s:9:"/Google/i";s:11:"Replacement";s:6:"Amazon";}"
4 | 5 | start_position_no_lines | a:1:{s:4:"Line";s:1:"2";}"
So here, method
accepts the name of the actual method that should be called, and arguments
is the arguments the specific method accepts/requires - but serialized.
How to apply these rules?
I was thinking that each time a new document is uploaded/imported into a stream, I will apply the rules associated with the stream. Something like:
$content = $document->content;
$parsing_rules = $stream->parsingRules()->get();
foreach($parsing_rules as $rule)
{
$arguments = unserialize($rule->argments);
return $this->{$rule->method}($arguments, $content);
}
Now above is no where near perfect, and it will return the $content
already after the first iteration.
Any feedback is highly appreciated. Above is only my thoughts on how to do this project, but I am not sure if there is a better approach to solve this.
1 Answer 1
From my standpoint, we can improve your solution a little bit
Add the order column to the parsing_rules
table
Because the list of rules is an ordered list, we need to save the index of each stage to the database to make sure that we get the correct order of rules every time we retrieve them from database.
Move rules implementations to dedicated classes instead of methods
When you add more rules and your rules become more complex, putting all rules in one class can make the class hard to be maintained.
First, design a common interface that every rule class implements:
interface RuleInterface {
public function apply(string $text): string;
}
Now, each rule is represented as a concrete class that implement the RuleInterface
. If a rule requires some parameters, add these parameters to the constructors of the concrete classes. For examples, your last rules can be converted to a class:
class StartPositionRule implements RuleInterface {
private $startLine;
public function __construct(array $options) {
$this->startLine = $options['startLine'];
}
public function apply(string $text): string {
// apply this rule to the text and return the results
}
}
To save the rule to the database, save the class name to method
column and the $options
array to arguments
column:
$row->method = get_class($rule);
$row->arguments = serialize($options);
To retrieve the rule from the database, create new instance of of the rule class from the class name and the arguments:
$class = $row->method;
$arguments = unserialize($row->arguments);
$rule = new $class($arguments);
// add the unserialized rule to the list of before parsing text
$rules[] = $rule;
In your client class, rules is represented in a list of instances of RuleInterface
interface. The text can be transformed as:
$text = $originalText;
foreach($rules as $rule) {
$text = $rule->apply($text);
}
-
This makes sense. Really like the "order" column. Hadn’t thought about that. A follow up question: what if for example, an user deletes
rule #3
- how would that be reflected in the database? Because then rule 4 -> rule ->3oliverbj– oliverbj2019年03月20日 05:34:43 +00:00Commented Mar 20, 2019 at 5:34 -
When we update the list of rules, we should re-calculate the orders of rules and update them. I don't think we should allow user to delete a rules from database directly.Hieu Le– Hieu Le2019年03月20日 06:23:57 +00:00Commented Mar 20, 2019 at 6:23
-
But
parsing_rules
should be dynamic and users should be able to apply these to different documents. Further, each parsing rule will be unique to the specific streamoliverbj– oliverbj2019年03月20日 07:03:51 +00:00Commented Mar 20, 2019 at 7:03 -
For example, we have rule #1 (
order = 1
), rule #2 (order = 2
), rule #3 (order = 3
). When user deletes rule #2, we should update theorder
of other rules. So, theorder
of rule #3 will be set to2
before being saved to the database.Hieu Le– Hieu Le2019年03月20日 07:16:55 +00:00Commented Mar 20, 2019 at 7:16 -
can you please show an example on how to actually use the
interfaces
? Say I have created a class calledApplyParsingRules.php
, which will be called whenever a new text document is avaialble to be parsed (and rules are available). How can I dynamically apply all the rules from the database?oliverbj– oliverbj2019年03月20日 14:02:46 +00:00Commented Mar 20, 2019 at 14:02
Explore related questions
See similar questions with these tags.