1
\$\begingroup\$

Since I sanitize all the user-provided string before upload them to the DB, I wanted to give the users the possibility to format text as it happens here on Stack Exchange or on WhatsApp:

  • *word* -> bold
  • _word_ -> italic

This class contains two functions:

  • upload(): is called when the user upload a text and it replaces *word* with <b>word</b>, _word_ with <i>word</i> and \n with <br />
  • download(): is called when the user wants to modify the text and it does exactly the opposite; it replaces HTML tags with the custom signs * and _

My questions:

  1. Could be this code considered as a real class in the way of thinking or is it just procedural code put into a class?
  2. Would you improve it in any way?
  3. Do you have any suggestion to write it better?
class txtFormatting {
 private $text;
 function __construct($text)
 {
 $this->text = $text;
 }
 function upload() {
 $this->text = preg_replace('/[ \t]+/', ' ', $this->text); //transforms: 2/+ whitespaces -> 1 whitespace
 $this->text = nl2br($this->text); //transforms: \n -> <br />
 $this->text = preg_replace(array("/\r\n/", "/\n\r/", "/\n/", "/\r/"), '', $this->text); 
 $this->text = explode(' ', $this->text); //each word becomes a value
 $regexAY =
 [
 '/[*]{1}[a-zA-Z0-9]+[*]{1}/' =>
 [
 "pattern" => "*",
 "openTag" => "<b>",
 "closeTag" => "</b>"
 ],
 '/[_]{1}[a-zA-Z0-9]+[_]{1}/' =>
 [
 "pattern" => "_",
 "openTag" => "<i>",
 "closeTag" => "</i>"
 ]
 ];
 $newText = [];
 foreach ($this->text as $key => $word) {
 foreach ($regexAY as $regex => $value) {
 if (preg_match($regex, $word)) {
 $pattern = $regexAY[$regex]["pattern"];
 $openTag = $regexAY[$regex]["openTag"];
 $closeTag = $regexAY[$regex]["closeTag"];
 $word = preg_replace('/\\' .$pattern. '(.*?)\\' .$pattern. '/', $openTag. '1ドル' .$closeTag, $word); // /\*(.*?)\*/ OR /_(.*?)_/
 }
 }
 if ($word !== '') { array_push($newText, $word); }
 }
 return $this->text = implode(' ', $newText);
 }
 function download() {
 /*function br2nl() {
 return preg_replace('/\<br(\s*)?\/?\>/i', "\n", $this->text); // /\<br(\s*)?\/?\>/i
 }*/
 $this->text = preg_replace('/\<br(\s*)?\/?\>/i', "\n", $this->text);
 $this->text = explode(' ', $this->text);
 $regexAY =
 [
 '/<b>[a-zA-Z0-9]+<\/b>/' =>
 [
 "pattern" => ["/<b>/", "/<\/b>/"],
 "replacement" => "*"
 ],
 '/<i>[a-zA-Z0-9]+<\/i>/' =>
 [
 "pattern" => ["/<i>/", "/<\/i>/"],
 "replacement" => "_"
 ]
 ];
 $newText = [];
 foreach ($this->text as $key => $word) {
 foreach ($regexAY as $regex => $value) {
 if (preg_match($regex, $word)) {
 $word = preg_replace($regexAY[$regex]["pattern"], $regexAY[$regex]["replacement"], $word);
 }
 }
 if ($word !== '') { array_push($newText, $word); }
 }
 return $this->text = implode(' ', $newText);
 }
}
$text = " This _is_ _just_ _a test_
 *text*
 so _don't_
 consider it just *read*
 it";
$a = new txtFormatting($text);
echo $a->upload()."\n";
$text = "This <i>is</i> <i>just</i> _a test_<br /> <b>text</b><br /> so <i>don't</i><br /> consider it just <b>read</b><br /> it";
$b = new txtFormatting($text);
echo $b->download()."\n";
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jan 23, 2017 at 11:23
\$\endgroup\$
0

2 Answers 2

5
\$\begingroup\$
  • The names of the methods should reflect what they are doing, i.e. you should call it something like encode (instead of upload) and decode (instead of download).

  • You should not store encoded information the database. Consider, for example, if you want to be able to support other methods of output in the future (e.g. into a PDF or whatever), or if you want to change * to be rendered as something else. Then you would have all this HTML inserted in your data that you have to decode. Instead, you should upload the original unencoded data into the table, return the unencoded data when needed for editing, and encode the data only just before you require it in the encoded format. This way there is also no need for a decode (download) method. During user input, you should only have to make sure the data is valid according to business rules.

  • Based on the example, it seems there is no <i> if _ is around multiple words? (Sorry, I'm no regex expert.)

answered Jan 23, 2017 at 13:10
\$\endgroup\$
8
  • \$\begingroup\$ your note about the _ is completely correct, and also applies for * \$\endgroup\$ Commented Jan 23, 2017 at 13:13
  • \$\begingroup\$ Precious tips! I just didn't get what you mean with RAW data; or better, I already heard and read something about it but my researches didn't find anything usefull every time I did them. Can you briefly explain me what is it? (For the 3rd point: yes it's correct, at this stage I wanted my regex to allow text formatting just for single words) \$\endgroup\$ Commented Jan 23, 2017 at 13:52
  • \$\begingroup\$ @brigo Basically, in general, one has the actual, unencoded business data, the one that users input and edit. This is the only data that should be in databases, and what I called raw data here (which was probably not the most accurate term). After later loading this data from the database, one would transform (i.e. encode, escape, truncate, add formatting etc) this data depending on in what context the data would be used. (Though, the data should of course be validated against business and integrity rules before being inserted in the table, i.e. make sure it's not empty etc) \$\endgroup\$ Commented Jan 23, 2017 at 14:11
  • \$\begingroup\$ @JanErikGunnar Ok now I understood what you mean with raw data: it's the user-provided data "untouched" by any modification. In this case it would be, for example "Hi, I'm user n. 43240 and I like ice creams"; right? Is it not basically the same saving the data with the "more standardized" HTML tags, so I also wouldn't need any decoding process when displaying the description to the other users and I would need it only in the less common case in which the user n.4320 wants to modify it? (I'm really interested in the DB management so I really apprecciate to know your point of view) \$\endgroup\$ Commented Jan 23, 2017 at 15:32
  • \$\begingroup\$ I think there is still a little confusion :) Encode = replacing asterisks with HTML etc. You would encode when you need it to be HTML. When user is editing it, you send what is in the database without any encoding or decoding. You never decode it. \$\endgroup\$ Commented Jan 23, 2017 at 15:38
1
\$\begingroup\$

I just want to add that nl2br() doesn't "transform \n -> <br />".

According to the php manual nl2br()...

Returns string with <br /> or <br> inserted before all newlines (\r\n, \n\r, \n and \r).

This can be a concern if you are doing multiple edits, thinking you are "replacing", but you are actually "bloating".

After a few upload() calls, what started as new\nline might become new<br /><br /><br />\nline.

answered Oct 26, 2017 at 4:06
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.