0

what is proper way to save all lines from text file to objects. I have .txt file something like this

0001Marcus Aurelius 20021122160 21311
0002William Shakespeare 19940822332 11092
0003Albert Camus 20010715180 01232

From this file I know position of each data that is written in file, and all data are formatted.

Line number is from 0 to 3
Book author is from 4 to 30
Publish date is from 31 to 37
Page num. is from 38 to 43
Book code is from 44 to 49

I made class Data which holds information about start, end position, value, error.

Then I made class Line that holds list of type Data, and list that holds all error founded from some line. After load data from line to object Data I loop through lineError and add errors from all line to list, because I need to save errors from each line to database.

My question is this proper way to save data from file to object and after processing same data saving to database, advice for some better approach?

public class Data
{
 public int startPosition = 0;
 public int endPosition = 0;
 public object value = null;
 public string fieldName = "";
 public Error error = null;
 public Data(int start, int end, string name)
 {
 this.startPosition = start;
 this.endPosition = end;
 this.fieldName = name;
 }
 public void SetValueFromLine(string line)
 {
 string valueFromLine = line.Substring(this.startPosition, this.endPosition - this.startPosition);
 // if else statment that checks validity of data (lenght, empty value) 
 this.value = valueFromLine;
 }
}
public class Line
{
 public List<Data> lineData = new List<Data>();
 public List<Error> lineError = new List<Error>();
 public Line()
 {
 AddObjectDataToList();
 }
 public void AddObjectDataToList()
 {
 lineData.Add(new Data(0, 3, "lineNumber"));
 lineData.Add(new Data(4, 30, "bookAuthor"));
 lineData.Add(new Data(31, 37, "publishData"));
 lineData.Add(new Data(38, 43, "pageNumber"));
 lineData.Add(new Data(44, 49, "bookCode"));
 }
 public void LoadLineDataToObjects(string line)
 {
 foreach(Data s in lineData)
 {
 s.SetValueFromLine(line);
 }
 }
 public void GetAllErrorFromData()
 {
 foreach (Data s in lineData)
 {
 if(s.error != null)
 {
 lineError.Add(s.error);
 }
 }
 }
}
public class File
{
 public string fileName;
 public List<Line> lines = new List<Line>();
}
asked Jan 20, 2018 at 18:36
7
  • 3
    You may want to research serialization - if it has been saved to a DB though why do you need the text form anymore? Please read How to Ask and take the tour Commented Jan 20, 2018 at 18:39
  • So your question is actually how to parse the text file to a database? Commented Jan 20, 2018 at 18:50
  • No my question is what is best approach to save data from file to objects, because after i have all lines from file saved to objects, I need to make some validation on data and it's easier to loop through all data from first line and check for example do I have author data in my base, book code etc. If some line do not have data from my database I need to skip saving that line in database. I do not have problem with saving data to database, that works fine. I only need advice is this model good for doing that thing saving data from one line to objects and checking if some of data exists. Commented Jan 20, 2018 at 19:00
  • Are you re-inventing msdn.microsoft.com/en-us/library/… Commented Jan 20, 2018 at 19:14
  • No, thx for this it seems interesting, my file is not csv. I do not have delimiter sign. As you can see in my example above I have line number and author name connected together. I only know where is position of every data from file and that position is constant. Commented Jan 20, 2018 at 19:21

1 Answer 1

0

I assume that the focus is on using OOP. I also assume that parsing is a secondary task and I will not consider options for its implementation.

First of all, it is necessary to determine the main acting object. Strange as it may seem, this is not a Book, but the string itself (e.g. DataLine). Initially, I wanted to create a Book from a string (through a separate constructor), but that would be a mistake.

What actions should be able to perform DataLine? - In fact, only one - process. I see two acceptable options for this method:

  1. process returns Book or throws exceptions. (Book process())

  2. process returns nothing, but interacts with another object. (void process(IResults result))

The first option has the following drawbacks:

  • It is difficult to test (although this applies to the second option). All validation is hidden inside DataLine.

  • It is impossible/difficult to return a few errors.

  • The program is aimed at working with incorrect data, so expected exceptions are often generated. This violates the ideology of exceptions. Also, there are small fears of slowing performance.

The second option is devoid of the last two drawbacks. IResults can contain methodserror(...), to return several errors, and success(Book book).

The testability of the process method can be significantly improved by adding IValidator. This object can be passed as a parameter to the DataLine constructor, but this is not entirely correct. First, this unnecessary expense of memory because it will not give us tangible benefits. Secondly, this does not correspond to the essence of the DataLine class. DataLine represents only a line that can be processed in one particular way. Thus, a good solution is the void process (IValidator validator, IResults result).

Summarize the above (may contain syntax errors):

interface IResults {
 void error (string message);
 void success (Book book);
}
interface IValidator {
 // just example
 bool checkBookCode (string bookCode);
}
class DataLine {
 private readonly string _rawData;
 // constructor
 /////////////////
 public void process (IValidator validator, IResults result) {
 // parse _rawData
 bool isValid = true; // just example! maybe better to add IResults.hasErrors ()
 if (! validator.checkBookCode (bookCode)) {
 result.error("Bad book code");
 isValid = false;
 }
 if (isValid) {
 result.success(new Book (...));
 // or even result.success (...); to avoid cohesion (coupling?) with the Book
 }
 }
}

The next step is to create a model of the file with the lines. Here again there are many options and nuances, but I would like to pay attention to IEnumerable<DataLine>. Ideally, we need to create a DataLines class that will support IEnumerable<DataLine> and load from a file or from IEnumerable<string>. However, this approach is relatively complex and redundant, it makes sense only in large projects. A much simpler version:

interface DataLinesProvider {
 IEnumerable <DataLine> Lines ();
}
class DataLinesFile implements DataLinesProvider {
 private readonly string _fileName;
 // constructor
 ////////////////////
 IEnumerable <DataLine> Lines () {
 // not sure that it's right
 return File
 . ReadAllLines (_fileName)
 .Select (x => new DataLine (x));
 }
}

You can infinitely improve the code, introduce new and new abstractions, but here you must start from common sense and a specific problem.

P. S. sorry for "strange" English. Google not always correctly translate such complex topics.

answered Jan 21, 2018 at 17:43

4 Comments

thanks for your advice, yes you are right focus is on using OOP. For validation I have two methods one for validating data before saving data in object (check if length of bookCode is correct, and check if on that position I have any data in some scenario i can have only whitespaces), and another validation when I have all data written in objects check if bookCode has data in database.
all data data written in objects - it's not OOP. If you creating Book than it's valid book. If it's not a valid book than create PossibleBook, that will check is it valid and only than create Book. Besides, why you can't check bookCode without creating Book?
Because I thought it will slow performance if I at the same time read data from file and check if I have some record in my database. As you said this may cause slowing performance sometimes I can get text file which has more then 10 000 lines in file or even more.
I said that unnecessary throwing of exceptions may cause performance issues. If you want load all data from file and validate it later than my code already done this (DataLine just store single line and validate it only on demand). If you working with really big files than you will need to break it into kinda chunks, maybe even add some bulk-validations-requests to DB.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.