Problem Statement
You are receiving n objects in a random order, and you need to print them to stdout correctly ordered by sequence number.
The sequence numbers start from 0 (zero) and you have to wait until you get a complete, unbroken sequence batch of j objects before you output them.
You have to process all objects without loss. The program should exit once it completes outputting the first 50000 objects Batch size j = 100
The object is defined as such:
{
"id" : "object_id", // object ID (string)
"seq" : 0, // object sequence number (int64, 0-49999)
"data" : "" // []bytes
}
Example Output Statement
Step Input Value Output State j = 1 Output state j = 3
0 6
1 0 0
2 4 0
3 2 0
4 1 0,1,2 0,1,2
5 3 0,1,2,3,4 0,1,2
6 9 0,1,2,3,4 0,1,2
7 5 0,1,2,3,4,5,6 0,1,2,3,4,5
Solution
func (receiver *Receiver) Print(seqNumber uint64, batchSize uint64, outputFile io.Writer) (error, bool) {
fmt.Fprintf(outputFile, "[ ")
if seqNumber >= receiver.outputSequence.length {
receiver.outputSequence.bufferSizeIncrease(seqNumber)
}
receiver.outputSequence.sequence[seqNumber] = true
printedCount := uint64(0) // check for MAX_OBJECTS_TO_PRINT
var nthBatchStartingIndex uint64
MaxObjectsToPrint := config.GetMaxPrintSize()
Loop:
for nthBatchStartingIndex < receiver.outputSequence.length { // check unbroken sequence
var assessIndex = nthBatchStartingIndex
for j := assessIndex; j < nthBatchStartingIndex+batchSize; j++ { // Assess nth batch
if j >= receiver.outputSequence.length { //index out of range - edge case
break Loop
}
if receiver.outputSequence.sequence[j] == false {
break Loop
}
}
count, printThresholdReached := receiver.printAssessedBatchIndexes(assessIndex, printedCount, batchSize, MaxObjectsToPrint, outputFile)
if printThresholdReached { // print sequence threshold reached MAX_OBJECTS_TO_PRINT
fmt.Fprintf(outputFile, " ] ")
fmt.Fprintf(outputFile, " ----for input value %d\n", seqNumber)
return nil, false
}
printedCount += count
if printedCount >= MaxObjectsToPrint { // print sequence threshold reached MAX_OBJECTS_TO_PRINT
fmt.Fprintf(outputFile, " ] ")
fmt.Fprintf(outputFile, " ----for input value %d\n", seqNumber)
receiver.Log.Printf("****MaxObjectsToPrint threshold(%d) reached \n", MaxObjectsToPrint)
return nil, false
}
nthBatchStartingIndex = assessIndex + batchSize // next batch
}
fmt.Fprintf(outputFile, " ] ")
fmt.Fprintf(outputFile, " ----for input value %d\n", seqNumber)
return nil, true
}
Here is the complete solution, written for this problem.
Print()
is the method that does heavy lifting in this code, with varying size of memory & heavy CPU usage:
How to make
receiver.outputSequence
memory effective by using datastructure other than array? becausenewBufferSize := 2 * seqNumber
is doubling memory...How to make
Print
method have effective CPU usage? On some goroutine
-
\$\begingroup\$ What does "batch size j=100" mean? Does it mean you will never receive out-of-sequence entries farther than 100? If so, you can do this with a fixed size array of 100. If that's not the case, you might want to try putting the elements in a map instead of an array. No matter the solution, you have to remember max(distance between out of sequence elements). With a slice, you have to move them as well. \$\endgroup\$Burak Serdar– Burak Serdar2020年07月04日 05:04:43 +00:00Commented Jul 4, 2020 at 5:04
-
\$\begingroup\$ @BurakSerdar batch size j = 3 mean {0,1,2} {3,4,5}....batch size j = 2 mean {0,1} {2,3} {4,5}... So, batch size j=100 mean, once we have 100 unbroken sequence {0.1.2.3.4.5,.....99} then display it \$\endgroup\$overexchange– overexchange2020年07月04日 05:56:52 +00:00Commented Jul 4, 2020 at 5:56
-
\$\begingroup\$ @BurakSerdar batch size j = 3 mean display unbroken sequence of set of 3 elements {0,1,2} {3,4,5}....batch size j = 2 mean display unbroken sequence of set of 2 elements {0,1} {2,3} {4,5}... So, batch size j=100 mean, once we have 100 unbroken sequence {0.1.2.3.4.5,.....99} then display it \$\endgroup\$overexchange– overexchange2020年07月04日 06:41:17 +00:00Commented Jul 4, 2020 at 6:41
-
\$\begingroup\$ I think you overcomplicated the problem by trying to keep things in a slice. If the seqnumber is int and no seqnumbers are skipped, you can simply keep a map keyed with seqnumber and the last item printed. Then when you get the next item in sequence, output the next items until you hit a gap. If you need to batch the output, you can batch it up after this stage. \$\endgroup\$Burak Serdar– Burak Serdar2020年07月04日 07:08:20 +00:00Commented Jul 4, 2020 at 7:08
-
\$\begingroup\$ @BurakSerdar sequence numbers are coming in random order. For me linkedlist makes more sense, because ordering is enforced \$\endgroup\$overexchange– overexchange2020年07月04日 08:01:11 +00:00Commented Jul 4, 2020 at 8:01
1 Answer 1
First, observe that if you receive an out-of-sequence object, you have to potentially keep all the objects from the latest printed sequence number to the received number. Worst case, you receive the object in reverse order and have to keep them all in memory. There is no way around this. So, what you can optimize is first, when you are printing objects you can find them quickly, and second, you don't want to move them around, which will happen if you use a slice and grow it as necessary.
So, a sketch of an algorithm that will use CPU and memory better than what you have is as follows:
var nextInSequence=0 // The next item you are expecting in the sequence
var storedObjects=map[int]SomeStruct{}
func Print(seqNumber int, obj SomeStruct) {
if seqNumber==nextInSequence {
output obj
nextInSequence++
} else {
storedObjects[seqNumber]=obj
}
for {
if stored, ok:=storedObjects[nextInSequence] ; ok {
output stored
delete(storedObjects,nextInSequence)
nextInSequence ++
} else {
break
}
}
}