I have to implement a class with method that takes document:
struct TDocument { string url; // document identifier uint64_t pubDate; uint64_t fetchTime; string text; uint64_t firstFetchTime = 0; };
The method should return document with updated fields according to the following rules:
text
andfetchTime
must be the same as the document with maxfetchTime
(take into account only documents with the same URL)pubDate
must be the same as thepubDate
in document with minfetchTime
(take into account only documents with the same URL)firstFetchTime
must be the same as thefetchTime
in the document with minfetchTime
(take into account only documents with the same URL)Example:
Input:
url="url1", pubDate=1, fetchTime=1, text="text1"
url="url1", pubDate=2, fetchTime=3, text="text2"Output:
url="url1", pubDate=1, fetchTime=1, text="text1", firstFetchTime=1 // for the first input line return the same document with updated firstFetchTime url="url1", pubDate=1, fetchTime=2, text="text2", firstFetchTime=1 // for the second input line update pubDate is 1 (because the should take min value among documents with the same URL
My idea: is to store minFetchTime and maxFetchTime among documents with the same URL
Could you please review my solution?
Are there similar tasks on the LeetCode.com?
Code
struct TDocument {
string url;
uint64_t pubDate; // pubdate of the minFetchTime
uint64_t fetchTime; // maxFetchTime
string text; // maxFetchTime
uint64_t firstFetchTime = 0; // minFetchTime
};
struct UtilityDocument {
UtilityDocument(string url_, uint64_t pubDate_, string text_, uint64_t firstFetchTime_, uint64_t maxFetchTime_, uint64_t minFetchTime_)
: url(url_), pubDate(pubDate_), text(text_), firstFetchTime(firstFetchTime_), maxFetchTime(maxFetchTime_), minFetchTime(minFetchTime_) {}
public:
string url;
uint64_t pubDate;
string text;
uint64_t firstFetchTime;
uint64_t maxFetchTime;
uint64_t minFetchTime;
};
class TProcessor {
public:
std::shared_ptr<TDocument> Process(std::shared_ptr<TDocument> input) {
if (docs_.contains(input->url)) {
if (input->fetchTime > docs_[input->url]->maxFetchTime) {
docs_[input->url]->maxFetchTime = input->fetchTime;
docs_[input->url]->text = input->text;
input->pubDate = docs_[input->url]->pubDate;
input->firstFetchTime = docs_[input->url]->minFetchTime;
}
if (input->fetchTime < docs_[input->url]->minFetchTime) {
docs_[input->url]->minFetchTime = input->fetchTime;
docs_[input->url]->pubDate = input->pubDate;
input->text = docs_[input->url]->text;
input->fetchTime = docs_[input->url]->maxFetchTime;
}
} else {
input->firstFetchTime = input->fetchTime;
docs_[input->url] = shared_ptr<UtilityDocument>(new UtilityDocument(
input->url,
input->pubDate,
input->text,
input->firstFetchTime,
input->fetchTime, // maxFetchTime
input->fetchTime // minFetchTime
));
}
return input;
};
private:
unordered_map<string, std::shared_ptr<UtilityDocument>> docs_;
};
2 Answers 2
There should be some documentation, I think at least of TDocument
and TProcessor.Process()
.
It looks tempting to have something modelling what's common to TDocument
and whatever models info about all same URL TDocument
s for Process()
- I don't think the latter is a/one document and name it differently.
Without any methods in TDocument
, the question of is-a seems moot,
but seeing no use of UtilityDocument.firstFetchTime
, I'm unconvinced derivation helps.
Far as I remember, specifying visibility public
is pointless in a struct
without preceding restriction.
In Process()
, all those repetitions of docs_[input->url]
are too verbose for my taste.
As 37307554 points out, not setting the doc's pubDate
and firstFetchTime
looks flaky, I'll extend that to text
and fetchTime
.
/*** a TDocument keeps URL, text, publication date and time fetched
* and in addition the time this URL was first fetched _when Process()ed_ */
struct TDocument {
string url; // document identifier
uint64_t pubDate; // processed: of doc with min fetchTime
uint64_t fetchTime; // processed: of doc with max fetchTime
string text; // processed: of doc with max fetchTime
uint64_t firstFetchTime = 0; // processed: of doc with min fetchTime
};
struct URLInfo { // for documents with same URL
URLInfo(string url_, string text_,
uint64_t pubDate_, uint64_t maxFetchTime_, uint64_t minFetchTime_)
: url(url_), text(text_),
pubDate(pubDate_), maxFetchTime(maxFetchTime_), minFetchTime(minFetchTime_)
{}
string url;
string text;
uint64_t pubDate;
uint64_t maxFetchTime;
uint64_t minFetchTime;
};
class TProcessor {
public:
/** URL info instantiated or updated as necessary;
* text and fetchTime are set to the URL's text & max fetchTime;
* pubDate & firstFetchTime to the URL's pubDate & min fetchTime. */
std::shared_ptr<TDocument> Process(std::shared_ptr<TDocument> input) {
auto found = docs_.find(input->url);
if (found != docs_.end()) {
// assert(found->minFetchTime <= found->maxFetchTime) ?
if (input->fetchTime > found->maxFetchTime) {
found->maxFetchTime = input->fetchTime;
found->text = input->text;
} else if (input->fetchTime < found->minFetchTime) {
found->minFetchTime = input->fetchTime;
found->pubDate = input->pubDate;
} // XXX what if ==?
input->pubDate = found->pubDate;
input->fetchTime = found->maxFetchTime;
input->text = found->text;
input->firstFetchTime = found->minFetchTime;
} else {
input->firstFetchTime = input->fetchTime;
docs_[input->url] = shared_ptr<URLInfo>(new URLInfo(
input->url,
input->text,
input->pubDate,
input->fetchTime, // maxFetchTime
input->fetchTime // minFetchTime
));
}
return input;
};
private:
unordered_map<string, std::shared_ptr<URLInfo>> docs_;
};
Using an else if
where comparing doc fetchTime
and URL max/minFetchTime
leads me to suspect a gap in the specification.
Caveat: I didn't code C++ in earnest for decades, no modern C++, obviously.
When a new document with the same URL but a different fetchTime
is processed in your process
method, you correctly update maxFetchTime
and minFetchTime
as needed. However, the logic to update pubDate
and firstFetchTime
in the input document could be more explicitly aligned with the rules. Specifically, pubDate
should always reflect the pubDate
of the document with the minimum fetchTime
, and firstFetchTime
should also be updated to reflect the minimum fetchTime.
You can optimize updating the input
document with values from docs_
and vice versa by determining whether an update is necessary before making any changes. You can do this by first checking if the current document's fetchTime
is indeed the new minimum or maximum for that URL before proceeding with updates.
Consider breaking down the Process
method into smaller helper functions for handling specific tasks, such as updating the document for a new maximum fetchTime
or a new minimum fetchTime
.
If and when TProcessor::Process
might be called from multiple threads, consider thread safety for your docs_
map as the current implementation does not appear to be thread-safe.
Consider adding error handling or validation for the input document, such as checking for null pointers or invalid data.