I've been out of the C++ game for a few years and I'm trying to knock the rust off by implementing a programming language. This Source
interface is used to read characters from a file, standard input, etc. Includes and outer namespace are omitted below.
source.h
:
class Source {
public:
virtual ~Source() = default;
virtual char32_t advance() = 0;
virtual std::optional<char32_t> peek(std::size_t lookahead = 0) = 0;
virtual bool at_end() const = 0;
};
std::unique_ptr<Source> make_source(std::basic_istream<char32_t>& in);
std::unique_ptr<Source> make_source(std::u32string in);
source.cc
:
using std::cbegin;
using std::cend;
namespace {
template <typename It, std::sentinel_for<It> Sen = It>
class IteratorSource : public Source {
public:
IteratorSource(It begin, Sen end) : container_(), in_(begin), end_(end) {}
template <typename T>
explicit IteratorSource(T&& container)
: container_(std::forward<T>(container)),
in_(cbegin(any_cast<T&>(container_))),
end_(cend(any_cast<T&>(container_))) {}
~IteratorSource() override = default;
char32_t advance() override {
if (empty(lookahead_buffer_)) {
return *in_++;
}
char32_t c = lookahead_buffer_.front();
lookahead_buffer_.pop_front();
return c;
}
std::optional<char32_t> peek(size_t lookahead) override {
while (size(lookahead_buffer_) <= lookahead && in_ != end_) {
lookahead_buffer_.push_back(*in_++);
}
if (size(lookahead_buffer_) <= lookahead) {
return {};
}
return lookahead_buffer_[lookahead];
}
bool at_end() const override {
return empty(lookahead_buffer_) && in_ == end_;
}
private:
std::any container_;
It in_;
Sen end_;
std::deque<char32_t> lookahead_buffer_;
};
} // namespace
std::unique_ptr<Source> make_source(std::basic_istream<char32_t>& in) {
return std::make_unique<IteratorSource<std::istreambuf_iterator<char32_t>>>(
std::istreambuf_iterator<char32_t>{in},
std::istreambuf_iterator<char32_t>{});
}
std::unique_ptr<Source> make_source(std::u32string in) {
return std::make_unique<IteratorSource<std::u32string::const_iterator>>(
std::move(in));
}
1 Answer 1
Using std::any
to maybe store the underlying data is a dangerous inconsistency. If we call the second constructor, we get an object that owns the data source. If we call the first constructor, we get an object that doesn't own its data source. This is unlikely to be expected by the user, and may result in bugs with object lifetimes.
It's much safer to make the IteratorSource
always owning, or always non-owning (presumably the latter).
So I'd suggest just deleting the container_
member.
advance
should probably assert
that in != end
before dereferencing the iterator.
-
\$\begingroup\$ Note that
IteratorSource
is hidden in the unnamed namespace of thesource.cc
file, so the user is not exposed to the "maybe I own it, maybe I don't" issue. \$\endgroup\$ruds– ruds2022年09月06日 17:09:25 +00:00Commented Sep 6, 2022 at 17:09