Converting an array-returning function to a lazy evaluator

Question 1

Take a look at the following function:

std::vector<double> get_data(
 std::vector<double> const &data,
 double const t,
 double const x)
{
 std::vector<double> result(data.size(), 0.0);
 double const
 A = func(t, x),
 B = func2(t, x);
 std::transform(std::begin(data) + 1, std::end(data), std::begin(result) + 1,
 [A, B](double const val){
 return A * val + B;
 });
 return result;
}

I am attempting to convert the function above into some sort of structure that computes the values of the result array lazily. First, I will present the code, then describe some rationale.

class data_obj {
 std::vector<double>
 m_data;
 double
 m_A , m_B;
public:
 data_obj(
 std::vector<double> data,
 double const t,
 double const x)
 : m_data(data)
 {
 set_values(t, x);
 }
 void set_values(double const t, double const x)
 {
 m_A = func(t, x);
 m_B = func2(t, x);
 }
 friend class obj_iterator;
};
class obj_iterator {
 data_obj const
 &m_obj;
 std::vector<double>::const_iterator
 m_data_iter, m_data_end;
 double
 m_val;
public:
 obj_iterator(data_obj const &obj)
 : m_obj(obj)
 , m_data_iter(m_obj.m_data.begin())
 , m_data_end(m_obj.m_data.end())
 , m_val(0) { }
 bool has_next() const {
 return m_data_iter != m_data_end;
 }
 void next() {
 ++m_data_begin;
 m_val = *m_data_iter * m_obj.m_A + m_obj.m_B;
 }
 double val() const {
 return m_val;
 }
};

Here are some points to explain the rationale/requirements:

t and x are independent parameters in an optimizer, hence the obj_iterator::set_values() method. On the other hand, data_obj::m_data is constant throughout.
I chose a reference member in obj_iterator::m_data because I want the lifetime of the iterator to be dependent on the lifetime of the object holding the data, as mentioned here.

Finally, here's an example usage of the classes created:

data_obj res_obj(src_data, t, x);
for(obj_iterator res_iter(res_obj); res_iter.has_next(); res_iter.next()) {
 std::cout << res_iter.val() << ", ";
}

Is my rationale flawed? How would you improve this implementation? As someone who programs in python, this could be easily achieved with generator functions.

Question 2

You've done a lot of things well. Going the route of an iterator was a good choice, and you got the small details of const correctness and references and whatnot correct. There are, however, a few things I think can improved.

Naming

Your names are very undescriptive. In fact, it took me far longer than it should have to realize it's just modelling a linear function over a certain domain. Something like apply_linear for the returning version might be good, and something like linear_iterator might be good for the other one. Also, data_obj is very vague name. Pretty much all objects contain data of some kind. I might call it something like linear_function or something.

Structure

Something about your structure is a little off. I think instead of having obj_iterator take a data_obj for construction, I would have data_obj have begin() and end() methods that return a beginning iterator and an ending one. That way, you don't have to get into any friend stuff (right now data_obj is pretty useless to anything except the iterator since its members are private).

I'm also a bit uncomfortable with func and func2 being hardcoded. What if you want to change them? What if you want them to not be applied? It has essentially tightly coupled data_obj to func and func2, and worse yet, there's not really any reason for it. You can just have A and B be accepted for the values, and then you can apply func and func2 if you want.

That's essentially tightly coupling the linear function to those two functions. Further more, the coupling is super easy to get rid of: instead of taking t and x and applying func and func2, just accept a value that won't be further transformed.

In essence, I'm essentially suggesting that you generalize your thinking a bit farther. Currently you're modeling y(x) = m(t, z) * x + b(t, z). The problem is that m and b are constants, so there's no point in modeling them as part of the function. Instead, you can just model a typical linear function y(x) = m*x + b and take in m and b as constants (m = m(t, z), b = b(t, z)).

Idiomatic Iterator

Like I said earlier, it's good that you went with iterators for this. It's the best (built in) way in C++ to achieve a generator. It would have been a bit nicer if you'd done it more idiomatically though. In C++, iterators are used as pairs of iterators with a currently iterated iterator and an ending iterator. Also, operator++, operator* and operator== (or operator!=) are typically used instead of methods like next(), has_next(), etc.

auto fn = linear_function(2.5, 37.2, {...});
for (auto it = fn.begin(), end = fn.end(); it != end; ++it) {
 std::cout << *it << "\n";
}

A nice side effect of this is that you actually get rewarded for being idiomatic. The standard library becomes a lot more compatible with your stuff, and you can get range based for loops as a happy side effect:

// Wooo standard library!
auto fn = linear_function(2.5, 37.2, {...});
auto sum = std::accumulate(std::begin(fn), std::end(fn), 0.0);
// Wooo pretty loops!
for (auto y : linear_function(2.5, 37.2, {...})) {
 std::cout << *it << "\n";
}

Minor things

These are minor enough that I don't think any of them deserve their own section.

I would consider having data_obj work on a pair of iterators instead of being coupled to vector (what if you want to use it on an array, std::array, std::set, etc?).
I would considering make data_obj immutable.
data should be std::move'd m_data to ensure an extra copy is avoided.

Style

This is all subjective, of course, but:

I'm not a fan of the m_ prefix. I prefer either plain member names (e.g. data), or member names with a trailing underscore if method/member name conflicts are expected (e.g. data_). The m_ prefix is still relatively common, but it seems to be dying off, and personally, I can't wait for it to finish dying.
When function parameters can reasonably fit on one line, they should be on one line.
Indented variables per type is very non-standard. Standard is to have each variable on its own line with its own type declaration.

Putting it together

An example can probably illustrate much more effectively than my rambling words:

// Models a linear function
class linear_function {
private:
 const double m;
 const double b;
public:
 // Construct the function from slop intercept representation
 linear_function(double m, double b) : m(m), b(b)
 { }
 // Evaluate the linear function for a specific value.
 double operator()(double x) const { return m * x + b; }
 bool operator==(const linear_function& other) const {
 return m == other.m && b == other.b;
 }
};
// Iterates over a given domain for a linear function.
template<typename Iter>
class linear_function_iterator { // You'd want to extend or mimic std::iterator in a library-esque implementation
private:
 const linear_function fn_;
 Iter iter_;
 const Iter end_;
public:
 // Whether you want to take a single iterator or a pair is going to depend on typical usecase. Taking
 // a pair makes use a bit safer though since you can do checking to ensure proper use.
 linear_function_iterator(linear_function fn, Iter iter, Iter end)
 : fn_(std::move(fn)), iter_(iter), end_(end)
 { }
 double operator*() const {
 assert(iter_ != end_); // Example of using two iterators instead of one to provide additional error checking.
 return fn_(*iter_);
 }
 linear_function_iterator& operator++() {
 assert(iter_ != end_);
 ++iter_;
 return *this;
 }
 bool operator==(const linear_function_iterator& other) const {
 return fn_ == other.fn_ && iter_ == other.iter_;
 }
 bool operator!=(const linear_function_iterator& other) const {
 return !operator==(other);
 }
};
// Convenience class for a linear function over a given domain.
// Provides both an upfront approach or generator-like.
template<class Iter>
class fixed_linear_function { // This seems like a terrible name, but I'm blanking on anything better.
private:
 linear_function fn_;
 const Iter begin_;
 const Iter end_;
public:
 using iterator = linear_function_iterator<Iter>;
 fixed_linear_function(linear_function fn, Iter domain_begin, Iter domain_end)
 : fn_(std::move(fn)), begin_(domain_begin), end_(domain_end) 
 {}
 std::vector<double> evaluate() const {
 std::vector<double> results;
 std::copy(begin(), end(), std::back_inserter(results));
 return results;
 }
 iterator begin() {
 return iterator(fn_, begin_, end_);
 }
 iterator end() {
 return iterator(fn_, end_, end_);
 }
};
// Because I'm lazy
template<typename Iter>
fixed_linear_function<Iter> make_fixed_linear_function(double m, double b, Iter beg, Iter end) {
 return fixed_linear_function<Iter>(linear_function(m, b), beg, end);
}
// Soooooo lazy
template<typename Container>
auto make_fixed_linear_function(double m, double b, const Container& c) -> fixed_linear_function<decltype(std::begin(c))> {
 return make_fixed_linear_function(m, b, std::begin(c), std::end(c));
}

This brings up an interesting note: generality and idiomaticness in C++ often come at a high cost, but also give high reward with regards to reusability and interoperability. This is significantly longer than your code, but it can be used with the standard library, and it's more flexible. In fact, if you were actually going to implement the idiomatic C++ approach like this, you could actually very easily take it one step farther and arrive at a generic generator. All that would be required would be renaming linear_function_iterator to generator, templating out linear_function to be any function (and also likely removing the end iterator since generators typically support infinite iteration), and deducing the return type of operator* instead of assuming its double.

Note: This is strictly an example implementation. Operators are only partially defined, template traits aren't present, and overall, this is missing a lot of the corner cases/optimizations/etc that would make it applicable for real use.

Example usage:

int main() {
 std::vector<double> values{1.0, 3.5, 27.2};
 for (auto y : make_fixed_linear_function(3, 2.5, values)) {
 std::cout << y << '\n';
 }
 auto fn = make_fixed_linear_function(3, 2.5, values);
 std::cout << std::accumulate(std::begin(fn), std::end(fn), 0) << '\n';
}

Generic generator

For toy, or small applications, it's not really going to matter, but if you find yourself doing this a lot in a real application and not some kind of school assignment or throwaway program, you'd likely want to find a well known and highly regarded generator implementation (I think Boost has a generator). It would save you time, it'd be more likely to be correct, and it'd likely be better designed than anything you or I can think of.

Question 3

Thank you very much for this answer, the first thing I tried was the idiomatic approach, but I didn't succeed, so that was quite helpful. Also, I am incredibly fond of the idea of using iterator members in my data_obj (or equivalent), but I wasn't sure if that was safe (I imagined that values might go out of scope before the instance of fixed_linear_function). Anyway, very useful overall, thanks.

Question 4

Having studied your answer a little bit further, I'd like to ask one more question: the begin and end implementations in your convenience class pointed out a potential problem to me: they return two different linear_function_iterator<Iter> objects that hold duplicate information: the instance of the linear_function instance and the end iterator of the container which holds the values. Isn't that redundant? Is there a better way to implement it? Or is this the best way to be idiomatic?

Question 5

@Nasser The only reason the end is there so sanity checking can be done (presumably in debug builds). Really, that doesn't need to be there, and it's not the typical way to do iterators (though it is the typical way to do 'safe' iterators). As for the replicated linear_function, that could be omitted from the end with a bit of cleverness, but considering how cheap it is (2 doubles), I figured it was easier to just duplicate it.

Corbin Corbin 10.6k2 gold badges30 silver badges51 bronze badges · Accepted Answer · 2015-02-15 22:21:45Z

You've done a lot of things well. Going the route of an iterator was a good choice, and you got the small details of const correctness and references and whatnot correct. There are, however, a few things I think can improved.

Naming

Your names are very undescriptive. In fact, it took me far longer than it should have to realize it's just modelling a linear function over a certain domain. Something like apply_linear for the returning version might be good, and something like linear_iterator might be good for the other one. Also, data_obj is very vague name. Pretty much all objects contain data of some kind. I might call it something like linear_function or something.

Structure

Something about your structure is a little off. I think instead of having obj_iterator take a data_obj for construction, I would have data_obj have begin() and end() methods that return a beginning iterator and an ending one. That way, you don't have to get into any friend stuff (right now data_obj is pretty useless to anything except the iterator since its members are private).

I'm also a bit uncomfortable with func and func2 being hardcoded. What if you want to change them? What if you want them to not be applied? It has essentially tightly coupled data_obj to func and func2, and worse yet, there's not really any reason for it. You can just have A and B be accepted for the values, and then you can apply func and func2 if you want.

That's essentially tightly coupling the linear function to those two functions. Further more, the coupling is super easy to get rid of: instead of taking t and x and applying func and func2, just accept a value that won't be further transformed.

In essence, I'm essentially suggesting that you generalize your thinking a bit farther. Currently you're modeling y(x) = m(t, z) * x + b(t, z). The problem is that m and b are constants, so there's no point in modeling them as part of the function. Instead, you can just model a typical linear function y(x) = m*x + b and take in m and b as constants (m = m(t, z), b = b(t, z)).

Idiomatic Iterator

Like I said earlier, it's good that you went with iterators for this. It's the best (built in) way in C++ to achieve a generator. It would have been a bit nicer if you'd done it more idiomatically though. In C++, iterators are used as pairs of iterators with a currently iterated iterator and an ending iterator. Also, operator++, operator* and operator== (or operator!=) are typically used instead of methods like next(), has_next(), etc.

auto fn = linear_function(2.5, 37.2, {...});
for (auto it = fn.begin(), end = fn.end(); it != end; ++it) {
 std::cout << *it << "\n";
}

A nice side effect of this is that you actually get rewarded for being idiomatic. The standard library becomes a lot more compatible with your stuff, and you can get range based for loops as a happy side effect:

// Wooo standard library!
auto fn = linear_function(2.5, 37.2, {...});
auto sum = std::accumulate(std::begin(fn), std::end(fn), 0.0);
// Wooo pretty loops!
for (auto y : linear_function(2.5, 37.2, {...})) {
 std::cout << *it << "\n";
}

Minor things

These are minor enough that I don't think any of them deserve their own section.

I would consider having data_obj work on a pair of iterators instead of being coupled to vector (what if you want to use it on an array, std::array, std::set, etc?).
I would considering make data_obj immutable.
data should be std::move'd m_data to ensure an extra copy is avoided.

Style

This is all subjective, of course, but:

I'm not a fan of the m_ prefix. I prefer either plain member names (e.g. data), or member names with a trailing underscore if method/member name conflicts are expected (e.g. data_). The m_ prefix is still relatively common, but it seems to be dying off, and personally, I can't wait for it to finish dying.
When function parameters can reasonably fit on one line, they should be on one line.
Indented variables per type is very non-standard. Standard is to have each variable on its own line with its own type declaration.

Putting it together

An example can probably illustrate much more effectively than my rambling words:

// Models a linear function
class linear_function {
private:
 const double m;
 const double b;
public:
 // Construct the function from slop intercept representation
 linear_function(double m, double b) : m(m), b(b)
 { }
 // Evaluate the linear function for a specific value.
 double operator()(double x) const { return m * x + b; }
 bool operator==(const linear_function& other) const {
 return m == other.m && b == other.b;
 }
};
// Iterates over a given domain for a linear function.
template<typename Iter>
class linear_function_iterator { // You'd want to extend or mimic std::iterator in a library-esque implementation
private:
 const linear_function fn_;
 Iter iter_;
 const Iter end_;
public:
 // Whether you want to take a single iterator or a pair is going to depend on typical usecase. Taking
 // a pair makes use a bit safer though since you can do checking to ensure proper use.
 linear_function_iterator(linear_function fn, Iter iter, Iter end)
 : fn_(std::move(fn)), iter_(iter), end_(end)
 { }
 double operator*() const {
 assert(iter_ != end_); // Example of using two iterators instead of one to provide additional error checking.
 return fn_(*iter_);
 }
 linear_function_iterator& operator++() {
 assert(iter_ != end_);
 ++iter_;
 return *this;
 }
 bool operator==(const linear_function_iterator& other) const {
 return fn_ == other.fn_ && iter_ == other.iter_;
 }
 bool operator!=(const linear_function_iterator& other) const {
 return !operator==(other);
 }
};
// Convenience class for a linear function over a given domain.
// Provides both an upfront approach or generator-like.
template<class Iter>
class fixed_linear_function { // This seems like a terrible name, but I'm blanking on anything better.
private:
 linear_function fn_;
 const Iter begin_;
 const Iter end_;
public:
 using iterator = linear_function_iterator<Iter>;
 fixed_linear_function(linear_function fn, Iter domain_begin, Iter domain_end)
 : fn_(std::move(fn)), begin_(domain_begin), end_(domain_end) 
 {}
 std::vector<double> evaluate() const {
 std::vector<double> results;
 std::copy(begin(), end(), std::back_inserter(results));
 return results;
 }
 iterator begin() {
 return iterator(fn_, begin_, end_);
 }
 iterator end() {
 return iterator(fn_, end_, end_);
 }
};
// Because I'm lazy
template<typename Iter>
fixed_linear_function<Iter> make_fixed_linear_function(double m, double b, Iter beg, Iter end) {
 return fixed_linear_function<Iter>(linear_function(m, b), beg, end);
}
// Soooooo lazy
template<typename Container>
auto make_fixed_linear_function(double m, double b, const Container& c) -> fixed_linear_function<decltype(std::begin(c))> {
 return make_fixed_linear_function(m, b, std::begin(c), std::end(c));
}

This brings up an interesting note: generality and idiomaticness in C++ often come at a high cost, but also give high reward with regards to reusability and interoperability. This is significantly longer than your code, but it can be used with the standard library, and it's more flexible. In fact, if you were actually going to implement the idiomatic C++ approach like this, you could actually very easily take it one step farther and arrive at a generic generator. All that would be required would be renaming linear_function_iterator to generator, templating out linear_function to be any function (and also likely removing the end iterator since generators typically support infinite iteration), and deducing the return type of operator* instead of assuming its double.

Note: This is strictly an example implementation. Operators are only partially defined, template traits aren't present, and overall, this is missing a lot of the corner cases/optimizations/etc that would make it applicable for real use.

Example usage:

int main() {
 std::vector<double> values{1.0, 3.5, 27.2};
 for (auto y : make_fixed_linear_function(3, 2.5, values)) {
 std::cout << y << '\n';
 }
 auto fn = make_fixed_linear_function(3, 2.5, values);
 std::cout << std::accumulate(std::begin(fn), std::end(fn), 0) << '\n';
}

Generic generator

For toy, or small applications, it's not really going to matter, but if you find yourself doing this a lot in a real application and not some kind of school assignment or throwaway program, you'd likely want to find a well known and highly regarded generator implementation (I think Boost has a generator). It would save you time, it'd be more likely to be correct, and it'd likely be better designed than anything you or I can think of.

Thank you very much for this answer, the first thing I tried was the idiomatic approach, but I didn't succeed, so that was quite helpful. Also, I am incredibly fond of the idea of using iterator members in my data_obj (or equivalent), but I wasn't sure if that was safe (I imagined that values might go out of scope before the instance of fixed_linear_function). Anyway, very useful overall, thanks.
Having studied your answer a little bit further, I'd like to ask one more question: the begin and end implementations in your convenience class pointed out a potential problem to me: they return two different linear_function_iterator<Iter> objects that hold duplicate information: the instance of the linear_function instance and the end iterator of the container which holds the values. Isn't that redundant? Is there a better way to implement it? Or is this the best way to be idiomatic?
@Nasser The only reason the end is there so sanity checking can be done (presumably in debug builds). Really, that doesn't need to be there, and it's not the typical way to do iterators (though it is the typical way to do 'safe' iterators). As for the replicated linear_function, that could be omitted from the end with a bit of cleverness, but considering how cheap it is (2 doubles), I figured it was easier to just duplicate it.

Stack Exchange Network

Converting an array-returning function to a lazy evaluator

1 Answer 1

Naming

Structure

Idiomatic Iterator

Minor things

Style

Putting it together

Generic generator

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Converting an array-returning function to a lazy evaluator

1 Answer 1

Naming

Structure

Idiomatic Iterator

Minor things

Style

Putting it together

Generic generator

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions