I've been working on a Tokeniser/Lexer in Rust. I would like to have the tokeniser take input from different sources, such as files or in-memory strings. To abstract over this concern i've created a trait Source
. This has left me with a wonky situation where I appear to need a PhantomData
member in the token iterator. I've created a cut-down example here:
/// Source of Characters
///
/// In this example all that a source can do is be sliced to retrieve
/// a subsection of the overall character buffer.
trait Source<'a> {
/// Character at Offset
///
/// Gets the character at the given offset in the buffer and
/// returns it. If no character is available at that offset `None`
/// is returned.
fn at(&self, offset: usize) -> Option<(char, usize)>;
/// Slice the Source Buffer
fn slice(&self, start: usize, end: usize) -> &'a str;
}
/// Source of Characters from a `str` Slice
struct StringSource<'a> {
pub buff: &'a str
}
/// Implementation of the Source trait.
impl<'a> Source<'a> for StringSource<'a> {
fn at(&self, offset: usize) -> Option<(char, usize)> {
self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
}
fn slice(&self, start: usize, end: usize) -> &'a str {
&self.buff[start..end]
}
}
/// Token Iterator Implementation
///
/// This token iterator takes a given source and steps through it returning string slices for each token
struct TokenIter<'a, S>
where S: Source<'a>,
S: 'a
{
source: S,
idx: usize,
phantom: ::std::marker::PhantomData<&'a ()>,
}
impl<'a, S> TokenIter<'a, S>
where S: Source<'a>
{
/// Create a Token Iterator from a Source
fn new(source: S) -> Self {
TokenIter {
source: source,
idx: 0,
phantom: ::std::marker::PhantomData,
}
}
}
/// Token Iterators implement `Iterator`
impl<'a, S> Iterator for TokenIter<'a, S>
where S: Source<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item> {
let ts = self.idx;
self.source.at(ts).map(|(_ch, next)| {
self.idx = next;
// Imagine a regex state machine is run here to produce a token
// rather than just returning single-character tokens.
self.source.slice(ts, next)
})
}
}
fn main() {
let source = StringSource{ buff: "hello world" };
let iter = TokenIter::new(source);
println!("{:?}", iter.collect::<Vec<_>>());
}
Run it here: https://is.gd/OCL91u
Is there a nicer way to express the source buffer lifetime 'a
so that I don't need a PhantomData
member or this bit of funky lifetime constraints:
where S: Source<'a>,
S: 'a
1 Answer 1
such as files or in-memory strings
I'm going to give the advice that people dread: I don't think your abstraction is going to work here. When I first read "in-memory strings", I expected a String
, not a &str
. Since you mentioned a file, I think it's still a valid comparison. I don't believe you can implement this trait for such a type:
struct OwnedStringSource {
pub buff: String,
}
impl<'a> Source<'a> for OwnedStringSource {
fn at(&self, offset: usize) -> Option<(char, usize)> { None }
fn slice(&self, start: usize, end: usize) -> &'a str {
// Hmm.... what to put here?
}
}
That is, there's no way to say a reasonable lifetime for 'a
. I also think that it's the same root problem as Can I write an Iterator that yields a reference into itself?.
Aside from that, your original error was probably something like:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:117:6
|
117 | impl<'a, S> Iterator for TokenIter<S>
| ^^ unconstrained lifetime parameter
The best advice I've gotten about that specific error was helpful... after thinking about it for a while. Paraphrased and elided for this situation:
what [the error is] trying to tell you is that it cannot get [the generic type] back from either the implemented trait [...] or the type implemented on [...]. [The where clause] is not enough to extract [the generic] from [the type] because one [...] type can have multiple [...] impls with various arguments
Instead, I might suggest that you use an associated type instead of the generic parameter:
trait Source {
type Slice;
fn at(&self, offset: usize) -> Option<(char, usize)>;
fn slice(&self, start: usize, end: usize) -> Self::Slice;
}
This separates the lifetimes from the trait. Specific implementations can still participate in it:
impl<'a> Source for StringSource<'a> {
type Slice = &'a str;
// ...
}
And you can just bubble up the inner type out of the iterator:
impl<'a, S> Iterator for TokenIter<S>
where S: Source,
{
type Item = S::Slice;
// ...
}
You potentially might need to add extra bounds on that generic there (S::Slice: AsRef<str>
) depending on what you need to be able to do with the slice in the iterator implementation.
trait Source {
type Slice;
fn at(&self, offset: usize) -> Option<(char, usize)>;
fn slice(&self, start: usize, end: usize) -> Self::Slice;
}
struct StringSource<'a> {
pub buff: &'a str
}
impl<'a> Source for StringSource<'a> {
type Slice = &'a str;
fn at(&self, offset: usize) -> Option<(char, usize)> {
self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
}
fn slice(&self, start: usize, end: usize) -> &'a str {
&self.buff[start..end]
}
}
struct TokenIter<S> {
source: S,
idx: usize,
}
impl<S> TokenIter<S> {
fn new(source: S) -> Self {
TokenIter {
source: source,
idx: 0,
}
}
}
impl<S> Iterator for TokenIter<S>
where S: Source,
{
type Item = S::Slice;
fn next(&mut self) -> Option<Self::Item> {
let ts = self.idx;
self.source.at(ts).map(|(_ch, next)| {
self.idx = next;
self.source.slice(ts, next)
})
}
}
fn main() {
let source = StringSource{ buff: "hello world" };
let iter = TokenIter::new(source);
println!("{:?}", iter.collect::<Vec<_>>());
}
-
\$\begingroup\$ Thanks for this. I'd rather be told that it's the wrong abstraction by you now than find out further down the line. I knew I was dealing with the lifetime of the slice wrong; I just didn't know what the correct solution was. I think i'll go for a type parameter on the
Source
interface with theAsRef<str>
restriction on it. Tokens produced from that source can then use that same tactic to expose the captured lexemes to higher levels. \$\endgroup\$Will– Will2017年04月18日 06:35:26 +00:00Commented Apr 18, 2017 at 6:35