7
\$\begingroup\$

I've attempted to solve a small coding challenge I found online as a nice way to begin learning Rust. The largest challenge I feel going in with Rust is how working with strings is so fundamentally different from any other programming language I've used before. The challenge was this:

Write a function that takes two strings, s1 and s2 and returns the longest common sequence of s1 and s2:

"ABAZDC", "BACBAD" => "ABAD"
"AGGTAB", "GXTKAYB" => "GTAB"
"aaaa", "aa" => "aa"
"ABBA", "ABJABA" => "ABBA"

The challenge seemed simple but the way I understood how I one could iterate though Rust strings proved to be very difficult. With numerous copies of iterators made and copies of Strings being made back and forth I am aware that this program is in its most naive form.

I wonder if collecting the string's characters into a vector would've been a better approach. I really want to find a more efficient method of solving this problem with Rust that'll be more comparable to C++ in terms of performance.

#[macro_use]
extern crate criterion;
use criterion::Criterion;
fn longest_con_seq(s1:&String, s2:&String) -> String {
 let mut max: Option<String> = None; // Holds value of string with maximum length
 let mut current = String::new(); // String container to hold current longest value
 let mut s1_iter = s1.chars().peekable(); // Peekable iterator for string s1
 let mut s2_iter = s2.chars(); //Iterator for string s2
 let mut s2_prev_pos = s2_iter.clone(); // Iterator that holds position of previous location of first iterator
 let mut s1_prev_pos = s1_iter.clone(); // Peekable iterator used to make sure all possible combinations are located.
 loop { 
 let s1_char = s1_iter.next(); // Get character in s1
 if current.len() == 0 // If no consequtive string found yet store location of iterator
 {
 s1_prev_pos = s1_iter.clone()
 }
 match s1_char{
 Some(s1_char)=>
 { 
 loop{ 
 match s2_iter.next()
 {
 Some(s2_char) if s1_char == s2_char => {
 current.push(s1_char);
 s2_prev_pos = s2_iter.clone();
 break;
 },
 Some(_)=>continue,
 None=>{
 s2_iter = s2_prev_pos.clone();
 break;
 },
 }
 }
 },
 None=>{
 match s1_prev_pos.peek()
 {
 Some(_) => {
 match max{
 Some(_) => {
 let max_str = max.clone();
 let max_str = max_str.unwrap();
 if max_str.len() < current.len(){
 max = Some(current.clone());
 }
 current.clear();
 },
 None => {
 max = Some(current.clone());
 current.clear();
 },
 }
 s1_iter = s1_prev_pos.clone();
 s2_iter = s2.chars();
 },
 None => break,
 }
 },
 }
 }
 if let Some(_) = max {
 return max.unwrap();
 } else {
 return String::from("");
 }
}
fn criterion_benchmark(c: &mut Criterion) {
 let s1 = "GXTKAYB".to_owned();
 let s2 = "AGGTAB".to_owned();
 c.bench_function("Benchmark", move |b| b.iter(|| longest_con_seq(&s1, &s2)));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

C++ comparison

I wrote a C++ version of the code that manipulates the strings through their indices. I though it would outperform the Rust code but it wasn't so.


#include <benchmark/benchmark.h>
#include <string>
std::string longest_con_seq(std::string &s1, std::string &s2)
{
 std::string max = "", current = "";
 size_t s2_old_idx = 0, s1_old_idx = 0;
 bool first_iter = true;
 for(size_t s1_idx = 0; s1_idx < s1.length(); ++s1_idx)
 { 
 if(first_iter)
 s1_old_idx = s1_idx;
 first_iter = false;
 for(size_t s2_idx = s2_old_idx; s2_idx < s2.length(); ++s2_idx)
 {
 if(s1[s1_idx] == s2[s2_idx])
 {
 s2_old_idx = s2_idx + 1;
 current += s1[s1_idx];
 break;
 }
 }
 if(s1_idx == s1.length()-1 && s1_old_idx != s1.length()-1)
 {
 s1_idx = s1_old_idx;
 s2_old_idx = 0;
 if(max.length() == 0 || max.length() < current.length())
 {
 max = current;
 }
 current.clear();
 first_iter = true;
 }
 else if( s1_old_idx == s1.length()-1 )
 {
 break;
 }
 }
 return max;
}
static void BM_LongestConSeq(benchmark::State& state) {
 std::string s1 = "GXTKAYB", s2 = "AGGTAB";
 for (auto _ : state)
 {
 longest_con_seq(s1,s2);
 }
}
// Register the function as a benchmark
BENCHMARK(BM_LongestConSeq);
BENCHMARK_MAIN();

Results

Rust

**time:** [96.755 ns **97.291 ns** 98.031 ns] 
**change:** [-2.8521% _-1.4304%_ -0.1568%] (p = 0.04 < 0.05) 
 Change within noise threshold. 
_Found 12 outliers among 100 measurements (12.00%)_ 
5 (5.00%) high mild 
7 (7.00%) high severe

C++

Benchmark
----------
Time: **152 ns** 
CPU: **152 ns** 
Iterations: **4471469**

I really don't know how to interpret the results, I used Google Benchmark for the C++ code and Criterion for the Rust code. It seems to look like the Rust code has outperformed the C++ code, but maybe I'm wrong.

Shepmaster
8,77827 silver badges28 bronze badges
asked Jan 30, 2019 at 17:23
\$\endgroup\$
4
  • 1
    \$\begingroup\$ Sure, I'll write c++ version and benchmark both but its pretty clear with the ability to directly access indices with minimal to no copying of iterators the c++ solution would be much more efficient. \$\endgroup\$ Commented Jan 30, 2019 at 18:22
  • 1
    \$\begingroup\$ @BiggySmallRyeda I think you underestimate LLVM \$\endgroup\$ Commented Jan 30, 2019 at 23:38
  • \$\begingroup\$ From experience with both C++ and Rust, performance is generally comparable, as long as you use equivalent operations. In terms of iterators specifically, C++ iterators compose badly, which generally give an edge to Rust if you start piling "filtering" iterators... but that's not very common. \$\endgroup\$ Commented Jan 31, 2019 at 7:53
  • 1
    \$\begingroup\$ @MatthieuM. I added the edits you and the others recommended. I don't understand the results of the benchmark very well. Also, can someone judge my rust approach to the problem. \$\endgroup\$ Commented Jan 31, 2019 at 18:46

1 Answer 1

5
\$\begingroup\$
  1. Run Rustfmt, a tool for automatically formatting Rust code to the community-accepted style.
  2. Run Clippy, a tool for finding common mistakes that may not be compilation errors but are unlikely to be what the programmer intended.
  3. Don't accept a &StringWhy is it discouraged to accept a reference to a String (&String) or Vec (&Vec) as a function argument?
  4. Don't use explicit return keywords at the end of blocks.
  5. Instead of

    if let Some(_) = max {
     max.unwrap()
    

    bind the variable:

    if let Some(x) = max {
     x
    
  6. Your whole last conditional can be simplified to max.unwrap_or_default().

  7. Add some automated tests.

    #[test]
    fn example_1() {
     assert_eq!(longest_con_seq("ABAZDC", "BACBAD"), "ABAD");
    }
    #[test]
    fn example_2() {
     assert_eq!(longest_con_seq("AGGTAB", "GXTKAYB"), "GTAB");
    }
    #[test]
    fn example_3() {
     assert_eq!(longest_con_seq("aaaa", "aa"), "aa");
    }
    #[test]
    fn example_4() {
     assert_eq!(longest_con_seq("ABBA", "ABJABA"), "ABBA");
    }
    
  8. As before, don't match on Some and then unwrap, bind the variable. We can also avoid the clone by matching on a reference:

    match max {
     Some(_) => {
     let max_str = max.clone();
     let max_str = max_str.unwrap();
    
    match &max {
     Some(max_str) => {
    
  9. current.clear() is used in both match arms; extract it.

  10. max = Some(current.clone()) is the same; extract it and use a boolean.

    let do_it = match &max {
     Some(max_str) => {
     max_str.len() < current.len()
     }
     None => {
     true
     }
    };
    if do_it {
     max = Some(current.clone());
    }
    
  11. This construct can be replaced by as_ref and map_or:

    if max.as_ref().map_or(true, |s| s.len() < current.len()) {
     max = Some(current.clone());
    }
    

End result

#[macro_use]
extern crate criterion;
use criterion::Criterion;
fn longest_con_seq(s1: &str, s2: &str) -> String {
 let mut max: Option<String> = None; // Holds value of string with maximum length
 let mut current = String::new(); // String container to hold current longest value
 let mut s1_iter = s1.chars().peekable(); // Peekable iterator for string s1
 let mut s2_iter = s2.chars(); //Iterator for string s2
 let mut s2_prev_pos = s2_iter.clone(); // Iterator that holds position of previous location of first iterator
 let mut s1_prev_pos = s1_iter.clone(); // Peekable iterator used to make sure all possible combinations are located.
 loop {
 let s1_char = s1_iter.next(); // Get character in s1
 if current.is_empty() {
 // If no consecutive string found yet store location of iterator
 s1_prev_pos = s1_iter.clone()
 }
 match s1_char {
 Some(s1_char) => loop {
 match s2_iter.next() {
 Some(s2_char) if s1_char == s2_char => {
 current.push(s1_char);
 s2_prev_pos = s2_iter.clone();
 break;
 }
 Some(_) => continue,
 None => {
 s2_iter = s2_prev_pos.clone();
 break;
 }
 }
 },
 None => match s1_prev_pos.peek() {
 Some(_) => {
 if max.as_ref().map_or(true, |s| s.len() < current.len()) {
 max = Some(current.clone());
 }
 current.clear();
 s1_iter = s1_prev_pos.clone();
 s2_iter = s2.chars();
 }
 None => break,
 },
 }
 }
 max.unwrap_or_default()
}
fn criterion_benchmark(c: &mut Criterion) {
 let s1 = "GXTKAYB";
 let s2 = "AGGTAB";
 c.bench_function("Benchmark", move |b| b.iter(|| longest_con_seq(s1, s2)));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
#[test]
fn example_1() {
 assert_eq!(longest_con_seq("ABAZDC", "BACBAD"), "ABAD");
}
#[test]
fn example_2() {
 assert_eq!(longest_con_seq("AGGTAB", "GXTKAYB"), "GTAB");
}
#[test]
fn example_3() {
 assert_eq!(longest_con_seq("aaaa", "aa"), "aa");
}
#[test]
fn example_4() {
 assert_eq!(longest_con_seq("ABBA", "ABJABA"), "ABBA");
}

With numerous copies of iterators made and copies of Strings being made

Copies of iterators are generally going to be extremely lightweight. In many cases, they are equivalent to a pointer and and offset.

Cloning a String is slightly less ideal, but that may just be a restriction of the algorithm.

answered Feb 1, 2019 at 16:06
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.