There is a known problem stated as "Cyclic string":
A string S has been recorded many times, and then from the resulting line took a substring, and give to you.Your task is to determine the minimum possible length of a string S.
Input
In the first and only line of the input file contains a string that contains only letters, line length up to 50000 characters.
Output
The output file you want to display a single number - the answer to the problem.
It can be solved with substring hashes: source (Russian) / English translation.
I composed a solution:
bool substring_eq(const vector<uint64_t> &p_pow, const vector<uint64_t> &h,
size_t i1, size_t i2, size_t len) {
uint64_t h1 = h[i1 + len - 1];
if (i1) h1 -= h[i1 - 1];
uint64_t h2 = h[i2 + len - 1];
if (i2) h2 -= h[i2 - 1];
return (i1 < i2 && (h1 * p_pow[i2 - i1]) == h2) ||
(i1 > i2 && h1 == (h2 * p_pow[i1 - i2]));
}
void compute_hashes(const string &s, vector<uint64_t> &p_pow, vector<uint64_t> &h) {
const int p = 191;
p_pow[0] = 1;
for (size_t i = 1; i < p_pow.size(); ++i) {
p_pow[i] = p_pow[i - 1] * p;
}
for (size_t i = 0; i < s.length(); ++i) {
h[i] = (s[i] - 'A' + 1) * p_pow[i];
if (i) h[i] += h[i - 1];
}
}
size_t solve(const string &s) {
vector<uint64_t> p_pow(s.length()), hashes(s.length());
compute_hashes(s, p_pow, hashes);
size_t cycle_len = s.length();
for (size_t k = 1; k < s.length(); ++k) {
if (substring_eq(p_pow, hashes, 0, k, s.length() - k)) {
cycle_len = k;
break;
}
}
return cycle_len;
}
int main() {
//test();
string input;
cin >> input;
cout << solve(input) << std::endl;
return 0;
}
It doesn't pass the last test in e-judging system (I don't know the contents of it).
I even composed a function that generates input strings, but it catches no errors:
//////////////////////////
/// Stress testing
//////////////////////////
#include <random>
std::string random_string(std::string::size_type length) {
static auto &chrs = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
static std::mt19937 rg{std::random_device{}()};
static std::uniform_int_distribution<std::string::size_type> pick(0, sizeof(chrs) - 2);
std::string s;
s.reserve(length);
while (s.length() < length) {
char ch = chrs[pick(rg)];
if (s.find(ch) == string::npos) {
s += ch;
}
}
return s;
}
size_t solve(const string &s);
void test() {
srand(100);
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wmissing-noreturn"
while (true) {
string::size_type len = (unsigned long) (rand() % 50 + 1);
string s = random_string(len);
int count = rand() % 5000 + 2;
string input;
for (int i = 0; i < count; ++i) {
input += s;
}
string input1 = input;
input1 = input1.substr(rand() % (input1.length() - s.length()));
input1 = input1.substr(0, rand() % (input1.length() - s.length()) + s.length());
if (input1.empty()) continue;
size_t k = solve(input1);
if (k != s.length()) {
cout << "boom:" << input1;
}
}
#pragma clang diagnostic pop
}
//////////////////////////
What did I miss in the solution?
-
\$\begingroup\$ "I can't passed the last test in e-judging system" - is it TLE or WA? \$\endgroup\$vnp– vnp2016年12月17日 21:07:56 +00:00Commented Dec 17, 2016 at 21:07
-
\$\begingroup\$ What are TLE and WA? Particular e-judges? If yes then I don't know them. And I think it makes not a big difference where that tasks came from. \$\endgroup\$Alexander Myltsev– Alexander Myltsev2016年12月17日 21:12:29 +00:00Commented Dec 17, 2016 at 21:12
-
\$\begingroup\$ TLE = Time Limit Exceeded (время истекло), WA = Wrong Answer (ошибка). I am asking because WA is strictly off-topic here. \$\endgroup\$vnp– vnp2016年12月17日 21:13:39 +00:00Commented Dec 17, 2016 at 21:13
-
\$\begingroup\$ Sounds like Wrong Answer, which makes this question inappropriate for Code Review. \$\endgroup\$Mast– Mast ♦2016年12月17日 21:14:44 +00:00Commented Dec 17, 2016 at 21:14
-
1\$\begingroup\$ OK, thanks for keeping me updated. I read some threads here stating that asking "homework" assignments is fine as far as a student made a significant work. I will ask somewhere else then. Good luck! \$\endgroup\$Alexander Myltsev– Alexander Myltsev2016年12月18日 00:18:37 +00:00Commented Dec 18, 2016 at 0:18
1 Answer 1
This code is missing many necessary definitions. I guessed at
#include <cstdint>
#include <iostream>
#include <string>
#include <vector>
using std::cout;
using std::size_t;
using std::string;
using std::uint64_t;
using std::vector;
But I don't recommend all those using
s - just write the types in full (perhaps use an informative alias for your std::vector<std::uint64_t>
s).
Do we really need exactly 64-bit type, or can we be portable and using std::uint_fast64_t
instead?
The functions are very opaque - there's no guidance of what p_pow
, h
, i1
, i2
etc indicate. That makes it almost impossible to review the code without completely dissecting and rewriting it.
However, it seems to be based on a rolling hash function, which immediately raises a red flag: when a hash match is found, it needs to be confirmed as an actual match rather than merely a collision, but this code assumes there are no collisions. That's a serious bug, in my opinion. We need something like
std::size_t solve(std::string_view const s)
{
⋮
for (size_t k = 1; k < s.length(); ++k) {
auto const len = s.length() - k;
if (substring_eq(p_pow, hashes, 0, k, len)
&& s.substr(0, len) == s.substr(k, len)) {
// found it
return k;
}
}
(Note the use of string view, so that .substr()
is cheap.)
The substring_eq()
function could be simplified if we ensure that i1 <= i2
and if we start h
with a zero element:
using power_vector = std::vector<std::uint_fast64_t>;
bool substring_eq(const power_vector &p_pow, const power_vector &h,
std::size_t i1, std::size_t i2, std::size_t len)
{
if (i2 < i1) {
std::swap(i1, i2);
}
assert(i1 <= i2);
auto const h1 = h[i1 + len] - h[i1];
auto const h2 = h[i2 + len] - h[i2];
return h2 == h1 * p_pow[i2 - i1];
}
void compute_hashes(std::string_view const s, power_vector &p_pow, power_vector &h)
{
constexpr unsigned p = 191; // prime for exponentiation
// powers of p
p_pow[0] = 1;
for (std::size_t i = 1; i < p_pow.size(); ++i) {
p_pow[i] = p_pow[i - 1] * p;
}
// polynomial hash at each position in s
h[0] = 0;
for (std::size_t i = 0; i < s.length(); ++i) {
h[i+1] = h[i] + s[i] * p_pow[i];
}
}
We could probably also take advantage of the fact that substr_eq()
is always called with i0 == 0
, and eliminate that argument:
bool substring_eq(const power_vector &p_pow, const power_vector &h,
std::size_t i2, std::size_t len)
{
auto const h1 = h[len];
auto const h2 = h[i2 + len] - h[i2];
return h2 == h1 * p_pow[i2];
}
The test program generates random inputs to the solver. This is not a very good adversarial test of worst-case performance.
Improved code
#include <cstdint>
#include <iostream>
#include <string_view>
#include <vector>
using power_vector = std::vector<std::uint_fast64_t>;
using hash_vector = std::vector<std::uint_fast64_t>;
// Test whether string of length len starting at offset is exact match
// for same length prefix, according to hash values in h, based on
// integer powers in p_pow.
bool substring_eq(const power_vector &p_pow, const hash_vector &h,
std::size_t offset, std::size_t len)
{
auto const h1 = h[len];
auto const h2 = h[offset + len] - h[offset];
return h2 == h1 * p_pow[offset];
}
// Generate integer powers of a prime in p_pow, and rolling polynomial
// hash of input s in h.
void compute_hashes(std::string_view const s, power_vector &p_pow, hash_vector &h)
{
constexpr unsigned p = 191; // prime for exponentiation
// powers of p
p_pow[0] = 1;
for (std::size_t i = 1; i < p_pow.size(); ++i) {
p_pow[i] = p_pow[i - 1] * p;
}
// polynomial hash at each position in s
h[0] = 0;
for (std::size_t i = 0; i < s.length(); ++i) {
h[i+1] = h[i] + s[i] * p_pow[i];
}
}
std::size_t solve(std::string_view const s)
{
power_vector p_pow(s.length());
hash_vector hashes(s.length() + 1);
compute_hashes(s, p_pow, hashes);
for (std::size_t k = 1; k < s.length(); ++k) {
auto const len = s.length() - k;
if (substring_eq(p_pow, hashes, k, len)
&& s.substr(0, len) == s.substr(k, len)) {
// found it
return k;
}
}
// no cycle shorter than the whole string
return s.length();
}
I still don't like the data sharing through the two vectors. The object-oriented approach is to have a clear owner for them both, who is responsible for maintaining their invariants:
#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator>
#include <ranges>
#include <string_view>
#include <vector>
class fast_substring_compare
{
std::string_view s;
std::vector<std::uint_fast64_t> powers = {};
std::vector<std::uint_fast64_t> hash = {};
public:
explicit fast_substring_compare(std::string_view s)
: s{s}
{
constexpr std::uint_fast64_t one{1};
// Generate all powers of our chosen prime
powers.reserve(s.length());
powers.push_back(one);
std::generate_n(std::back_inserter(powers), s.length() - 1,
[n{one}] mutable { return n *= 191; });
// polynomial hash at each position in s
hash.reserve(s.length() + 1);
hash.push_back(0);
for (auto [c, p]: std::views::zip(s, powers)) {
hash.push_back(hash.back() + (unsigned char)c * p);
}
}
// Test whether substring string of length len starting at i1 is exact match
// for same length substring starting at i2.
bool substring_eq(std::size_t i1, std::size_t i2, std::size_t len) const
{
if (i2 < i1) {
std::swap(i1, i2);
}
auto const h1 = hash[i1 + len] - hash[i1];
auto const h2 = hash[i2 + len] - hash[i2];
return h2 == h1 * powers[i2 - i1]
&& s.substr(i1, len) == s.substr(i2, len);
}
};
std::size_t solve(std::string_view const s)
{
const fast_substring_compare comparer{s};
for (auto offset: std::views::iota(1uz, s.length())) {
auto const len = s.length() - offset;
if (comparer.substring_eq(0, offset, len)) {
// found it
return offset;
}
}
// no cycle shorter than the whole string
return s.length();
}