6
\$\begingroup\$

I have a nul-terminated char * (may be NULL). For logging purposes, I want to write a function inspired by python's repr() for it: it should return a string such that if the string is interpreted as a C++ expression, it evaluates to a copy of the original. By "copy" I mean that the contents of the string that the expression evaluates to should be the same sequence of bytes. The source character set and the execution character set can be assumed to be ASCII-compatible (e.g. utf-8, CP-1252, etc.).

What I tested (see code below):

  1. I constructed a char * pointing to a string consisting of every byte from 1 to 255.
  2. I used encode() on the array and printed the result.
  3. I copy-pasted the printed result into the editor and checked that it compares equal to the original.

Can you find any bugs?

// SPDX-FileCopyrightText: 2025 <https://github.com/hexagonrecursion>
// SPDX-License-Identifier: CC0-1.0
#include <string>
#include <ios>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <locale>
std::string encode(const char *s)
{
 if(s == nullptr) return "nullptr";
 std::locale cLocale("C");
 std::stringstream out;
 out << '"';
 for(; *s; ++s)
 {
 switch(*s)
 {
 case '\"': out << "\\\""; break;
 case '\?': out << "\\?"; break; // May need escaping due to trigraphs
 case '\\': out << "\\\\"; break;
 case '\a': out << "\\a"; break;
 case '\b': out << "\\b"; break;
 case '\f': out << "\\f"; break;
 case '\n': out << "\\n"; break;
 case '\r': out << "\\r"; break;
 case '\t': out << "\\t"; break;
 case '\v': out << "\\v"; break;
 default:
 if(std::isprint(*s, cLocale))
 {
 out << *s;
 }
 else
 {
 unsigned c = static_cast<unsigned>(static_cast<unsigned char>(*s));
 out << '\\' << std::oct << std::setw(3) << std::setfill('0') << c;
 }
 break;
 }
 }
 out << '"';
 return out.str();
}
int main()
{
 std::string s;
 for(unsigned char c = 1; c != 0; ++c)
 {
 s.push_back(c);
 }
 std::cout << encode(s.c_str()) << std::endl;
 // I copy-pasted this string from stdout:
 const char *copyPasteFromStdout = "001円002円003円004円005円006円\a\b\t\n\v\f\r016円017円020円021円022円023円024円025円026円027円030円031円032円033円034円035円036円037円 !\"#$%&'()*+,-./0123456789:;<=>\?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~177円200円201円202円203円204円205円206円207円210円211円212円213円214円215円216円217円220円221円222円223円224円225円226円227円230円231円232円233円234円235円236円237円240円241円242円243円244円245円246円247円250円251円252円253円254円255円256円257円260円261円262円263円264円265円266円267円270円271円272円273円274円275円276円277円300円301円302円303円304円305円306円307円310円311円312円313円314円315円316円317円320円321円322円323円324円325円326円327円330円331円332円333円334円335円336円337円340円341円342円343円344円345円346円347円350円351円352円353円354円355円356円357円360円361円362円363円364円365円366円367円370円371円372円373円374円375円376円377円";
 // prints true
 std::cout << std::boolalpha << (copyPasteFromStdout == s) << std::endl;
 return 0;
}

Here is how I plan to use the function:

PHYSFS_File *loud_openRead(const char *filename)
{
 PHYSFS_File *file = PHYSFS_openRead(filename);
 if (file != nullptr) return file;
 GetLogger()->Error("Error opening file with PHYSFS: %%", encode(filename));
 return nullptr;
}
asked Jan 19 at 7:11
\$\endgroup\$
5
  • 1
    \$\begingroup\$ "From my testing so far" could be more meaningful to reviewers - I suggest you include your test code if possible, so we can see what you've tested (and perhaps what you missed). \$\endgroup\$ Commented Jan 19 at 11:17
  • \$\begingroup\$ Just FYI, the terminology you’re looking for is "serialization" and "deserialization". \$\endgroup\$ Commented Jan 19 at 15:40
  • \$\begingroup\$ @TobySpeight Thanks. Updated. \$\endgroup\$ Commented Jan 19 at 16:39
  • \$\begingroup\$ You should check out base-64 encoding. This encodes text using only 7 bits of data (so can be represented by the standard printable characters). It is very common way to encode text that is saved in text systems. \$\endgroup\$ Commented Jan 22 at 4:26
  • \$\begingroup\$ @LokiAstari Thanks, but I have specifically mentioned that I want to use this for logging purposes. Using base64 would make it hard to read the logs - the opposite of what I want \$\endgroup\$ Commented Jan 22 at 17:45

1 Answer 1

6
\$\begingroup\$

I would write these character constants as '"' and '?' respectively:

 case '\"': out << "\\\""; break;
 case '\?': out << "\\?"; break;

We might even combine these cases:

 case '"':
 case '?':
 case '\\':
 out << '\\' << *s; break;

We shouldn't need to construct a new locale object each time we're called. We can construct a single shared instance:

 static const std::locale cLocale{"C"};

Widening from unsigned char to unsigned int is a usual conversion, so we can simplify:

 unsigned c = static_cast<unsigned char>(*s);

For the use-case in main(), it's wasteful to copy the result into a std::string. It might be better to pass a std::ostream& to one version of the encoder, and call that one from one that returns a string:

std::string encode(const char *s)
{
 if (s == nullptr) { return "nullptr"; }
 std::stringstream out;
 encode(s, out);
 return out.str();
}

Also in main(), there's no need to flush standard output, or to explicitly return 0.


#include <iomanip>
#include <locale>
#include <sstream>
#include <string_view>
std::ostream& encode(std::string_view s, std::ostream& out)
{
 static const std::locale cLocale{"C"};
 out << '"';
 for (char c: s) {
 switch (c) {
 case '"':
 case '?': // Don't introduce trigraphs
 case '\\':
 out << '\\' << c; break;
 case '\a': out << "\\a"; break;
 case '\b': out << "\\b"; break;
 case '\f': out << "\\f"; break;
 case '\n': out << "\\n"; break;
 case '\r': out << "\\r"; break;
 case '\t': out << "\\t"; break;
 case '\v': out << "\\v"; break;
 default:
 if (std::isprint(c, cLocale)) {
 out << c;
 } else {
 unsigned ci = static_cast<unsigned char>(c);
 out << '\\'
 << std::oct << std::setw(3) << std::setfill('0')
 << ci;
 }
 break;
 }
 }
 return out << '"';
}
std::string encode(const char *s)
{
 if (!s) {
 return "nullptr";
 }
 std::stringstream out;
 encode(s, out);
 return out.str();
}
#include <iostream>
#include <limits>
int main()
{
 std::string s;
 s.reserve(std::numeric_limits<unsigned char>::max());
 for (unsigned char c = 1; c != 0; ++c) {
 s.push_back(c);
 }
 encode(s, std::cout);
 std::cout << '\n';
 std::cout << encode(nullptr) << '\n';
 std::cout << encode("") << '\n';
}
answered Jan 19 at 11:44
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.