5
\$\begingroup\$

Recently I submitted some string formatting and printing code (found here) that I wrote as an small exercise. It was implemented naively using string replacement. This time around I wanted to actually parse the format string. This is the first time I've ever attempted to write code which does any type of parsing. I'm not 100% sure that this code does not contain any critical flaws. In addition to pointing out any such oversights, I would appreciate advice on how this could be modified to make it either faster or more readable.

The functionality is supposed to be similar to the Console.WriteLine or String.Format functions of C#.

#include <boost/lexical_cast.hpp>
#include <iostream>
#include <string>
#include <exception>
#include <utility>
#include <cctype>
template <typename T>
std::string to_string(T&& item)
{
 return boost::lexical_cast<std::string>(std::forward<T>(item)); 
}
template <class... Args>
std::string format(const std::string& fmt, Args&&... args)
{
 std::string output;
 output.reserve(fmt.length() * 2);
 std::string arg_strings[] = { to_string(std::forward<Args>(args))... };
 bool placeholder_mode = false;
 std::string index_string;
 for (auto it = fmt.begin(); it != fmt.end(); ++it) { 
 //check for end of brace enclosed placeholder
 if ((*it == '}') && (placeholder_mode == true)) {
 if ((index_string.length() == 0)) {
 throw std::invalid_argument("Invalid format string.");
 }
 //check if index_string contains a leading zero
 if ((index_string.length() > 1) && (index_string[0] == '0')) {
 throw std::invalid_argument("Invalid format string.");
 }
 int index = std::stoi(index_string);
 if (index > (sizeof...(args) - 1)) {
 throw std::out_of_range(
 "Index (zero based) must be greater than or equal "
 "to zero and less than the size of the argument list."
 );
 }
 output += arg_strings[index];
 index_string.clear();
 placeholder_mode = false;
 continue;
 }
 //escape double closing brace as single
 if ((*it == '}') && (*(it+1) == '}')) {
 it++;
 output.push_back('}');
 continue;
 }
 //check for unescaped closing brace
 //which doesn't close a placeholder
 if ((*it == '}') && (placeholder_mode == false)) {
 throw std::invalid_argument("Invalid format string.");
 }
 //escape double opening brace as single
 if ((*it == '{') && (*(it+1) == '{')) {
 it++;
 output.push_back('{');
 continue;
 }
 //check for begining of brace enclosed placeholder
 if ((*it == '{') && (placeholder_mode == false)) {
 placeholder_mode = true;
 continue;
 }
 if(placeholder_mode == true) {
 //check for non-digit characters
 if (isdigit(*it) == false) {
 throw std::invalid_argument("Invalid format string.");
 }
 index_string.push_back(*it);
 }
 else {
 output.push_back(*it);
 }
 }
 if (placeholder_mode == true) {
 //catch any other placeholders that were left open
 throw std::invalid_argument("Invalid format string.");
 }
 return output;
}
template <class... Args>
void print(const std::string& fmt, Args&&... args)
{
 std::cout << format(fmt, std::forward<Args>(args)...);
}
int main()
{
 print("Hello, {0}! The answer is {1}.\n", "World", 42);
}

Update: Version 2

asked Jul 13, 2014 at 3:06
\$\endgroup\$
0

1 Answer 1

3
\$\begingroup\$

There are a few points about this that seem open to improvement (at least to me).

  1. Don't compare bools to true or false, just use their value directly, so

    placeholder_mode == false
    

    becomes:

    !placeholder_mode
    

    likewise:

    if(placeholder_mode == true)
    

    becomes:

    if (placeholder_mode)
    
  2. You have basically everything in one giant function. This makes the code more difficult to understand, analyze, etc.

  3. Some of the checks don't seem to accomplish much (if anything)--preventing leading zeros, for one obvious example.

  4. Like most parsers, this is basically a state machine. As in many cases, it would probably be somewhat clearer if the state machine were a little more explicit.

    I'd probably start with something like this:

    enum states { COPYING, START_PLACEHOLDER, READ_PLACEHOLDER, END_PLACEHOLDER, ERROR };
    states state;
    while (it != fmt.end()) {
    switch (state) { 
    case COPYING:
     if (*it == '{') state = START_PLACEHOLDER;
     else output.push_back(*it);
     break;
    case START_PLACEHOLDER:
     if (*it == '0') state = ERROR;
     else state = READ_PLACEHOLDER;
     // intentional fall-through
    case READ_PLACEHOLDER: 
     num = std::stoi(it);
     check_range(num, 0, arg_count);
     output.push_back(args[num]);
    

    ...and so on. This structure can help clarify what happens in what states, and the conditions under which you progress from one state to another.

Edward
67.2k4 gold badges120 silver badges284 bronze badges
answered Jul 13, 2014 at 5:27
\$\endgroup\$
1
  • \$\begingroup\$ I cannot stress enough how inexperienced I am with parsing and finite state machines. I've updated my question with my attempt to implement one, but it does not quite work correctly. \$\endgroup\$ Commented Jul 13, 2014 at 11:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.