C++20 Robust File Interface

Question 1

I'm working on a homebrew game engine in C++20. I needed a robust, minimal-overhead, and easy way to load files into memory. I ended up with this: a simple, easy file loader. It doesn't have many capabilities, but it works for exactly what I need; string conversion and in-built hexdump functionality. However, I wanted to know if this interface follows best practices. I specifically am using C++20, so if any features from that standard work better for a particular purpose, please let me know. I want to be as efficient as possible.

Files.hpp:

/**
 * @file Files.hpp
 * @author Israfil Argos (israfiel-a)
 * @date 2024年12月16日
 * @brief A file providing functionality for loading and understanding
 * files.
 *
 * Copyright (c) 2024 Israfil Argos
 * This file is under the AGPLv3. For more information on what that
 * entails, see the LICENSE file provided with the engine.
 */
#ifndef IRIDIUM_FILES_HPP
#define IRIDIUM_FILES_HPP
// C++ standard I/O operations.
#include <iostream>
// C++ filesystem library.
#include <filesystem>
// C++ standard variable-sized array.
#include <vector>
/**
 * @brief The Iridium namespace.
 */
namespace Iridium
{
 /**
 * @brief The path to a file. This is an STL filesystem::path
 * structure.
 */
 typedef std::filesystem::path FilePath;
 /**
 * @brief A file's contents. This is an STL vector of unsigned bytes.
 */
 typedef std::vector<std::uint8_t> FileContents;
 /**
 * @brief Utilities having to do with MIME type.
 */
 namespace MIME
 {
 /**
 * @brief A file's MIME type. This is an STL string object.
 */
 typedef std::string FileMIME;
 /**
 * @brief The MIME for a JPEG file. The signature for this is `FF
 * D8 FF (D8 | E0)`.
 */
 constexpr const char *jpeg_mime = "image/jpeg";
 /**
 * @brief The MIME for a PNG file. The signature for this is `89 50
 * 4E 47 0D 0A 1A 0A`.
 */
 constexpr const char *png_mime = "image/png";
 /**
 * @brief The MIME for an Iridium config file. Since each one
 * begins with an [Iridium], the signature is just that. This is
 * not standard.
 */
 constexpr const char *config_mime = "text/config";
 /**
 * @brief The MIME for a generic/unknown file.
 */
 constexpr const char *generic_mime = "application/octet-stream";
 FileMIME GetFiletype(const FileContents &file_contents);
 }
 /**
 * @brief A file object. This is not for writing, it's just for
 * reading. For writing, see MutableFile.
 */
 class File
 {
 private:
 /**
 * @brief The path to the file.
 */
 FilePath path;
 /**
 * @brief A vector of raw bytes that represents the
 * contents of the file.
 */
 FileContents contents;
 public:
 /**
 * @brief Load a file from the given path.
 * @param path The path to the file within the game's asset
 * directory.
 */
 File(const FilePath &path);
 /**
 * @brief Get the directory within the game's asset directory
 * that this file exists in.
 * @return The directory name.
 */
 inline const FilePath &GetPath() const noexcept
 {
 return this->path;
 }
 /**
 * @brief Get the basename of the file.
 * @return The file's basename.
 */
 inline FilePath GetBasename() const noexcept
 {
 return this->path.stem();
 }
 /**
 * @brief Get the file's extension. This is NOT the file's MIME
 * type.
 * @return The file's extension.
 */
 inline FilePath GetExtension() const noexcept
 {
 return this->path.extension();
 }
 /**
 * @brief Get the contents of a file as raw bytes.
 * @return A vector of the file's raw bytes.
 */
 inline const FileContents &GetContents() const noexcept
 {
 return this->contents;
 }
 /**
 * @brief Get the size of the file in bytes.
 * @return The size of the file.
 */
 inline std::size_t GetSize() const noexcept
 {
 return this->contents.size();
 }
 /**
 * @brief Get the type of the file.
 * @return The type of the file.
 */
 inline MIME::FileMIME GetType() const noexcept
 {
 return MIME::GetFiletype(this->GetContents());
 }
 /**
 * @brief Stringify the file's metadata.
 * @return The stringified data.
 */
 const std::string &StringifyMetadata() const noexcept;
 /**
 * @brief Get the contents of the file in string format.
 * @return The string of file content.
 */
 inline std::string Stringify() const noexcept
 {
 return std::string(this->contents.begin(),
 this->contents.end());
 }
 /**
 * @brief Hexdump the file to a specified output stream. This
 * flushes the output stream.
 * @param output The ouput stream to print the hexdump to.
 * @param column_count The amount of horizontal columns to
 * print.
 * @param characters A boolean that represents whether or not
 * to print ASCII transliterations of each byte should there be
 * one.
 */
 void Hexdump(std::ostream &output = std::cout,
 std::uint32_t column_count = 32,
 bool characters = true) const;
 };
 /**
 * @brief Attempt to normalize (remove things like root-traversal,
 * double separators, etc.) a given path. This will merge the file path
 * with the application's asset directory.
 * @param path The path to normalize.
 * @return The normalized path.
 */
 FilePath NormalizePath(const FilePath &path);
 /**
 * @brief Get the asset directory.
 * @return The asset directory's path.
 * @note This function is effectively boilerplate at this time; it
 * exists only for the future, where configuring this value will be
 * possible.
 */
 FilePath GetAssetDirectory();
}
#endif // IRIDIUM_FILES_HPP

Files.cpp:

/**
 * @file Files.cpp
 * @author Israfil Argos (israfiel-a)
 * @date 2024年12月21日
 * @brief The implementation file of Iridium's file management interface.
 *
 * Copyright (c) 2024 Israfil Argos
 * This file is under the AGPLv3. For more information on what that
 * entails, see the LICENSE file provided with the engine.
 */
// The parent header file.
#include <Files.hpp>
// C++ standard file streams.
#include <fstream>
// C++ standard iterators.
#include <iterator>
/**
 * @brief A fixed-width type that, when passed to an std::ostream, will
 * allow padding input.
 * @tparam T The type of input.
 * @tparam W The width of padding.
 */
template <typename T, int W>
struct FixedWidthValue
{
 FixedWidthValue(T v_) : v(v_) {}
 T v;
};
/**
 * @brief The default fixed-width stream operator. This does not set a
 * delimiter to fill with.
 * @tparam T The type of input.
 * @tparam W The width of padding.
 * @param output The output stream.
 * @param value The value we're printing.
 * @return The newly streamed-to output stream.
 */
template <typename T, int W>
std::ostream &operator<<(std::ostream &output,
 const FixedWidthValue<T, W> &value)
{
 return output << std::setw(W) << value.v;
}
/**
 * @brief The fixed-width stream operator for unsigned characters. This
 * automatically prints a delimiter character if the byte can't be printed.
 * @tparam W The width of padding.
 * @param output The output stream.
 * @param value The value we're printing.
 * @return The newly streamed-to output stream.
 */
template <int W>
std::ostream &operator<<(std::ostream &output,
 const FixedWidthValue<std::uint8_t, W> &value)
{
 // If the character is able to be printed, print it. Otherwise, print
 // the delimiter.
 if (isprint((int)value.v) && value.v != ' ')
 return output << value.v << " ";
 else return output << std::setw(W) << output.fill();
}
namespace Iridium
{
 std::string MIME::GetFiletype(const FileContents &file_contents)
 {
 if (file_contents.size() > 7 && file_contents[0] == 0x89 &&
 file_contents[1] == 0x50 && file_contents[2] == 0x4E &&
 file_contents[3] == 0x47 && file_contents[4] == 0x0D &&
 file_contents[5] == 0x0A && file_contents[6] == 0x1A &&
 file_contents[7] == 0x0A)
 return "image/png";
 if (file_contents.size() > 3 && file_contents[0] == 0xFF &&
 file_contents[1] == 0xD8 && file_contents[2] == 0xFF &&
 (file_contents[3] == 0xDB || file_contents[3] == 0xE0))
 return "image/jpeg";
 if (file_contents.size() > 8 && file_contents[0] == 0x5D &&
 file_contents[1] == 0x49 && file_contents[2] == 0x72 &&
 file_contents[3] == 0x69 && file_contents[4] == 0x64 &&
 file_contents[5] == 0x69 && file_contents[6] == 0x75 &&
 file_contents[7] == 0x6D && file_contents[8] == 0x5D)
 return "text/config";
 return "application/octet-stream";
 }
 File::File(const FilePath &path)
 {
 this->path = NormalizePath(path);
 if (!std::filesystem::is_regular_file(this->path))
 {
 // No proper error recorder exists yet, so just fail the thread
 // as a placeholder.
 exit(255);
 }
 // Open the file in raw binary mode and with the position pointer
 // at the end of the buffer.
 std::ifstream file_stream(this->path,
 std::ios::binary | std::ios::ate);
 if (!file_stream.is_open())
 {
 // Follows the above comment.
 exit(255);
 }
 std::ifstream::pos_type file_size = file_stream.tellg();
 this->contents = std::vector<std::uint8_t>(file_size);
 // Read any content.
 if (file_size > 0)
 {
 file_stream.seekg(0, std::ios::beg);
 file_stream.read((char *)this->contents.data(), file_size);
 }
 }
 const std::string &File::StringifyMetadata() const noexcept
 {
 static std::string metadata_string =
 "File \"" + this->GetBasename().string() + "\" (" +
 std::to_string(this->GetSize()) +
 " bytes):\n\tFull Path: " + this->GetPath().string() +
 "\n\tMIME Type: " + this->GetType() + "\n";
 return metadata_string;
 }
 void File::Hexdump(std::ostream &output, std::uint32_t column_count,
 bool characters) const
 {
 // Save the flags of the output stream so we can restore them
 // afterward.
 std::ios_base::fmtflags saved_flags(output.flags());
 output << std::uppercase << std::hex;
 const std::size_t max_rows = this->contents.size() / column_count;
 for (std::size_t current_row = 0; current_row < max_rows;
 current_row++)
 {
 const std::size_t row_begin = current_row * column_count;
 const std::size_t row_end = row_begin + column_count;
 const std::vector<std::uint8_t> row(
 this->contents.begin() + row_begin,
 this->contents.begin() + row_end);
 output << std::setfill('0');
 // Print the line number.
 output << std::setw(8) << row_begin << " ";
 // Copy the integer data into the given output stream.
 std::copy(row.begin(), row.end(),
 std::ostream_iterator<FixedWidthValue<int, 2>>(
 output, " "));
 if (characters)
 {
 output << " " << std::setfill('.');
 std::copy(
 row.begin(), row.end(),
 std::ostream_iterator<
 FixedWidthValue<std::uint8_t, 2>>(output, " "));
 }
 output << "\n";
 }
 // Reset flags and flush the stream.
 output.flags(saved_flags);
 output.flush();
 }
 FilePath NormalizePath(const FilePath &path)
 {
 std::filesystem::path canonical_path =
 std::filesystem::weakly_canonical(
 GetAssetDirectory().append(path.string()));
 return canonical_path.make_preferred();
 }
 FilePath GetAssetDirectory()
 {
 return std::filesystem::path("./Assets/");
 }
}

Usage:

#include <Files.hpp>
int main() 
{
 Iridium::File opened_file("<path>");
 opened_file.Hexdump();
}

This should hex-dump the provided file to cout with a column width of 32 bytes and a transliteration of ASCII characters beside the raw contents. Note that, because this is ripped from engine code, your file should be in a folder named "Assets" directly next to the executable.

Any feedback would be wonderful!

Question 2

Style

Using this-> is not needed (and I would consider it bad).

The problem is that you only need to use this syntax when you need to disambiguate variables (shadowed variables). The problem is the compiler will not warn you when you accidentally forget to add this->. But if you use good (distinct) variable names, the problem goes away. The compiler will also warn you about shadowed variables to help prevent the problem in the first place.

Code Review

Extraneous comments are worse than no comments:

// C++ standard I/O operations.
#include <iostream>
// C++ filesystem library.
#include <filesystem>
// C++ standard variable-sized array.
#include <vector>

In older code comment rot is a real problem. Comments should always have a purpose. Don't repeat stuff that the code already does.

I suppose this is OK because it ties in with the documentation system. But I don't particularly like it because it adds nothing to the code.

/**
 * @brief The Iridium namespace.
 */
namespace Iridium
{

This is older C++.

 typedef std::filesystem::path FilePath;
 typedef std::vector<std::uint8_t> FileContents;

The more modern approach is:

 using FilePath = std::filesystem::path;
 using FileContents = std::vector<std::uint8_t>;

These are fine:

 constexpr const char *jpeg_mime = "image/jpeg";
 constexpr const char *png_mime = "image/png";
 constexpr const char *config_mime = "text/config";
 constexpr const char *generic_mime = "application/octet-stream";

Again the more modern aproach would be:

 constexpr std::string_view jpeg_mime = "image/jpeg";
 constexpr std::string_view png_mime = "image/png";
 constexpr std::string_view config_mime = "text/config";
 constexpr std::string_view generic_mime = "application/octet-stream";

Here you are making a copy of the content.
Which is already in a container.

 inline std::string Stringify() const noexcept
 {
 return std::string(this->contents.begin(),
 this->contents.end());
 }

To prevent a copy, I would return a std::string_view (a read-only view of the data).

Dumps the old content and gets a new vector.

 this->contents = std::vector<std::uint8_t>(file_size);

Seems simpler to use:

 this->contents.resize(file_size);

This is a bug.

 const std::string &File::StringifyMetadata() const noexcept
 {
 // Using static here is wrong.
 // This means the variable is created the first time the
 // method is called. But it will retain its values between calls
 // so the initialization is done only the first time.
 //
 //Subsequent calls will use the same value.
 //
 static std::string metadata_string =
 "File \"" + this->GetBasename().string() + "\" (" +
 std::to_string(this->GetSize()) +
 " bytes):\n\tFull Path: " + this->GetPath().string() +
 "\n\tMIME Type: " + this->GetType() + "\n";
 return metadata_string;
 }

Using string addition (concatenation) is very expensive as you create a new string for each addition (copying the data from the previous result). Though the compiler will probably optimize that away.

I personally thing the better technique would be to use a std::stringstream.

 const std::string &File::StringifyMetadata() const noexcept
 {
 std::stringstream metadata;
 metadata << "File \"" 
 << this->GetBasename().string()
 << "\" ("
 << std::to_string(this->GetSize())
 << " bytes):\n\tFull Path: "
 << this->GetPath().string()
 << "\n\tMIME Type: "
 << this->GetType()
 << "\n";
 return metadata.str();
 }

Your mileage may vary. Use the more readable one for you (or your coding standard). If it causes a problem with efficiency, do timings and select the most effecient.

This is very good:

 std::ios_base::fmtflags saved_flags(output.flags());

Most people don't think about this.

Prefer to use the prefix ++ operator:

 for (std::size_t current_row = 0; current_row < max_rows;
 current_row++)
 ^^

Probably makes no difference for integer types. But in C++, code is often changed by changing the type. So by using the prefix version, you always use the most efficient version, and when the type changes, you don't need to change the code.

Question 3

> Using string addition (concatenation) is very expensive. Stringstreams are even more expensive, especially for a few concatenations. They're particularly bad in a multi threaded environment, because every new stream takes a global lock on the locale state.

Question 4

"Using string addition (concatenation) is very expensive as you create a new string for each addition (copying the data from the previous result)." - Not true thanks to rvalue references.

Question 5

"Using string addition (concatenation) is very expensive" - GCC is very good at optimizing std::string manipulation, Clang might be too. Even for long strings the compiler can optimize away multiple memory allocations and copies. Of course, it might still be the case that a std::stringstream is faster here, but don't assume anything: look at the assembly output and measure the actual performance.

Question 6

Wow, this is incredibly thorough. Do you mind explaining why in the const char* case I would want string_view over string? Also, what is the efficiency difference between prefixed ++ and suffixed ++? I was under the impression it was semantic, no?

Question 7

The advantage of using this-> is that your IDE will suggest autocompletion. That's enough of a reason (if the OP is using it), to keep it.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Accepted Answer · 2024-12-22 07:31:16Z

Style

Using this-> is not needed (and I would consider it bad).

The problem is that you only need to use this syntax when you need to disambiguate variables (shadowed variables). The problem is the compiler will not warn you when you accidentally forget to add this->. But if you use good (distinct) variable names, the problem goes away. The compiler will also warn you about shadowed variables to help prevent the problem in the first place.

Code Review

Extraneous comments are worse than no comments:

// C++ standard I/O operations.
#include <iostream>
// C++ filesystem library.
#include <filesystem>
// C++ standard variable-sized array.
#include <vector>

In older code comment rot is a real problem. Comments should always have a purpose. Don't repeat stuff that the code already does.

I suppose this is OK because it ties in with the documentation system. But I don't particularly like it because it adds nothing to the code.

/**
 * @brief The Iridium namespace.
 */
namespace Iridium
{

This is older C++.

 typedef std::filesystem::path FilePath;
 typedef std::vector<std::uint8_t> FileContents;

The more modern approach is:

 using FilePath = std::filesystem::path;
 using FileContents = std::vector<std::uint8_t>;

These are fine:

 constexpr const char *jpeg_mime = "image/jpeg";
 constexpr const char *png_mime = "image/png";
 constexpr const char *config_mime = "text/config";
 constexpr const char *generic_mime = "application/octet-stream";

Again the more modern aproach would be:

 constexpr std::string_view jpeg_mime = "image/jpeg";
 constexpr std::string_view png_mime = "image/png";
 constexpr std::string_view config_mime = "text/config";
 constexpr std::string_view generic_mime = "application/octet-stream";

Here you are making a copy of the content.
Which is already in a container.

 inline std::string Stringify() const noexcept
 {
 return std::string(this->contents.begin(),
 this->contents.end());
 }

To prevent a copy, I would return a std::string_view (a read-only view of the data).

Dumps the old content and gets a new vector.

 this->contents = std::vector<std::uint8_t>(file_size);

Seems simpler to use:

 this->contents.resize(file_size);

This is a bug.

 const std::string &File::StringifyMetadata() const noexcept
 {
 // Using static here is wrong.
 // This means the variable is created the first time the
 // method is called. But it will retain its values between calls
 // so the initialization is done only the first time.
 //
 //Subsequent calls will use the same value.
 //
 static std::string metadata_string =
 "File \"" + this->GetBasename().string() + "\" (" +
 std::to_string(this->GetSize()) +
 " bytes):\n\tFull Path: " + this->GetPath().string() +
 "\n\tMIME Type: " + this->GetType() + "\n";
 return metadata_string;
 }

Using string addition (concatenation) is very expensive as you create a new string for each addition (copying the data from the previous result). Though the compiler will probably optimize that away.

I personally thing the better technique would be to use a std::stringstream.

 const std::string &File::StringifyMetadata() const noexcept
 {
 std::stringstream metadata;
 metadata << "File \"" 
 << this->GetBasename().string()
 << "\" ("
 << std::to_string(this->GetSize())
 << " bytes):\n\tFull Path: "
 << this->GetPath().string()
 << "\n\tMIME Type: "
 << this->GetType()
 << "\n";
 return metadata.str();
 }

Your mileage may vary. Use the more readable one for you (or your coding standard). If it causes a problem with efficiency, do timings and select the most effecient.

This is very good:

 std::ios_base::fmtflags saved_flags(output.flags());

Most people don't think about this.

Prefer to use the prefix ++ operator:

 for (std::size_t current_row = 0; current_row < max_rows;
 current_row++)
 ^^

Probably makes no difference for integer types. But in C++, code is often changed by changing the type. So by using the prefix version, you always use the most efficient version, and when the type changes, you don't need to change the code.

> Using string addition (concatenation) is very expensive. Stringstreams are even more expensive, especially for a few concatenations. They're particularly bad in a multi threaded environment, because every new stream takes a global lock on the locale state.
"Using string addition (concatenation) is very expensive as you create a new string for each addition (copying the data from the previous result)." - Not true thanks to rvalue references.
"Using string addition (concatenation) is very expensive" - GCC is very good at optimizing std::string manipulation, Clang might be too. Even for long strings the compiler can optimize away multiple memory allocations and copies. Of course, it might still be the case that a std::stringstream is faster here, but don't assume anything: look at the assembly output and measure the actual performance.
Wow, this is incredibly thorough. Do you mind explaining why in the const char* case I would want string_view over string? Also, what is the efficiency difference between prefixed ++ and suffixed ++? I was under the impression it was semantic, no?
The advantage of using this-> is that your IDE will suggest autocompletion. That's enough of a reason (if the OP is using it), to keep it.

Stack Exchange Network

C++20 Robust File Interface

1 Answer 1

Style

Code Review

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C++20 Robust File Interface

1 Answer 1

Style

Code Review

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions