Here's my assignment:
Write a program that takes a filename as its only command line argument and prints the results to
std::out
formatted as follows:
- The first line of the output should be whether or not the checksum passed ("Checksum passed", "Checksum failed")
- If the checksum passed, the rest of the file will contain the parsed data. Print label value pairs separated by a colon for each field, one per line (all numbers should be represented as decimals).
An expected output file is provided for the first binary file to demonstrate format.
File structure (Little Endian):
size: 4 bytes entry count: 2 bytes --- entry --- first name length: 4 bytes first name: first name length bytes last name length: 4 bytes last name: last name length bytes flags: 1 byte * age: 1 byte * height: 1 byte ------------- zero padding: x bytes checksum: 4 bytes * = optional
Field descriptions:
size: Total number of bytes in the file
entry count: Number of entries in the file
entry
first name length: Number of characters in the following first name string since it is not NULL terminated
first name: Non-null terminated string
last name length: Number of characters in the following last name string since it is not NULL terminated
last name: Non-null terminated string
flags: Bit mask which indicates presence of either age or height fields.
- age field present
- height field present
age: Age in years. Optional field.
height: Height in inches. Optional field.
zero padding - The file is zero padded to ensure 4 byte alignment.
checksum: This value is calculated by summing every four bytes of the data (not including the actual checksum value)
FileParser.cpp
// FileParser.cpp : Defines the entry point for the console application.
//
#include<iostream>
#include <string>
#include <fstream>
#include <sstream>
void ParseFile(char* dataBuffer);
int main(int argc, char *argv[])
{
std::string inputFileName = argv[1];
std::ifstream inputFile(inputFileName, std::ios::binary | std::ios::ate);
if (inputFile.good())
{
//get File size
std::streamoff size = inputFile.tellg();
char *data = new char[size];
inputFile.seekg(0, std::ios::beg);
inputFile.read(data, size);
inputFile.close();
int checkSum = 0;
for (int i = 4; i <= size - 4; i = i + 4)
{
checkSum += (data[i - 1] << 24) |
(data[i - 2] << 16) |
(data[i - 3] << 8) |
(data[i - 4]);
}
int checksumfromFile = 0;
memcpy(&checksumfromFile, &data[size - 4], sizeof(int));
if (checkSum == checksumfromFile)
{
std::cout << "Checksum passed" << std::endl;
ParseFile(data);
}
else
std::cout << "Checksum Failed" << std::endl;
}
return 0;
}
void ParseFile(char* data)
{
unsigned char sizeinBytes = 4;
unsigned char count = 2;
unsigned char firstNameLength = 4;
unsigned char lastNameLength = 4;
unsigned char flags = 1;
unsigned char age_byte = 1;
unsigned char height_byte = 1;
unsigned char checksum = 4;
int offset = 0;
// size
int size;
memcpy(&size, &data[offset], sizeof(int));
std::cout << "size:" << size << std::endl;
offset += sizeinBytes;
//count of entries
short entryCount;
memcpy(&entryCount, &data[offset], sizeof(short));
std::cout << "entry count::" << entryCount << std::endl;
offset += count;
for (int i = 0; i < entryCount; i++)
{
//firstNameLength
int lengthofFirstName;
memcpy(&lengthofFirstName, &data[offset], sizeof(int));
std::cout << "first name length:" << lengthofFirstName << std::endl;
offset += firstNameLength;
//firstName
char* firstName = new char[lengthofFirstName];
memcpy(firstName, &data[offset], lengthofFirstName);
firstName[lengthofFirstName] = '0円';
std::cout << "first name:" << firstName << std::endl;
offset += lengthofFirstName;
// 4 bytes
int lengthofLastName;
memcpy(&lengthofLastName, &data[offset], sizeof(int));
std::cout << "last name length:" << lengthofLastName << std::endl;
offset += lastNameLength;
char* lastName = new char[lengthofLastName];
memcpy(lastName, &data[offset], lengthofLastName);
lastName[lengthofLastName] = '0円';
std::cout << "last name:" << lastName << std::endl;
offset += lengthofLastName;
char flag = data[offset];
std::cout << "flags:" << (int)flag << std::endl;
offset += flags;
char age;
char height;
if (flag == 3)
{
age = data[offset];
height = data[offset + 1];
offset += age_byte + height_byte;
std::cout << "age:" << (int)age << std::endl;
std::cout << "height:" << (int)height << std::endl;
}
else if (flag == 2)
{
height = data[offset];
offset += height_byte;
std::cout << "height:" << (int)height << std::endl;
}
else if (flag == 1)
{
age = data[offset];
offset += age_byte;
std::cout << "age:" << (int)age << std::endl;
}
}
int checkSum;
memcpy(&checkSum, &data[size - 4], sizeof(int));
std::cout << "checksum:" << checkSum << std::endl;
}
1 Answer 1
I like the idea of being able to checksum a file to see if it's been altered or corrupted. That can be very useful in a variety of situations! Here are some thoughts on how you could improve it.
Simplify
I find this loop a little hard to understand at first:
int checkSum = 0;
for (int i = 4; i <= size - 4; i = i + 4)
{
checkSum += (data[i - 1] << 24) |
(data[i - 2] << 16) |
(data[i - 3] << 8) |
(data[i - 4]);
}
Starting i
at 4 and then accessing data[ i - # ]
seems a lot harder to understand than starting i
at 0 and accessing data[ i + # ]
. I think this is the first time I've ever seen it done this way. It's not wrong, it's just different from what 99% of programmers would do, so it takes longer to understand. Note that your checksum technique assumes it's running on a little endian architecture. That's probably fine here, but it's worth noting that there are other architectures out there where this would produce incorrect results. A comment explicitly pointing that out would be sufficient here, in my opinion.
Careful with Assumptions
In getting the checksum out of the file you've made an assumption that's not necessarily true:
int checksumfromFile = 0;
memcpy(&checksumfromFile, &data[size - 4], sizeof(int));
This explicitly claims that sizeof(int)
is 4 bytes. That is likely to be true on common architectures today, but isn't guaranteed by the language's standard. However, the specification for this file format does say the checksum will be 4 bytes. So you need to ensure that you only attempt to copy 4 bytes from the file. I recommend using a sized type like int32_t
instead of int
since you know that you actually need 4 bytes:
int32_t checksumfromFile = 0;
memcpy(&checksumfromFile, &data[ size - sizeof(checksumfromFile) ], sizeof(checksumfromFile));
I also set the thing inside of sizeof()
to be the thing you're writing to. This will ensure you never write past the end of the variable you're copying data into.
Remove Unused Variables
The variable checksum
is unused in the ParseFile()
function. You should remove it. You should also turn up the warnings on your compiler to tell you about these sorts of issues.
Put Common Code into a Function
If you look at the code for reading in the first name and the last name, you'll see that it's largely the same. I recommend making either a single function to read a length and a string, or 2 functions – 1 to get the length and 1 to get the actual string:
int32_t getStringLength(const char* data, const size_t offset)
{
int32_t length = 0;
memcpy(&length, data [ offset ], sizeof(length);
return length;
}
char* copyString(const char* from, const int32_t length)
{
char* to = new char [ length + 1 ];
if (to != nullptr)
{
memcpy(to, from, length);
to [ length ] = '0円';
}
return to;
}
Then you can call it in 2 places like so:
for (int i = 0; i < entryCount; i++)
{
//firstNameLength
int32_t lengthofFirstName = getStringLength(data, offset);
std::cout << "first name length:" << lengthofFirstName << std::endl;
offset += firstNameLength;
//firstName
char* firstName = copyString(data, offset, lengthofFirstName);
if (firstName != nullptr)
{
std::cout << "first name:" << firstName << "\n";
}
offset += lengthofFirstName;
// ... etc.
Then you can do the same thing for the last name. I fixed a memory bug when writing the function, so if you use the function, it's fixed in both cases. You were allocating 1 byte too few for the name plus terminating NUL
character.
Naming
Most of your naming is good. That's rare! I would change a few things, though. First, you're using camelCase, but you're not actually capitalizing each word. You're leaving prepositions lowercase. I would capitalize them. It's not a huge deal, but is unexpected.
Next, the names firstNameLength
and lengthofFirstName
are too close and confusing. firstNameLength
is not the length of the first name. It's the length of the data that holds the length of the first name. Why not call it something like sizeBytes
or firstNameSizeBytes
or numBytesFirstNameSize
or something that explains how it differs from lengthofFirstName
? Same with the last name. Perhaps you could have a named constant at the top of the file that holds the size of each record element? Something like this:
const size_t kNumBytesFirstNameSize = 4;
const size_t kNumBytesLastNameSize = 4;
const size_t kNumBytesFlagsSize = 1;
const size_t kNumBytesChecksum = 4;
I would also add named constants for what the flags mean:
enum PersonFlag {
kAgeMask = 0x01,
kHeightMask = 0x02
}
Then you can check them via:
if (flag & kAgeMask)
{
// ...read in age...
}
if (flag & kHeightMask)
{
// ... read in height ...
}
Note that there's less code here. You only need 2 checks instead of 3.
tested it on a simple file and it works fine
glad to hear, having 2nd thoughts readinginputFile(...)...streamoff size = inputFile.tellg() ... new char[size]
(ate
was beyond the right margin of CR's code peephole...). \$\endgroup\$