I have solved a task which takes input in a unique markup language. In this language each element consists of a starting and closing tag. Only starting tags can have attributes. Each attribute has a corresponding name and value.
For example:
<tag-name attribute1-name = "Value" attribute2-name = "Value2" ... >
A closing tag follows this format:
</tag-name>
The tags may also be nested. For example:
<tag1 value = "HelloWorld">
<tag2 name = "Name1">
</tag2>
</tag1>
Attributes are referenced as:
tag1~value
tag1.tag2~name
The source code of the markup language is given in n
lines of input, the number of attribute references or queries is q
.
I am seeking any improvements to my code and any bad practices I may be using.
#include <vector>
#include <iostream>
using namespace std;
typedef vector < string > VS;
typedef vector < VS > VVS;
//Takes numberOfLines of input and splits the lines into strings
VS Split(int numberOfLines) {
VS myVec;
string x;
for (int i = 0; i < numberOfLines; i++) {
getline(cin, x);
string::size_type start = 0;
string::size_type end = 0;
while ((end = x.find(" ", start)) != string::npos) {
myVec.push_back(x.substr(start, end - start));
start = end + 1;
}
myVec.push_back(x.substr(start));
}
return myVec;
}
//removes unnecessary characters in this case they are: " ,
// > (if it is the final element of the tag), and "
string Remove(string s) {
string c = s;
c.erase(c.begin());
auto x = c.end() - 1;
if ( * x == '>') {
c.erase(x);
x = c.end() - 1;
}
c.erase(x);
return c;
}
int main() {
VS HRML, attributes, values, t, validTags;
VVS Tags, ValidAttributes, ValidValues;
int n, q;
cin >> n >> q;
HRML = Split(n);
//does the heavy lifting
for (int i = 0; i < HRML.size(); i++) {
string x = HRML[i];
//checks if x contains the beginning of the starting tag
if (x[0] == '<' && x[1] != '/') {
//checks if x contains the end of the starting tag
if (x[x.size() - 1] == '>' && x[1] != '/') {
ValidAttributes.push_back(attributes);
attributes.clear();
ValidValues.push_back(values);
values.clear();
}
auto c = x.end() - 1;
if ( * c == '>') {
x.erase(c);
}
x.erase(x.begin());
t.push_back(x);
Tags.push_back(t);
}
// checks if x contains the end of the starting tag
if (x[x.size() - 1] == '>' && x[1] != '/') {
ValidAttributes.push_back(attributes);
attributes.clear();
ValidValues.push_back(values);
values.clear();
}
//checks if x contains the ending tag
else if (x[1] == '/') {
x.erase(x.begin());
x.erase(x.begin());
x.erase(x.end() - 1);
for (int i = 0; i < t.size(); i++) {
if (x == t[i]) {
t.erase(t.begin() + i);
}
}
}
//checks to see if an attribute has been assigned a value
else if (x == "=") {
attributes.push_back(HRML[i - 1]);
values.push_back(Remove(HRML[i + 1]));
}
}
string x = "";
//makes valid(user-usable) tags from all the tags
// passed into vector<string> Tags
for (int i = 0; i < Tags.size(); i++) {
for (int j = 0; j < Tags[i].size(); j++) {
if (Tags[i].size() > 1) {
string begin = Tags[i][j] + '.';
x += begin;
if (j == (Tags[i].size() - 1)) {
x.erase(x.end() - 1);
validTags.push_back(x + '~');
x = "";
}
} else {
validTags.push_back(Tags[i][0] + '~');
}
}
}
//iterates through each query given by the user and checks if it is valid
for (int i = 0; i < q + 1; i++) {
string output = "";
if (i == 0) {
string x;
getline(cin, x);
} else {
string x;
getline(cin, x);
int c = 0;
for (int j = 0; j < validTags.size(); j++) {
for (int p = 0; p < ValidAttributes[c].size(); p++) {
if (x == validTags[j] + ValidAttributes[c][p]) {
output = ValidValues[c][p];
/*if a valid attribute reference has been
found then there is no need to check the rest
of validAttribute[c] */
goto endOfLoop;
} else {
output = "Not Found!";
}
}
if (c < (ValidAttributes.size() - 1)) {
c++;
}
}
endOfLoop:
cout << output << endl;
}
}
return 0;
}
1 Answer 1
Here are some things that may help you improve your code.
Don't abuse using namespace std
Putting using namespace std
at the top of every program is a bad habit that you'd do well to avoid.
Be careful with signed and unsigned
In multiple places in this code an int
i
with an unsigned size_t
returned from size()
. It would be better to declare i
to also be size_t
.
Use all of the required #include
s
The type std::string
is used but its declaration is in #include <string>
which is not actually in the list of includes.
Prefer using
over typedef
When you make more use of templates, you will likely encounter the reason many people prefer using
over typedef
in modern C++. So your declarations become:
using VS = std::vector<std::string>;
using VVS = std::vector<VS>;
See T.43 for details.
Use "range for
" and simplify your code
The code currently uses this:
for (int i = 0; i < HRML.size(); i++) {
string x = HRML[i];
There is a much simpler and more efficient way to do this:
for (auto x : HRML) {
This won't work without further modification to the details of the loop, but for help on that, see the next suggestion.
Use a state machine
The logic of this code could be expressed as a state machine. If that were done, one could process the stream "on the fly" character at a time with little difficulty.
goto
still considered harmful
Generally speaking, the use of goto
is not recommended. Using it as you have, for breaking out nested loops is about the only accepted use. See ES.76 for details. In this case, however, you can avoid it entirely as shown in the next suggestion.
Use appropriate data structures
The use of vectors of vectors of strings is not a very efficient structure for this program. I would suggest that using an unordered_map
would be a better choice and would allow you change the convoluted triple loop at the end of the program to this:
for(std::string query; q > 0 && std::getline(std::cin, query); --q) {
auto search = tagValue.find(query);
if (search == tagValue.end()) {
std::cout << "Not Found!\n";
} else {
std::cout << search->second << '\n';
}
}
Parse input carefully
On my machine, your posted code segfaulted and crashed when run with the sample input provided at the code challenge site. The reason is that this code is not reading input correctly. Specifically, after this line:
std::cin >> n >> q;
there is still a newline character in the input stream. Get rid of it like this:
constexpr std::size_t maxlinelen{200};
std::cin.ignore(maxlinelen, '\n');
The value of maxlinelen
is from the problem description, but generically one could use this:
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
Also, this line is not safe because it assumes that there are at least two characters in the string, which isn't guaranteed.
if (x[0] == '<' && x[1] != '/') {
goto
in your code. Most coding standards consider this bad just by itself. I would consider it bad only if you can't justify it. But there is no comment explaining why you need a goto. \$\endgroup\$