Parsing string and extracting meaningful information

Question 1

A message can contain an user name or group name, the user name has to be 2 character long and should contain uppercase ASCII characters only, similarly group name can be 3-10 characters long. There can be any number of these entities with the optional comment at the end.

The message would have the following format:

amount|<user|group> <,user|group>* opt_comment

Example:

Suppose there is a group COOKIEMONSTERS with users CD, RM:

Note: There can be multiple users in a group.

Valid messages

AB+2, BC, COOKIEMONSTERS "Dinner"
AB, BC
AB, BC, FOODIE
AB, BC, FOODIE+4

Invalid messages

A, BC, COOKIEMONSTERS "Dinner"
A, BC, "Dinner" COOKIEMONSTERS
A, BC, SOMEVERYVERYLONGNAME
1+A, BC

I have to parse the message and if there is group name present then I have to expand it into its constituent user names (assume that I have some service for it) hence, given the group constraint I chose not to follow the regex path.

So, given the above message I need the following information:

['AB+2', 'BC', 'CD', 'RM'] and "Dinner" if comment present

where CD and RM are members of the group COOKIEMONSTERS.

I have written some very ugly code and want to know if some cleaner approach is possible.

Code:

const USER_LENGTH = 2; 
function _parseExpenseMessage(message, dependencies) {
 const parseInfo = parse(message);
 const mul = [];
 const add = [];
 const handles = [];
 let idx = 0;
 for (let handle of parseInfo.handles) {
 handles.push(handle.substring(0, USERNAME_LENGTH));
 idx = handle.indexOf('*');
 if (idx !== -1) {
 mul.push(roundToTwo(handle[idx + 1]));
 } else {
 mul.push(1);
 }
 idx = handle.indexOf('+');
 if (idx !== -1) {
 add.push(roundToTwo(handle[idx + 1]));
 } else {
 add.push(0);
 }
 }
 let amount = +parseInfo.amount;
 let each = (amount - add.reduce((x, sum) => x + sum, 0)) / mul.reduce((x, sum) => x + sum, 0);
 const res = [];
 for (let i = 0; i < handles.length; i++) {
 res.push({
 target: handles[i],
 amount: each * mul[i] + add[i]
 });
 }
 return {expenses: res, comment: parseInfo.comment};
};
function parse(text) {
 const duplicates = new Set();
 const handles = [];
 const amount = text.split('|')[0]; 
 const message = text.split('|')[1];
 let name = '';
 let expr = '';
 let hasComment = false;
 let i = 0;
 let len = message.length;
 for (; i < len; i++) {
 var char = message.charAt(i);
 if (isValidCharacter(char)) {
 name += char;
 } else if (isExpression(char)) {
 expr += char;
 }
 if (isSeparator(char) || i === message.length - 1) {
 if (char === '"') {
 if (message.lastIndexOf('"') === message.length - 1) {
 hasComment = true;
 break;
 } else {
 throw new Error('Invalid comment format');
 }
 }
 if (name.length === USERNAME_LENGTH) {
 handles.push(name + expr);
 duplicates.add(name);
 } else if (name.length > USERNAME_LENGTH) {
 let group = someService(name);
 for (let user of group.users()) {
 let name = user.getName();
 duplicates.add(user);
 handles.push(user + expr);
 }
 }
 name = '';
 expr = '';
 }
 }
 let comment = '';
 if (hasComment) {
 comment = message.substr(i, len).replace(/"/g, '');
 }
 // Duplicate entry not allowed
 if (duplicates.size < handles.length) {
 throw new Error('Duplicate entry');
 }
 return {amount: amount, handles: handles, comment: comment};
}
function isValidCharacter(char) {
 return (char >= 'A' && char <= 'Z');
}
function isExpression (char) {
 return (char === '*' || char === '+' || (char >= '0' && char <= '9'));
}
function isSeparator (char) {
 return (char === ' ' || char === ',' || char === '"');
}
function roundToTwo(num) {
 return Math.round(+(num) * 100) / 100;
}

Note:

I also need to check that there are not duplicate user in the message above.

Update 1:

The user name can contain expression with the limited semantics, eg: AB+1 or AB*2 or AB+1*2 that's it. There can be only two operators + and *.

Question 2

can you clarify with more examples? eg, it's not clear to me where CD and RM are coming from in your example? Also what is the +2 for?

Question 3

@Jonah its part of some large module, having said that the question is in itself self sufficient. I gave the input and expected output. I can add some more examples though.

Question 4

@200_success do you still think there is not enough information?

Question 5

I see that you have made an effort to explain. I have to admit that I'm still confused, though. Unilateral closure by a moderator is no longer justified, so I have reopened the question to let the community decide.

Question 6

Spec still isn't clear. What does the +n suffix mean? Are there always exactly two user names? If not, then AB, CD is ambiguous - CD could be a group. So I assume it's always two usernames. But then what's the purpose of the 2nd valid example, if the idea is to associate users with a group and no group is specified?

Question 7

Much like the other commenters on your question, I don't quite understand your specs. So this will be kind of a "blind" review of the code, focusing on technique, and largely disregarding the overall approach and algorithm.

`let` and `var`

You use let but not enough. For example, in this code var char would be better as let char:

 for (; i < len; i++) {
 var char = message.charAt(i);
 if (isValidCharacter(char)) {
 name += char;
 } else if (isExpression(char)) {
 expr += char;
 }

Reserved words

char is a reserved word in JavaScript. Notice that its syntax highlighting in the code in your question is a bit strange. That was a clue.

Repeated splits

Here the same split is performed twice:

 const amount = text.split('|')[0]; 
 const message = text.split('|')[1];

Handling duplicate users

The parse function parses user and expr strings, collecting user + expr in a list of handles, and checks for duplicate users by putting them in a duplicates set.

First of all, the duplicates is unfortunate for a set, which by definition will only contain distinct values.

But my main objection here is that you check for duplicates near the end of the function, after the parsing has completed. It would be better to check at the point when you insert a user into a set, if it already exists in the set then raise the error immediately, there's no need to continue with parsing.

Setting the `comment`

In the parse function, instead of declaring and setting comment at the end, it would be better to set it at the point where you set hasComment = true. The benefit will be that you can remove the hasComment flag variable, and declare the i loop variable within the scope of the loop.

What is `_parseExpenseMessage` for?

Since the function name starts with an _ character, I assumed it might be an indication of some sort of "private" function, used by other functions, so I started reviewing the parse function first, in an attempt to explore the code through its public interface. But then, it seems the _parseExpenseMessage function is not referenced anywhere. And the code in it doesn't resemble anything you've described in the description, so I'm wondering if it's used at all.

Term ordering

I liked very much a tip from the book Code Complete to write the terms of a range condition in increasing order of the values, when applicable. So instead of this:

 return (char >= 'A' && char <= 'Z');

I find it easier to read when written like this:

 return ('A' <= char && char <= 'Z');

This appeals to me also because it's one step away from Python's cool ... <= ... <= ... operator, and I have secret hopes that other languages may implement this feature too. When the terms are in sorted order, the switch to a more modern (future) writing style will be just one step away. (Yeah right, I'll dream on.)

I would apply the same logic to the other similar functions too.

Question 8

Actually the problem is from a interview coding problem hence I couldn't reveal much details. Having said that I agree that I failed to explain the problem properly because parsing the message is part of the big problem. So, let me give you quick hints, the application is a command line tool to manage expenses, you can register expense with the command AB: amount|BC,CD,PIZZABUDDYS opt_comment here AB is the spender and BC and CD are users names and PIZZABUDDY can be a alias for multiple users and application requires that there cannot be duplicate user names on the right side.

Question 9

I have designed the application is a decoupled manner and my core just accepts the user names so I have to parse the message upfront before calling the core logic. Now, one important variation is that the message can be something like AB: 40.00|BC+1,CD*2,PIZZABUDDYS+1 opt_comment which means that money would be distributed equally but BC would pay 1 dollar extra CD would pay double and users in PIZZABUDDY would each pay 1 dollar extra. So, my main pain point was to somehow parse the message and extract these meaningful information in the most efficient way, makes sense now?

Question 10

@CodeYogi: It would (have) helped to have that motivation/description of "expressions" in the question. Note that Vogel612 is right in rolling back even a single underscore: edits that improve the question are welcome. They never may invalidate answers - in contrast, comments may have been given hoping them to become dispensable. (Feeling with anyone confronted with a haphazard requirements specification (interview or not) - I consider smooth handling a hard professional skill.)

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2016-12-26 18:48:54Z

Much like the other commenters on your question, I don't quite understand your specs. So this will be kind of a "blind" review of the code, focusing on technique, and largely disregarding the overall approach and algorithm.

`let` and `var`

You use let but not enough. For example, in this code var char would be better as let char:

 for (; i < len; i++) {
 var char = message.charAt(i);
 if (isValidCharacter(char)) {
 name += char;
 } else if (isExpression(char)) {
 expr += char;
 }

Reserved words

char is a reserved word in JavaScript. Notice that its syntax highlighting in the code in your question is a bit strange. That was a clue.

Repeated splits

Here the same split is performed twice:

 const amount = text.split('|')[0]; 
 const message = text.split('|')[1];

Handling duplicate users

The parse function parses user and expr strings, collecting user + expr in a list of handles, and checks for duplicate users by putting them in a duplicates set.

First of all, the duplicates is unfortunate for a set, which by definition will only contain distinct values.

But my main objection here is that you check for duplicates near the end of the function, after the parsing has completed. It would be better to check at the point when you insert a user into a set, if it already exists in the set then raise the error immediately, there's no need to continue with parsing.

Setting the `comment`

In the parse function, instead of declaring and setting comment at the end, it would be better to set it at the point where you set hasComment = true. The benefit will be that you can remove the hasComment flag variable, and declare the i loop variable within the scope of the loop.

What is `_parseExpenseMessage` for?

Since the function name starts with an _ character, I assumed it might be an indication of some sort of "private" function, used by other functions, so I started reviewing the parse function first, in an attempt to explore the code through its public interface. But then, it seems the _parseExpenseMessage function is not referenced anywhere. And the code in it doesn't resemble anything you've described in the description, so I'm wondering if it's used at all.

Term ordering

I liked very much a tip from the book Code Complete to write the terms of a range condition in increasing order of the values, when applicable. So instead of this:

 return (char >= 'A' && char <= 'Z');

I find it easier to read when written like this:

 return ('A' <= char && char <= 'Z');

This appeals to me also because it's one step away from Python's cool ... <= ... <= ... operator, and I have secret hopes that other languages may implement this feature too. When the terms are in sorted order, the switch to a more modern (future) writing style will be just one step away. (Yeah right, I'll dream on.)

I would apply the same logic to the other similar functions too.

Actually the problem is from a interview coding problem hence I couldn't reveal much details. Having said that I agree that I failed to explain the problem properly because parsing the message is part of the big problem. So, let me give you quick hints, the application is a command line tool to manage expenses, you can register expense with the command AB: amount|BC,CD,PIZZABUDDYS opt_comment here AB is the spender and BC and CD are users names and PIZZABUDDY can be a alias for multiple users and application requires that there cannot be duplicate user names on the right side.
I have designed the application is a decoupled manner and my core just accepts the user names so I have to parse the message upfront before calling the core logic. Now, one important variation is that the message can be something like AB: 40.00|BC+1,CD*2,PIZZABUDDYS+1 opt_comment which means that money would be distributed equally but BC would pay 1 dollar extra CD would pay double and users in PIZZABUDDY would each pay 1 dollar extra. So, my main pain point was to somehow parse the message and extract these meaningful information in the most efficient way, makes sense now?
@CodeYogi: It would (have) helped to have that motivation/description of "expressions" in the question. Note that Vogel612 is right in rolling back even a single underscore: edits that improve the question are welcome. They never may invalidate answers - in contrast, comments may have been given hoping them to become dispensable. (Feeling with anyone confronted with a haphazard requirements specification (interview or not) - I consider smooth handling a hard professional skill.)

Stack Exchange Network

Parsing string and extracting meaningful information

1 Answer 1

`let` and `var`

Reserved words

Repeated splits

Handling duplicate users

Setting the `comment`

What is `_parseExpenseMessage` for?

Term ordering

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parsing string and extracting meaningful information

1 Answer 1

let and var

Reserved words

Repeated splits

Handling duplicate users

Setting the comment

What is _parseExpenseMessage for?

Term ordering

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`let` and `var`

Setting the `comment`

What is `_parseExpenseMessage` for?