I wanted to write a function to generate only unpronounceable words. I realised afterwards that the same thing could probably be achieved by just taking out the vowels. Regardless, I feel that there is probably a better way to iterate through the syllable checking.
function unpronouncable(length) {
var twoLetters = ['aa','ae','ag','ai','ar','ao','as','aw','ay','ba','da','de','ed','ef','eh','ei','em','en','er','et','ex','fa','fe','fu','gi','go','hm','ho','so','id','ie','jo','ka','ki','ku','lo','mi','mm','mu','na','ne','nu','od','oe','oi','om','op','os','os','oy','pe','pi','qi','re','se','ta','ti','ut','wo','xi','xi','ya','yu','za','zo','co','my','qu','ge','ro','ra','xa','wa'];
var threeLetters = ['thm','css','dvd','buh','bad','rue','cun','pub','puk','pul'];
var letters = "abcdefghijklmnopqrstuvwxyz";
var string = "";
for (var i = 0; i < length; i += 1) {
string += letters[Math.floor(Math.random() * letters.length)];
}
function generate() {
for (var j = 0; j < string.length; j += 1) {
var test = string.slice(j,j+2);
for (var k = 0; k < twoLetters.length; k += 1) {
if (twoLetters[k] === test) {
console.log('Has two letter syllable');
return unpronouncable(length);
}
}
}
for (var j = 0; j < string.length; j += 1) {
var test = string.slice(j,j+3);
for (var k = 0; k < twoLetters.length; k += 1) {
if (threeLetters[k] === test) {
console.log('Has three letter syllable');
return unpronouncable(length);
}
}
}
return string;
}
return generate();
}
console.log(unpronouncable(8));
-
\$\begingroup\$ Why not generate words based on the Sonority hierarchy? Reverse the hierarchy and they'll be unpronounceable in most langauges. \$\endgroup\$curiousdannii– curiousdannii2014年08月30日 02:53:55 +00:00Commented Aug 30, 2014 at 2:53
3 Answers 3
Copy Pasting
If you are copy-pasting code, it is a sign that you should write it differently (your two for
loops are quite obviously copy pasted).
Bug
I'm pretty sure that here:
for (var k = 0; k < twoLetters.length; k += 1) {
if (threeLetters[k] === test) {
you meant to change it to k < threeLetters.length;
.
Solution
Instead of copy-pasting the for
loop code, extract it to a function:
function generateHelper(string, array, charCount) {
for (var j = 0; j < string.length; j += 1) {
var test = string.slice(j,j+charCount);
for (var k = 0; k < array.length; k += 1) {
if (array[k] === test) {
console.log('Has ' + charCount + ' letter syllable');
return unpronouncable(length);
}
}
}
}
Coding Style
i++
It is customary to write j++
instead of j += 1
in loops.
loop variable
It is also customary to use i
as primary loop variable and j
as secondary variable in case the loop is nested inside another loop. As your j-loops in generate
are not nested, I would use i
for them, and then j
instead of k
for the nested loop.
Call of return generate();
return
should generally be at the end of a function, but in this case, I would pull it up in front of the generate
function definition. It is easier to overlook at the bottom.
But the whole generate
method seems unnecessary. Couldn't you just remove these lines:
function generate() {
}
return generate();
I don't mean the code inside the function, just the function definition and the function call.
General Approach
You already mentioned that this is probably not the best approach, so I'm not going to say too much about it. Generally, I would probably go with an iterative solution as most of the time they are easier to understand and faster as well.
An example of an iterative approach might look like this:
function unpronouncable(length) {
// filter these out
var twoLetters = ['aa', 'ae', 'ag', 'ai', 'ar', 'ao', 'as', 'aw', 'ay', 'ba', 'da', 'de', 'ed', 'ef', 'eh', 'ei', 'em', 'en', 'er', 'et', 'ex', 'fa', 'fe', 'fu', 'gi', 'go', 'hm', 'ho', 'so', 'id', 'ie', 'jo', 'ka', 'ki', 'ku', 'lo', 'mi', 'mm', 'mu', 'na', 'ne', 'nu', 'od', 'oe', 'oi', 'om', 'op', 'os', 'os', 'oy', 'pe', 'pi', 'qi', 're', 'se', 'ta', 'ti', 'ut', 'wo', 'xi', 'xi', 'ya', 'yu', 'za', 'zo', 'co', 'my', 'qu', 'ge', 'ro', 'ra', 'xa', 'wa'];
var threeLetters = ['thm', 'css', 'dvd', 'buh', 'bad', 'rue', 'cun', 'pub', 'puk', 'pul'];
// generate a new string until one is found that does not contain the filters
var generatedString = "";
var done = false;
while (!done) {
generatedString = generate(length);
done = !contains(generatedString, twoLetters, 2) && !contains(generatedString, threeLetters, 3);
}
return generatedString;
// generates a random string with given length
function generate(length) {
var letters = "abcdefghijklmnopqrstuvwxyz";
var string = "";
for (var i = 0; i < length; i++) {
string += letters[Math.floor(Math.random() * letters.length)];
}
return string;
}
// returns true if string contains any characters in the stringArray.
function contains(string, stringArray, charCount) {
for (var k = 0; k < stringArray.length; k++) {
if (string.indexOf(stringArray[k]) > -1) {
console.log('Has ' + charCount + ' letter syllable');
return true;
}
}
return false;
}
}
console.log(unpronouncable(8));
You could also replace the done = [...]
assignment with two if
statements, which might be easier to read.
Also: your approach with iterating over the string and using splice
is quite complicated. Just use string.indexOf(stringArray[k]) > -1;
to check if a string contains another string.
And just out of curiosity: how would you pronounce css
or dvd
?
-
\$\begingroup\$ I was thinking of those acronyms as things that aid memory, so avoiding them would make the output as random as possible. My idea was to come up with a name that nobody would ever be able to say or remember. Could you comment on how I might use an iterative approach? \$\endgroup\$Tom– Tom2014年08月29日 17:56:31 +00:00Commented Aug 29, 2014 at 17:56
-
\$\begingroup\$ @user759 ok, so it's more
unrememberable
thanunpronouncable
. And I updated my answer for an iterative approach. \$\endgroup\$tim– tim2014年08月29日 18:12:28 +00:00Commented Aug 29, 2014 at 18:12 -
\$\begingroup\$ Loop variables don't have to be i then j. Use whatever letter feel rights. Use r and c when dealing with rows and columns or x and y if they have to do with coordinates. Try to use the same index variables on the same collection. \$\endgroup\$Florian F– Florian F2014年08月29日 21:30:23 +00:00Commented Aug 29, 2014 at 21:30
-
\$\begingroup\$ @FlorianF well, obviously they don't have to be. but if there isn't a good reason (like with your two examples),
i
andj
are standard, and usingj
andk
without any reason is odd and can lead to confusion and harder to read code (to some degree; I'm not saying that it is a major mistake, just that it is against common style). \$\endgroup\$tim– tim2014年08月29日 21:41:26 +00:00Commented Aug 29, 2014 at 21:41
An efficient approach could be to build a data structure which will let you look at each letter in turn and test if following letters make the word pronounceable.
That could get complex, but fortunately you can get JavaScript to do this (or something like it) for you, using a regular expression. I'd write it like this:
var twoLetters = ['aa','ae','ag','ai','ar','ao','as','aw','ay','ba','da','de','ed','ef','eh','ei','em','en','er','et','ex','fa','fe','fu','gi','go','hm','ho','so','id','ie','jo','ka','ki','ku','lo','mi','mm','mu','na','ne','nu','od','oe','oi','om','op','os','os','oy','pe','pi','qi','re','se','ta','ti','ut','wo','xi','xi','ya','yu','za','zo','co','my','qu','ge','ro','ra','xa','wa'];
var threeLetters = ['thm','css','dvd','buh','bad','rue','cun','pub','puk','pul'];
var pronounceable = new RegExp(twoLetters.concat(threeLetters).join('|'));
function random_word(length) {
var letters = "abcdefghijklmnopqrstuvwxyz";
var string = "";
for (var i = 0; i < length; i++) {
string += letters[Math.floor(Math.random() * letters.length)];
}
return string;
}
function generate_unpronounceable(length) {
var word;
while (true) {
word = random_word(length);
if (! pronounceable.test(word)) {
return word;
}
}
}
To start with, unpronouncable
should be spelled unpronounceable
, or better yet, generate_unpronounceable()
.
I'm not convinced that this is a valid strategy. If it is a valid approach, then your lists of pronounceable syllables is incomplete. For example, unpronouncable(4)
is capable of returning "test"
, which I consider to be a pronounceable word.
Your two-letter combinations are listed almost alphabetically, which makes the list hard to maintain. For example, "so"
is out of place, "os"
and "xi"
are listed twice, and eight elements are haphazardly added at the end. The three-letter list doesn't appear to be in any recognizable order.
When you reject a candidate word, you use recursion to retry. In JavaScript, you should avoid using recursion when a simple while
loop will do — especially when there is no bound to the depth of the recursion. The code should be reorganized along these lines:
function generate_unpronounceable(length) {
var LETTERS = "abcdefghijklmnopqrstuvwxyz";
do {
var word = "";
for (var i = 0; i < length; i++) {
word += LETTERS[Math.floor(Math.random() * LETTERS.length)];
}
} while (is_pronounceable(word));
return word;
}
function is_prounounceable(word) {
...
}
Decomposing the problem into generate_unpronounceable()
and is_pronounceable()
is also better because is_pronounceable()
can be unit-tested more easily.
-
\$\begingroup\$ I was originally using recursion, but then if there is a match and I break out of the test loop, I'm still stuck in the outer for loop and the next three letter syllable one. Is there a way to avoid that? \$\endgroup\$Tom– Tom2014年08月29日 20:24:14 +00:00Commented Aug 29, 2014 at 20:24