I have implemented a C code in order to answer an interview question. I would be appreciated if someone would improve the code. Task is an easy, however I want to achieve best efficiency in terms of readability and simplicity. Pardon my English again.
- Sorry that I have just printed out the result.
The edit distance between two strings refers to the minimum number of character insertions, deletions, and substitutions required to change one string to the other. For example, the edit distance between "kitten" and "sitting" is three: substitute the "k" for "s", substitute the "e" for "i", and append a "g".
Given two strings, compute the edit distance between them.
CODE:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned int count_edit_dist(const char list1[],
const char list2[])
{
unsigned int count=0;
size_t len_str1 = strlen(list1);
size_t len_str2 = strlen(list2);
for(int i=0; list1[i]!='0円'; i++)
{
if(list2[i]=='0円') //means len(list2) < len(list1)
{
count += (unsigned int)(len_str1 - len_str2);
return count;
}
if(list1[i] != list2[i])
count++;
}
//reacing here means len(list1) <= len(list2)
count += (unsigned int)(len_str2 - len_str1);
return count;
}
int main(void)
{
unsigned int count;
char list1[] = "category";
char list2[] = "caterirrrr";
count = count_edit_dist(list1, list2);
printf("edit distance between strings: %u\n", count);
}
-
\$\begingroup\$ the program gives edit distance as 7 b/w 'sunday' and 'saturday' \$\endgroup\$hjpotter92– hjpotter922021年10月25日 07:59:48 +00:00Commented Oct 25, 2021 at 7:59
2 Answers 2
Minimum edit distance
count_edit_dist()
does return a value, yet it is certainly not the minimum possible edit distance as defined. To compute the minimum, a fair amount more code is needed.
Small stuff
unsigned
vs. size_t
vs. int
For practical uses, makes no difference, yet code would avoid casting by using size_t
and making the printf()
format strings agree. size_t
also correctly handles extreme cases.
// unsigned int count=0;
size_t count=0;
size_t
is the best full range type for array indexing:
// for(int i=0; list1[i]!='0円'; i++)
for(size_t i=0; list1[i]!='0円'; i++)
Declare when needed
Instead of defining variables far from their first use ...
unsigned int count; ... count = count_edit_dist(list1, list2);
... consider defining when needed:
...
unsigned int count = count_edit_dist(list1, list2);
Spell check
Run spell checker.
// vv
//reacing here means len(list1) <= len(list2)
//reaching
Consider a more informative output
printf("Edit distance between strings \"%s\" and \"%s\": %u\n", list1, list2, count);
Testing with some form of !
I find alternatives that avoid a form of negation easier to follow. Yet this is a style issue - best to code to your group's style guide.
// for(int i=0; list1[i]!='0円'; i++)
for (int i=0; list1[i]; i++)
Good formatting
Good use of
const
inunsigned int count_edit_dist(const char list1[], const char list2[])
Style: I'd use
unsigned
rather thanunsigned int
.
Running strlen()
at the beginning is unecessary and can be replaced with a simpler while
loop.
// A,B are strings
unsigned int count_edit_dist(const char *A, const char *B)
{
unsigned count=0;
// a,b are pointers to current letters of A,B
const char *a = A;
const char *b = B;
// while we have characters from either
while (*a||*b) {
// add to count i different
if (*a!=*b) ++count;
// increment letter if not at end of string
if (*a) ++a;
if (*b) ++b;
}
return count;
}