Skip to main content
Code Review

Return to Answer

Commonmark migration
Source Link

#Not very fast#

Not very fast

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

#My own benchmarks#

My own benchmarks

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 void *ret = dst;
 asm volatile("rep movsb" : "+D" (dst) : "c"(n), "S"(src) : "cc", "memory");
 return ret;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec

#Not very fast#

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

#My own benchmarks#

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 void *ret = dst;
 asm volatile("rep movsb" : "+D" (dst) : "c"(n), "S"(src) : "cc", "memory");
 return ret;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec

Not very fast

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

My own benchmarks

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 void *ret = dst;
 asm volatile("rep movsb" : "+D" (dst) : "c"(n), "S"(src) : "cc", "memory");
 return ret;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec
Fixed assembly macro to be more correct.
Source Link
JS1
  • 28.8k
  • 3
  • 41
  • 83

#Not very fast#

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

#My own benchmarks#

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 void *ret = dst;
 asm volatile("rep movsb" : :"+D" "c"(ndst), "D": "c"(dstn), "S"(src) : "cc", "memory");
 return dst;ret;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec

#Not very fast#

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

#My own benchmarks#

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 asm("rep movsb" : : "c"(n), "D"(dst), "S"(src));
 return dst;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec

#Not very fast#

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

#My own benchmarks#

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 void *ret = dst;
 asm volatile("rep movsb" : "+D" (dst) : "c"(n), "S"(src) : "cc", "memory");
 return ret;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec
Source Link
JS1
  • 28.8k
  • 3
  • 41
  • 83

#Not very fast#

Your memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.

If you research the various memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.

#My own benchmarks#

I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:

void *memcpy2(void *dst, const void *src,size_t n)
{
 size_t i;
 for (i=0;i<n;i++)
 *(char *) dst++ = *(char *) src++;
 return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
 asm("rep movsb" : : "c"(n), "D"(dst), "S"(src));
 return dst;
}

My results for copying 2GB (32-bit host):

OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 sec
lang-c

AltStyle によって変換されたページ (->オリジナル) /