Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Answer

added 553 characters in body
Source Link
Andriy
  • 1.4k
  • 1
  • 10
  • 5

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j + "-in-db";

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j + "-in-db";

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j + "-in-db";

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

added 229 characters in body
Source Link
Andriy
  • 1.4k
  • 1
  • 10
  • 5

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j + "-in-db";

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j + "-in-db";
added 728 characters in body
Source Link
Andriy
  • 1.4k
  • 1
  • 10
  • 5

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =
 new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
 .map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
 while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
 mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

added 291 characters in body
Source Link
Andriy
  • 1.4k
  • 1
  • 10
  • 5
Loading
added 210 characters in body
Source Link
Andriy
  • 1.4k
  • 1
  • 10
  • 5
Loading
Source Link
Andriy
  • 1.4k
  • 1
  • 10
  • 5
Loading
default

AltStyle によって変換されたページ (->オリジナル) /