Showing posts with label icmp. Show all posts
Showing posts with label icmp. Show all posts

Thursday, June 14, 2012

Network Management at EMC World 2012

[EMC World 2012 Man - courtesy: computerworld]

Network Management at EMC World 2012

Abstract:
EMC purchase network management vendor SMARTS with their InCharge suite, a number of years ago, rebranding the suite as Ionix. EMC purchased Voyence, rebranding it as NCM (Network Configuration Manager). After EMC World 2012, they completed the acquisition of Watch4Net APG (Advanced Performance Grapher.) The suite of these platforms is now being rolled into a single new brand called EMC IT Operations Intelligence. EMC World 2012 was poised to advertize the new branding in a significant way.
Result:
EMC World 2012 in Las Vegas, Nevada was unfortunately pretty uneventful for service providers. Why was it uneventful?

The labs for EMC IT Operations Intelligence did not function. There were a lot of other labs, which functioned, but not the Network Management labs. EMC World 2012 was a sure "shot-in-the-head" for demonstrating, to service providers, the benefits of running EMC Network Management tools in a VM.

After 7 days, EMC could not get their IT Operations Intelligence Network Management Suite running in a VMWare VM.

Background:
Small customers may host their network management tools in a VMWare VM. Enterprises will occasionally implement their network management systems on smaller systems, where they know they will get deterministic behavior from the underlying platform.

Service Providers traditionally run their mission critical network management systems on larger UNIX Systems, so as to provide instant scalability (swap in CPU boards) and 99.999 availability (reboot once-a-year, whether they need to or not.)

The platform of choice in the Service Provider market for scalable Network Management platforms has been SPARC Solaris, for decades... clearly, for a reason. This was demonstrated well at EMC World 2012.

The Problem:
Why not host a network management platform in a VMWare infrastructure? Besides, the fact that EMC could not make it happen, after 1 year of preparation, and 7 days of struggling... there are basic logistics.

Network Management is dependent upon ICMP and SNMP. Both of these protocols are "connectionless protocols" - sometimes referred to as "unreliable protocols". Why would a network management platform use "unreliable protocols"?

The IETF understands that network management should always be light (each poll is a single packet, while a TCP protocol requires a 3-way handshake to start the transaction, poll the single packet, then break down with another 3-way handshake. Imagine doing this for thousands of devices every x seconds - not very light-weight, not very smart. A "connection based protocol" will also hide the nature of an unreliable underlying network, which is what a network management platform is supposed to expose - so it can be fixed.

Now stick a network management platform in a VM, where the network connection from the VM (holding an operating system, with a TCP/IP stack), going down through the hypervisor (which is another operating system, with another TCP/IP stack, which is also sharing the resources of that VM with other VM's.) If there is the slightest glitch in the VM or the hypervisor, which may cause the the packets to be queued or dropped - the actual VMWare infrastructure will signal to the Network Management Centers that there is a network problem, in their customer's network!

Politics:
Clearly, someone at EMC does not understand Network Management, nor do they understand Managed Service Providers.

The Network Management Platform MUST BE ROCK SOLID, so the Network Operations Center personnel will NEVER mistake a alerts in their console from a customer's managed device as a local performance issue in their VM.

With EMC using Solaris to reach into the Telco Data Centers, EMC later using Cisco to reach into the Telco Data Centers - EMC is done using their partners. VMWare was the platform of choice, to [not] demonstrate their Network Management tools on. Cisco was the [soon to be replaced] platform of choice, since EMC announced they will start building their own servers.

Either someone at EMC is sleeping-at-the-wheel or they need to get a spine to support their customers. Either way, this does not bode well for EMC as a provider of software solutions for service providers.


Business Requirements:
In order for a real service provider to reliably run a real network management system in a virtualized environment:
  • The virtualized platform must not insert any overhead.
  • All resources provided must be deterministic.
  • Patches are installed while the system is live.
  • Engagement of patches must be deterministic.
  • Patch engagement must be fast.
  • Rollback of patches must be deterministic.
  • Patch rollback must be fast.
  • Availability must be 99.999.




Solutions:
There are many platforms which fulfill these basic business requirements, but none of them are VMWare. Ironically, only SPARC Solaris platform is currently supported by EMC for IT Operations Intelligence, EMC does not support SPARC Solaris under VMWare, and EMC chose not to demonstrate their Network Management suite under a platform which meets service provider requirements.

Today, Zones is about the only virtualized technology which offers 0%-overhead virtualizataion. (Actually, on SMP systems, virtualizing via Zones can increase application throughput, if Zones are partitioned by CPU board.) Zones, to work in this environment, seem to work best with external storage providers, like EMC.

Any platform which offers 0% virtualization penalty with ZFS support can easily meet service providers technical platform business requirements. Of these, the top 3 are probably the best supported by commercial interests
  • Oracle SPARC Solaris
  • Oracle Intel Solaris
  • Joyent SMART OS
  • OpenIndiana
  • Illumian
  • BeleniX
  • SchilliX
  • StormOS
Conclusion:
Today's market is becoming more proprietary each passing day. The movement towards supporting applications only under proprietary solutions (such as VMWare) has demonstrated it's risk during EMC World 2012. A network management provider would not be well advised to use any network management tool which is bound to a single proprietary platform element and does not support POSIX platforms.

Monday, May 10, 2010

Exercizing an ICMP Poller


Exercizing an ICMP Poller

Abstract:
Polling devices for availability on a global basis required an understanding of the underlying topology and impacts due to concerns of latency and reliability. An excellent tool to perform polling on a wide scale is "fping".

Methodology:
The "fping" poller can take a sequence of nodes and probe those nodes. A simple way to start is to leverage the "hosts" table. The basic UNIX command called "time" can be leveraged to understand performance. The "nawk" command can be leverage in order to parse nodes and output into usable information.

Implementation:
Parsing the "hosts" table is simple enough to do via "nawk", leveraging a portion of the device name, IP address, or even comments! The "fping" poller is able to use Name, IP, or even the regular "/etc/hosts" table format for input.

For example, looking for all 192 entries in the host table can be done as follows:

sunt2000/root# nawk '/192./ { print 1ドル }' /etc/hosts # parse node ip addresses
sunt2000/root# nawk '/192./ { print 2ドル }' /etc/hosts # parse node names
sunt2000/root# nawk '/192./' /etc/hosts # parse node names

When performing large numbers of pings across the globe, name lookups can unexpectedly add a great deal of time to the run of the command:

sunt2000/root# time nawk '/192./ { print 1ドル }' /etc/hosts \| fping \|
nawk '/alive/ { Count+=1 } END { print "Count:", Count }'


Count: 2885
real 3m12.70s
user 0m0.45s
sys 0m0.68s

sunt2000/root# time nawk '/192./ { print 2ドル }' /etc/hosts \| fping \|
nawk '/alive/ { Count+=1 } END { print "Count:", Count }'


Count: 2884
real 8m47.74s
user 0m0.49s
sys 0m0.70s

The name resolution lookup for the system to find an ip address is mitigated the second run, due to the name caching daemon, under operating systems like Solaris.

sunt2000/root# time nawk '/192./ { print 2ドル }' /etc/hosts \| fping \|
nawk '/alive/ { Count+=1 } END { print "Count:", Count }'


Count: 2883
real 3m10.87s
user 0m0.44s
sys 0m0.67s

Avoiding Name Resolution Cache Miss:
Sometimes, an application can not afford to accept an occasional Name Resolution cache miss. One way of managing this is through the parsing script on the back end of the "fping" command in an ehanced "nawk" one-liner.

sunt2000/root# time nawk '/HDFC/ { print 1ドル }' /etc/hosts \| fping \| nawk '
BEGIN { File="/etc/hosts" ; while ( getline
Ip=1ドル; Name=2ドル ; IpArray[Ip]=Ip ; NameArray[Ip]=Name }
close(File) }
/alive/ { Count+=1 ; print NameArray[1ドル] "\t" 0ドル }
END { print "Count:", Count }'

Count: 2882
real 3m9.04s
user 0m0.73s
sys 0m0.76s

Tuning Solaris 10:
When manaing LARGE number of devices, ICMP Input overflows may be occurring. This can be checked via the "netstat" command parsed by a nawk one-liner. Note, the high ratio.

sunt2000/root# netstat -s -P icmp \| nawk '{ gsub("=","") }
/icmpInMsgs/ { icmpInMsgs=3ドル }
/icmpInOverflows/ { icmpInOverflows=2ドル }
END { print "Msgs=" icmpInMsgs "\tOverflows=" icmpInOverflows }'

Msgs=381247797 Overflows=138767274

To check what the tunable is:
sunt2000/root# ndd -get /dev/icmp icmp_max_buf
262144

The above value is the default for this version of Solaris. It can (and should) be increased (dramatically), as device counts start to grow aggressively (into the thousands of devices.)
sunt2000/root# ndd -set /dev/icmp icmp_max_buf 2097152

Note, the above setting is not persistent after reboot.

Validating Tuning:
Before and after the fping, the values of icmp messages and overflows can be observed.
Prior FPing: icmpInMsgs=381267922 icmpInOverflows=138775834
Post FPing: icmpInMsgs=381270809 icmpInOverflows=138778159
Difference: icmpInMsgs=2887 icmpInOverflows=2325

Applying the tunables temporarily
sunt2000/root# ndd -set /dev/icmp icmp_max_buf 2097152
sunt2000/root# ndd -get /dev/icmp icmp_max_buf
2097152

Validate the Ratio:
Prior FPing: icmpInMsgs=381279465 icmpInOverflows=138778662
Post FPing: icmpInMsgs=381282224 icmpInOverflows=138778806
Difference: icmpInMsgs=2759 icmpInOverflows=144


Secondary Validation of the Ratio:
Prior FPing: icmpInMsgs=381296575 icmpInOverflows=138784125
Post FPing: icmpInMsgs=381300943 icmpInOverflows=138784125
Difference: icmpInMsgs=4368 icmpInOverflows=0


Making the Tuning Persistent:
A start/stop script can be created to make the tunables persistent.
t2000/root# vi /etc/init.d/ndd_rmm.sh

#!/bin/ksh
# script: ndd_rmm.sh
# author: david halko
# purpose: make a start/stop script to make peristent tunables
#
case ${1} in
start) /usr/sbin/ndd -get /dev/icmp icmp_max_buf nawk '
!/2097152/ {
Cmd="/usr/sbin/ndd -set /dev/icmp icmp_max_buf 2097152"
system(Cmd) }'
;;
status) ls -al /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
/usr/sbin/ndd -get /dev/icmp icmp_max_buf
;;
install) ln -s /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
chmod 755 /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
chown -h ivadmin /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
;;
*) echo "ndd_rmm.sh [startstatusinstall]\n"
esac
:w
:q

t2000/root# ksh /etc/init.d/ndd_rmm.sh install

t2000/root# ksh /etc/init.d/ndd_rmm.sh status
-rwxr-xr-x 1 ivadmin root 647 May 10 21:18 /etc/init.d/ndd_rmm.sh
lrwxrwxrwx 1 ivadmin root 22 May 10 21:20 /etc/rc2.d/S89_ndd_rmm.sh -> /etc/init.d/ndd_rmm.sh
262144

t2000/root# /etc/init.d/ndd_rmm.sh start

t2000/root# /etc/init.d/ndd_rmm.sh status
-rwxr-xr-x 1 root root 647 May 10 21:18 /etc/init.d/ndd_rmm.sh
lrwxrwxrwx 1 root root 22 May 10 21:20 /etc/rc2.d/S89_ndd_rmm.sh -> /etc/init.d/ndd_rmm.sh
2097152

Monitoring Overflows :
You can monitor overflows in near-real-time through a simple script such as:
#
# script: icmpOverflowMon.sh
# author: David Halko
# purpose: simple repetitive script to monitor icmp overflows
#
for i in 0 1 2 3 4 5 6 7 8 9 ; do
for j in 0 1 2 3 4 5 6 7 8 9 ; do
for k in 0 1 2 3 4 5 6 7 8 9 ; do
echo "`date` - $i$j$k - \c"
netstat -s -P icmp \| nawk '{ gsub("=","") }
/icmpInMsgs/ { icmpInMsgs=3ドル ; print 0ドル }
/icmpInOverflows/ { icmpInOverflows=2ドル ; print 0ドル }
END { print "InMsgs=" icmpInMsgs "\tOverflows=" icmpInOverflows }'
sleep 10
done
done
done
Conclusion:
Exercizing ICMP pollers in a Network Management environment is easy to do. It may be important to tune the OS if polling a large number of devices is required. Tuning the OS is a very reasonable process where metrics can simply show the behavior and improvements.


----------------------------------------------
UPDATE --- every time I edit this posting in blogger, it removes my pipe symbols. In general, you may need to add a pipe between some commands like fping, netstat, nawk if you get a syntax error when you copy-paste a line of text.
Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /