skip to main | skip to sidebar
Showing posts with label OSPF. Show all posts
Showing posts with label OSPF. Show all posts

Saturday, November 10, 2012

You have to make the right balance between the convergence time and MTU

Lately i'm getting the impression that Cisco is getting new products out without the proper internal testing.

I'm going to talk about two recent examples, ASR1001 and ASR901, devices that are an excellent value for money, but (as usual) hide limitations that you unfortunately find out only after exhaustive testing.

ASR1001 is a fine router, a worthy replacement of 7200, which can be used for various purposes. Of course, as every new platform by every vendor these days, it fully supports jumbo frames and that's a nice thing. At least you get that impression until you try to use the large MTU for control/routing protocols, where you might fall into an also nice surprise.

MTU 1500 (just a few ms)

17:22:35.415: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from FULL to DOWN, Neighbor Down: Interface down or detached
17:22:35.539: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from DOWN to INIT, Received Hello
17:22:35.539: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from INIT to 2WAY, 2-Way Received
17:22:35.539: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from 2WAY to EXSTART, AdjOK?
17:22:35.643: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from EXSTART to EXCHANGE, Negotiation Done
17:22:35.823: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from EXCHANGE to LOADING, Exchange Done
17:22:35.824: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from LOADING to FULL, Loading Done

MTU 9216 (~48 sec!)
17:43:07.923: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from FULL to DOWN, Neighbor Down: Interface down or detached
17:43:08.001: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from DOWN to INIT, Received Hello
17:43:08.001: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from INIT to 2WAY, 2-Way Received
17:43:08.001: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from 2WAY to EXSTART, AdjOK?
17:43:08.098: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from EXSTART to EXCHANGE, Negotiation Done
17:43:08.241: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from EXCHANGE to LOADING, Exchange Done
17:43:55.942: %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.x on GigabitEthernet0/0/0 from LOADING to FULL, Loading Done

While trying to use MTU 9216 in a test environment (the same issue was observed even with smaller MTU), we met an interesting issue during the exchange of large OSPF databases between ASR1001s (running 15.1S & 15.2S). Packets (with LSAs) were being dropped internally in the router, because the underlying driver of LSMPI (Linux Shared Memory Punt Interface) was not capable of handling such packets in fast rates. These large packets internally got fragmented to smaller ones (512 bytes each), transmitted and then reassembled, something that increased the pps rate (9216/512=18 times) between the involved subsystems.

To be more exact, packets punted from the ESP to the RP are received by the Linux kernel of the RP. The Linux kernel then sends those packets to the IOSD process through LSMPI, as you can see in the following diagram.


So, this is the complete punt path on the ASR1000 Series router:

QFP <==> RP Linux Kernel <==> LSMPI <==> Fast-Path Thread <==> Cisco IOS Thread

Since there is a built-in limit on the pps rate that the LSMPI can handle, by fragmenting internally the packets due to their size, the internal pps rate increases and some sub-packets get discarded, which leads to complete packets being dropped and then retransmitted from the neighboring routers, which in turn leads to longer convergence times...if convergence can be accomplished after such packet losses (there were cases that even after minutes there was no convergence, or adjacency seemed FULL but the RIB didn't have any entries from the LSADB). Things can get messier if you also run BGP (with path mtu discovery) and there is a large number of updates that need to be processed. Some time in the past, IOS included code that retransmitted internally only the lost 512-bytes packets so OSPF couldn't actually understand it was losing packets, but due to it causing other issues (probably overloading the LSMPI even more) it got removed.

So this leads to the question "at what layer should the router handle internally control plane packet loss"? As low as possible in order to hide it from the actual protocol or just leave everything to the protocol itself?

You can use the following command in order to check for issues in the LSMPI path (look out for "Device xmit fail").

ASR1001#show platform software infrastructure lsmpi
LSMPI Driver stat ver: 3
Packets:
 In: 17916
 Out: 4713
Rings:
 RX: 2047 free 0 in-use 2048 total
 TX: 2047 free 0 in-use 2048 total
 RXDONE: 2047 free 0 in-use 2048 total
 TXDONE: 2046 free 1 in-use 2048 total
Buffers:
 RX: 6877 free 1317 in-use 8194 total
Reason for RX drops (sticky):
 Ring full : 0
 Ring put failed : 0
 No free buffer : 0
 Receive failed : 0
 Packet too large : 0
 Other inst buf : 0
 Consecutive SOPs : 0
 No SOP or EOP : 0
 EOP but no SOP : 0
 Particle overrun : 0
 Bad particle ins : 0
 Bad buf cond : 0
 DS rd req failed : 0
 HT rd req failed : 0
Reason for TX drops (sticky):
 Bad packet len : 0
 Bad buf len : 0
 Bad ifindex : 0
 No device : 0
 No skbuff : 0
 Device xmit fail : 103
 Device xmit rtry : 0
 Tx Done ringfull : 0
 Bad u->k xlation : 0
 No extra skbuff : 0
 Consecutive SOPs : 0
 No SOP or EOP : 0
 EOP but no SOP : 0
 Particle overrun : 0
 Other inst buf : 0
...

Keep in mind that ICMP echoes cannot be used to verify this behavior, because ICMP replies are handled by the ESP/QFP, so you won't notice this issue.

Note: Cisco has an excellent doc describing all cases of packets drops on the ASR1k platform here.
Also, there is a relevant bug (CSCtz53398) that is supposed to provide a workaround.

Cisco's answer? "You have to make the right balance between the convergence time and MTU"!

I tend to agree with them, but until now i had the impression that a larger MTU would lower the convergence time (as long as the CPU could follow). Well, time to reconsider....


Posted by Tassos at 14:37 2 comments

Labels: , , , , ,

Monday, February 8, 2010

Should IPC's 127.0.0.0/8 be redistributed by OSPF?

I have a tac case open for over 5 months, regarding the default redistribution of 127/8 when "service internal" is configured on a C10000 router. Keep in mind that i'm redistributing all connected routes to OSPF.

Specifically:


C10k-33SB7>sh ip route 127.0.0.0
Routing entry for 127.0.0.0/8, 2 known subnets
Attached (2 connections)
Variably subnetted with 2 masks
Redistributing via ospf x
C 127.0.0.0/8 is directly connected, Ethernet0/0/0
L 127.0.0.254/32 is directly connected, Ethernet0/0/0

C10k-33SB7>sh ip ospf database self-originate | i 127.0.0.0
127.0.0.0 x.x.x.x 1590 0x80001C02 0x00FA81 0


If i understand correctly, 127/8 is used by the IPC (Inter-Process Communication) channel on the C10k. According to Cisco:

"Ethernet 0/0/0 is assigned to the backplane. It is not a user configurable interface. The reason for this is to provide a loopback interface for testing purposes. You cannot configure it or make it go away. There will be no impact on routing since all addresses in the 127/8 are illegal for routing purposes."

Of course, the reason i have to carry all those non-routable prefixes in my OSPF database is of no concern to Cisco.


another-router>sh ip ospf database | i 127.0.0.0
127.0.0.0 x.x.x.x 601 0x80000932 0x00BEC7 0
127.0.0.0 x.x.x.x 347 0x80001C02 0x00FA81 0
127.0.0.0 x.x.x.x 288 0x80000918 0x00FB7B 0
127.0.0.0 x.x.x.x 526 0x800026EE 0x00E9EE 0
127.0.0.0 x.x.x.x 1972 0x80000917 0x00A8DE 0
127.0.0.0 x.x.x.x 110 0x80002563 0x00D622 0
....


Cisco proposes to simply filter them using access-lists/prefix-lists, but i cannot understand why i have to filter something that shouldn't be there in any case. After all, i'm already filtering customer routes where i would expect such routes.

Some months ago, Cisco had issued a security advisory about the IPC being externally reachable. Both bugs (CSCsg15342, CSCsh29217) referenced in that advisory include exactly the same info, but i don't know what was actually fixed.

IPC processing needs to be more robust
Cisco 10000, uBR10012 and uBR7200 series devices use a User Datagram Protocol (UDP) based Inter-Process Communication (IPC) channel that is externally reachable. An attacker could exploit this vulnerability to cause a denial of service (DoS) condition on affected devices. No other platforms are affected.


Probably they fixed the reachable part and messed up the announce part, because in previous IOS there wasn't such an issue :


C10k-31SB14>sh ip route 127.0.0.0
Routing entry for 127.0.0.0/8
Known via "connected", distance 0, metric 0 (connected, via interface)
Redistributing via ospf x
Routing Descriptor Blocks:
* directly connected, via Ethernet0/0/0
Route metric is 0, traffic share count is 1

C10k-31SB14>sh ip ospf database self-originate | i 127.0.0.0
C10k-31SB14>


I have heard tens of excuses about "service internal", but i haven't heard anything that mandates this silly 127/8 redistribution. If anyone knows the actual reason behind this, i would be very glad to hear it.

Posted by Tassos at 10:50 2 comments

Labels: , ,

Friday, January 11, 2008

How to filter OSPF routes that have the same source ip

OSPF running on a full-mesh P2MP topology between R3,R4,R5.

Trying to find the route of a network which is equally announced from both R3 & R4 to R5, produces the following:


R5#sh ip route 100.1.34.0
Routing entry for 100.1.34.0/24
Known via "ospf 1", distance 110, metric 74, type intra area
Last update from 100.1.0.4 on Serial1/0, 00:00:46 ago
Routing Descriptor Blocks:
100.1.0.4, from 200.1.4.4, 00:00:46 ago, via Serial1/0
Route metric is 74, traffic share count is 1
* 100.1.0.3, from 200.1.4.4, 00:00:46 ago, via Serial1/0
Route metric is 74, traffic share count is 1


As you can see, 200.1.4.4 is the router-id of the router which announces both routes, but each one with its own next-hop. Probably you're expecting "100.1.0.3, from 200.1.3.3" on the 2nd route, but in OSPF P2MP topologies, the hub router announces the OSPF routes to the spokes using its own ip.

If you want to prevent this route, when originated specifically from R3, from entering the routing table, you must use a route-map and match on the next-hop address of R3. You cannot match on the source ip, because both routes have the same (due to OSPF P2MP network type).



R5(config-route-map)#match ip ?
address Match address of route or match packet
next-hop Match next-hop address of route
route-source Match advertising source address of route



router ospf 1
distribute-list route-map ROUTE_FROM_R3 in
!
access-list 3 permit 100.1.0.3
!
access-list 34 permit 100.1.34.0
!
route-map ROUTE_FROM_R3 deny 10
match ip address 34
match ip next-hop 3
!
route-map ROUTE_FROM_R3 permit 20

Posted by Tassos at 17:44 0 comments

Labels: ,

How to create a tunnel and connect 2 different ip subnets

This is a handy trick if you want to connect 2 different subnets (belonging to 2 routers that are some hops away) and you can't use your own ip addresses.

Router1 - RouterX - RouterY - Router2

Router 1


interface Loopback0
ip address 100.1.6.6 255.255.255.0
!
int Serial0/0
ip address 100.2.6.6 255.255.255.0
!
interface Tunnel0
ip unnumbered Loopback0
tunnel source Serial0/0
tunnel destination 100.2.7.7
!
router ospf 111
passive-interface Loopback0
network 100.1.6.6 0.0.0.0 area 111


Router 2

interface Loopback0
ip address 100.1.7.7 255.255.255.0
!
int Serial0/0
ip address 100.2.7.7 255.255.255.0
!
interface Tunnel0
ip unnumbered Loopback0
tunnel source Serial0/0
tunnel destination 100.2.6.6
!
router ospf 111
passive-interface Loopback0
network 100.1.7.7 0.0.0.0 area 111


OSPF can work fine between different subnets, as long as ip unnumbered is used.

Just keep in mind that if you're already using OSPF in these two routers, then you have to use a different OSPF process for this tunnel link. Also make sure you define the tunnel interfaces as passive under the other (main) process.

You can do the same (probably easier) if you use RIP and configure "no validate-update-source" under its process.

EIGRP doesn't seem to provide a similar feature.

Posted by Tassos at 01:33 0 comments

Labels: , ,

Friday, December 28, 2007

3rd Mock Lab - 97%

This was the first mock lab, i finished in about 7,5 hours! Of course it took me another 1,5 hour to review all of it and correct 2 "mistakes". But generally, i'm quite satisfied by the result.

There were 3 topics i had to look at the DocCD; one of them was complete unknown to me (MVR - Multicast Vlan Registration). I just knew there was a feature which provided the functionality i was asked for, but i didn't know, neither its name, nor its configuration. As it proved out, it wasn't so difficult.

Finally, to be honest, i believe i should have been graded with a 91%, because there were another 2 tasks that i didn't complete 100% according to the solution provided. I know that sometimes there are many correct solutions to each task, but i think (after looking the vendor's solution) that my own solutions weren't so correct.

On the other hand, there were 2 tasks that my solutions were correct and the ones from the vendor were probably wrong. That's because one task was influenced by changes near the end of the exam (new routes introduced) and the other one wasn't meant to be tested (full reachability while a backup link was active). If i hadn't checked all the questions one more time, i wouldn't have found them. Especially the second one, it took me some time until i come up with a solution; i was trying to find out why an OSPF router was load-balancing between a correct path and a "wrong" path, when the "wrong" path was through a totally NSSA area which was having a default route back to this router. So i was getting a loop there. I finally changed the cost in order to avoid the NSSA area and everything worked fine.


Today i'm having my second CCIE Assessor Lab... hoping for something>80% this time.

Posted by Tassos at 09:47 1 comments

Labels: , , ,

Thursday, November 29, 2007

1st Mock Lab - 74 %

On past Friday i had my first Mock Lab attempt, which proved easier that expected.

I was expecting IGP redistribution in multiple points, IPv6 BGP/OSPF/RIP routing, DVMRP, etc, but most of the sections where quite "easy". That of course doesn't mean that i passed. I missed 10 whole points in the Bridging & Switching section, which is one of my favorites.

The reason? I was in a hurry to finish the first sections which i have more experience with, in order to have enough time to do the rest. But, many questions were tricky and passing through them in a hurry, proved to be the wrong method. I missed 3 points on a simple task (because i didn't configure a vlan) and another 4 points (in two other questions) because of the first error.

There were only 2 sections that i didn't know at all and i had to improvise (one proved correct with a minor error in an acl entry, the other totally wrong). Also there were another 4 sections that i had to look into the Doc CD. At least i have now learned to use the Doc CD more efficiently. It's surely invaluable in such situations.

Also, when i finished, i had only 30 mins left (according to the actual 8-hour lab, because in the mock lab you're allowed to exceed this time by 3 hours). So instead of looking back all the sections in order to verify them, i decided to reload all routers! What was even more disastrous, was the fact that i reloaded them all at the same time and i didn't look at their logs while booting.

The result? After the reload, something wasn't working as expected. After a quick search i found one router which seemed not to be running OSPF. I checked its configuration (thank god i had saved all my session locally) and i found that there were two "neighbor" commands missing! I added them and reloaded again. This time i watched the logs and there was an error message saying that this particular command is not supported on this kind of topologies (a bug? command is accepted while configuring, but it's rejected after reloading). So i saved my configuration and warned (through email) the proctor about this behavior.

The answer from the support department the next day was to post my "problem" on the forum and i'll get an answer there. Of course, after a week there's still no answer there. And after a quick search on their forum, i found out that many questions remain unanswered. Too bad, because i have heard a lot of good words about InternetworkExpert's training material. I wanted to test their Mock Labs and maybe buy their Workbooks, but this kind of support made me rethink it. If i have one question in the Mock lab (and gets unanswered), i 'll probably have tens in the Workbook labs. Anyway, i still have one Mock Lab from them. If they keep the same -high- price and/or wrong "attitude", i'll have to look at a different vendor.

Let's hope this Friday, i'll have something tougher. The goal still remains for something>50%.

Subscribe to: Comments (Atom)
 

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License.

AltStyle によって変換されたページ (->オリジナル) /