Saturday, November 12, 2011
aggregate-address ... summary-only-after-a-while
As it seems, there is always something that you think you know, until it's proven the other way around.
Some years ago, when i was studying for the CCIE, i knew that in order to suppress more specific routes from an aggregate advertisement in BGP, you could use the "aggregate-address .... summary-only" command. And i believed it until recently.
Let's suppose you have the following config in a ASR1k (10.1.1.1) running 15.1(2)S2:
router bgp 100 bgp router-id 10.1.1.1 neighbor 10.2.2.2 remote-as 100 neighbor 10.2.2.2 update-source Loopback0 ... address-family ipv4 aggregate-address 10.10.10.0 255.255.255.0 summary-only redistribute connected neighbor 10.2.2.2 activate ...
Then you have 2 subscribers logging in.
With "show bgp" the 2 /32 routes under 10.10.10.0/24 seem to be suppressed and the /24 is in the BGP table as it should be:
*> 10.10.10.0/24 0.0.0.0 32768 i s> 10.10.10.3/32 0.0.0.0 0 32768 ? s> 10.10.10.4/32 0.0.0.0 0 32768 ?
but doing some "debug bgp updates/events" reveals the following:
BGP(0): 10.2.2.2 send UPDATE (format) 10.10.10.3/32, next 10.1.1.1, metric 0, path Local BGP(0): 10.2.2.2 send UPDATE (format) 10.10.10.4/32, next 10.1.1.1, metric 0, path Local
...and after a while:
BGP: aggregate timer expired BGP: topo global:IPv4 Unicast:base Remove_fwdroute for 10.10.10.3/32 BGP: topo global:IPv4 Unicast:base Remove_fwdroute for 10.10.10.4/32
At the same time on the peer router (10.2.2.2) you can see the above 2 /32 routes being received:
RP/0/RP0/CPU0:ASR9k#sh bgp neighbor 10.1.1.1 routes | i /32 *>i10.10.10.3/32 10.1.1.1 0 100 0 ? *>i10.10.10.4/32 10.1.1.1 0 100 0 ?
and immediately afterwards you can see the 2 /32 routes being withdrawn:
RP/0/RP0/CPU0:ASR9k#sh bgp neighbor 10.1.1.1 routes | i /32
Cisco is a little bit contradicting on this behavior, as usual.
According to Cisco tac, the default aggregation logic runs every 30 seconds, but bgp update processing will be done almost every 2 seconds. That's the reason the route is being updated initially and later withdrawn (due to the aggregation processing following the initial update). They also admit that the root cause of this problem is with the BGP code. The route will be advertised as soon as the best path is completed. It may take 30 seconds or more for the aggregation logic to complete and withdraw the more specific route.
Then we also have the following:
Bug CSCsu96698
Related Bug Information
Release notes of 12.2SB and 12.0S
Release notes of 12.2(33)SXH6
CLI change to bgp aggregate-timer command to suppress more specific routes.
Old Behavior: More specific routes are advertised and withdrawn later, even if aggregate-address
summary-only is configured. The BGP table shows the specific prefixes as suppressed.
New Behavior: The bgp aggregate-timer command now accepts the value of 0 (zero), which
disables the aggregate timer and suppresses the routes immediately.
Command Reference for "bgp aggregate-timer"
To set the interval at which BGP routes will be aggregated or to disable timer-based route aggregation, use the bgp aggregate-timer command in address-family or router configuration mode. To restore the default value, use the no form of this command.
bgp aggregate-timer seconds
no bgp aggregate-timer
Syntax Description
seconds
Interval (in seconds) at which the system will aggregate BGP routes.
•The range is from 6 to 60 or else 0 (zero). The default is 30.
•A value of 0 (zero) disables timer-based aggregation and starts aggregation immediately.
Command Default
30 seconds
Usage Guidelines
Use this command to change the default interval at which BGP routes are aggregated.
In very large configurations, even if the aggregate-address summary-only command is configured, more specific routes are advertised and later withdrawn. To avoid this behavior, configure the bgp aggregate-timer to 0 (zero), and the system will immediately check for aggregate routes and suppress specific routes.
The interesting part is that the command reference for "aggregate-address ... summary-only" doesn't mention anything about this behavior in order to warn you.
Using the summary-only keyword not only creates the aggregate route (for example, 192.*.*.*) but also suppresses advertisements of more-specific routes to all neighbors. If you want to suppress only advertisements to certain neighbors, you may use the neighbor distribute-list command, with caution. If a more-specific route leaks out, all BGP or mBGP routers will prefer that route over the less-specific aggregate you are generating (using longest-match routing).
The following debug logs show the default aggregate-timer which is 30 secs, vs the default BGP scan timer which is 60 secs:
Nov 12 21:45:32.468: BGP: Performing BGP general scanning Nov 12 21:45:38.906: BGP: aggregate timer expired Nov 12 21:46:09.637: BGP: aggregate timer expired Nov 12 21:46:32.487: BGP: Performing BGP general scanning Nov 12 21:46:40.379: BGP: aggregate timer expired Nov 12 21:47:11.099: BGP: aggregate timer expired Nov 12 21:47:32.506: BGP: Performing BGP general scanning Nov 12 21:47:41.828: BGP: aggregate timer expired Nov 12 21:48:12.547: BGP: aggregate timer expired Nov 12 21:48:32.525: BGP: Performing BGP general scanning Nov 12 21:48:43.268: BGP: aggregate timer expired Nov 12 21:49:13.989: BGP: aggregate timer expired Nov 12 21:49:32.544: BGP: Performing BGP general scanning Nov 12 21:49:44.765: BGP: aggregate timer expired Nov 12 21:50:15.510: BGP: aggregate timer expired
Guess what! After changing the aggregate-timer to 0, the cpu load increases by a steady +10%, due to the BGP Router process!
ASR1k#sh proc cpu s | exc 0.00 CPU utilization for five seconds: 17%/6%; one minute: 16%; five minutes: 15% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 61 329991518 1083305044 304 4.07% 4.09% 3.93% 0 IOSD ipc task 340 202049403 53391862 3784 1.35% 2.10% 2.11% 0 VTEMPLATE Backgr 404 84594181 2432529294 0 1.19% 1.18% 1.14% 0 PPP Events 229 49275197 1710570 28806 0.71% 0.33% 0.39% 0 QoS stats proces 152 39536838 801801056 49 0.63% 0.56% 0.51% 0 SSM connection m 159 51982155 383585236 135 0.47% 0.66% 0.64% 0 SSS Manager ASR1k#conf t Enter configuration commands, one per line. End with CNTL/Z. ASR1k(config)#router bgp 100 ASR1k(config-router)# address-family ipv4 ASR1k(config-router-af)# bgp aggregate-timer 0 ASR1k(config-router-af)#^Z
...and after a while:
ASR1k#sh proc cpu s | exc 0.00 CPU utilization for five seconds: 29%/6%; one minute: 26%; five minutes: 25% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 394 143774989 19776654 7269 9.91% 7.93% 7.32% 0 BGP Router 61 329983414 1083280872 304 4.79% 3.97% 3.90% 0 IOSD ipc task 340 202044852 53390504 3784 2.71% 2.22% 2.04% 0 VTEMPLATE Backgr 404 84591714 2432460324 0 1.03% 1.16% 1.09% 0 PPP Events 159 51980758 383575060 135 0.79% 0.66% 0.62% 0 SSS Manager 152 39535734 801779715 49 0.63% 0.54% 0.50% 0 SSM connection m
Conclusions
1) By default, the "aggregate-address ... summary-only" command doesn't immediately stop the announcement of more specific routes, as it's supposed to. You need to also change the BGP aggregate-timer to 0.
2) After changing the BGP aggregate-timer to 0, the announcement of more specific routes stops, but the cpu load increases by 10%.
C'mon Cisco! You gave us NH Address Tracking and PIC, and you can't fix the aggregation process?
Posted by Tassos at 19:14 14 comments
Labels: aggregate-address, BGP, summary-only
Monday, April 7, 2008
BGP - Can an aggregate-address suppress another aggregate-address on the same router?
Or can an aggregate-address aggregate another aggregate-address on the same router?
Today i was playing with BGP a little (i found some time to prepare for my CCIP) and here is what i found out:
Suppose you have the following config on a router:
interface Loopback1
ip address 1.1.1.1 255.255.255.128
!
router bgp 1
no synchronization
bgp log-neighbor-changes
network 1.1.1.0 mask 255.255.255.128
aggregate-address 1.1.0.0 255.255.0.0 summary-only
aggregate-address 1.1.1.0 255.255.255.0 summary-only
neighbor 10.10.10.2 remote-as 2
no auto-summary
What do you think "sh ip bgp" will show? (please take some time and think about it...)
To be honest, as a first thought i was hoping it would display only the 1.1.0.0/16 summary, which actually "overlaps" the 1.1.1.0/24 summary.
Guess what?
R1#sh ip bgp
BGP table version is 5, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid,> best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.0.0/16 0.0.0.0 32768 i
s> 1.1.1.0/25 0.0.0.0 0 32768 i
*> 1.1.1.0/24 0.0.0.0 32768 i
According to Cisco, in order to aggregate an address, you must have a more-specific route of that address in the BGP table. And if you want the more-specific route to be suppressed, you must use the "summary-only" keyword.
So, if we want to be as strict as possible, in our case we do have a more-specific route in the BGP table (and in the routing table).
R1#sh ip bgp 1.1.1.0/24
BGP routing table entry for 1.1.1.0/24, version 5
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to non peer-group peers:
10.10.10.2
Local, (aggregated by 1 1.1.1.1)
0.0.0.0 from 0.0.0.0 (1.1.1.1)
Origin IGP, localpref 100, weight 32768, valid, aggregated, local, atomic-aggregate, best
R1#
R1#sh ip route 1.1.1.0 255.255.255.0
Routing entry for 1.1.1.0/24
Known via "bgp 1", distance 200, metric 0, type locally generated
Routing Descriptor Blocks:
* directly connected, via Null0
Route metric is 0, traffic share count is 1
AS Hops 0
Although the aggregation part seems to be working fine (it actually isn't), the suppression one isn't. As it seems, an aggregate-address cannot suppress another more-specific aggregate-address, when both are created locally in the same router.
Now, let's change the configuration and make it more interesting by removing the "summary-only" keyword from the more-specific aggregate-address :
router bgp 1
no synchronization
bgp log-neighbor-changes
network 1.1.1.0 mask 255.255.255.128
aggregate-address 1.1.0.0 255.255.0.0 summary-only
aggregate-address 1.1.1.0 255.255.255.0
neighbor 10.10.10.2 remote-as 2
no auto-summary
R1#clear ip bgp *
R1#sh ip bgp
BGP table version is 5, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid,> best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.0.0/16 0.0.0.0 32768 i
s> 1.1.1.0/25 0.0.0.0 0 32768 i
*> 1.1.1.0/24 0.0.0.0 32768 i
What do you think of that? We still have the same output. The more-specific (lower level) aggregate-address is not suppressed, but the more-specific network is (by the 1.1.0.0/16 aggregate-address).
Why is that? Although i'm not sure, looking at the following debugs (after adding another level of aggregation using the "aggregate-address 1.0.0.0 255.0.0.0 summary-only" command and reseting the bgp session), i have come to this explanation:
*Mar 1 02:28:52.107: BGP(0): nettable_walker 1.1.1.0/25 route sourced locally
*Mar 1 02:28:52.111: BGP(0): Aggregate processing for IPv4 Unicast
*Mar 1 02:28:52.111: BGP(0): For aggregate 1.0.0.0/8
*Mar 1 02:28:52.111: BGP(0): 1.0.0.0/8 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): sub-prefix : 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): Needs to be re-aggregated
*Mar 1 02:28:52.115: BGP(0): 1.0.0.0/8 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): 1.0.0.0/8 aggregate has 1.1.1.0/25 more-specific
*Mar 1 02:28:52.115: BGP(0): 1.0.0.0/8 aggregate created, attributes updated
*Mar 1 02:28:52.115: BGP(0): created aggregate route for 1.0.0.0/8
*Mar 1 02:28:52.115: BGP(0): 1.0.0.0/8 subtree has an entry 1.0.0.0/8
*Mar 1 02:28:52.115: BGP(0): 1.0.0.0/8 subtree has another entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): Found sub-prefix 1.1.1.0/25: suppressed
*Mar 1 02:28:52.115: BGP(0): For aggregate 1.1.0.0/16
*Mar 1 02:28:52.115: BGP(0): 1.1.0.0/16 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): sub-prefix : 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): Needs to be re-aggregated
*Mar 1 02:28:52.115: BGP(0): 1.1.0.0/16 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): 1.1.0.0/16 aggregate has 1.1.1.0/25 more-specific
*Mar 1 02:28:52.115: BGP(0): 1.1.0.0/16 aggregate created, attributes updated
*Mar 1 02:28:52.115: BGP(0): created aggregate route for 1.1.0.0/16
*Mar 1 02:28:52.115: BGP(0): 1.1.0.0/16 subtree has an entry 1.1.0.0/16
*Mar 1 02:28:52.115: BGP(0): 1.1.0.0/16 subtree has another entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): Found sub-prefix 1.1.1.0/25: suppressed
*Mar 1 02:28:52.115: BGP(0): For aggregate 1.1.1.0/24
*Mar 1 02:28:52.115: BGP(0): 1.1.1.0/24 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): sub-prefix : 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): Needs to be re-aggregated
*Mar 1 02:28:52.115: BGP(0): 1.1.1.0/24 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): 1.1.1.0/24 aggregate has 1.1.1.0/25 more-specific
*Mar 1 02:28:52.115: BGP(0): 1.1.1.0/24 aggregate created, attributes updated
*Mar 1 02:28:52.115: BGP(0): created aggregate route for 1.1.1.0/24
*Mar 1 02:28:52.115: BGP(0): 1.1.1.0/24 subtree has an entry 1.1.1.0/25
*Mar 1 02:28:52.115: BGP(0): Found sub-prefix 1.1.1.0/25: suppressed
*Mar 1 02:28:52.115: BGP(0): Found sub-prefix 1.1.1.0/24:
*Mar 1 02:28:52.115: BGP(0): Revise route installing 1 of 1 route for 1.0.0.0/8 -> 0.0.0.0 to main IP table
*Mar 1 02:28:52.119: RT: network 1.0.0.0 is now variably masked
*Mar 1 02:28:52.119: RT: add 1.0.0.0/8 via 0.0.0.0, bgp metric [200/0]
*Mar 1 02:28:52.119: RT: NET-RED 1.0.0.0/8
*Mar 1 02:28:52.119: BGP(0): Revise route installing 1 of 1 route for 1.1.0.0/16 -> 0.0.0.0 to main IP table
*Mar 1 02:28:52.119: RT: add 1.1.0.0/16 via 0.0.0.0, bgp metric [200/0]
*Mar 1 02:28:52.119: RT: NET-RED 1.1.0.0/16
*Mar 1 02:28:52.119: BGP(0): nettable_walker 1.1.1.0/25 route sourced locally
*Mar 1 02:28:52.119: BGP(0): Revise route installing 1 of 1 route for 1.1.1.0/24 -> 0.0.0.0 to main IP table
*Mar 1 02:28:52.119: RT: add 1.1.1.0/24 via 0.0.0.0, bgp metric [200/0]
*Mar 1 02:28:52.119: RT: NET-RED 1.1.1.0/24
During the initial scanning, the aggregate processing algorithm of BGP doesn't even check the locally aggregated addresses (is recursive scanning too difficult/dangerous to implement?), because they aren't in the BGP routing table at the time of scanning (i guess if the scanning was happening from more-specific to less-specific, it would find them). So it checks only the networks that are injected into BGP through the 3 known ways (network command, redistribution, other ASs). In our case, every configured aggregate-address is aggregating and suppressing only the locally configured more-specific network. Specifically, the more-specific local network is aggregated & suppressed 3 times, one for each aggregate-address definition.
The result?
R1#sh ip bgp
BGP table version is 6, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid,> best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.0.0.0 0.0.0.0 32768 i
*> 1.1.0.0/16 0.0.0.0 32768 i
s> 1.1.1.0/25 0.0.0.0 0 32768 i
*> 1.1.1.0/24 0.0.0.0 32768 i
If we add the less-specific aggregate after the BGP has already started, we get different debug logs:
*Mar 1 00:05:26.995: BGP(0): Aggregate processing for IPv4 Unicast
*Mar 1 00:05:26.995: BGP(0): For aggregate 1.0.0.0/8
*Mar 1 00:05:26.995: BGP(0): 1.0.0.0/8 subtree has an entry 1.1.0.0/16
*Mar 1 00:05:26.999: BGP(0): sub-prefix : 1.1.0.0/16
*Mar 1 00:05:26.999: BGP(0): Needs to be re-aggregated
*Mar 1 00:05:26.999: BGP(0): 1.0.0.0/8 subtree has an entry 1.1.0.0/16
*Mar 1 00:05:27.003: BGP(0): 1.0.0.0/8 aggregate has 1.1.1.0/25 more-specific
*Mar 1 00:05:27.007: BGP(0): 1.0.0.0/8 aggregate created, attributes updated
*Mar 1 00:05:27.007: BGP(0): created aggregate route for 1.0.0.0/8
*Mar 1 00:05:27.011: BGP(0): 1.0.0.0/8 subtree has an entry 1.0.0.0/8
*Mar 1 00:05:27.011: BGP(0): 1.0.0.0/8 subtree has another entry 1.1.0.0/16
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.0.0/16: not suppressed
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.1.0/25: suppressed
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.1.0/24: not suppressed
*Mar 1 00:05:27.011: BGP(0): For aggregate 1.1.0.0/16
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 subtree has an entry 1.1.0.0/16
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 subtree has another entry 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): sub-prefix : 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): Needs to be re-aggregated
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 subtree has an entry 1.1.0.0/16
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 subtree has another entry 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 aggregate has 1.1.1.0/25 more-specific
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 aggregate updated
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 subtree has an entry 1.1.0.0/16
*Mar 1 00:05:27.011: BGP(0): 1.1.0.0/16 subtree has another entry 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.1.0/25: suppressed
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.1.0/24: not suppressed
*Mar 1 00:05:27.011: BGP(0): For aggregate 1.1.1.0/24
*Mar 1 00:05:27.011: BGP(0): 1.1.1.0/24 subtree has an entry 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): sub-prefix : 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): Needs to be re-aggregated
*Mar 1 00:05:27.011: BGP(0): 1.1.1.0/24 subtree has an entry 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): 1.1.1.0/24 aggregate has 1.1.1.0/25 more-specific
*Mar 1 00:05:27.011: BGP(0): 1.1.1.0/24 aggregate updated
*Mar 1 00:05:27.011: BGP(0): 1.1.1.0/24 subtree has an entry 1.1.1.0/25
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.1.0/25: suppressed
*Mar 1 00:05:27.011: BGP(0): Found sub-prefix 1.1.1.0/24:
*Mar 1 00:05:27.011: BGP(0): Revise route installing 1 of 1 route for 1.0.0.0/8 -> 0.0.0.0 to main IP table
*Mar 1 00:05:27.011: RT: add 1.0.0.0/8 via 0.0.0.0, bgp metric [200/0]
*Mar 1 00:05:27.011: RT: NET-RED 1.0.0.0/8
This time, all the existing more-specific aggregates are scanned, but they are clearly not suppressed.
According to RFC 4271:
3.2. Routing Information Base
The Routing Information Base (RIB) within a BGP speaker consists of
three distinct parts:
a) Adj-RIBs-In: The Adj-RIBs-In stores routing information learned
from inbound UPDATE messages that were received from other BGP
speakers. Their contents represent routes that are available
as input to the Decision Process.
b) Loc-RIB: The Loc-RIB contains the local routing information the
BGP speaker selected by applying its local policies to the
routing information contained in its Adj-RIBs-In. These are
the routes that will be used by the local BGP speaker. The
next hop for each of these routes MUST be resolvable via the
local BGP speaker's Routing Table.
c) Adj-RIBs-Out: The Adj-RIBs-Out stores information the local BGP
speaker selected for advertisement to its peers. The routing
information stored in the Adj-RIBs-Out will be carried in the
local BGP speaker's UPDATE messages and advertised to its
peers.
...
The Decision Process takes place in three distinct phases, each
triggered by a different event:
a) Phase 1 is responsible for calculating the degree of preference
for each route received from a peer.
b) Phase 2 is invoked on completion of phase 1. It is responsible
for choosing the best route out of all those available for each
distinct destination, and for installing each chosen route into
the Loc-RIB.
c) Phase 3 is invoked after the Loc-RIB has been modified. It is
responsible for disseminating routes in the Loc-RIB to each
peer, according to the policies contained in the PIB. Route
aggregation and information reduction can optionally be
performed within this phase.
...
9.2.2.2. Aggregating Routing Information
Aggregation is the process of combining the characteristics of
several different routes in such a way that a single route can be
advertised. Aggregation can occur as part of the Decision Process to
reduce the amount of routing information that will be placed in the
Adj-RIBs-Out.
If someone else can provide a better (preferred technical) explanation for both cases, i would be very happy to hear it.
Btw, someone must tell Cisco to write more detailed docs :
"Aggregation applies only to routes that exist in the BGP routing table. An aggregated route is forwarded if at least one more specific route of the aggregation exists in the BGP routing table"
Anyhow, you may want to keep that in mind.
Posted by Tassos at 22:34 2 comments
Labels: aggregate-address, BGP