skip to main | skip to sidebar
Showing posts with label radius. Show all posts
Showing posts with label radius. Show all posts

Sunday, February 23, 2014

IPv6 radius accounting is still a mess


Since the beginning of putting IPv6 into production BRAS/BNG (almost 3 years ago), we were facing the following issue: radius accounting records were missing either IPv4 and/or IPv6 address information. When only IPv4 was being used, everything was easy: just wait for IPCP to complete and then send the accounting record. Now, with the addition of IPv6 and most importantly DHCPv6 into the equation, things have become complicated and even worse, trying to fix IPv6 has broken IPv4.

The issue is mainly caused by the extra PPP/NCP negotiation happening due to IPv6 and by the DHCPv6 running asynchronously on top of IPv6 and PPP.Unfortunately most of these issues are unnoticed until you have a lot of IPv6 subscribers and a lot of different CPEs.

All these years, Cisco has tried various solutions in order to fix this issue, maybe following a wrong way.

My proposal?

  • Allow the first accounting packet (start) to be sent into a configurable (preferable small) timeframe, assuming at least one of IPv4/IPv6 address assignments has completed.
  • Allow a second accounting packet (interim/alive) to be sent a little bit later (with a larger timeframe compared to the first one), only if some IPv6 info wasn't included in the first accounting packet.
  • Both timeframes should be configurable by the user.
  • Allow an extra accounting packet (interim/alive) to be sent whenever an DHCPv6 prefix delegation happens, only if its info wasn't included in the previous two accounting packets.

TR-187 from Broadband Forum describes a similar behavior (in R-67 to R-73), but it includes an accounting record per each phase of PPP/NCP/IPCP negotiation.

Here is a more detailed description of my proposal:

Everything happening at the PPP layer, should produce a single accounting record assuming "logical" PPP timeouts forbid the NCP negotiation to continue for a long time. PPP should complete at least one of the IPv4/IPv6 address negotiations during this time, otherwise the PPP session should be torn down. If at least one protocol address assignment is successful, then send the accounting start record for this protocol and allow for extra accounting (interim/alive) record to be sent as soon as the other protocol completes too. But don't add extra delay in the initial accounting, while waiting for both IPv4/IPv6 address assignment to happen. This initial accounting record (start) should include both IPv4 (IPCP) address and IPv6 (IPV6CP) interface-id, with the capability to also add the ICMPv6 RA prefix if SLAAC is being used. Based on my measurements (on a averagely loaded BRAS/BNG), IPCP/IPV6CP assignments should take less than 100ms, while SLAAC completion might take up to 500ms after PPP completes.

A 2nd accounting record should be sent, just after acquiring any IPv6 address/prefix through DHCPv6 and/or DHCPv6-PD. Since DHCPv6 runs asynchronously on top of IPv6 and you cannot guarantee for the exact timeframe into which the CPE will ask for an address/prefix using it, it's better to have a separate accounting record for it, even if that comes after some seconds. Based on my measurements, DHCPv6 completion might happen up to 3-5 seconds after PPP completes, depending mostly on the CPE's ability to initiate DHCPv6 at the right time (after that it's just some ms until the prefix is assigned).

A 3rd accounting record should be sent on vary rare cases, i.e. when IPv4 assignment completes early, SLAAC and/or DHCPv6 IA-NA are completed afterwards and DHCPv6-PD happens at a later time (although relevant RFCs "forbid" such behavior, the network should be able to cope with "arrogant" CPEs too).

Of course if the DHCPv6/DHCPv6-PD assignment happens into the initial small timeframe, then its info should also be included into the initial accounting packet if possible. Or if SLAAC accounting didn't happen into the initial timeframe, then its info should be included into the second accounting packet.

Also, it's obvious that if PPP protocol rejects are received for IPv4 (i.e. for IPv6-only) or IPv6 (i.e. for IPv4-only), then there shouldn't be any waiting time for the rejected protocol to complete address assignment.

An even better proposal is to fix the periodic accounting inform in order to include every possible type of new information (including IPv4/IPv6 addresses), either as soon as it happens or after some configurable time.

All the above is interesting for conversation assuming that the prefix assigned through DHCPv6-PD is indeed included into the radius accounting packets. There have been numerous tries of fixing that and the IPv4/IPv6 accounting in general ("aaa accounting include auth-profile delegated-ipv6-prefix", "aaa accounting delay-start extended-time", "aaa accounting delay-start [all]", CSCua18679, CSCub63085, CSCuj80527, CSCtn24085, CSCuj09925), so let's hope this time Cisco will be successful.

PS: Some things might work better or worse depending on whether the DHCPv6-PD assignment happens through a local pool or a pool configured on radius. Ivan's blog includes some information too.

Saturday, April 23, 2011

How to assign IPv6 addresses to broadband CPEs

During the last months i've been experimenting a lot with all possible IPv6 address assignment methods to a broadband subscriber. As is the case with most ISPs and IPv6, we are testing every possible scenario before we put one (or a combination) of them into production; nevertheless we still haven't decided which path to follow on every aspect. And although we have made up our minds on many of those dilemmas, the one that will remain unresolved for a little bit more is the choice between dynamic vs static addresses.

Based on various RFCs, Broadband Forum's TRs, blogs (i.e. Ivan's) and IETF WG presentations, i have gathered below all possible address assignment scenarios/methods, when it comes to a broadband (PPPoE) IPv6 subscriber CPE. Single host connectivity is out of the scope of this list, because it doesn't seem applicable for wireline providers.

Some of these methods have already been verified internally (using mostly Cisco equipment, followed closely by Juniper), others are waiting to be verified and a few are under evaluation whether they actually can be implemented and verified. So, if you believe that something is not applicable, please write down your objection, so we can discuss it. Also, if you find something missing, please feel free to add it in your comment.

I have put them in 3 different areas, based on the interface and the address type. Each area includes various mechanisms related to address assignment.

  • CPE WAN Interface - Link-Local IPv6 Address
    • IPv6CP (RFC 5072)
      • Random/Unique Interface-Id
      • "Framed-Interface-Id" from Radius (RFC 3162)
  • CPE WAN Interface - Global Unicast IPv6 Address/Prefix
    • SLAAC (RFC 4862)
      • RA using a locally configured IPv6 pool on BRAS
      • RA using a "Framed-IPv6-Pool" from Radius (RFC 3162) to define a locally configured IPv6 pool
      • RA using "Framed-IPv6-Prefix" from Radius (RFC 3162)
    • Stateful DHCPv6 (RFC 3315)
      • IA_NA using a locally configured IPv6 address prefix pool on BRAS
      • IA_NA using an external DHCPv6 server, having BRAS as Relay Agent
      • IA_NA using "Framed-IPv6-Pool" from Radius (RFC 3162) to define a locally configured IPv6 pool on BRAS
      • IA_NA using "Framed-IPv6-Address" from Radius (draft-ietf-radext-ipv6-access)
  • CPE LAN Interface - Global Unicast IPv6 Prefix
    • DHCPv6-PD (RFC 3633)
      • IA_PD using "Delegated-IPv6-Prefix" from Radius (RFC 4818)
      • IA_PD using "Framed-IPv6-Prefix" from Radius (RFC 3162) for $username-dhcpv6 (if RFC 4818 is not supported)
      • IA_PD using "Framed-IPv6-Prefix" from Radius (RFC 3162) (if global addresses are not used on the WAN interface)
      • IA_PD using a locally configured IPv6 prefix pool on BRAS
      • IA_PD using an external DHCPv6-PD server, having BRAS as Relay Agent
      • IA_PD using "Framed-IPv6-Pool" from Radius (RFC 3162) to define a locally configured IPv6 prefix pool on BRAS
      • IA_PD using "Delegated-IPv6-Prefix-Pool" from Radius (draft-ietf-radext-ipv6-access) to define a locally configured IPv6 prefix pool on BRAS
Acronyms
RA = Router Advertisement
IA_NA = Identity Association for Non-temporary Addresses
IA_PD = Identity Association for Prefix Delegation

Color Codes
GREEN : Verified
ORANGE : To be verified
BLUE : To be implemented and verified
RED : Probably not applicable

What i see as the most interesting options are the following two:
  • "Framed-IPv6-Prefix" for WAN (through SLAAC) and "Delegated-IPv6-Prefix" for LAN
  • "Framed-IPv6-Address" for WAN (through DHCPv6) and "Delegated-IPv6-Prefix" for LAN

Hopefully we'll soon have support of "draft-ietf-radext-ipv6-access" by BRAS vendors and we'll be able to verify another bunch of them.


Notes
  • Radiator has been used as the radius server for all our internal tests. Its flexibility probably puts it at the top of the market.
  • Technically, IPv6CP is not an address assignment method (like IPCP). It just helps/hints the peer to build a possible IPv6 link-local address by negotiating a 64bit Interface-Id option. There have been various draft RFCs (draft-huang-ipv6cp-options, draft-qin-pppext-ipv6-addr-pref) that were supposed to add more options to it (like Prefix, IPv6-Address, Delegated-Prefix, DNS-IPv6-Address, etc.), but they expired and never moved into standard status. Two new ones (draft-hu-pppext-ipv6cp-requirements, draft-hu-pppext-ipv6cp-extensions) have brought the issue into surface again. IETF members' usual answer is that PPP is a link-layer protocol, so higher level (i.e network,application) protocols should be left outside.

Posted by Tassos at 11:35 1 comments

Labels: , , , , , ,

Tuesday, March 3, 2009

Would you mind waiting for 2-3 mins in a console?

Details about bug CSCed45578 :

Console locked for 2 mins after booting with system accounting

Symptom:

When an IOS system is configured with "aaa accounting system", then users may
be prevented from starting an exec session on the console or terminal lines for
as much as three minutes, after the system reloads.

Conditions:


This delay occurs when the RADIUS and/or TACACS+ accounting server(s) are not
reachable at boot time.

Workaround:


Configure the following hidden global command:

router(config)#no aaa accounting system guarantee-first


Imagine this now on a router using many different authentication methods :


aaa authentication login default group tacacs+
aaa authentication login CONSOLE-ACCESS local
aaa authentication ppp default group radius
!
aaa accounting exec default start-stop group tacacs+
aaa accounting network default start-stop group radius
aaa accounting system default start-stop group radius
!
line con 0
login authentication CONSOLE-ACCESS

Terminal clients are authenticated through tacacs and their accounting is sent to tacacs too. These are probably the engineers trying to do their job in the router.

Console clients are authenticated using the local usernames ("enable" could be used too) but their accounting is sent to tacacs. These are probably the high-level engineers looking to do a somewhat more important job in the router.

PPP (i.e. adsl) customers are authenticated through radius and their accounting is sent to radius too.

System accounting (before reload and after reload/power-on) is also sent to radius, so ppp users from this router left in the database can be erased after the forementioned events.


What do you think will happen immediately after a reload, if the radius server is not reachable, but the router has ip connectivity?


1) ppp customers will be unable to connect (radius auth)
2) engineers will be unable to connect (tacacs auth)
3) high-level engineers will be unable to connect (local auth)

If you answered 1 (like i believed until some months ago), then you're 1/3 correct.
NOBODY will be able to connect to this router, regardless of the access method used.
This will last for 2-3 minutes in the best case (retransmit and timeout intervals of the aaa server do not seem to have any effect), but i have seen it lasting for 7-8 minutes in some cases (i guess before bug CSCed45578 was fixed).

Well, tell me honestly; would you mind waiting for 2-3 mins in a console, while trying to troubleshoot an issue? But let's suppose you can wait (some mins are nothing in comparison to eternity). What if there is a crash or an event happening during this small time and you need to further troubleshoot the issue? EEM can help in some circumstances, but it's not a panacea.

Unfortunately you have to live with it! And according to Cisco this behavior is expected.

But, there is a workaround! No need to worry. "no aaa accounting system guarantee-first", which was a hidden command in older releases, will fix the above issue, so anyone not affected by the non-working aaa server can have access to the router as soon as it boots up.

And when you though that finally a solution has been found, you discover that this command is causing another headache. And here is the complete story...


When "aaa accounting system default start-stop group radius" is enabled, router will sent an Accounting-On packet to the radius server sometime when it boots up. It will also try to sent an Accounting-Off, but that is not guaranteed (i.e.: after a crash or lost connection).

When the radius server receives the Accounting-Off packet, it assumes that all users disconnected from the router. For the same reason, when it receives the Accounting-On packet it assumes that the router just booted up, so the users were disconnected some time ago and now there should be nobody connected.

If you have configured:

aaa accounting system default start-stop group radius
aaa accounting system guarantee-first ! enabled by default => not shown

then router guarantees that the first aaa packet sent will be the Accounting-On. So, no login (in whatever form) is allowed, until a response is received from the radius server (or whatever system accounting method you have used), or the 2-3 mins waiting period passes. This way you're sure that the Accounting-On packet will be the first one to reach the radius.

If you have configured:

aaa accounting system default start-stop group radius
no aaa accounting system guarantee-first

then some users might log in before the router sends the Accounting-On packet. As a consequence, there is a possibility that some accounting start packets will be sent before the Accounting-On packet and/or some others might be lost. For most radius server implementations it is expected that all the previous user entries will be cleared from the database, when the Accounting-On packet is received. So, in your database, you'll see the above logged-in users as not connected.

I understand the motivation behind Cisco's implementation, but i think it lacks flexibility. In my case i don't care what will happen to the accounting records produced by the con/vty exec sessions during the bootup time. I prefer to lose them (i might even not create them in the first place), instead of postponing my access to the router.

A solution that comes to my mind right now would be to add this first-packet-guarantee specifically on each authentication method, like below:

aaa authentication login default group tacacs+
aaa authentication login CONSOLE-ACCESS enable
aaa authentication ppp default group radius guarantee-system-acct-first
aaa authentication ppp AAA-RADIUS group radius guarantee-system-acct-first

or disable this first-packet-guarantee per aaa method :

aaa authentication login default group tacacs+ no-guarantee-system-acct-first
aaa authentication login CONSOLE-ACCESS enable no-guarantee-system-acct-first
aaa authentication ppp default group radius
aaa authentication ppp AAA-RADIUS group radius

or enable/disable it per aaa server group, something like :

aaa group server radius RADIUS-GROUP
server 1.1.1.1 auth-port 1645 acct-port 1646
server 2.2.2.2 auth-port 1645 acct-port 1646
guarantee-system-acct-first


Before an aaa method or server group with the "guarantee-system-acct-first" parameter is invoked, the router should make sure that the "Accounting-On" packet has been sent and a response has been received. All other aaa methods should be invoked without any restrictions.

This way, engineers will be able to access the router without any waiting (through tacacs/local auth), but ppp customers will not be allowed to login, until the radius server has responded to the Accounting-On packet or the waiting time has passed.

Cisco recommends this to be handled through a PER (Product Enhancement Request) without of course being able to guarantee (have they tried using the above command? :-p) its solution.

I, personally, have negative experience from 3 PERs until now and i don't see the reason to make them 4. How about you? Have you ever submitted a PER to Cisco and what was the result? Please spare a min and complete the poll to the right.

Update 9 Mar 2009 : Bug CSCsy20392 has been opened, for everyone interested in following it. It has been opened as enhancement (to change the default behaviour of accounting guarantee-first command) and hopefully a PER will be attached to it.

Posted by Tassos at 21:11 7 comments

Labels: , ,

Subscribe to: Comments (Atom)
 

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License.

AltStyle によって変換されたページ (->オリジナル) /