Intel 8254x: Difference between revisions
Revision as of 16:53, 10 November 2025
The Intel 8254x series is comprised of: 82546GB/EB, 82545GM/EM, 82544GC/EI, 82541(PI/GI/EI), 82541ER, 82547GI/EI, and 82540EP/EM Gigabit Ethernet Controllers.
Overview
Intel 8254x-based cards come in 32-/64-bit, 33/66 MHz PCI and PCI-X flavors.
The Intel 82547GI(EI) connects to the motherboard via a Communications Streaming Architecture (CSA) port instead of a PCI/PCI-X bus.
The 82541xx and 82540EP/EM controllers do not support the PCI-X bus.
They are all high-performance, Gigabit-capable controllers and range from 1 to 4 ethernet/fiber ports per controller.
The Intel 8254x series heavily utilizes task offloading. Each controller has an "offloading engine" for tasks such as TCP/UDP/IP checksum calculations, packet filtering, and packet segmentation.
- Jumbo packets are supported.
- Wake on LAN (WoL) is supported.
- A four wire serial EEPROM interface as well as a generic EEPROM "read" interface is implemented within the configuration registers.
- D0 and D3 power states are supported through ACPI.
Programming
Detection
Section 5.2 in the 8254x Software Developer's Manual lists the Vendor and Device ID's of the various device in the 8254x series. These are used to detect devices on the PCI bus by looking in the PCI Configuration Space registers.
The device will also fill in the PCI Base Address Registers (BAR). BAR0 will either be a 64-bit or 32-bit MMIO address (checked by testing bits 2:1 to see if it's 00b (32-bit) or 10b (64-bit)) that points to the device's base register space. BAR0 should always be used to interface with the device via MMIO as the BAR number never changes in different devices in the series.
There is also a BAR that will contain an I/O base address, this can be detected by looking at each BAR and testing bit 1. Documentation states this will be in either BAR2 or BAR4, but emulators may move it.
When using MMIO, reading/writing to/from registers is very straight-forward.
uint64_tioaddr=BAR_GOES_HERE; voidwrite_register(uint16_tregister,uint32_tvalue){ *(uint32_t*)(ioaddr+register)=value; } uint32_tread_register(uint16_tregister){ return*(uint32_t*)(ioaddr+register); }
When using IO, reading/writing to/from registers is a little more complicated as the IO address space for the 8254x is only 8 bytes wide. The register at offset 0x00 is the "IOADDR" window. The register at offset 0x04 is the "IODATA" window. IOADDR holds the IO address that the IODATA window operates on. So, basic operation is to set the IOADDR window and then the desired action using the IODATA window.
uint16_tioaddr=IO_BAR_GOES_HERE; voidwrite_register(uint16_tregister,uint32_tvalue){ outl(ioaddr+0x00,register);// set the IOADDR window outl(ioaddr+0x04,value);// write the value to the IOADDR window which will end up in the register in IOADDR } uint32_tread_register(uint16_tregister){ outl(ioaddr+0x00,register);// set the IOADDR window returninl(ioaddr+0x04);// read the value }
Device Registers
The 8254x cards have a handful of registers. There is a complete list of the registers and their offsets at the Table 13-2 (Page 219) of the Intel 8254x Family of Gigabit Ethernet Controllers Software Developer's Manual.
Here are the most important ones:
| Category | Offset | Abbreviation | Name | R/W | Manual Page |
|---|---|---|---|---|---|
| General | 00000h | CTRL | Device Control | R/W | 224 |
| General | 00008h | STATUS | Device Status | R | 229 |
| General | 00010h | EECD | EEPROM/Flash Control/Data | R/W | 232 |
| General | 00014h | EERD | EEPROM Read (not applicable
to the 82544GC/EI) |
R/W | 236 |
| Interrupt | 000C0h | ICR | Interrupt Cause Read | R/W | 292 |
| Interrupt | 000D0h | IMS | Interrupt Mask Set / Read | R/W | 297 |
| Receive | 00100h | RCTL | Receive Control | R/W | 300 |
| Receive | 02800h | RDBAL | Receive Descriptor Base Low | R/W | 306 |
| Receive | 02804h | RDBAH | Receive Descriptor Base High | R/W | 306 |
| Receive | 02808h | RDLEN | Receive Descriptor Length | R/W | 307 |
| Receive | 02810h | RDH | Receive Descriptor Head | R/W | 307 |
| Receive | 02818h | RDT | Receive Descriptor Tail | R/W | 308 |
| Transmit | 00400h | TCTL | Transmit Control | R/W | 310 |
| Transmit | 03800h | TDBAL | Transmit Descriptor Base Low | R/W | 315 |
| Transmit | 03804h | TDBAH | Transmit Descriptor Base High | R/W | 316 |
| Transmit | 03808h | TDLEN | Transmit Descriptor Length | R/W | 316 |
| Transmit | 03810h | TDH | Transmit Descriptor Head | R/W | 317 |
| Transmit | 03818h | TDT | Transmit Descriptor Tail | R/W | 318 |
| Receive | 05400h-
05488h |
RAL(8*n) | Receive Address Low (n) | R/W | 329 |
| Receive | 05404h-
0547Ch |
RAH(8*n) | Receive Address High (n) | R/W | 329 |
| Field | Bit(s) | Name | Field | Bit(s) | Name |
|---|---|---|---|---|---|
| FD | 0 | Full - Duplex | SDP1_DATA | 19 | SDP1 Data Value |
| RSV | 2:1 | Reserved | ADVD3WUC | 20 | D3Cold Wakeup Capability
Advertisement Enable |
| LRST | 3 | Link Reset | EN_PHY_PWR_MGMT | 21 | PHY Power Management Enable |
| RSV | 4 | Reserved | SDP0_IODIR | 22 | SDP0 Pin Directionality |
| ASDE | 5 | Auto-Speed Detection Enable | SDP1_IODIR | 23 | SDP1 Pin Directionality |
| SLU | 6 | Set Link Up | RSV | 25:24 | Reserved |
| ILOS | 7 | Invert Loss-of-Signal | RST | 26 | Device Reset |
| SPEED | 9:8 | Speed selection | RFCE | 27 | Receive Flow Control Enable |
| RSV | 10 | Reserved | TFCE | 28 | Transmit Flow Control Enable |
| FRCSPD | 11 | Force Speed | RSV | 29 | Reserved |
| FRCDPLX | 12 | Force Duplex | VME | 30 | VLAN Mode Enable |
| RSV | 17:13 | Reserved | PHY_RST | 31 | PHY Reset |
| SDP0_DATA | 18 | SDP0 Data Value |
| Field | Bit(s) | Name |
|---|---|---|
| FD | 0 | Link Full Duplex configuration Indication. |
| LU | 1 | Link Up indication |
| Function ID | 3:2 | Provides software a mechanism to determine the Ethernet
controller function number (LAN identifier) for this MAC. Read as: [0b,0b] LAN A, [0b,1b] LAN B. Note: These settings are only applicable to the 82546GB/EB. |
| TXOFF | 4 | Transmission Paused |
| TBIMODE | 5 | TBI Mode/internal SerDes Indication.
Note: For the 82544GC/EI, reflects the status of the TBI_MODE input pin. |
| SPEED | 7:6 | Link Speed Setting.
Speed indication is mapped as follows: 00b = 10 Mb/s 01b = 100 Mb/s 10b = 1000 Mb/s 11b = 1000 Mb/s These bits are not valid in TBI mode/internal SerDes. |
| ASDV | 9:8 | Auto Speed Detection Value |
| RSV | 10 | Reserved |
| PCI66 | 11 | PCI Bus speed indication. (When set, indicates that the PCI Bus is running
at 66 MHz). |
| BUS641 | 12 | PCI Bus Width indication. (When set, indicates that the Ethernet controller is on
a 64-bit bus) |
| PCIX_MODE1 | 13 | PCI-X Mode indication. (When set, indicates that the Ethernet Controller is operating
in PCI-X mode) |
| PCIXSPD1 | 15:14 | PCI-X Bus Speed Indication.
00b = 50-66 MHz 01b = 66-100 MHz 10b = 100-133 MHz 11b = Reserved |
| RSV | 31:16 | Reserved |
1. Not applicable to the 82540EP/EM, 82541xx, or 82547GI/EI.
| Field | Bit(s) | Name |
|---|---|---|
| RSV | 0 | Reserved |
| EN | 1 | Transmit Enable |
| RSV | 2 | Reserved |
| PSP | 3 | Pad Short Packets |
| CT | 11:4 | Collision Threshold |
| COLD | 21:12 | Collision Distance |
| SWXOFF | 22 | Software XOFF Transmission |
| RSV | 23 | Reserved |
| RTLC | 24 | Re-transmit on Late Collision |
| NRTU | 25 | No Re-transmit on underrun
(82544GC/EL Only) |
| RSV | 31:26 | Reserved |
| Field | Bit(s) | Name | Field | Bit(s) | Name |
|---|---|---|---|---|---|
| RSV | 0 | Reserved | BSIZE | 17:16 | Receive Buffer Size |
| EN | 1 | Receiver Enable | VFE | 18 | VLAN Filter Enable |
| SBP | 2 | Store Bad Packets | CFIEN | 19 | Canonical Form Indicator Enable |
| UPE | 3 | Unicast Promiscuous Enabled | CFI | 20 | Canonical Form Indicator bit value |
| MPE | 4 | Multicast Promiscuous Enabled | RSV | 21 | Reserved |
| LPE | 5 | Long Packet Reception Enable | DPF | 22 | Discard Pause Frames |
| LBM | 7:6 | Loopback Mode | PMCF | 23 | Pass MAC Control Frames |
| RDMTS | 9:8 | Receive Descriptor Minimum
Threshold Size |
RSV | 24 | Reserved |
| RSV | 11:10 | Reserved | BSEX | 25 | Buffer Size Extenstion |
| MO | 13:12 | Multicast Offset | SECRC | 26 | Strip Ethernet CRC from incoming packet |
| RSV | 14 | Reserved | RSV | 21:27 | Reserved |
| BAM | 15 | Broadcast Accept |
When BSEX is set, the value in BSIZE is multiplied by 16.
| Size (Bytes) | BSIZE | BSEX |
|---|---|---|
| 16384 | 01b | 1 |
| 8192 | 10b | 1 |
| 4096 | 11b | 1 |
| 2048 | 00b | 0 |
| 1024 | 01b | 0 |
| 512 | 10b | 0 |
| 256 | 11b | 0 |
| Field | Bit(s) | Description |
|---|---|---|
| TDW | 0 | Sets mask for Transmit Descriptor Written Back |
| TXQE | 1 | Sets mask for Transmit Queue Empty. |
| LSC | 2 | Sets mask for Link Status Change. |
| RXSEQ | 3 | Sets mask for Receive Sequence Error.
This is a reserved bit for the 82541xx and 82547GI/EI. Set to 0b. |
| RXDMT0 | 4 | Sets mask for Receive Descriptor Minimum Threshold hit. |
| RSV | 5 | Reserved |
| RXO | 6 | Sets mask for on Receiver FIFO Overrun |
| RXT0 | 7 | Sets mask for Receiver Timer Interrupt |
| RSV | 8 | Reserved |
| MDAC | 9 | Sets mask for MDI/O Access Complete Interrupt |
| RXCFG | 10 | Sets mask for Receiving /C/ ordered sets.
This is a reserved bit for the 82541xx and 82547GI/EI. Set to 0b |
| RSV | 11 | Reserved |
| PHYINT | 12 | Sets mask for PHY Interrupt (not applicable to the 82544GC/EI).
This is a reserved bit for the 82541xx and 82547GI/EI. Set to 0b |
| GPI | 14:11 | Sets mask for General Purpose Interrupts (82544GC/EI only). |
| GPI | 14:13 | Sets mask for General Purpose Interrupts |
| TXD_LOW | 15 | Sets the mask for Transmit Descriptor Low Threshold hit (not
applicable to the 82544GC/EI). |
| SRPD | 16 | Sets mask for Small Receive Packet Detection (not applicable to
the 82544GC/EI). |
| RSV | 31:17 | Reserved |
To enable an interrupt, simply write '1' to the corresponding bit.
Descriptor Format
Both receive and transmit descriptors are 16 bytes in size. There are 3 types of transmit descriptors, the original referred to as the "Legacy transmit descriptor". The second one is referred to as the " TCP/IP Data Descriptor" and is a replacement for the legacy descriptor offering access to new offloading capabilities.The other descriptor type is fundamentally different as it does not point to packet data. It merely contains control information which is loaded into registers of the controller and affect the processing of future packets. For simplicity we will only use the Legacy transmit descriptor. If you want to learn more about the other types of descriptors, you can have a look at the specification.
| 63 63 | 47 40 | 39 36 | 35 32 | 31 24 | 23 16 | 15 0 |
|---|---|---|---|---|---|---|
| Buffer Address | ||||||
| Special | CSS | RSV | STA | CMD | CSO | Length |
| Name | Description |
|---|---|
| Buffer Address | The address of the buffer. Descriptors with a null address transfer no data. |
| Length | Length is per segment. The maximum length allowed is 16288 bytes. |
| CSO | Checksum Offset. Indicates where, relative to the start of the packet to insert
a TCP checksum if it is enabled in the CMD field. |
| CMD | Command Field |
| STA | Status Field |
| RSV | Reserved |
| CSS | Checksum Start Field. Its an offset relative to the start of the buffer and it
indicates where to start computing the Checksum. |
| Special | Special Field |
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|---|---|---|---|---|---|---|---|
| IDE | VLE | DEXT | RPS | RS | IC | IFCS | EOP |
| Name | Description |
|---|---|
| IDE (bit 7) | Interrupt Delay Enable |
| VLE (bit 6) | VLAN Packet Enable |
| DEXT (bit 5) | Extension. (Set to 0b to indicate legacy mode) |
| RPS/RSV (bit 4) | Report Packet Sent. 82544GC/EL only. Otherwise reserved! |
| RS (bit 3) | Report Status. (When set, the controller will fire an interrupt when
the packet gets transmitted and bit STA.DD (Descriptor Done) will be set). |
| IC (bit 2) | Insert Checksum. (When set, the controller will insert a checksum based
on the values of the CSO and CSS fields.) |
| IFCS (bit 1) | Controls the Insertion of the FCS/CRC field in normal Ethernet packets.
IFCS is only valid when EOP is set. |
| EOP (bit 0) | End Of Packet. It indicates the last descriptor making up the packet.
One or many descriptors can be used to form a packet. |
Transmit Descriptor Status Format
| 3 | 2 | 1 | 0 |
|---|---|---|---|
| TU | LC | EC | DD |
| Name | Description |
|---|---|
| TU/RSV (bit 3) | Transmit Underrun. Indicated a transmit underrun error has occurred.
82544GC/EL only. Otherwise reserved! |
| LC (bit 2) | Late Collision. Indicates that a Late Collision occurred while working in
half-duplex mode. It has no meaning in full-duplex. |
| EC (bit 1) | Excess Collisions. It indicates that the packet has experienced more than
the maximum excessive collisions as defined by TCTL.CT control field. |
| DD (bit 0) | Descriptor Done. Indicates that the descriptor is finished. |
| 63 48 | 47 40 | 39 32 | 31 16 | 15 0 |
|---|---|---|---|---|
| Buffer Address | ||||
| Special* | Errors | Status | Packet
Checksum* |
Length |
*82544GC/EL only. Otherwise reserved!
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|---|---|---|---|---|---|---|---|
| PIF | IPCF | TCPCS | RSV | VP | IXSM | EOP | DD |
| Name | Description |
|---|---|
| PIF (bit 7) | Passed in-exact filter. If set the software must examine this packet to determine
whether to accept it or not. if PIF is clear, the packet is known to be for this station. |
| IPCS (bit 6) | IP Checksum Calculated on Packet. (0 = do not perform IP checksum, 1 = perform IP checksum) |
| TCPCS (bit 5) | TCP Checksum Calculated on Packet. (0 = do not perform TCP/UDP checksum, 1 = perform TCP/UDP checksum) |
| RSV (bit 4) | Reserved |
| VP (bit 3) | Packet is 802.1Q (matched VET). |
| IXSM (bit 2) | Ignore Checksum Indication. (when set, the checksum indication results should be ignored). |
| EOP (bit 1) | End Of Packet. (Indicates that this is the last descriptor for an incoming packet) |
| DD (bit 0) | Descriptor Done. (Indicates whether the controller is done with the descriptor) |
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|---|---|---|---|---|---|---|---|
| RXE | IPE | TCPE | CXEa | RSV | SEQb | SEb | CE |
a. 82544GC/EI only, otherwise reserved!
b.82541xx, 82547GI/EI, and 82540EP/EM only, otherwise reserved.
| Name | Description |
|---|---|
| RXE (bit 7) | RX Data Error |
| IPE (bit 6) | IP Checksum Error |
| TCPE (bit 5) | TCP/UDP Checksum Error |
| CXE (bit 4) | Carrier Extension Error |
| RSV (bit 3) | Reserved |
| SEQ (bit 2) | Sequence Error |
| SE (bit 1) | Symbol Error |
| CE (bit 0) | CRC Error or Alignment Error |
The Receive Descriptor Special field is only populated for 802.1q packets. For all other packets it's contents are set to 0.
| 15 13 | 12 | 11 0 |
|---|---|---|
| PRI | CFI | VLAN |
| Name | Description |
|---|---|
| VLAN | VLAN Identifier |
| CFI | Canonical Form Indicator |
| PRI | User Priority |
EEPROM Reading
There are a few variants of the card with many differences, most notably the method to access the EEPROM and the Flash memory of the card. Here we will only describe methods applicable to cards that use the EEPROM method.
After that the EEPROM must be enabled in order to be able to read the MAC address of the NIC, this is done by setting the EECD.SK (0x01), EECD.CS (0x02) and EECD.DI (0x04) bits of the EECD (0x00010) register. This will allow software to perform reads to the EEPROM.
Before reading the EEPROM has a "lock-unlock" mechanism to prevent software-hardware collisions when reading from the EEPROM.
To lock the EEPROM the EECD.REQ (0x40) bit must be set in the EECD register. Then wait until the EECD.GNT (0x80) bit becomes set. Unlocking only requires to clear EECD.REQ.
To finally read the EEPROM first the kernel should AND the address to 12 (Applicable only to 82541x or 82547GI/EI cards) or 8 bits; then bit shift the desired address to 2 (Applicable only to 82541x or 82547GI/EI cards) or by 4. The kernel must OR it with the EECD.START (0x01) bit. Then finally write it to the EERD (0x00014) register.
The kernel should wait until the EEPROM read operation is finished by checking until EECD.DONE becomes clear. Then the kernel must read the EERD register, shift it to the right by 16 bits and truncate it to 16-bits.
After that the EERD.START bit must be cleared.
staticuint16_teeprom_read(uint8_taddr){ uint32_ttmp; uint16_tdata; if((le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EECD))&I8254X_EECD_EE_PRES)==0){ kpanic("EEPROM present bit is not set for i8254x\n"); } /* Tell the EEPROM to start reading */ if(dev_info.version==I82547GI_EI ||dev_info.version==I82541EI_A0 ||dev_info.version==I82541EI_B0 ||dev_info.version==I82541ER_C0 ||dev_info.version==I82541GI_B1 ||dev_info.version==I82541PI_C0){ /* Specification says that only 82541x devices and the * 82547GI/EI do 2-bit shift */ tmp=((uint32_t)addr&0xfff)<<2; }else{ tmp=((uint32_t)addr&0xff)<<8; } tmp|=I8254X_EERD_START; mmio_write_dword(dev_info.mmio.addr,I8254X_EERD,cpu_to_le32(tmp)); /* Wait until the read is finished - then the DONE bit is cleared */ timeout((le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EERD))&I8254X_EERD_DONE)==0,100); /* Obtain the data */ data=(uint16_t)(le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EERD))>>16); /* Tell EEPROM to stop reading */ tmp=le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EERD)); tmp&=~(uint32_t)I8254X_EERD_START; mmio_write_dword(dev_info.mmio.addr,I8254X_EERD,cpu_to_le32(tmp)); returndata; }
When all data is finally read the kernel should unlock the EEPROM to let hardware access it.
Initialization
The 8254x will be on an undefined state and as such it needs to be reset. The first thing that should be done is enabling bus mastering, memory and IO accesses from the PCI command register.
Then the NIC should be reset by setting CTRL.RST (bit 26, self clearing) bit in the Device Control register of the card.
After the card has been reset, you should enable the CTRL.ASDE, the CTRL.SLU bits (To enable Auto Speed Detection (ASDE), you must also set the SLU (Set link up) bit) and write the MAC address you want the device to use in the RAL0 and RAH0 registers. To get the device MAC address, all you have to do is read the first 3 bytes of the EEPROM.
The entire procedure looks something like this:
uint8_tMAC_ADDRESS[6]; voidreset_nic(){ uint32_tdevice_control=read_register(I8254_REG_CTRL); device_control|=I8254_CTRL_RESET;// Set the reset bit write_register(I8254_REG_CTRL,device_control); while(read_register(I8254_REG_CTRL)&I8254_CTRL_RESET)__asm__("hlt");// wait for it to reset device_control=read_register(I8254_REG_CTRL); device_control|=I8254_CTRL_ASDE|I8254_CTRL_SLU;// Enable Auto Speed Detection. write_register(I8254_REG_CTRL,device_control); // Read the MAC address from the EEPROM uint16_tb0=eeprom_read(0); uint16_tb1=eeprom_read(1); uint16_tb2=eeprom_read(2); MAC_ADDRESS[0]=b0&0xFF; MAC_ADDRESS[1]=b0>>8; MAC_ADDRESS[2]=b1&0xFF; MAC_ADDRESS[3]=b1>>8; MAC_ADDRESS[4]=b2&0xFF; MAC_ADDRESS[5]=b2>>8; // Write the MAC address to RAL/RAH 0. uint32_twriteL=((uint32_t)b1<<16)|b0; uint32_twriteH=b2; write_register(E1000_REG_RAL0,writeL); write_register(E1000_REG_RAH0,writeH); }
Ring setup
Theory of operation:
The next step is to setup the rings. Without setting up the rings, you will not be able to send/receive packets. Luckily the ring system is pretty simple, It consists of the T/RDH and T/RDT (Transmit/Receive Descriptor Head/Tail) and of-course the ring buffers.
Transmit Ring
In the image bellow, you can see the structure of the transmit ring. The shaded boxes represent descriptors that have been transmitted but not yet reclamed. (If you dynamically allocate the descriptor buffers, reclaiming would simply involve freeing those buffers).
Anything between the Head and the Tail is owned by the controller and consists the transmit queue (the descriptors that have been queued for transmission). At reset, both TDT and TDH are set to 0. (If TDT = TDH that means that the queue is empty, there is nothing to transmit).
Receive Ring
The image bellow depicts the structure of the receive ring. The shaded boxes represent descriptors that have stored incoming packets but have not yet been recognized by the driver. You can detect which descriptors have incoming data written in them by checking whether the status field is non-zero.
Any descriptors between RDH and RDT are owned by the hardware and should not be modified!
After the reset, the head should point to the first descriptor and the tail to the last descriptor of the ring (Since all descriptors are available for use).
The RDH points to the descriptor the controller will write the next received packet. It increments automatically.
The RDT points to one descriptor after the last available descriptor. This register should still point to a valid descriptor (should be within Base and Base + Size).
The TDLEN/RDLEN registers contain the size in bytes of the ring.
Setup:
Transmit Ring
- Firstly allocate a region for the descriptor ring
- Next, you can allocate a static buffer for the descriptors if you want, or use a dynamically allocated buffer to allocate it when you transmit the packet (In this example code, we use the first option).
- Set TDH and TDT to 0, TDBAL to the lower 32 bits of the ring's physical address, TDBAH to the higher 32 bits and TDBAL to the total length of the ring buffer (number of descriptors * 16)
- Set your preferred bits in the TCTL registger.
// Assumes 1:1 memory mapping for simplicity #define NUM_OF_TX_DESCRIPTORS 8 #define SIZE_OF_TX_DESCRIPTOR_BUFFER 4096 structtransmit_descriptor_t; voidsetup_transmit_ring(){ size_ttransmit_ring_size=NUM_OF_TX_DESCRIPTORS*16; transmit_descriptor_t*transmit_ring=your_favorite_physical_allocator(transmit_ring_size); for(inti=0;i<NUM_OF_TX_DESCRIPTORS;i++){ transmit_descriptor_t*descriptor=transmit_ring+i; descriptor->buffer_address=your_favorite_physical_allocator(SIZE_OF_TX_DESCRIPTOR_BUFFER); } write_register(REG_TDBAL,((uint64_t)transmit_ring)&0xFFFFFFFF); write_register(REG_TDBAH,((uint64_t)transmit_ring)>>32); write_register(REG_TDLEN,transmit_ring_size); write_register(REG_TDH,0); write_register(REG_TDT,0); // Set the Enable (EN) and Pad Short Packets (PSP) bits uint32_ttctl=E1000_TCTL_EN|E1000_TCTL_PSP; write_register(REG_TCTL,tctl); }
Receive Ring
- Firstly allocate a region for the descriptor ring
- After that, loop through each descriptor and allocate a buffer of the selected size (set in the Receive Control Register) and set it (its physical address) in the descriptor address field.
- Set RDH to 0 (the first descriptor), RDT to the last descriptor (number of descriptors - 1), RDBAL to the lower 32 bits of the ring's physical address, RDBAH to the higher 32 bits and RDLEN to the total length of the ring buffer (number of descriptors * 16).
- Set your preferred bits in the RCTL register (You must set the EN bit to enable the dma engine. LPE and BAM are recommended).
// Assumes 1:1 page mapping for simplicity #define NUM_OF_RX_DESCRIPTORS 32 #define SIZE_OF_RX_DESCRIPTOR_BUFFER 4096 structreceive_descriptor_t; voidsetup_receive_ring(){ size_treceive_ring_size=NUM_OF_RX_DESCRIPTORS*16;// you can substitute 16 with sizeof(receive_descriptor_t) receive_descriptor_t*receive_ring=your_favorite_physical_allocator(receive_ring_size); for(inti=0;i<NUM_OF_RX_DESCRIPTORS;i++){ receive_descriptor_t*descriptor=receive_ring+i; descriptor->buffer_address=your_favorite_physical_allocator(SIZE_OF_RX_DESCRIPTOR_BUFFER); } write_register(REG_RDBAL,((uint64_t)receive_ring)&0xFFFFFFFF);// Base Address Low write_register(REG_RDBAH,((uint64_t)rx_phys)>>32);// Base Address High write_register(REG_RDLEN,receive_ring_size);// Ring Size write_register(REG_RDH,0);// Set it to the first descriptor write_register(REG_RDT,NUM_OF_RX_DESCRIPTORS-1);// Set it to the last descriptor // Set the Enable, Long Packet Reception, Broadcast Accept Mode and Size Extenstion bits // Also set the buffer size. This configuration (BSIZE = 0b11 and BSEX = 1) means 4096 (4kB) buffers uint32_trctl=RCTL_EN|RCTL_LPE|RCTL_BAM|RCTL_BSEX|(0b11<<RCTL_BSIZE); write_register(REG_RCTL,rctl); }
Interrupt Handling
Well, If you want to receive packets, you need a way of knowing when to read them. Thats where interrupts come into play.
To enable Interrupts, simply set the corresponding bit in the Interrupt Mask Set/Read (IMS) register. Recommended interrupts are: RXT0 (to receive interrupts about incoming packets), RXO (to get notified about overruns) and LSC (to get notified about link status changes, e.g. if the user (un)plugs the ethernet cable. In such cases, you should redo the DHCP handshake to connect to that network).
voidenable_interrupts(){ uint32_tims=E1000_IMS_RXT|E1000_IMS_RXO|E1000_IMS_LSC; write_register(REG_IMS,ims); }
To check why an interrupt was caused, you can check the Interrupt Cause Read (ICR) register. The ICR register is self clearing, meaning it will get cleared when you read it. A simple interrupt handler may look something like this:
void_handle_interrupt(){ uint32_tcause=read_register(REG_ICR);// Cleared uppon read if(cause&IMS_RXT){// Packets received receive_packets();// Call the function responsible for receiving // packets and sending them to the network stack } if(cause&IMS_LSC){// link status change // Read the status register and check the LU bit to get the link status if(read_register(E1000_REG_STATUS)&STATUS_LU){ kprintf("Link change detected: Link up!\n"); }else{ kprintf("Link change detected: Link down!\n"); } } }
Packet Transmittion
To transmit a packet, all you have to do is load the data in a free descriptor (or split it if it doesn't fit in one descriptor) and set the EOP bit on the last descriptor.
In this example we are using preallocated buffers, but you could use dynamically allocated ones. Just remember to free it after the packet is transmitted.
voidsend_data(void*data,uint32_tsize,boolEOP){ uint32_ttail=read_register(REG_TDT); transmit_descriptor_t*tx=transmit_ring+tail;// Get the descriptor the tail is pointing at (next available descriptor) memcpy(tx->buffer_address,data,size);// Copy the data to the previously allocated buffer tx->length=size;// Set the length of the descriptor if(EOP)tx->command|=TX_CMD_EOP|TX_CMD_IFCS;// If its the last one, set EOP tail=(tail+1)%NUM_OF_TX_DESCRIPTORS; write_register(REG_TDT,tail);// Increment and write the tail } size_tsend(void*data,size_tlength){ size_tsent=0; // split the data into chunks and send them for(;sent<length;){ intto_send=min(length-sent,SIZE_OF_TX_DESCRIPTOR_BUFFER); send_data((void*)((uint64_t)data+sent),to_send,to_send==(length-sent)); sent+=to_send; } returnsent; }
Packet Reception
To receive packets after an interrupt, all you have to do is loop, from the first non-received (by the driver) packet, to the last one. To do that, its a good idea to keep track of the last descriptor the driver read. (You should do this, to reconstruct the packets in the correct order)
uint8_trx_next=0; voidreceive_packets(){ uint32_tidx=rx_next; void*buffer=nullptr;// use this to store the buffer. size_tbuffer_len=0; while(receive_ring[idx].status&RX_STATUS_DD){ // This descriptor has been filled booleop=receive_ring[idx].status&RX_STATUS_EOP; uint16_tlen=receive_ring[idx].length; void*data=receive_ring[idx].buffer_address; // Handle multiple-descriptor packets if(buffer==nullptr){// This is the first descriptor of the packet buffer=malloc(len);// use your kernel's heap allocator buffer_len=len; memcpy(buffer,data,len); }else{ // Its the next part of the packet, add it to the packet void*new_buffer=malloc(buffer_len+len);// allocate a bigger buffer memcpy(new_buffer,buffer,buffer_len);// copy the previous data free(buffer);// free the old buffer // copy the new data memcpy((void*)((uint64_t)new_buffer+buffer_len),data,len); // Set the new buffer into the variables buffer_len+=len; buffer=new_buffer; } // Set status to 0 (To give ownership back to the controller) receive_ring[idx].status=0; idx=(idx+1)%NUM_OF_RECEIVE_DESCRIPTORS; if(eop){ // This is the last descriptor of the packet // Forward the packet to your network stack stack_receive_packet(buffer,buffer_len); buffer=nullptr; buffer_len=0; } } // Give the controller more free descriptors by updating RDT uint32_ttail=(idx==0)?NUM_OF_RECEIVE_DESCRIPTORS-1:idx-1; write_register(REG_RDT,tail); rx_next=idx; }
Emulation
- VirtualBox (3.1 is all I can personally confirm) supports rather dodgy implementations of an Intel PRO/1000 MT Server (82545EM), Intel PRO/1000 MT Desktop (82540EM), and Intel PRO/1000 T Server (82543GC).
- Bugs:
- The EERD register is unimplemented (you *must* use the 4-wire access method if you want to read from the EEPROM). [01000101 - I had a patch committed to fix this. It will soon be mainstream]
- Bugs:
- VMWare Virtual Server 2 emulates/virtualizes an 82545EM-based card rather well.
- QEMU (since 0.10.0) supports an 82540EM-based card and it seems to work OK. It is the default network card since 0.11.0.
- Bugs:
- QEMU does not properly handle the software reset operation (CTRL.RST) in builds prior to June 2009.
- QEMU (version 4.2.1 tested) doesn't seem to support flash memory, instead shifting the IO Register Base Address up.
- Bugs:
- IIRC (needs confirmation) Microsoft's Hyper-V supports an 8254x-series card.
Documentation
- Intel 8254x Family of Gigabit Ethernet Controllers Software Developer's Manual
- The PCIe GbE Controllers Open Source Software Developer’s Manual may also be of interest. In Linux, the PCIe cards are handled by a separate driver (e1000e) but they appear to be mostly if not entirely compatible with the 8254x series.