Intel 8254x: Difference between revisions

From OSDev Wiki
Jump to navigation Jump to search
(Added more information about how the card works, its registers and how to write a simple driver that can receive/transmit packets)
m (Forgot to center a table... oops :))
Line 812: Line 812:
|Descriptor Done. Indicates that the descriptor is finished.
|Descriptor Done. Indicates that the descriptor is finished.
|}
|}
{| class="wikitable"
{| class="wikitable" (追記) style="text-align: center; (追記ここまで)
|+Receive Descriptor Format
|+Receive Descriptor Format
!63 48
!63 48

Revision as of 16:53, 10 November 2025

The Intel 8254x series is comprised of: 82546GB/EB, 82545GM/EM, 82544GC/EI, 82541(PI/GI/EI), 82541ER, 82547GI/EI, and 82540EP/EM Gigabit Ethernet Controllers.

Intel 82540EM-based card

Overview

Intel 8254x-based cards come in 32-/64-bit, 33/66 MHz PCI and PCI-X flavors.

The Intel 82547GI(EI) connects to the motherboard via a Communications Streaming Architecture (CSA) port instead of a PCI/PCI-X bus.

The 82541xx and 82540EP/EM controllers do not support the PCI-X bus.

They are all high-performance, Gigabit-capable controllers and range from 1 to 4 ethernet/fiber ports per controller.

The Intel 8254x series heavily utilizes task offloading. Each controller has an "offloading engine" for tasks such as TCP/UDP/IP checksum calculations, packet filtering, and packet segmentation.

  • Jumbo packets are supported.
  • Wake on LAN (WoL) is supported.
  • A four wire serial EEPROM interface as well as a generic EEPROM "read" interface is implemented within the configuration registers.
  • D0 and D3 power states are supported through ACPI.

Programming

Detection

Section 5.2 in the 8254x Software Developer's Manual lists the Vendor and Device ID's of the various device in the 8254x series. These are used to detect devices on the PCI bus by looking in the PCI Configuration Space registers.

The device will also fill in the PCI Base Address Registers (BAR). BAR0 will either be a 64-bit or 32-bit MMIO address (checked by testing bits 2:1 to see if it's 00b (32-bit) or 10b (64-bit)) that points to the device's base register space. BAR0 should always be used to interface with the device via MMIO as the BAR number never changes in different devices in the series.

There is also a BAR that will contain an I/O base address, this can be detected by looking at each BAR and testing bit 1. Documentation states this will be in either BAR2 or BAR4, but emulators may move it.

When using MMIO, reading/writing to/from registers is very straight-forward.

uint64_tioaddr=BAR_GOES_HERE;

voidwrite_register(uint16_tregister,uint32_tvalue){
*(uint32_t*)(ioaddr+register)=value;
}

uint32_tread_register(uint16_tregister){
return*(uint32_t*)(ioaddr+register);
}

When using IO, reading/writing to/from registers is a little more complicated as the IO address space for the 8254x is only 8 bytes wide. The register at offset 0x00 is the "IOADDR" window. The register at offset 0x04 is the "IODATA" window. IOADDR holds the IO address that the IODATA window operates on. So, basic operation is to set the IOADDR window and then the desired action using the IODATA window.

uint16_tioaddr=IO_BAR_GOES_HERE;

voidwrite_register(uint16_tregister,uint32_tvalue){
outl(ioaddr+0x00,register);// set the IOADDR window
outl(ioaddr+0x04,value);// write the value to the IOADDR window which will end up in the register in IOADDR
}

uint32_tread_register(uint16_tregister){
outl(ioaddr+0x00,register);// set the IOADDR window
returninl(ioaddr+0x04);// read the value
}

Device Registers

The 8254x cards have a handful of registers. There is a complete list of the registers and their offsets at the Table 13-2 (Page 219) of the Intel 8254x Family of Gigabit Ethernet Controllers Software Developer's Manual.

Here are the most important ones:

Category Offset Abbreviation Name R/W Manual Page
General 00000h CTRL Device Control R/W 224
General 00008h STATUS Device Status R 229
General 00010h EECD EEPROM/Flash Control/Data R/W 232
General 00014h EERD EEPROM Read (not applicable

to the 82544GC/EI)

R/W 236
Interrupt 000C0h ICR Interrupt Cause Read R/W 292
Interrupt 000D0h IMS Interrupt Mask Set / Read R/W 297
Receive 00100h RCTL Receive Control R/W 300
Receive 02800h RDBAL Receive Descriptor Base Low R/W 306
Receive 02804h RDBAH Receive Descriptor Base High R/W 306
Receive 02808h RDLEN Receive Descriptor Length R/W 307
Receive 02810h RDH Receive Descriptor Head R/W 307
Receive 02818h RDT Receive Descriptor Tail R/W 308
Transmit 00400h TCTL Transmit Control R/W 310
Transmit 03800h TDBAL Transmit Descriptor Base Low R/W 315
Transmit 03804h TDBAH Transmit Descriptor Base High R/W 316
Transmit 03808h TDLEN Transmit Descriptor Length R/W 316
Transmit 03810h TDH Transmit Descriptor Head R/W 317
Transmit 03818h TDT Transmit Descriptor Tail R/W 318
Receive 05400h-

05488h

RAL(8*n) Receive Address Low (n) R/W 329
Receive 05404h-

0547Ch

RAH(8*n) Receive Address High (n) R/W 329
The Device Control Register (CTRL)
Field Bit(s) Name Field Bit(s) Name
FD 0 Full - Duplex SDP1_DATA 19 SDP1 Data Value
RSV 2:1 Reserved ADVD3WUC 20 D3Cold Wakeup Capability

Advertisement Enable

LRST 3 Link Reset EN_PHY_PWR_MGMT 21 PHY Power Management Enable
RSV 4 Reserved SDP0_IODIR 22 SDP0 Pin Directionality
ASDE 5 Auto-Speed Detection Enable SDP1_IODIR 23 SDP1 Pin Directionality
SLU 6 Set Link Up RSV 25:24 Reserved
ILOS 7 Invert Loss-of-Signal RST 26 Device Reset
SPEED 9:8 Speed selection RFCE 27 Receive Flow Control Enable
RSV 10 Reserved TFCE 28 Transmit Flow Control Enable
FRCSPD 11 Force Speed RSV 29 Reserved
FRCDPLX 12 Force Duplex VME 30 VLAN Mode Enable
RSV 17:13 Reserved PHY_RST 31 PHY Reset
SDP0_DATA 18 SDP0 Data Value
Status Register Bit Description
Field Bit(s) Name
FD 0 Link Full Duplex configuration Indication.
LU 1 Link Up indication
Function ID 3:2 Provides software a mechanism to determine the Ethernet

controller function number (LAN identifier) for this MAC. Read

as: [0b,0b] LAN A, [0b,1b] LAN B.

Note: These settings are only applicable to the 82546GB/EB.

TXOFF 4 Transmission Paused
TBIMODE 5 TBI Mode/internal SerDes Indication.

Note: For the 82544GC/EI, reflects the status of the TBI_MODE input pin.

SPEED 7:6 Link Speed Setting.

Speed indication is mapped as follows:

00b = 10 Mb/s

01b = 100 Mb/s

10b = 1000 Mb/s

11b = 1000 Mb/s

These bits are not valid in TBI mode/internal SerDes.

ASDV 9:8 Auto Speed Detection Value
RSV 10 Reserved
PCI66 11 PCI Bus speed indication. (When set, indicates that the PCI Bus is running

at 66 MHz).

BUS641 12 PCI Bus Width indication. (When set, indicates that the Ethernet controller is on

a 64-bit bus)

PCIX_MODE1 13 PCI-X Mode indication. (When set, indicates that the Ethernet Controller is operating

in PCI-X mode)

PCIXSPD1 15:14 PCI-X Bus Speed Indication.

00b = 50-66 MHz

01b = 66-100 MHz

10b = 100-133 MHz

11b = Reserved

RSV 31:16 Reserved

1. Not applicable to the 82540EP/EM, 82541xx, or 82547GI/EI.

Transmit Control Register (TCTL)
Field Bit(s) Name
RSV 0 Reserved
EN 1 Transmit Enable
RSV 2 Reserved
PSP 3 Pad Short Packets
CT 11:4 Collision Threshold
COLD 21:12 Collision Distance
SWXOFF 22 Software XOFF Transmission
RSV 23 Reserved
RTLC 24 Re-transmit on Late Collision
NRTU 25 No Re-transmit on underrun

(82544GC/EL Only)

RSV 31:26 Reserved
Receive Control Register (RCTL)
Field Bit(s) Name Field Bit(s) Name
RSV 0 Reserved BSIZE 17:16 Receive Buffer Size
EN 1 Receiver Enable VFE 18 VLAN Filter Enable
SBP 2 Store Bad Packets CFIEN 19 Canonical Form Indicator Enable
UPE 3 Unicast Promiscuous Enabled CFI 20 Canonical Form Indicator bit value
MPE 4 Multicast Promiscuous Enabled RSV 21 Reserved
LPE 5 Long Packet Reception Enable DPF 22 Discard Pause Frames
LBM 7:6 Loopback Mode PMCF 23 Pass MAC Control Frames
RDMTS 9:8 Receive Descriptor Minimum

Threshold Size

RSV 24 Reserved
RSV 11:10 Reserved BSEX 25 Buffer Size Extenstion
MO 13:12 Multicast Offset SECRC 26 Strip Ethernet CRC from incoming packet
RSV 14 Reserved RSV 21:27 Reserved
BAM 15 Broadcast Accept

When BSEX is set, the value in BSIZE is multiplied by 16.

Receive Buffer Size Configuration
Size (Bytes) BSIZE BSEX
16384 01b 1
8192 10b 1
4096 11b 1
2048 00b 0
1024 01b 0
512 10b 0
256 11b 0
Interrupt mask Set / Read (IMS)
Field Bit(s) Description
TDW 0 Sets mask for Transmit Descriptor Written Back
TXQE 1 Sets mask for Transmit Queue Empty.
LSC 2 Sets mask for Link Status Change.
RXSEQ 3 Sets mask for Receive Sequence Error.

This is a reserved bit for the 82541xx and 82547GI/EI. Set to 0b.

RXDMT0 4 Sets mask for Receive Descriptor Minimum Threshold hit.
RSV 5 Reserved
RXO 6 Sets mask for on Receiver FIFO Overrun
RXT0 7 Sets mask for Receiver Timer Interrupt
RSV 8 Reserved
MDAC 9 Sets mask for MDI/O Access Complete Interrupt
RXCFG 10 Sets mask for Receiving /C/ ordered sets.

This is a reserved bit for the 82541xx and 82547GI/EI. Set to 0b

RSV 11 Reserved
PHYINT 12 Sets mask for PHY Interrupt (not applicable to the 82544GC/EI).

This is a reserved bit for the 82541xx and 82547GI/EI. Set to 0b

GPI 14:11 Sets mask for General Purpose Interrupts (82544GC/EI only).
GPI 14:13 Sets mask for General Purpose Interrupts
TXD_LOW 15 Sets the mask for Transmit Descriptor Low Threshold hit (not

applicable to the 82544GC/EI).

SRPD 16 Sets mask for Small Receive Packet Detection (not applicable to

the 82544GC/EI).

RSV 31:17 Reserved

To enable an interrupt, simply write '1' to the corresponding bit.

Descriptor Format

Both receive and transmit descriptors are 16 bytes in size. There are 3 types of transmit descriptors, the original referred to as the "Legacy transmit descriptor". The second one is referred to as the " TCP/IP Data Descriptor" and is a replacement for the legacy descriptor offering access to new offloading capabilities.The other descriptor type is fundamentally different as it does not point to packet data. It merely contains control information which is loaded into registers of the controller and affect the processing of future packets. For simplicity we will only use the Legacy transmit descriptor. If you want to learn more about the other types of descriptors, you can have a look at the specification.

Legacy Transmit Descriptor Format
63 63 47 40 39 36 35 32 31 24 23 16 15 0
Buffer Address
Special CSS RSV STA CMD CSO Length
Legacy Transmit Descriptor Field Description
Name Description
Buffer Address The address of the buffer. Descriptors with a null address transfer no data.
Length Length is per segment. The maximum length allowed is 16288 bytes.
CSO Checksum Offset. Indicates where, relative to the start of the packet to insert

a TCP checksum if it is enabled in the CMD field.

CMD Command Field
STA Status Field
RSV Reserved
CSS Checksum Start Field. Its an offset relative to the start of the buffer and it

indicates where to start computing the Checksum.

Special Special Field
Transmit Descriptor Command Field Format
7 6 5 4 3 2 1 0
IDE VLE DEXT RPS RS IC IFCS EOP
Transmit Descriptor Command Field Description
Name Description
IDE (bit 7) Interrupt Delay Enable
VLE (bit 6) VLAN Packet Enable
DEXT (bit 5) Extension. (Set to 0b to indicate legacy mode)
RPS/RSV (bit 4) Report Packet Sent. 82544GC/EL only. Otherwise reserved!
RS (bit 3) Report Status. (When set, the controller will fire an interrupt when

the packet gets transmitted and bit STA.DD (Descriptor Done) will be set).

IC (bit 2) Insert Checksum. (When set, the controller will insert a checksum based

on the values of the CSO and CSS fields.)

IFCS (bit 1) Controls the Insertion of the FCS/CRC field in normal Ethernet packets.

IFCS is only valid when EOP is set.

EOP (bit 0) End Of Packet. It indicates the last descriptor making up the packet.

One or many descriptors can be used to form a packet.

Transmit Descriptor Status Format
3 2 1 0
TU LC EC DD
Transmit Descriptor Status Field Description
Name Description
TU/RSV (bit 3) Transmit Underrun. Indicated a transmit underrun error has occurred.

82544GC/EL only. Otherwise reserved!

LC (bit 2) Late Collision. Indicates that a Late Collision occurred while working in

half-duplex mode. It has no meaning in full-duplex.

EC (bit 1) Excess Collisions. It indicates that the packet has experienced more than

the maximum excessive collisions as defined by TCTL.CT control field.

DD (bit 0) Descriptor Done. Indicates that the descriptor is finished.
Receive Descriptor Format
63 48 47 40 39 32 31 16 15 0
Buffer Address
Special* Errors Status Packet

Checksum*

Length

*82544GC/EL only. Otherwise reserved!

Receive Descriptor Status Field
7 6 5 4 3 2 1 0
PIF IPCF TCPCS RSV VP IXSM EOP DD
Receive Descriptor Status Bits
Name Description
PIF (bit 7) Passed in-exact filter. If set the software must examine this packet to determine

whether to accept it or not. if PIF is clear, the packet is known to be for this station.

IPCS (bit 6) IP Checksum Calculated on Packet. (0 = do not perform IP checksum, 1 = perform IP checksum)
TCPCS (bit 5) TCP Checksum Calculated on Packet. (0 = do not perform TCP/UDP checksum, 1 = perform TCP/UDP checksum)
RSV (bit 4) Reserved
VP (bit 3) Packet is 802.1Q (matched VET).
IXSM (bit 2) Ignore Checksum Indication. (when set, the checksum indication results should be ignored).
EOP (bit 1) End Of Packet. (Indicates that this is the last descriptor for an incoming packet)
DD (bit 0) Descriptor Done. (Indicates whether the controller is done with the descriptor)
Receive Descriptor Errors Field
7 6 5 4 3 2 1 0
RXE IPE TCPE CXEa RSV SEQb SEb CE

a. 82544GC/EI only, otherwise reserved!

b.82541xx, 82547GI/EI, and 82540EP/EM only, otherwise reserved.

Receive Descriptor Error bits
Name Description
RXE (bit 7) RX Data Error
IPE (bit 6) IP Checksum Error
TCPE (bit 5) TCP/UDP Checksum Error
CXE (bit 4) Carrier Extension Error
RSV (bit 3) Reserved
SEQ (bit 2) Sequence Error
SE (bit 1) Symbol Error
CE (bit 0) CRC Error or Alignment Error

The Receive Descriptor Special field is only populated for 802.1q packets. For all other packets it's contents are set to 0.

Receive Descriptor Special Field
15 13 12 11 0
PRI CFI VLAN
Receive Descriptor Special Field
Name Description
VLAN VLAN Identifier
CFI Canonical Form Indicator
PRI User Priority

EEPROM Reading

There are a few variants of the card with many differences, most notably the method to access the EEPROM and the Flash memory of the card. Here we will only describe methods applicable to cards that use the EEPROM method.

After that the EEPROM must be enabled in order to be able to read the MAC address of the NIC, this is done by setting the EECD.SK (0x01), EECD.CS (0x02) and EECD.DI (0x04) bits of the EECD (0x00010) register. This will allow software to perform reads to the EEPROM.

Before reading the EEPROM has a "lock-unlock" mechanism to prevent software-hardware collisions when reading from the EEPROM.

To lock the EEPROM the EECD.REQ (0x40) bit must be set in the EECD register. Then wait until the EECD.GNT (0x80) bit becomes set. Unlocking only requires to clear EECD.REQ.

To finally read the EEPROM first the kernel should AND the address to 12 (Applicable only to 82541x or 82547GI/EI cards) or 8 bits; then bit shift the desired address to 2 (Applicable only to 82541x or 82547GI/EI cards) or by 4. The kernel must OR it with the EECD.START (0x01) bit. Then finally write it to the EERD (0x00014) register.

The kernel should wait until the EEPROM read operation is finished by checking until EECD.DONE becomes clear. Then the kernel must read the EERD register, shift it to the right by 16 bits and truncate it to 16-bits.

After that the EERD.START bit must be cleared.

staticuint16_teeprom_read(uint8_taddr){
uint32_ttmp;
uint16_tdata;

if((le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EECD))&I8254X_EECD_EE_PRES)==0){
kpanic("EEPROM present bit is not set for i8254x\n");
}

/* Tell the EEPROM to start reading */
if(dev_info.version==I82547GI_EI
||dev_info.version==I82541EI_A0
||dev_info.version==I82541EI_B0
||dev_info.version==I82541ER_C0
||dev_info.version==I82541GI_B1
||dev_info.version==I82541PI_C0){
/* Specification says that only 82541x devices and the
 * 82547GI/EI do 2-bit shift */
tmp=((uint32_t)addr&0xfff)<<2;
}else{
tmp=((uint32_t)addr&0xff)<<8;
}
tmp|=I8254X_EERD_START;
mmio_write_dword(dev_info.mmio.addr,I8254X_EERD,cpu_to_le32(tmp));

/* Wait until the read is finished - then the DONE bit is cleared */
timeout((le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EERD))&I8254X_EERD_DONE)==0,100);

/* Obtain the data */
data=(uint16_t)(le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EERD))>>16);

/* Tell EEPROM to stop reading */
tmp=le32_to_cpu(mmio_read_dword(dev_info.mmio.addr,I8254X_EERD));
tmp&=~(uint32_t)I8254X_EERD_START;
mmio_write_dword(dev_info.mmio.addr,I8254X_EERD,cpu_to_le32(tmp));
returndata;
}

When all data is finally read the kernel should unlock the EEPROM to let hardware access it.

Initialization

The 8254x will be on an undefined state and as such it needs to be reset. The first thing that should be done is enabling bus mastering, memory and IO accesses from the PCI command register.

Then the NIC should be reset by setting CTRL.RST (bit 26, self clearing) bit in the Device Control register of the card.

After the card has been reset, you should enable the CTRL.ASDE, the CTRL.SLU bits (To enable Auto Speed Detection (ASDE), you must also set the SLU (Set link up) bit) and write the MAC address you want the device to use in the RAL0 and RAH0 registers. To get the device MAC address, all you have to do is read the first 3 bytes of the EEPROM.


The entire procedure looks something like this:

uint8_tMAC_ADDRESS[6];
voidreset_nic(){
uint32_tdevice_control=read_register(I8254_REG_CTRL);

device_control|=I8254_CTRL_RESET;// Set the reset bit
write_register(I8254_REG_CTRL,device_control);

while(read_register(I8254_REG_CTRL)&I8254_CTRL_RESET)__asm__("hlt");// wait for it to reset

device_control=read_register(I8254_REG_CTRL);
device_control|=I8254_CTRL_ASDE|I8254_CTRL_SLU;// Enable Auto Speed Detection.

write_register(I8254_REG_CTRL,device_control);

// Read the MAC address from the EEPROM
uint16_tb0=eeprom_read(0);
uint16_tb1=eeprom_read(1);
uint16_tb2=eeprom_read(2);

MAC_ADDRESS[0]=b0&0xFF;
MAC_ADDRESS[1]=b0>>8;
MAC_ADDRESS[2]=b1&0xFF;
MAC_ADDRESS[3]=b1>>8;
MAC_ADDRESS[4]=b2&0xFF;
MAC_ADDRESS[5]=b2>>8;

// Write the MAC address to RAL/RAH 0.
uint32_twriteL=((uint32_t)b1<<16)|b0;
uint32_twriteH=b2;

write_register(E1000_REG_RAL0,writeL);
write_register(E1000_REG_RAH0,writeH);
}

Ring setup

Theory of operation:

The next step is to setup the rings. Without setting up the rings, you will not be able to send/receive packets. Luckily the ring system is pretty simple, It consists of the T/RDH and T/RDT (Transmit/Receive Descriptor Head/Tail) and of-course the ring buffers.

Transmit Ring

In the image bellow, you can see the structure of the transmit ring. The shaded boxes represent descriptors that have been transmitted but not yet reclamed. (If you dynamically allocate the descriptor buffers, reclaiming would simply involve freeing those buffers).

Transmit Ring Structure.png

Anything between the Head and the Tail is owned by the controller and consists the transmit queue (the descriptors that have been queued for transmission). At reset, both TDT and TDH are set to 0. (If TDT = TDH that means that the queue is empty, there is nothing to transmit).

Receive Ring

The image bellow depicts the structure of the receive ring. The shaded boxes represent descriptors that have stored incoming packets but have not yet been recognized by the driver. You can detect which descriptors have incoming data written in them by checking whether the status field is non-zero.

Receive Ring Structure.png

Any descriptors between RDH and RDT are owned by the hardware and should not be modified!

After the reset, the head should point to the first descriptor and the tail to the last descriptor of the ring (Since all descriptors are available for use).

The RDH points to the descriptor the controller will write the next received packet. It increments automatically.

The RDT points to one descriptor after the last available descriptor. This register should still point to a valid descriptor (should be within Base and Base + Size).


The TDLEN/RDLEN registers contain the size in bytes of the ring.

Setup:

Transmit Ring
  • Firstly allocate a region for the descriptor ring
  • Next, you can allocate a static buffer for the descriptors if you want, or use a dynamically allocated buffer to allocate it when you transmit the packet (In this example code, we use the first option).
  • Set TDH and TDT to 0, TDBAL to the lower 32 bits of the ring's physical address, TDBAH to the higher 32 bits and TDBAL to the total length of the ring buffer (number of descriptors * 16)
  • Set your preferred bits in the TCTL registger.
// Assumes 1:1 memory mapping for simplicity
#define NUM_OF_TX_DESCRIPTORS 8
#define SIZE_OF_TX_DESCRIPTOR_BUFFER 4096
structtransmit_descriptor_t;

voidsetup_transmit_ring(){
size_ttransmit_ring_size=NUM_OF_TX_DESCRIPTORS*16;
transmit_descriptor_t*transmit_ring=your_favorite_physical_allocator(transmit_ring_size);

for(inti=0;i<NUM_OF_TX_DESCRIPTORS;i++){
transmit_descriptor_t*descriptor=transmit_ring+i;
descriptor->buffer_address=your_favorite_physical_allocator(SIZE_OF_TX_DESCRIPTOR_BUFFER);
}

write_register(REG_TDBAL,((uint64_t)transmit_ring)&0xFFFFFFFF);
write_register(REG_TDBAH,((uint64_t)transmit_ring)>>32);
write_register(REG_TDLEN,transmit_ring_size);
write_register(REG_TDH,0);
write_register(REG_TDT,0);

// Set the Enable (EN) and Pad Short Packets (PSP) bits
uint32_ttctl=E1000_TCTL_EN|E1000_TCTL_PSP;
write_register(REG_TCTL,tctl);
}
Receive Ring
  • Firstly allocate a region for the descriptor ring
  • After that, loop through each descriptor and allocate a buffer of the selected size (set in the Receive Control Register) and set it (its physical address) in the descriptor address field.
  • Set RDH to 0 (the first descriptor), RDT to the last descriptor (number of descriptors - 1), RDBAL to the lower 32 bits of the ring's physical address, RDBAH to the higher 32 bits and RDLEN to the total length of the ring buffer (number of descriptors * 16).
  • Set your preferred bits in the RCTL register (You must set the EN bit to enable the dma engine. LPE and BAM are recommended).
// Assumes 1:1 page mapping for simplicity
#define NUM_OF_RX_DESCRIPTORS 32
#define SIZE_OF_RX_DESCRIPTOR_BUFFER 4096
structreceive_descriptor_t;

voidsetup_receive_ring(){
size_treceive_ring_size=NUM_OF_RX_DESCRIPTORS*16;// you can substitute 16 with sizeof(receive_descriptor_t)
receive_descriptor_t*receive_ring=your_favorite_physical_allocator(receive_ring_size);

for(inti=0;i<NUM_OF_RX_DESCRIPTORS;i++){
receive_descriptor_t*descriptor=receive_ring+i;
descriptor->buffer_address=your_favorite_physical_allocator(SIZE_OF_RX_DESCRIPTOR_BUFFER);
}

write_register(REG_RDBAL,((uint64_t)receive_ring)&0xFFFFFFFF);// Base Address Low
write_register(REG_RDBAH,((uint64_t)rx_phys)>>32);// Base Address High
write_register(REG_RDLEN,receive_ring_size);// Ring Size
write_register(REG_RDH,0);// Set it to the first descriptor
write_register(REG_RDT,NUM_OF_RX_DESCRIPTORS-1);// Set it to the last descriptor

// Set the Enable, Long Packet Reception, Broadcast Accept Mode and Size Extenstion bits
// Also set the buffer size. This configuration (BSIZE = 0b11 and BSEX = 1) means 4096 (4kB) buffers
uint32_trctl=RCTL_EN|RCTL_LPE|RCTL_BAM|RCTL_BSEX|(0b11<<RCTL_BSIZE);
write_register(REG_RCTL,rctl);
}

Interrupt Handling

Well, If you want to receive packets, you need a way of knowing when to read them. Thats where interrupts come into play.

To enable Interrupts, simply set the corresponding bit in the Interrupt Mask Set/Read (IMS) register. Recommended interrupts are: RXT0 (to receive interrupts about incoming packets), RXO (to get notified about overruns) and LSC (to get notified about link status changes, e.g. if the user (un)plugs the ethernet cable. In such cases, you should redo the DHCP handshake to connect to that network).

voidenable_interrupts(){
uint32_tims=E1000_IMS_RXT|E1000_IMS_RXO|E1000_IMS_LSC;
write_register(REG_IMS,ims);
}

To check why an interrupt was caused, you can check the Interrupt Cause Read (ICR) register. The ICR register is self clearing, meaning it will get cleared when you read it. A simple interrupt handler may look something like this:

void_handle_interrupt(){
uint32_tcause=read_register(REG_ICR);// Cleared uppon read

if(cause&IMS_RXT){// Packets received
receive_packets();// Call the function responsible for receiving
// packets and sending them to the network stack
}

if(cause&IMS_LSC){// link status change
// Read the status register and check the LU bit to get the link status
if(read_register(E1000_REG_STATUS)&STATUS_LU){
kprintf("Link change detected: Link up!\n");
}else{
kprintf("Link change detected: Link down!\n");
}
}
}

Packet Transmittion

To transmit a packet, all you have to do is load the data in a free descriptor (or split it if it doesn't fit in one descriptor) and set the EOP bit on the last descriptor.

In this example we are using preallocated buffers, but you could use dynamically allocated ones. Just remember to free it after the packet is transmitted.

voidsend_data(void*data,uint32_tsize,boolEOP){
uint32_ttail=read_register(REG_TDT);
transmit_descriptor_t*tx=transmit_ring+tail;// Get the descriptor the tail is pointing at (next available descriptor)

memcpy(tx->buffer_address,data,size);// Copy the data to the previously allocated buffer

tx->length=size;// Set the length of the descriptor

if(EOP)tx->command|=TX_CMD_EOP|TX_CMD_IFCS;// If its the last one, set EOP
tail=(tail+1)%NUM_OF_TX_DESCRIPTORS;
write_register(REG_TDT,tail);// Increment and write the tail
}

size_tsend(void*data,size_tlength){
size_tsent=0;
// split the data into chunks and send them
for(;sent<length;){
intto_send=min(length-sent,SIZE_OF_TX_DESCRIPTOR_BUFFER);
send_data((void*)((uint64_t)data+sent),to_send,to_send==(length-sent));
sent+=to_send;
}
returnsent;
}

Packet Reception

To receive packets after an interrupt, all you have to do is loop, from the first non-received (by the driver) packet, to the last one. To do that, its a good idea to keep track of the last descriptor the driver read. (You should do this, to reconstruct the packets in the correct order)

uint8_trx_next=0;

voidreceive_packets(){
uint32_tidx=rx_next;

void*buffer=nullptr;// use this to store the buffer.
size_tbuffer_len=0;

while(receive_ring[idx].status&RX_STATUS_DD){
// This descriptor has been filled

booleop=receive_ring[idx].status&RX_STATUS_EOP;
uint16_tlen=receive_ring[idx].length;
void*data=receive_ring[idx].buffer_address;

// Handle multiple-descriptor packets
if(buffer==nullptr){// This is the first descriptor of the packet
buffer=malloc(len);// use your kernel's heap allocator
buffer_len=len;
memcpy(buffer,data,len);
}else{
// Its the next part of the packet, add it to the packet
void*new_buffer=malloc(buffer_len+len);// allocate a bigger buffer
memcpy(new_buffer,buffer,buffer_len);// copy the previous data
free(buffer);// free the old buffer

// copy the new data
memcpy((void*)((uint64_t)new_buffer+buffer_len),data,len);

// Set the new buffer into the variables
buffer_len+=len;
buffer=new_buffer;
}

// Set status to 0 (To give ownership back to the controller)
receive_ring[idx].status=0;

idx=(idx+1)%NUM_OF_RECEIVE_DESCRIPTORS;

if(eop){
// This is the last descriptor of the packet
// Forward the packet to your network stack
stack_receive_packet(buffer,buffer_len);
buffer=nullptr;
buffer_len=0;
}
}

// Give the controller more free descriptors by updating RDT
uint32_ttail=(idx==0)?NUM_OF_RECEIVE_DESCRIPTORS-1:idx-1;
write_register(REG_RDT,tail);

rx_next=idx;
}

Emulation

  • VirtualBox (3.1 is all I can personally confirm) supports rather dodgy implementations of an Intel PRO/1000 MT Server (82545EM), Intel PRO/1000 MT Desktop (82540EM), and Intel PRO/1000 T Server (82543GC).
    • Bugs:
      • The EERD register is unimplemented (you *must* use the 4-wire access method if you want to read from the EEPROM). [01000101 - I had a patch committed to fix this. It will soon be mainstream]
  • VMWare Virtual Server 2 emulates/virtualizes an 82545EM-based card rather well.
  • QEMU (since 0.10.0) supports an 82540EM-based card and it seems to work OK. It is the default network card since 0.11.0.
    • Bugs:
      • QEMU does not properly handle the software reset operation (CTRL.RST) in builds prior to June 2009.
      • QEMU (version 4.2.1 tested) doesn't seem to support flash memory, instead shifting the IO Register Base Address up.
  • IIRC (needs confirmation) Microsoft's Hyper-V supports an 8254x-series card.

Documentation

Example driver

Retrieved from "https://wiki.osdev.org/index.php?title=Intel_8254x&oldid=29753"