Linux Crystal Beach driver Pre-Alpha Evaluation Release
-------------------------------------------------------

This README describes how to enable your Bridgeport system to use the Crystal
Beach DMA engine under Linux. This driver should be MP-safe and mostly stable,
but has not yet been tuned for performance.

Contents
--------
1) Integration Instructions
2) sysfs Interface
3) Implementation Description

Integration Instructions
------------------------

1) Start with a fresh kernel source tree (2.6.12). Make sure it
   compiles and boots. Verify that lspci shows the CB device. (bus 0 device 8
   unknown device 1a38). If it doesn't, enable it in the BIOS, under
   Advanced -> Blackford -> Crystal Beach.

2) Copy everything from the directory this README resides into a new
   directory in the kernel tree named drivers/dma.

3) Move drivers/dma/dmaengine.h to include/linux.

4) Apply drivers/dma/cb-{kernel version}.diff using patch.

5) Do a kernel config (be sure to turn on DMA subsystem and the CB driver),
   compile, etc. It should not be necessary to enable the DMA softcopy
   or Debug options. If compiled as a module, it will be called ioatdma.ko.

6) Verify that the kernel has CONFIG_PCI_MSI enabled. Turning PREEMPT off is
   recommended.

7) Boot the newly-compiled kernel.


Crystal Beach sysfs Interface
-----------------------------

If the CB driver is properly loaded, there will be directories created in sysfs,
under /sys/class/dma, named dma0chanX, where X is 0-3.

Channel entries:

in_use
1 if the DMA channel is allocated to a client, such as the network stack.

min_copy_size
Since DMA engine transactions require setup, it is faster to just use a
CPU-based copy for very small copies. Copies smaller than min_copy_size
bytes will be done via a CPU copy. This parameter is writable. For example,
"echo 64 > min_copy_size" would set the threshold to 64.
A value of 4097 or greater will result in all copies being performed by the
CPU. A value of 0 will cause all copies to be executed by the Crystal Beach
device.

bytes_transferred
The total number of bytes transferred by either the DMA engine, or by CPU
copy (see min_copy_size).

memcpy_count
The total number of copy operations initiated.

Implementation Description
--------------------------

The DMA engine allows the network stack to offload copies to user memory.
However, to do so, some changes to the network stack are necessary:

  -- Initialization --
* During net stack initialization, (dev.c) the network stack must ask for DMA
  channel resources.

  -- Runtime --
* In tcp_recvmsg (tcp.c), the user buffer (the iovec array) is locked down
  (using get_user_pages) and a mapping between the pages and the original iovec
  established (the locked_list).
* When a packet is being added to the prequeue, if it passes header prediction
  and there are preposted buffers, it initiates the copy operation, and this
  is flagged by setting skb->copied_early (tcp_input.c).
* When the user process is woken up, it skips copying SKBs already being
  copied by the DMA engine. It waits for the completion of the async copies,
  and then frees all SKBs on the async wait queue, since all copy operations are
  now guaranteed to be completed.
* Finally, the user buffer is unlocked and marked dirty.
