Usenet Postings
By Subject
By Date
FPGA CPUs
Why FPGA CPUs?
Homebuilt processors
Altera, Xilinx Announce
Soft cores
Porting lcc
32-bit RISC CPU
Superscalar FPGA CPUs
Java processors
Forth processors
Reimplementing Alto
Transputers
FPGA CPU Speeds
Synthesized CPUs
Register files
Register files (2)
Floating point
Using block RAM
Flex10K CPUs
Flex10KE CPUs
Multiprocessors
Multis and fast unis
Inner loop datapaths
Supercomputers
Systems-on-a-Chip
SoC On-Chip Buses
On-chip Memory
VGA controller
Small footprints
CNets
CNets and Datapaths
Generators vs. synthesis
FPGAs vs. Processors
CPUs vs. FPGAs
Emulating FPGAs
FPGAs as coprocessors
Regexps in FPGAs
Life in an FPGA
Maximum element
Miscellaneous
Floorplanning
Pushing on a rope
Virtex speculation
Rambus for FPGAs
3-D rendering
LFSR Design
Subject: Re: FPGA multiprocessors Date: 07 Oct 1997 00:00:00 GMT Newsgroups: comp.arch.fpga Charles Sweeney <CharlesSweeney-@compuserve.com> wrote in article <3438A7D6.2431@compuserve.com>...> Jan Gray wrote:>> Assuming careful floorplanning, it should be possible to place six 32-bit>> processor tiles, or twelve 16-bit processor tiles, in a single 56x56>> XC4085XL with space left over for interprocessor interconnect. Also the>> number of processor tiles can be doubled if we eschew the I-cache and>> simplify the microarchitecture -- though performance would greatly suffer.> > It's good to see you planning to take advantage of the parallelism> offered by FPGAs, but why constrain your software to have to run in a> particular microprocessor architecture? why not go further and compile> your programs directly into the hardware of the FPGA, Handel-C does> exactly that, please see our web site below. Good question. The trite answer is since designing processor ISAs and microarchitectures for FPGA implementations is my research interest, that's my hammer in search of nails. FPGA multiprocessors are now possible -- but it remains to be seen if they are actually useful! The other answer is that I don't preclude a modest custom datapath per processor (and such datapaths could be designed from source code by tools such as Handel-C). So I think an FPGA multiprocessor is the preferred solution for problems which: 1. are amenable to n-way "outer loop" parallelism and 2. involve too much irregular computation for custom datapath only and 3. involve enough inner loop regular computation that an FPGA custom datapath is faster/cheaper than a general purpose processor or multiprocessor built of same. (Whether such problems exist and are important remains to be seen.) As for your question "why not go further and compile your programs directly into the hardware of the FPGA?" :- There will always be very regular signal processing applications, regular in computation, regular in operand fetch and result store, and relatively simple in the computation kernel, for which a custom datapath compiled to an FPGA is a good solution. But there are also other computations which are either too irregular or too large to practically implement in an FPGA datapath, even in a time-multiplexed (reconfiguration) manner. The "outer loops" and "outer function calls" of these computations are best done in a general purpose processor, even as you move the inner loop(s) to a custom datapath. Indeed, the inner loops may constitute only a few percent of the total text of the source code of the computation. To help these large "dusty deck" applications take advantage of custom datapaths, it must be extremely convenient to interface the custom stuff to the general purpose processor. For some problems where even the irregular computation is a critical path, especially those involving floating-point, it probably makes sense to choose a fast, cheap commercial off-the-shelf microprocessor. Of course there are penalties here. Cost of processor. Less integration. Board real-estate costs. "Representation domain crossing" costs. Relatively slow communication between processor and FPGA. Cost of FPGA resources spent interfacing to processor. But for problems where the irregular computation is not the critical path, the now modest overhead (10-20%) of an embedded general purpose CPU enables an interesting integrated "system on chip" hybrid: embedded processor, on-chip bus, on-chip custom datapaths and peripherals. In theory, you could compile your dusty deck C, C++, Java, FORTRAN, Scheme, etc. and run it immediately on your FPGA CPU. Then automatically (profile driven) or through explicit directives, you can compile the inner loops to a custom datapath. This can either be manifest as an on-chip command oriented coprocessor, or in some cases as new instructions. The latter has the potential advantage of very high custom operation issue rates (today, 66 MHz) and access to processor register file, etc. Given this approach, even if your dusty deck app stores its data in such advanced data structures (sarcasm) as a linked list (/sarcasm), it can still potentially take advantage of a custom datapath. This is much less feasible if your registers or operands(s) are microseconds away on the non-embedded host processor. For example, the unused logic in //www3.sympatico.ca/jsgray/sld021.htm was reserved for the Gouraud rendering instructions described in the last paragraph in: //www3.sympatico.ca/jsgray/render.txt Of course, embedded processor in programmable logic is just one point on the CPU/custom datapath spectrum. See also the BRASS research //http.cs.berkeley.edu/Research/Projects/brass and my old essay on FPGA PC coprocessors //www3.sympatico.ca/jsgray/coproc.txt Jan Gray
Copyright © 2000, Gray Research LLC. All rights reserved.
Last updated: Feb 03 2001