This section is for my comments on the EDK and USB. The current problem is moving to the AXI bus from PLB/LMB.
AXI USB 2 HighSpeed Host and Device Cores are Available
We fixed the FPGA code generation and now have both host and device cores for USB 2 High Speed on AXI for the Xilinx EDK. Each core has both bus master and slave interfaces allowing USB bus saturation level performance, I.E., 52 megabytes/sec sustained performance.
Both cores have enhanced and configurable queueing support to provide optimized performance while using minimal resources.
Since both cores are coded in System C (C++) and synthesized using the Forte Design Automation Cynthesizer product, we can easily customized each core to meet customer needs. Turn around time can be measured in days, not weeks or months. If you are a high level modeler, then we will supply System C source, if you are an RTL designer, then we supply Verilog RTL or a netlist. And, since this is high level EDA, porting from Xilinx to Altera (when they support AXI) or an ASIC flow is easy. Essentially, there are no core IP changes to move the core from flow to flow. The test bench stays the same, the core coding stays the same.
System C AXI USB Device Core to be released soon.
Using the Forte Cynthesizer compiler, we are able to get hand coded verilog performance while coding the core in System C (C++). This is a big productivity boost for everyone. Current ASIC results taking the verilog output of Cynthesizer, and running it through Design Compiler for a TSMC .18 micron process are excellent - almost the same as hand coded. FPGA results are not, yet, but this is a compiler problem and will get fixed. The advantages are huge. The entire AXI interface, including high performance DMA is in one .h file. It's a class, so every time you need AXI, there is no more typing 40 signal names, you just instantiate the class, I.E., axi::master, axi::slave, axi::bus.
System C makes the core much easier to modify and keeps the design process invariant. Moving from software to hardware is much easier since the target language is C++.
For none Forte Cynthesizer customers, we will build RTL in verilog for you. In all cases, the System C model will be available for simulation use.
ULPI Data Bus Timing Problems
One problem I encountered in the last 4 weeks is the ULPI Data bus in the out direction not would not meet setup and hold requirements during some builds of the EDK project I was using as a test bench.
When the Sp605 development board running test firmware emulating a mass storage device was plugged into a PC it would not enumerate. Looking at the Lecroy Voyager traces, I could see that bit 7 of the ULPI data bus was getting flipped at times corrupting the returned descriptors and thus making the PC reject the device. After consulting with a Xilinx FAE, it appeared that the ULPI Data Out registers were not being placed into I/O Pin registers when timing would fail. I was about to spend the day learning the details of Xilinx constraints, when it occurred to me that I have not had this problem in 10 years with the EDK. I went back to my old MPD file for my old core and compared the ULPI section to my new core's ULPI section and found that the syntax was different. The old format seems to work every time, the new format may not. The successful format is shown below. (It is all one line, sandvox is wrapping it)
DMA Working on Main Memory AXI Bus
Found the right formula. It was the MPD file for my new usb core. I can now achieve 52 megabytes/sec (USB saturation speed) using AXI DMA to/from the DDR memory on AXI. As usual with Microblaze and DD anything memory, the cache size is very important. If you use the default of 8KB for data and instruction cache, then 4 buffers per endpoint in the new core architecture are required. If you increase the Microblaze cache to 16KB data and instruction, then the USB core can use 2 buffers per endpoint. Each bram will hold 8 buffers so keeping the Microblaze cache small and adding core buffers may be the better option. More later as I have a chance to play with the various sizes.
Bus Master AXI working on Private AXI bus
AXI Development Wars
Update on AXI:
I have both master and slave interfaces running.
The master interface is blistering fast, but unstable, I.E. USB operations start to malfunction after 100 mega bytes or so of USB data. I'm working on it. The good news is that burst transfers work and can easily keep up with High-Speed USB traffic demands.
The slave interface is stable, but under Microblaze there are no burst transfers in uncached address space. The only obvious place to put USB on AXI as a slave is AXI "Lite". In typical Xilinx EDK fashion, the AXI Lite bus is dog slow - not useful for USB performance applications. AXI is supposed to synthesize to a cross bar connection with minimal logic (read no delays to speak of) when the clocks are all synchronous and the same speed. All I can tell you is that under EDK 13.1 and the SP605 board, I get a thundering 13 megabytes / sec. This is compared to 52 megabytes /sec on LMB, the forbidden bus that we all use when we need performance.