CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big Databy Chao Wang, Xi Li, Xue-Hai Zhou

J. Comput. Sci. Technol.


D.c. sputtered AlxIn1-xSb films

M. Jachimowski, A. Data

An analytical delay model for SRAM-based FPGA interconnections

Zhou Feng, Huang Zhijun, Tong Jiarong, Tang Pushan

A Phase I/II Study of Herpes Simplex Virus Type 1 Thymidine Kinase "Suicide" Gene Therapy for Recurrent Glioblastoma

David Klatzmann, Charles A. Valery, Gilbert Bensimon, Beatrice Marro, Olivier Boyer, Karima Mokhtari, Bertrand Diquet, Jean-Loup Salzmann, Jacques Philippon, Study Group On Gene Therapy For Glioblastoma


Wang C, Li X, Zhou XH. CRAIS: A crossbar-based interconnection scheme on FPGA for big data. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 30(1): 84–96 Jan. 2015. DOI 10.1007/s11390-015-1506-5

CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big


Chao Wang 1 (王 超), Member, CCF, ACM, IEEE

Xi Li 1,2,∗ (李 曦), Senior Member, CCF, Member, ACM, IEEE, and

Xue-Hai Zhou 1 (周学海), Senior Member, CCF, Member, ACM, IEEE 1School of Computer Science, University of Science and Technology of China, Hefei 230027, China 2School of Software Engineering, University of Science and Technology of China, Suzhou 215123, China

E-mail: {cswang, llxx, xhzhou}

Received July 15, 2014; revised December 12, 2014.

Abstract On-chip interconnection has posed significant challenges in multiprocessor system on chip (MPSoC) design paradigm, especially in big data era. With respect to the state-of-the-art, crossbar-based interconnection methodologies are still efficient for FPGA-based small-scale heterogeneous MPSoCs. This paper proposes a crossbar-based on-chip interconnection scheme, named CRAIS. CRAIS utilizes reconfigurable crossbar interconnections between microprocessors and intellectual property (IP) cores in MPSoC. The hardware interconnection can be dynamically reconfigured during execution.

Empirical results on FPGA prototype demonstrate that CRAIS can achieve more than 7X speedup compared with the state-of-the-art StarNet approach, while it only utilizes 21%∼35% hardware resources of StarNet.

Keywords interconnect, big data, crossbar, multiprocessor system on chip 1 Introduction

With the wide application of cloud computing, mobile Internet and networking, social and enterprise big data pervades our daily life. Diversely, the vast amounts of data generated at an explosion-like speed and the fast growth rate of the global data are unprecedented[1]. It makes our life more convenient and at the same time also poses significant challenges to computer researchers. The novel big data applications are generally considered with the following four characteristics.

Volume. Data magnitude develops from TB to PB and even to ZB; this calls for special care from data mining and processing techniques.

Variety. More and more data is semi-structured or unstructured, such as social media, web pages, images, and videos.

Velocity. Not only the existed volume but also the generation speedup of data is tremendous, and requires high-speed data transfer.

Real-Time Requirements and Low Value Density.

The efficiency of data mining should be maintained.

Above characteristics will inevitably bring new challenges. Massive data makes data security increasingly difficult. What is worse, large data processing and analysis capabilities are far below the ideal level. It needs the capacity of high-speed information transmission and real-time analysis and processing. Furthermore, low value density feature makes data mining more difficult. The collection and analysis of big data is quite time consuming, which poses a significant challenge for real-time processing techniques. If we want to summarize some useful information from the data sea, we have to collect all the data that may be potentially useful. However, it costs much time to transfer massive

Regular Paper

Special Section on Computer Architecture and Systems for Big Data

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61379040, 61272131, 61202053, 61222204, and 61221062, the Natural Science Foundation of Jiangsu Province of China under Grant No. SBK201240198, the Fundamental Research Funds for the Central Universities of China under Grant No. WK0110000034, the Open Project of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CAS) under Grant No. CARCH201407, and the Strategic Priority Research Program of CAS under Grant No. XDA06010403. ∗ Corresponding Author ©2015 Springer Science+Business Media, LLC & Science Press, China

Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data 85 amounts of data. Therefore accelerating the big data application is becoming more and more important.

Meanwhile, over the last few years, Fled Programmable Gate Arrays (FPGA)[2], Coarse Grained

Reconfigurable Architecture (CGRA)[3] and Graphic

Processing Unit (GPU) based heterogeneous computing modes have been regarded as efficient acceleration platforms for data-intensive computing domains, such as machine learning[4-5] and genome sequencing[6]. As an alternative to big data computation using conventional symmetric multiprocessors, heterogeneous computing could efficiently utilize the benefits among a wide range of computing resources. For these two approaches, custom and reconfigurable accelerators processors using FPGA could exploit much more computational performance as well as less power consumption, making it a creditable way for changing and diverse big data applications. As a result, very high throughput and energy efficiency can be potentially achieved using reconfigurable architectures through FPGA devices, such as RAMP[7] and MOLEN[8]. This is particularly true for most dataflow processing and stream applications where data-level parallelism is regarded as the major evaluation metric. On the other end of the spectrum, reconfigurable parallel computing techniques are enabling fast prototyping of dataflow applications ranging from single case study to a series of similar applications. It is a natural manner to regard each microprocessor, digital signal processor or Intellectual

Property (IP) accelerator as a devoted function unit; hence the multiprocessor system on chip (MPSoC) solution can effectively facilitate researchers in high-level abstraction and platform-based system design. As such, there exists tremendous potential for applying reconfigurable computing techniques to dataflow related fields, including biomedical systems, social networks, machine learning, and deep belief networks.