Deployment of parallel architectures in computing systems is increasing. In this paper we study the performance effects of a variety of programming techniques and technologies that utilize these parallel architectures as applied to example algorithms. We demonstrate that algorithms, which are highly parallel in nature, gain significant performance increases through proper application of both parallel computing methodologies and hardware. Using both Java and C++ environments with statistical and mathematical problem sets we demonstrate that proper utilization of parallel computing paradigms can reduce the execution time of algorithms by up to 99.997%.
In order to meet the ever increasing computational demands of modern data-intensive applications hardware manufacturers have sought to provide increasingly advanced computational architectures. While classical designs for processing units focused on providing single core processors driven by high clock speeds modern architectures provide a more powerful and energy efficient solution through the application of multi- core processors that provide multiple lanes of computation and are driven by moderate clock speeds. Recent advances have also revealed the benefit of utilizing the highly parallel architecture of graphics processing units to provide applications with a basic processing pipeline capable of very high computational throughput.
Operating systems provide mechanisms such as threading architectures in order to access the multiple lanes of computation in a concurrent manner. Similarly, manufacturers have begun to provide APIs to access the parallel computational architecture of graphics processors.1 However, while some compilers may provide basic optimizations on applications at compile time in order to take advantage of concurrent architectures the solutions usually far fall short from fully utilizing available resources2. In order to fully utilize the computational throughput of available resources software developers must design their applications with concurrency as a core element.
In this paper we show the evolution of several algorithms through the application of both language frameworks and concurrency techniques in order to highlight the inefficiencies of classical single pipeline development and to demonstrate the performance gains for applications by fully utilizing parallel architectures.