As a charitable service-based nonprofit organization (NPO) coordinating individuals, businesses, academia and governments with interests in High Technology, Big Data and Cybersecurity, we bridge the global digital divide by providing supercomputing access, applied research, training, tools and other digital incentives “to empower the underserved and disadvantaged.”

Abstracts and Bios

Rannan Dagan  - Cloudera

BIO: Raanan is a System Engineer at Cloudera. He is a certified Hadoop developer and administrator. Raanan has been in the software industry for over 20 years. His main focus for the past 10 years is around Distributed Systems. These systems include Cloudera Distribution of Hadoop, Coherence Distributed Cache, JRockit, TopLink O-R Mapping, Complex Event Processing, WebLogic Application Server, and & Tuxedo.

ABSTRACT: Hadoop for High Performance Computing

Technical discussion around the value of the Hadoop echo system to address Big Data Analytics. The session will cover the technical elements of: HDFS, MapReduce, HBase, Hive, Pig, Sqoop, Flume, and Mahout (Machine Learning).  Apache Hadoop is a powerful open source technology that addresses the economic, flexibility and scalability issues surrounding massive amounts of enterprise data and enables actionable insights to be derived from structured and unstructured data sets. Hadoop, which forms the infrastructure foundation of many of the world’s leading social media companies, including Facebook, LinkedIn and Twitter, has rapidly become a leading solution to the new challenges generated by Big Data

Steve Hebert  - Nimbix

BIO:  Mr. Hebert is founder and CEO of Nimbix, LLC, an HPC-as-a-Service and GPU hosting company. Prior to Nimbix, Mr. Hebert was at Altera Corporation, where he worked with HP to enable FPGA-based hardware for accelerated computing. Prior to Altera, he held management and consulting roles in the IT industry and notably served as CEO of, Inc., a digital media company that deployed cloud-based streaming video for corporate communications. He began his career at Texas Instruments’ semiconductor group in Sales & Marketing. Mr. Hebert earned a Bachelor of Science in Electrical Engineering from Santa Clara University.

ABSTRACT: Alternative Deployment Models for Cloud Computing in HPC Applications

With the dramatic growth in the adoption of cloud computing services in the IT industry, it is no surprise that cloud infrastructures have become attractive to high performance computing professionals. Confronted with ever increasing workloads and bigger computational challenges, the idea of a vast pool of “utility” machines being available to help with computation is intriguing. The challenge for many organizations who have spent years optimizing software and hardware is to answer the question of how best to leverage the cloud to take advantage of potential benefits. This presentation explores three deployment models in cloud computing for high performance applications as well as associated challenges in provisioning, workload management, data movement and ease of use.

Eric Jones  - Enthought

BIO:  Eric has a broad background in engineering and software development and leads product engineering and software design at Enthought (  Prior to co-founding Enthought, Eric worked in the fields of numerical electromagnetics and genetic optimization in the Department of Electrical Engineering at Duke University.  His research led him to become one of the initial co-authors of SciPy (  He has taught numerous courses about Python and how to use it for scientific computing.  He also serves as a member of the Python Software Foundation (

ABSTRACT: Python in Scientific Computing

Eric will provide a whirlwind tour of the possibilities of Python for scientific computing and demo some industry examples.  You will get an overview of the numeric goodness found in NumPy and SciPy and learn about 2D and 3D visualization tools for quickly displaying results.  The examples will span from simple, useful analysis techniques to full blown applications with rich and elegant user interfaces. Eric will cover examples of how to develop rapidly with Python to build user interfaces as well as integrate Fortran/C/C++/GPU code to combine legacy and HPC code.

Charles Lively  - Texas A&M

BIO:  Charles Lively III is a PhD Candidate in the Department of Computer Science and Engineering working with Valerie E. Taylor.  He received his B.S.E. in Computer Engineering from Mercer University and M.S. in Computer Engineering from Texas A&M University.  His research interests include High Performance Computing with special interest in the analysis and modeling of scientific applications.

ABSTRACT: Energy and Performance Characteristics of Different Parallel Implementations of Scientific Applications on Multicore Systems

Energy consumption is a major concern with high performance multicore systems. In this paper, we explore the energy consumption and performance (execution time) characteristics of different parallel implementations of scientific applications. In particular, the experiments focus on message-passing interface (MPI)-only versus hybrid MPI/OpenMP implementations for hybrid NAS (NASA Advanced Supercomputing) BT (Block Tridiagonal) benchmark (strong scaling), a Lattice Boltzmann application (strong scaling), and a Gyrokinetic Toroidal Code – GTC (weak scaling), as well as CPU frequency scaling. Experiments were conducted on a system instrumented to obtain power information; this system consists of 8 nodes with 4 cores per node. The results indicate, with respect to the MPI-only versus the hybrid implementation, that the best implementation is dependent upon the application executed on 16 or fewer cores. For the case of 32 cores, the results were consistent that hybrid implementation resulted in less execution time and energy. With CPU frequency scaling, the best case for energy saving was not the best case for execution time.

Dr. Joshua Mora  - AMD

BIO: Joshua Mora, Sr. Member Technical Staff, AMD.  PhD in Computational Fluid Dynamics and  large scale solvers for state of the art High Performance Computing Systems.  Research, development and consultancy on hardware and software of High Performance Computing solutions for private (Oil and Gas, manufacturing, Formula 1, cloud) and public sectors (research/academia institutions). Member of High Performance Group of Spec,  HPC advisory council and HPC society.  Current research on programming models, algorithms for Exascale and HPC support of world wide customers at Advanced Micro Devices.

ABSTRACT: Best practices for programming with openMP on NUMA systems

NUMA systems present the challenge to the multithreading software developer on how to allocate data in such a way that memory accesses are maximized with respect to local accesses. Fundamental concepts on the penalties of remote memory accesses versus local memory accesses will be exposed in addition of experimental data collected on AMD based systems. A set of examples will be provided then as a guideline on how to improve the programming of  openMP applications with heavy memory access requirements (both memory latency and bandwidth sensitive applications). Additionally, runtime setup is another important component towards the proper exploitation of the NUMA systems with openMP applications. Therefore it deserves to  be covered as well within those best practices.

Dr. Theodore Omtzigt  - Stillwater

BIO:  Dr. Omtzigt has been part of the most exciting new product development efforts in the PC industry in the past twenty years. He started his professional career as a computer architect at Intel with the first PCI chip set architecture team defining the performance characteristics of Intel's Pentium platform components. After successfully developing three product families and Intel's platform performance evaluation methodology, he moved in 1994 to the Micro Processors Group and was instrumental in setting up the capability to evaluate Pentium processor performance under 16- and 32-bit Microsoft Windows platforms. In his role as Pentium processor architect, Dr. Omtzigt was involved in the enhancements of the bus and cache architecture of the Pentium, Pentium II and Pentium III processors. When Intel engaged with Lockheed Martin to commercialize its 3D graphics technology, Dr. Omtzigt became part of the Accelerated Graphics Port (AGP) definition team. As part of this team, he led the performance modeling of the AGP protocol and attached graphics controllers. During this period Dr. Omtzigt invented Intel's Observation Architecture (OA), a real-time integrated diagnostic processor. He also created the Platform Performance Engineering team to develop the software infrastructure for OA to help developers obtain platform performance information useful for game software optimizations.

Dr. Omtzigt has contributed to product development efforts at two other companies, 3Dfx Interactive and NVIDIA Corporation. At 3Dfx, he developed a new methodology that accurately predicts the performance of new product features on complex workloads commonly found in the graphics industry. This was instrumental in allowing 3Dfx to optimize its engineering resources to focus on the right chip components. At NVIDIA, Dr. Omtzigt applied his product development skills to product verification, the back-end of the development chain. In this context, Dr. Omtzigt developed a round-trip engineering environment that allows the conceptual product specification to be verified against the emerging chip implementation. In all of these venues, Dr. Omtzigt's contributions made the engineering organization more efficient and capable of tackling more complex problems, considered the hallmark of successful new product development in the high-technology market place.

In 2005, Dr. Omtzigt founded Stillwater Supercomputing, Inc. to develop and commercialize technology targeted at the computational science and engineering communities. The goal of Stillwater is to revolutionize Computer Aided Design and Engineering through the use of personal supercomputers and the round-trip engineering methodologies he pioneered at Intel and NVIDIA.

Dr. Omtzigt received his Ph.D. in Electrical Engineering from Yale University in 1992 through original research in advanced supercomputing architectures. He earned his Master's cum laude in Electrical Engineering at Delft University of Technology in 1987. He is a lifelong member of the IEEE, and served as an IEEE Student Chapter president at Delft University of Technology, in the Netherlands. He lives with his wife, Terrie St. Clair, a professional nature photographer, in Freeport, Maine, and El Dorado Hills, California.

ABSTRACT: OpenKL a high-level linear algebra framework for portable geoscience

Seismic imaging has become more accurate with the development of more sophisticated velocity models and the improved affordability of low-cost high-precision accelerometers. During a seismic survey, human operator error and environmental interference of the sensors are the biggest impacts on accurate data capture. These errors can lead to significant economic cost measured in hundreds of thousands of dollars.  These errors are not caught till the data set is analyzed in the back office. To create accurate reservoir models large scale supercomputers are used, adding to the opportunity cost of bad survey data. With today's personal computing technologies delivering 8-core 8GB laptops even large seismic surveys can easily by captured and managed on a $1000 laptop. This creates the opportunity to build seismic surveys that produce real-time responses by carrying relatively small form factor computing infrastructure on the thumpers or survey vessels, thus providing immediate feedback on the survey data collection quality. However, this proliferation of computing platforms from laptops, to multicore desktops, GPU-accelerated desktops, workgroup clusters, to cloud-based supers complicates the delivery of high-performance state-of-the-art geophysics and geochemistry applications. OpenKL is an effort to solve this problem through a high-level linear algebra framework that abstracts away the details of the underlying hardware. This decouples the geoscience application from the underlying hardware in the same way that OpenGL decoupled a graphics application from the graphics accelerator. This decoupling enables portable geoscience applications that can run without modification on the full spectrum of computing hardware that is available to the survey crew. This increases the productivity of the overall effort as the geophysicist can concentrate on reconstruction algorithms instead of having to select the best SSE or CUDA instructions. We will discuss the current state of OpenKL which includes a demonstration of a high-level algorithm that executes on multi-core CPU, a GPU, and a Distributed Memory cluster. We'll conclude with the OpenKL roadmap and reach out to the community to deliver portable, high-performance geoscience applications.

Xingfu Wu  - Texas A&M

BIO: Dr. Xingfu Wu is a TEES Research Scientist at Texas A&M University. He is a senior ACM member and an IEEE member. His research interests are performance evaluation and modeling, parallel and cloud computing, and power and energy analysis in HPC systems. He served as session chairs and PC members for several international conferences, and was a guest editor of IEEE Distributed Systems Online Special Issue on Data-intensive Computing (Vol. 5, Issue 1, 2004). His monograph: Performance Evaluation, Prediction and Visualization of Parallel Systems was published by Kluwer Academic Publishers (ISBN 0-7923-8462-8) in 1999.  He won the best paper award in the 14th IEEE International Conference on Computational Science and Engineering.

ABSTRACT: Parallel Finite Element Earthquake Rupture Simulations on Large-scale Multicore Supercomputers

In this talk, we use the 2008 Ms 8.0 Wenchuan earthquake occurred in Wenchuan county, Sichuan province in China on May 12th, 2008 as an example to address our earthquake rupture simulations. We integrate a 3D mesh generator into the simulation, and use MPI to parallelize the 3D mesh generator, illustrate an element-based partitioning scheme for explicit finite element methods, and based on the partitioning scheme and what we learned from our previous work, we implement our hybrid MPI/OpenMP finite element earthquake simulation code in order to not only achieve multiple levels of parallelism of the code but also to reduce the communication overhead of MPI within a multicore node by taking advantage of the shared address space and on-chip high inter-core bandwidth and low inter-core latency. We evaluate the hybrid MPI/OpenMP finite element earthquake rupture simulations on quad- and hex-core Cray XT 4/5 systems from Oak Ridge National Laboratory using the Southern California Earthquake Center (SCEC) benchmark TPV 210. Our experimental results indicate that the parallel finite element earthquake rupture simulation obtains the accurate output results and has good scalability on these Cray XT systems. In the end of this talk, we will talk about Prophesy and its extension.

Krishna Sankhavaram  - University of Texas M.D. Anderson Cancer Center

BIO: Krishna Sankhavaram brings 20 years of experience building software solutions and delivering IT infrastructure services to support Basic and Translational research in cancer research environments. He has a broad background in many technologies with Solution architecture skills blended with extensive experience in project management. He spent more than a decade at St. Jude Children’s Research Hospital, where he helped build the IT infrastructure for research at the Hartwell center for Bioinformatics and Biotechnology. He is the Director for Research Information Systems & Technology Development at MD Anderson Cancer Center. He helped build a large computing cluster for research, with a rapidly growing storage environment. He leads SOA based integrated software framework for researchers that integrates clinical and research areas.

ABSTRACT: IT Challenges in supporting Personalized Cancer Medicine

An introduction and discussion of the IT challenges we face in supporting personalized cancer medicine. The data tsunami that we have to deal with, the role of cluster computing and impact will be discussed with a lot of technical details on the storage, and computing resources available at MD Anderson for research. I will also show our software framework that researchers use to work in this environment.


Micah Staggs -- Brocade

BIO:  Micah has worked with computer networks for 15 years.  He was certified CCIE in 2001. Designed networks at Enron from 2000 to 2003, moving in 2003 to AIM investments.  He has been a network engineer at Foundry/Brocade since 2007.  He specializes in Ethernet Layer 2 and 3 and in designing for performance and throughput.

ABSTRACT: Intelligent Ethernet Fabrics in HPC


Doni Branch -- Intel

BIO:  Doni is an HPC Solutions Architect at Intel.

ABSTRACT: The Missing Middle