• Fall'19: Our research group meets weekly on Wed 10:30am in KWII 2-205.
  • If you are interested in joining our research group, please contact Dr. Butt.


[MASCOTS'15] Ali is serving as the General Chair for MASCOTS'15. Consider submitting a paper

[Funding] DSSL and Chao Wang receive an NSF award for developing online prediction based resource management

[Funding] SCAPE and DSSL receive an NSF award for exploiting slowdowns for speedup in power-scalable systems

[Cluster'14] paper on supporting Hadoop applications on microservers

[MASCOTS'14] Paper on Hadoop workflow scheduling

[HPDC'14] Paper on performance tuning for MapReduce

[CCGrid'14] Paper on managing heterogeneous storage in Hadoop

[COE Fellowship'13] Ali is named a VT COE Faculty Fellow

[JPDC'13] Paper on reducing energy-mangement delays in disks

[TPDS'12] Paper on timely input data staging for HPC centers

[MASCOTS'12] Papers on cooperative deduplication for data centers, and GPU-based RAID

[HPDC'12] Paper on resource management for MapReduce in the cloud

[CRA-CMW'12] Ali is a speaker at the CRA Career Mentoring Workshop (Presentation slide)

[IBM Fellowship'12] Min Li is a recipient of a 2012 IBM PhD Fellowship. Congratulations!!

[MBEC'12] Paper on real-time detetion of biological molecules using GPUs

[IPDPS'12] Paper on using Hadoop for subgraph analysis in massive graphs

[ACM] Ali is now an ACM Senior Member

[Cluster'11] Paper on QoS-aware scheduling for heterogeneous clusters

[MASCOTS'11] Paper on synthesizing traces for Hadoop simulations

[Funding] Ali receives a NetApp Faculty Fellowship

[ERSS'11] Ali is organizing the Workshop on Energy Consumption and Reliability of Storage Systems (ERSS)

[TPDS] Paper on HPC data offloading

[IGCC'11] Paper on power management of heterogeneous clusters

[ICDCS'11] Paper on multi-tiered data staging for HPC

[IPDPS'11] Paper on cloud-based HPC data transfer service

[JSS] Paper on reusable software components for hybrid clusters

[JPDC] Paper on capability-aware framework for using accelerators in data-intensive computing

[Funding] DSSL receives an NSF Data Intensive Computing award for investigating the interactions between energy savings and reliability improvements in disks

[Funding] DSSL receives an NSF CSR award for applying the Cloud Computing model to HPC, especially for clusters comprising CELL/GPUs etc.

[SC'10] Paper on end-to-end app performance on multi-cores

[NAE US FOE'10] Ali is organizing a session on Cloud Computing

[IGCC'10] Paper on disk energy latency

[IEEE] Ali elevated to IEEE Senior Member

[CCGRID'10] Paper on accelerator-based clusters

[IPDPS'10] Paper on just-in-time data staging

[Funding] Ali receives an IBM SUR Award

[MASCOTS'09] Guanying wins Best Paper Award

[NAE US FOE'09] Ali selected to participate as one of nation's promising young engineers

[MASCOTS'09] Paper on Hadoop Simulation, Poster on disk energy management

[ICS'09] Paper on HPC Scratch Cache

[OSR'09, IPDPS'09] Paper on MapReduce for Cell

[HotPower'08] Paper on impact of disk scrubbing on energy savings

[Funding] Ali receives an IBM Faculty Award

[SC'08 PDS-Workshop] Paper on data staging for supercomputing jobs

[ICS'08] Paper on data offloading in HPC centers

[Funding] Ali receives an NSF CAREER award

[CF'08] Paper on I/O techniques for CELL/BE

[IPDPS'08] Paper on using contributory storage in grids
[FAST'08] Posters on CELL I/O and ReplayCache
[SC'07 PDS-Workshop] Paper on Data management for HPC centers

[HiPC'07] Poster on supporting Disconnected Operation in Grid environments

[HPDC'07] Short paper on PeerStripe: P2P-based large-file storage

[August 2006] DSSL is born!


Our research is supported by grants from National Science Foundation, DOE's Oak Ridge National Lab., IBM, NetApp, and Virginia Tech.

Current Projects

Please see the Publications page for a more up-to-date listing of our research. The following tends to lag over time...
  • AMOCA: Capability-aware programming model for asymmetric many-core processors for data-intensive workloads
  • HPC Data Management: Developing innovative data staging and offloading techniques for meeting Center-User Service Level Agreements
  • ERP: Energy-Reliability trade-offs in disks
  • MRPerf: Realistic simulator for designing MapReduce clusters
  • I/O for Heterogeneous Multiprocessors: Exploring efficient I/O techniques for the Cell/BE processor
  • ReplayCache: Using history-based cache management for improving I/O performance for repetitive applications runs
  • SARD: Techniques for servicing I/O requests from peer node memory.
  • Disconnected Operations in Grids: Integrating intermittent mobile devices in Grids via support for disconnected operations
  • PeerStripe: Storing large-data on contributory resources
  • FlexiCache: Modular Linux buffer cache design to support customized replacement policies

Old Projects

  • The effect of Kernel prefetching on file system buffer cache:

    Designing effective block replacement algorithms to minimize file system buffer cache misses is a challenging task. Despite the well-known interactions between prefetching and caching, almost all buffer cache replacement algorithms have been proposed and studied comparatively without taking into account file system prefetching which exists in all modern operating systems. In this project we studied the effect of such kernel prefetching and showed (SIGMETRICS'05) that it can have a significant impact on the relative performance in terms of the number of actual disk I/Os of many well-known replacement algorithms; it can not only narrow the performance gap but also change the relative performance benefits of different algorithms. The goal of the project is to demonstrate the importance for buffer caching research to take file system prefetching into consideration.
    More information on this project is available at the AccuSim webpage.

  • Predicting program behavior using program-counter:

    Borrowing from the ideas in computer architecture research, it was determined that the program counter can also serve as an indication of program behavior for the operating system kernel. This insight was leveraged in two applications, power management of hard disks (IEEE TC'06) and file system buffer cache management (OSDI'04), with promising results. The approach is implemented in the Linux kernel.
    More information about the project can be found at the PCOS webpage.

  • Peer-to-peer resource management:

    Modern resource sharing systems comprise of thousands of resources, and peer-to-peer (p2p) approaches can be used to provide resource self-organization in the presence of failures. We designed two projects based on this concept. First, we used p2p mechanisms to manage Condor pools (SC'03, JPDC 66:1). Condor is a distributed system that allows sharing of resources within an administrative domain. We developed an automatic collaboration framework that allowed remote pools to discover each other, and therefore enabled resource sharing across administrative domains. Second, we applied the p2p approach to harness idle disk space on nodes within academic and corporate setups (SC'04, JoGC'06). We developed a distributed file system by extending the Network File System (NFS), which allows sharing of idle disk space in a transparent manner.
    More information about the project can be found here.

  • Ensuring fairness in resource sharing:

    We observed that in resource sharing systems some users tend to only utilize resources without contributing resources to the system. This creates an imbalance and results in the eventual collapse of the system. We developed a DHT based accountability and feedback mechanism that would allow users in the system to determine "credit-worthiness" of other users (VM'04, PPoPP'05, SC'05, JoGC'06). This information can then be used to decide whether or not to allow exchange of resources with a particular user. The project solves the practical problem of fairness in sharing by providing a distributed accountability mechanism. Greedy users can be quickly identified and secluded, resulting in a robust system.
    More information about the project can be found at the GridCop webpage.

  • Query caching in peer-to-peer networks:

    We observed that p2p query traffic exhibits temporal locality and can benefit from caching. In the first part of this project, a query caching proxy was installed at the boundary of an organization and queries originating from inside the organization were cached. We refer to this as forward-caching. Next, we cached the queries originating from outside the organization and forwarded to inside (WCW9). We refer to this as reverse-caching. We found that if the cache capacity is fixed in terms of the number of cached query replies, the forward and reverse query caching are equivalent; both in hit ratio and in bandwidth savings. The project provided insight into caching for reducing p2p traffic (which is now the most prevalent traffic on the Internet), and improving bandwidth utilization.

  • Designing computational grid portals:

    Computational grids provide computing power by sharing resources across administrative domains. This sharing, coupled with the need to execute untrusted code from arbitrary users, introduces security hazards. Grid environments are built on top of platforms that control access to resources within a single administrative domain, at the granularity of a user. In wide-area multi-domain grid environments, the overhead of maintaining user accounts is prohibitive, and securing access to resources via user accountability is impractical. Typically, these issues are handled by implementing checks that guarantee the safety of applications, so that they can run in shared user accounts. We showed (JPDC 63:10, IPDPS'02) that safety checks -- language-based, compile-time, link-time or load-time -- currently implemented in most grid environments are either inadequate or limit allowed grid users and applications. A survey of various grid systems was done that highlights the problems and limitations of current grid environments. A runtime process monitoring technique was also proposed. The approach allows setting-up an execution environment that supports the full legitimate use allowed by the security policy of a shared resource.