In particular, for the case when a web server is serving static files, we examine the costs and benefits of a policy that gives preferential service to short connections. We start by assessing the scheduling behavior of a commonly used server (Apache running on Linux) with respect to connection size and show that it does not appear to provide preferential service to short connections. We then examine the potential performance improvements of a policy that does favor short connections (shortest-connection-first). We show that mean response time can be improved by factors of four or five under shortest-connection-first, as compared to an (Apache-like) size-independent policy. Finally we assess the costs of shortest-connection-first scheduling in terms of unfairness (. E., the degree to which long connections suffer). We show that under shortest-connection-first scheduling, long connections pay very little penalty. This surprising result can be understood as a consequence of heavy-tailed Web server workloads, in which most connections are small, but most server load is due to the few large connections.
Auto bibliography - premier and Affordable Academic
Compiler techniques have been used to improve instruction cache performance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved flow data cache performance. In this paper we present a general framework for Cache conscious Data Placement. This is a compiler directed approach that creates an address palcement for the stack (local variables global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24 on average. Harchol-Balter, "Connection Scheduling in Web Servers Proceedings of the usenix conference on Internet Technologies, october 1999. Under high loads, a web server may be servicing many hundreds of connections concurrently. In traditional Web servers, the question of the order in which concurrent connections are serviced has been left to the operating system. In this paper we ask whether servers might provide better service by using non-traditional service ordering.
We applied the technique to the spec95fp benchmark suite, a representative set of numeric programs. We used the simOS machine simulator to analyze the applications and isolate their performance bottlenecks. We also validated these results on a real machine, an eight-processor 350mhz digital AlphaServer. Compiler-directed page coloring leads to significant performance improvements for several applications. Overall, our technique improves the spec95fp rating for eight processors by 8 over Digital unix's page mapping policy and by 20 over a page coloring, a standard page mapping policy. The suif compiler achieves a spec95fp ratio.4, the highest ratio to date. Austin, "Cache-conscious Data Placement Proceedings of the eighth International Conference on Architectural Support for Programming Languages and Operating Systems, october 1998. As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance.
This paper motivates and describes the three key techniques employed by vegas, and presents the results of a comprehensive experimental performance study- using both simulations and measurements on the Internet- of the vegas and Reno offer implementations of tcp. Lam., "Compiler-Directed Page coloring for Multiprocessors Proceedings of The seventh International Symposium on Architectural Support for Programming Languages and Operating Systems, 1998. This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased aggregate cache size available in a multiprocessor. This technique uses the compiler's knowledge of the access patterns of the parallelized applications to direct the operating system's virtual memory page mapping strategy. We demonstrate that this technique can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct-mapped or two-way set-associative caches. We also show that it is complementary to latency-hiding techniques such as prefetching. We implemented compiler-directed page coloring in the suif parallelizing compiler and on two commercial operating systems.
Isbn this is the second edition of our book about the linux kernel. The book has been updated to cover the.0 version of the kernel, which is a milestone in the development of linux. Blumofe, "Hoard: a fast, Scalable, and Memory-Efficient Allocator for Shared-Memory multiprocessors The University of Texas at Austin, department of Computer Sciences. In this paper, we present hoard, a memory allocator for shared-memory multiprocessors. We prove that its worst-case memory fragmentation is asymptotically equivalent to that of an optimal uniprocessor allocator. We present experiments that demonstrate its speed and scalability. Peterson, "tcp vegas: End to End Congestion avoidance on a global Internet ieee journal on Selected Areas in Communication, 13(8 1465-1480, October 1995. Vegas is an implementation of tcp that achieves between 37 and 71 better throughput on the Internet, with one-fifth to one-half the losses, as compared to the implementation of tcp in the reno distribution of bsd unix.
Annotated bibliography, archivy, auto, piaggioauto
We propose new unix interfaces to improve scalability, and to provide fine-grained scheduling and resource management. Mogul, "Measuring the capacity of a web Server Proceedings of usenix symposium on Internet Technologies and Systems, 1997. The widespread use of the world Wide web and related applications places interesting perforrnance demand on network servers. The ability to measure the effect of these demands is important for tuning and optimizaing the various software components that make up a web server. To measure these effects, it is necessary to generate realistic http client requests. Unfortunately, accurate generation of such traffic in a testbed of limited scope is not trivial. In particular, the commonly used approach is unable to generate client request-rates that exceed the capacity of the server being tested even for short periods of time.
This paper about examines pitfalls that one encounters when measuring Web server capacity using a synthetic workload. We propose and evaluate a new method for Web traffic generation that can generate bursty trafic, with peak loads book that exceed the capacity of the server. Finally, we use the proposed method to measure the performance of a web server. Verworner, linux Kernel Internals, 2nd.,. Addison Wesley longman, 1998.
Balsa, "Linux Benchmarking howto linux Documentation Project, august 1997. The linux Benchmarking howto discusses some issues associated with the benchmarking of Linux systems and presents a basic benchmarking toolkit, as well as an associated form, which enable one to produce significant benchmarking information in a couple of hours. Mogul, "Scalable kernel performance for Internet servers under realistic load Proceedings of usenix annual Technical Conference, june 1998. Unix internet servers with an event-driven architecture often perform poorly under real workloads, even if they perform well under laboratory benchmarking conditions. We investigated the poor performance of event-driven servers. We found that the delays typical in wide-area networks cause busy servers to manage a large number of simultaneous connections.
We also observed that the select system call implementation in most unix kernels scales poorly with the number of connections being managed by a process. The unix algorithm for allocating file descriptors also scales poorly. These algorithmic problems lead directly to the poor performance of event-driven servers. We implemented scalable versions of the select system call and the descriptor allocation algorithm. This led to an improvement of up to 58 in Web proxy and Web server throughput, and dramatically improved the scalability of the system. Mogul, "Better operating system features for faster network servers sigmetrics workshop on Internet Server Performance, june 1998. Widely-used operating systems provide inadequate support for large-scale Internet server applications. Their algorithms and interfaces fail to efficiently support either event-driven or multi-threaded servers. They provide poor control over the scheduling and management of machine resources, making it difficult to provide robust and controlled service.
Helpful Sample of, annotated, bibliography, annotated
We then present a detaied analysis of essay tcp's loss recovery and congestion control behavior from the recorded transfers. Our two most important results are: (1) short Web transfers lead to poor loss recovery performance for tcp, and (2) concurrent connections are overly aggressive users of the network. We then discuss techniques designed to solve these problems. To improve the data-driven loss recovery performance of short transfers, we present a new enhancement to tcp's loss recovery. To improve the congestion control and loss recovery performance of parallel tcp connections, we present a new integrated approach to congestion control and loss recovery that works across the set of concurrent connections. Simulations and trace analysis show that our enhanced loss recovery scheme could have eliminated 25 summary of all timeout events, and that our integrated approach provides greater fairness and improved startup performance for concurrent connections. Our solutions are more general than application-specific enhancements such as the use of persistent connections in p-http and http/1.1, and addresses issues, such as improved tcp loss recovery, that are not considered by them.
Experiments show that this technique can improve the throughput of a web server by up. Katz, "tcp autumn behavior of a busy Internet Server: Analysis and Improvements Proceedings of ieee infocom, march 1998. The rapid growth of the world Wide web in recent years has caused a significant shift in the composition of Internet traffic. Although past work has studied the behavior of tcp dynamics in the context of bulk-transfer applications and some studies have begun to investigate the interactions of tcp and http, few have used extensive real-world traffic traces to examine the problem. This interaction is interesting because of the way in which current Web brouwsers use tcp connections: multiple concurrent short connections from a single host. In this paper, we analyze the way in which Web browsers use tcp connections based on extensive traffic traces obtained from a busy web server (the offical Web server of the 1996 Atlanta Olympic games). At the time of operation, this Web server was one of the busiest on the Internet, handling tens of millions of requests per day from several hundred thousand clients. We first describe the techniques used to gather these traces and reconstruct the behavior of tcp on the server.
which to support user-level management. Finally, we describe the design, implementation, and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism. Druschel, "Soft timers: efficient microsecond software timer support for network processing. Proceedings of the 17th Symposium on Operating Systems Principles (sosp'99), kiawah Island Resort, sc, december 1999. This paper proposes and evaluates soft timers, a new operating system facility that allows the efficient scheduling of software events at a granularity down to tens of microseconds. Soft timers can be used to avoid interrupts and reduce context switches associated with network processing without sacrificing low communication delays. More specifically, soft timers enable transport protocols like tcp to efficiently perform rate-based clocking of packet transmissions. Experiments show that rate-based clocking can improve http response time over connections with high bandwidth-delay products by up to 89 and that soft timers allow a server to employ rate-based clocking with little cpu overhead (2-6) at high aggregate bandwidths. Soft timers can also be used to perform network polling, which eliminates network interrupts and increases the memory access locality of the network subsystem without sacrificing delay.
We solicit papers to enter into this resource, preferably with a suggested classification, and opinions on the papers already in the resource. Please mail this. Levy, "Scheduler activations : effective kernel support for the user-level management of parallelism. Acm transactions on Computer Systems Vol.10,. Threads are the vehicle for concurrency in many approaches to parallel programming. Threads can be supported either by the operating system kernel or by user-level library code in the application address space, but neither approach has been fully satisfactory. This paper addresses this dilemma. First, we argue that the performance of kernel threads report is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; managing parallelism at the user level is essential to high-performance parallel computing.
Auto bibliography - custom Paper Writing Help Deserving
Reading and writing papers related to our work is an integral part of being an academic. This site is a collection of annotated bibliography entries for articles in engineering education. The bibliography is organized according to the suggestion for constructing a core literature for computing education research developed by a working group at iticse 2005 in Lisbon, portugal, however, the areas and criteria used are easily adapted to engineering education in general. We have also report added area e, related to more general literature of interest in teaching and learning in higher education. The idea is that papers are classified into four areas of contribution. We have also used the lisbon group's quality criteria to rank publications in the different areas. Access to bibliography information for the papers is via the links listed below.