postheadericon Why Did My PC Suddenly Slow Down?

Abstract

Users are often frustrated when they encounter a sudden decrease in the responsiveness of their personal computers. However, it is often difficult to pinpoint a particular offending process and the resource it is over-consuming, even when such a simple explanation does exist. We present preliminary results from several weeks of PC usage showing that user-perceived unresponsiveness often has such a simple explanation and that simple statistical models often suffice to pinpoint the problem. The statistical models we build use all the performance counters for all running processes. When the user expresses frustration at a given time point, we can use these models to determine which processes are acting most anomalously, and in turn which features of those processes are most anomalous. We present an investigative tool that ranks processes and features according to their degree of anomaly, and allows the user to interactively examine the relevant time series.Â

KEYWORDS: Â performance instrumentation, machine management, statistical modeling, anomaly detection.

1. Introduction

Nearly everyone who has used a computer has encountered a situation where an application or the entire machine seems to slow down dramatically: all of a sudden, windows are not as responsive, actions are taking longer than they should, and so on. At this point, although the user might like to investigate what’s wrong, he has a limited set of options. He may open up Windows’ Task Manager or use UNIX’s “ps” command to view the running processes, and then check to see which ones are taking the most CPU, I/O, or memory, but he will generally not know whether the values he sees are typical or surprising. In other words, while he can view instantaneous values of some system features, he has no model of what their typical values are; furthermore, even if he could view all possible features, it would be difficult to glean insights from the resulting deluge of data.

One plausible hypothesis is that most slowdowns are the result of one process consuming an abnormally high amount of one resource (e.g., CPU, disk, network, OS handles or file descriptors, system threads, etc.) from a large set of possible resources. However, presenting the consumption of every resource for every process directly would likely be too much information for a user to digest. We built a system based on these assumptions, and found it to be remarkably effective. Our system collects data and builds a model for each process. The model allows us to determine the level of anomaly for any process, and furthermore for any feature within a process.  This allows us to use all possible features, since only the anomalous ones will float to the top. We have also developed a visualization tool that allows interactive investigation of the processes and their features with respect to these models. When a user experiences a slowdown at a given point, he can see the processes ranked by their relative level of anomaly, and for each process, the features ranked by anomaly.  In addition, the user can see the time series for the feature of interest.Â

Once a user has identified a high-likelihood offender (perhaps an antivirus product or a desktop search application), he has numerous options to improve the situation. He might start shopping for a new antivirus product, switch to a competing desktop search application, or just stop using something that is more trouble that its worth. Additionally, because many developers also use the software they write, this tool may help them catch transient resource usage issues that are significantly slowing down the system as a whole.

The primary question raised by our approach is its effectiveness: how often does user-perceived machine slowness have such a simple explanation? Beyond this, there were also significant questions about how best to represent features that would only be available sporadically (i.e., when the relevant processes were running). In the remainder of this paper, we address these questions and show preliminary results from using our investigative tool.Â

2. RELATED WORK

There has been a significant amount of work using statistical models to detect and/or diagnose faults and performance problems [A+03, Ba+04, Bo+05, Ch+02, F+04, G+05, KF05, R+04, R05, X+04]. Researchers have investigated many different sources of data (e.g., performance counters [Co+04], request paths [Ch+03, Ch+04]), as well as many different statistical models. Most of this work has focused on a server environment: an environment where a large number of machines serve an even larger number of user requests. In addition to their obvious economic importance, the high request volume of server environments makes them particularly well-suited to analysis using statistical models. The consumers of this analysis are either operations personnel (sometimes viewed as datacenter system administrators) or developers.

In contrast to this previous line of research, our work focuses on end-user desktops, and the consumer of the analysis is the end-user himself. End-user desktops are a significantly different environment from servers. Perhaps most importantly, we have much less expectation of the workload being repetitive. Because of this, we might find it quite difficult to duplicate the success that statistical models have had detecting request failures in server environments. Luckily, we can sidestep this issue because the consumer of the analysis is the frustrated user – the user will himself indicate that the system is slow (detection), and the statistical model is only responsible for narrowing down the reason for this slowness (diagnosis).

Like our work, statistical debugging [Z+04, Z+05, Z+06], Strider [Wa+03], Chronus [Wh+04] and Peer-Pressure [Wa+04] target end-user applications, but they are otherwise radically different. Statistical debugging focuses on helping developers understand why a particular application occasionally fails. To this end, statistical debugging requires an external mechanism to determine which process is failing (in contrast to our goal of determining which process is at fault), and it requires a large number of differently-instrumented binaries (in contrast to our goal of working on a single machine without some external correlation mechanism and without instrumentation). Strider, Chronus and Peer-Pressure aim to diagnose problems due to bad persistent state (i.e., file contents and registry settings). Strider and Peer-Pressure presume that the failing process is known a-priori, while Chronus requires the user to specify a probe determining if the failure is present. In contrast, our analysis is not restricted to persistent state changes (some other input or change in workload may have triggered resource over-consumption), and we require significantly less expertise from the user (he just pushes the “why is my machine slow” button).

There has also been a significant amount of previous work on helping developers or sophisticated system administrators understand performance of individual components on a single machine, which may be either a server or an end-userÂ’s desktop. For example, profilers and other instrumentation or logging systems allow a developer or sophisticated system administrator to understand where the time/memory/etc. is spent in an application or OS, thereby guiding refinements to the application or OS [Ca+04, HC+01]. Our work relies on Windows performance counters (one such logging system) to gather the low-level performance metrics; our contribution is to present more useful analysis of this data to the end-user. We are not aware of previous work trying to make the analysis done by these systems useful to an end-user.

A final distinction with previous work is our focus on the user’s perception of slowness. In contrast, most previous work has looked at more directly measureable quantities, like the latency of a particular machine operation.

3. Data Collection and MOdeling

To evaluate our approach, we collected data over the course of several weeks on two machines running Windows XP. Every time a user of either machine was frustrated at the slowness of the system, he would press the “why is my machine slow” button (Scroll Lock in our implementation), which we refer to as a “frustration event”. During this time, there was approximately one frustration event per day per machine.

Instead of carefully choosing which features should be used for the analysis, we gathered all available performance counters for all processes. This included items such as total number of threads in the process, I/O bytes read per second, percentage of CPU used, page faults, and so on.  Our reasoning was that more data could only help us, since we would be prioritizing features to show the user the ones that were most relevant – as such, if we included uninteresting or constant features, they would simply sink to the bottom of the list.

These features were written to a file for all running processes once every 60 seconds. Every four hours, the resulting file was compressed and copied to a fileserver. We found the overhead of this logging to be negligible (in particular, on a 3Ghz Xeon PC, less than 0.15% of the CPU on average).

The nature of the data is complex – not only are there dozens of features for each of dozens of processes; at any given time only some processes are running. Figure 1 below illustrates this aspect of the data.

Related Posts with Thumbnails

Incoming search terms for the article:

Incoming search terms for the article:

Similar articles

  • What to Do When Computer Becomes Slow?
    Slow computers are usually not dealt correctly by most of the users and hence they end up in damaging their systems permanently. It is highly suggested for the people to figure out the cause for which their computer has become slow. Most users complain: “my computer is slow and I cannot figure out how to
    ...
  • Computer Performance Monitoring
    I generally recommend a “big to small” approach because the more fine-grained the monitoring, the more costly it is: So here are the approaches, from the least costly to the most costly: 1. First, find the average end-to-end response time by emulating end-user client exchanges with the web server. Identify the machine and service which
    ...
  • : The Art of Computer Systems Performance Analysis: Techniques for Experimental…
    This seminal book was the most complete mainstream text on the subject when it was first published in 1991. The reason it remains popular eleven years later (which is an extraordinary life for a computer book) is because it’s an applied mathematics book that focuses solely on performance analysis, simulation and measurements. Technology may change
    ...
  • Uniblue Speed Up My PC 2009 4.0 patch keygen serial download
    SpeedUpMyPC 2009, Uniblues award-winning speed up computer solution, lets you monitor and control all your PC resources with easy, one click instructions. System settings, internet usage, disk clutter, RAM & CPU are all automatically scanned, cleaned and optimized for peak performance. You can also selectively disable or remove unnecessary background and auto-start processes to cut
    ...
  • Ways To Boost PC Performance
    Ways To Boost PC Performance This guest article written by James Ricketts Nobody wants to work on a PC that takes hours to complete your requests. However, there is no running away from deteriorating PC performance. It just happens. You can prevent this situation and boost the performance of your PC by following a
    ...

Leave a Reply

  • chong: Alongside slimming the startup list, also run a diskcleanup. After that a full defrag will help speedup...
  • Eric Hollis: I have no doubt that breaking the symmetry of treating all icons (desktop or not) as windows would...
  • Nancey Haag: After that my machine performed MUCH better. I know this shouldn’t make a difference, but it was...
  • Jacob Bowles: “Well MacOSXHints has it wrong. Sorry guys, what is slowing down your machines is the size of the...
  • Daryl House: To do this set up a cron script to automatically delete it every night at around 2 am.