EP1515234A2 - Programmatic computer problem diagnosis and resolution and automated reporting and updating of the same - Google Patents

Programmatic computer problem diagnosis and resolution and automated reporting and updating of the same Download PDF

Info

Publication number
EP1515234A2
EP1515234A2 EP04010605A EP04010605A EP1515234A2 EP 1515234 A2 EP1515234 A2 EP 1515234A2 EP 04010605 A EP04010605 A EP 04010605A EP 04010605 A EP04010605 A EP 04010605A EP 1515234 A2 EP1515234 A2 EP 1515234A2
Authority
EP
European Patent Office
Prior art keywords
act
module
accordance
following
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP04010605A
Other languages
German (de)
French (fr)
Other versions
EP1515234A3 (en
Inventor
Andrew Ritz
Jee Fung Pang
Jonathan Vines Smith
Michael Richard Fortin
Nicholas Stephen Judge
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Adeia Technologies Inc
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of EP1515234A2 publication Critical patent/EP1515234A2/en
Publication of EP1515234A3 publication Critical patent/EP1515234A3/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0784Routing of error reports, e.g. with a specific transmission path or data flow
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B65CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
    • B65BMACHINES, APPARATUS OR DEVICES FOR, OR METHODS OF, PACKAGING ARTICLES OR MATERIALS; UNPACKING
    • B65B5/00Packaging individual articles in containers or receptacles, e.g. bags, sacks, boxes, cartons, cans, jars
    • B65B5/10Filling containers or receptacles progressively or in stages by introducing successive articles, or layers of articles
    • B65B5/101Filling containers or receptacles progressively or in stages by introducing successive articles, or layers of articles by gravity
    • B65B5/103Filling containers or receptacles progressively or in stages by introducing successive articles, or layers of articles by gravity for packaging pills or tablets
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B65CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
    • B65BMACHINES, APPARATUS OR DEVICES FOR, OR METHODS OF, PACKAGING ARTICLES OR MATERIALS; UNPACKING
    • B65B35/00Supplying, feeding, arranging or orientating articles to be packaged
    • B65B35/10Feeding, e.g. conveying, single articles
    • B65B35/12Feeding, e.g. conveying, single articles by gravity
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B65CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
    • B65BMACHINES, APPARATUS OR DEVICES FOR, OR METHODS OF, PACKAGING ARTICLES OR MATERIALS; UNPACKING
    • B65B61/00Auxiliary devices, not otherwise provided for, for operating on sheets, blanks, webs, binding material, containers or packages
    • B65B61/04Auxiliary devices, not otherwise provided for, for operating on sheets, blanks, webs, binding material, containers or packages for severing webs, or for separating joined packages
    • B65B61/06Auxiliary devices, not otherwise provided for, for operating on sheets, blanks, webs, binding material, containers or packages for severing webs, or for separating joined packages by cutting
    • B65B61/10Auxiliary devices, not otherwise provided for, for operating on sheets, blanks, webs, binding material, containers or packages for severing webs, or for separating joined packages by cutting using heated wires or cutters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0781Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61JCONTAINERS SPECIALLY ADAPTED FOR MEDICAL OR PHARMACEUTICAL PURPOSES; DEVICES OR METHODS SPECIALLY ADAPTED FOR BRINGING PHARMACEUTICAL PRODUCTS INTO PARTICULAR PHYSICAL OR ADMINISTERING FORMS; DEVICES FOR ADMINISTERING FOOD OR MEDICINES ORALLY; BABY COMFORTERS; DEVICES FOR RECEIVING SPITTLE
    • A61J7/00Devices for administering medicines orally, e.g. spoons; Pill counting devices; Arrangements for time indication or reminder for taking medicine
    • A61J7/0076Medicament distribution means

Definitions

  • the present invention relates generally to software, and more particularly relates to systems and methods for programmatically determining root causes of problems that occur when operating a personal computer, and providing programmatic resolution and/or rich diagnostic data for users to address those problems.
  • the computing system's operating system typically includes some limited mechanism for identifying the presence of a problem in the form of basic error messages.
  • the error messages may not provide enough information to those attempting to diagnose and solve the root cause of the problem or to identify the workarounds for avoiding the problem.
  • the method includes monitoring events generated by appropriate instrumentation within an operating system, logging at least a subset of the events to a log file, and detecting one or more error conditions.
  • a diagnostics module is invoked.
  • the diagnostics module queries the log file for events relevant to diagnosis of the problem, and identifies the root cause by evaluating the results of the query. Once the root cause of the problem is diagnosed, a resolution module corresponding to that root cause may be invoked to programmatically resolve the problem.
  • User-defined or default policy rules may govern if and when a diagnosis module and/or resolution module is invoked. Accordingly, a computing system problem may be diagnosed and resolved programmatically thereby improving the user experience, while still allowing some degree of user control over the diagnosis and resolution process generally.
  • at least some of the query results are sent to an error reporting service, which returns one of more updates to the computing system. These updates modify which events are logged, how the diagnoses module diagnoses, and/or how the resolution module resolves.
  • the present invention relates to mechanisms for programmatically diagnosing the root cause of a problem in a computing system.
  • appropriate instrumentation is added to generate events to describe the state of execution of the tasks to be diagnosed. These events are monitored within an operating system, and at least some of the events are logged to a log file.
  • a diagnostics module is invoked. The diagnostics module queries the log file for events relevant to diagnosis of the problem, and identifies the root cause by evaluating the results of the query.
  • a resolution module corresponding to that root cause may be invoked to programmatically resolve the problem.
  • the invocation of the diagnostic and resolution modules may be subject to policy rules.
  • the detection, diagnostic and resolution modules may be automatically updated as needed by an update service.
  • Figure 1 shows a schematic diagram of a sample computer architecture usable for these devices.
  • the architecture portrayed is only one example of a suitable environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing devices be interpreted as having any dependency or requirement relating to anyone or combination of components illustrated in Figure 1.
  • the invention is operational with numerous other general-purpose or special-purpose computing or communications environments or configurations.
  • Examples of well known computing systems, environments, and configurations suitable for use with the invention include, but are not limited to, mobile telephones, pocket computers, personal computers, servers, multiprocessor systems, microprocessor-based systems, minicomputers, mainframe computers, and distributed computing environments that include any of the above systems or devices.
  • a computing system 100 typically includes at least one processing unit 102 and memory 104.
  • the memory 104 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two. This most basic configuration is illustrated in Figure 1 by the dashed line 106.
  • the storage media devices may have additional features and functionality. For example, they may include additional storage (removable and non-removable) including, but not limited to, PCMCIA cards, magnetic and optical disks, and magnetic tape. Such additional storage is illustrated in Figure 1 by removable storage 108 and non-removable storage 110.
  • Computer-storage media include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Memory 104, removable storage 108, and non-removable storage 110 are all examples of computer-storage media.
  • Computer-storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory, other memory technology, CD-ROM, digital versatile disks, other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, other magnetic storage devices, and any other media that can be used to store the desired information and that can be accessed by the computing device.
  • module can refer to software objects or routines that execute on the computing system.
  • the different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in software and hardware or hardware are also possible and contemplated.
  • Computing device 100 may also contain communication channels 112 that allow the host to communicate with other devices.
  • Communication channels 112 are examples of communications media.
  • Communications media typically embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and include any information-delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communications media include wired media, such as wired networks and direct-wired connections, and wireless media such as acoustic, radio, infrared, and other wireless media.
  • the term computer-readable media as used herein includes both storage media and communications media.
  • the computing device 100 may also have input components 114 such as a keyboard, mouse, pen, a voice-input component, a touch-input device, and so forth.
  • Output components 116 include screen displays, speakers, printer, etc., and rendering modules (often called “adapters") for driving them.
  • the computing device 100 has a power supply 118. All these components are well known in the art and need not be discussed at length here.
  • FIG. 2 illustrates a more specific architecture 200 that may be used to implement the features of the present invention.
  • the architecture 200 includes a computing system 201 in communication with a remote computing system 236.
  • the computing system 201 may also implement the features of the present invention even without the assistance of the remote computing system 236, albeit without the features of the update service described further below.
  • each of the computing systems 201 and 236 may be structured as described above with respect to the computing system 100.
  • Figure 3 illustrates a flowchart of a method 300 for programmatically diagnosing and potentially resolving a problem in a computing system in accordance with the principles of the present invention. As the method 300 may be performed in the context of the architecture 200, Figures 2 and 3 will now be described with frequent reference to each other.
  • the method 300 includes an act of monitoring events within an operating system (act 301).
  • the monitored events are generated by a number of operating system (OS) components, drivers, applications, and services 262 which will also be collectively referred to herein as "event providers 262".
  • the event providers 262 communicate events 202 to a logger 204.
  • the amount of data that is to be collected at any given point is bounded by the then existing circumstances.
  • the logger 204 would deal with fewer events.
  • any given event provider need not generate an event for every interaction it senses, but may generate only the more relevant events relating to root causes of problems. For example, an event need not be generated every time a disk drive writes to a sector. However, an event might be generated if the disk drive fails to respond to a read or write command, or attempts to write to a forbidden sector.
  • Examples of event providers 262 include software modules that manage power, Plug-and-Play (PnP) operation, memory management, bus control (e.g., PCI), and other low-level APIs (application programming interfaces).
  • Other operating system components or applications or drivers may also raise events to the logger 204. Examples of the other operating system components include networking modules, graphics modules, audio modules, and printing modules.
  • Examples of the types of events 202 that are communicated to the logger 204 include user requests, system calls, device connections, communication requests, or the like.
  • one event may describe that a user has requested to put the computing system 201 into a low power or standby state, and subsequent events that will help the user or support engineer diagnose and resolve a standby failure in the event that the user's request does not succeed.
  • the standby failure may include what applications or drivers vetoed the request to be put into a low power state.
  • any other events that are detectable by the operating system may be provided by the events provider 262 to the logger 204.
  • the logger 204 logs at least a subset of the events to a log file (act 302).
  • the event trace log file 248 represents an example of such a log file.
  • the logger 204 is configured to log all or a portion of the events 202.
  • the logger may be configured to log those events that are more likely to be helpful with diagnosing a problem.
  • the logger 204 also may notify the diagnostics policy service 208 of the events.
  • the volume of events flowing to the diagnostics policy services 208 may be much lower than the volume of event flowing to the event trace log file 248 in some embodiments.
  • the logger 204 may simply notify the diagnostics policy service 208 when a transaction begins or ends, or when an error condition arises.
  • the computing system 201 detects one or more error conditions (act 303). Referring to Figure 2, this may be accomplished by the diagnostics policy service 208.
  • the diagnostics policy service 208 determines when an actual problem has occurred by, for example, detecting a predetermined single error condition, or by detecting a predetermined sequence of error conditions has arisen.
  • the computing system 201 performs a functional, result-oriented step for programmatically diagnosing a problem evidenced by the one or more error conditions (step 310).
  • This may include any corresponding acts for accomplishing this result. However, in the illustrated embodiment, this includes corresponding acts 311 through 314.
  • the computing system 201 may consult rules to determine that the diagnostic module should be invoked according to the rules (act 304). Rules may be set by received user input instructions or perhaps by default. Accordingly, the diagnostics policy service 208 is indirectly coupled to the diagnostics module(s)- 220 via a monitoring service 212.
  • the monitoring service 212 applies policy to filter which events are propagated up to invoke diagnostic module(s) 220 for root cause determination. Examples of when filtering of such events may be desirable include enterprise environments where an Information Technology (IT) manager or system administrator may prefer that the operating system not perform certain automatic root cause determination and/or problem resolutions actions automatically. For example, an IT manager may want to be informed that a problem has occurred, but have no automated root cause analysis or any automated resolution occur. Or, the IT manager may want root cause analysis to occur, but no automated resolution.
  • IT Information Technology
  • one action that computing system 201 may undertake in response to determining a root cause problem may be to automatically install an updated device driver. Since updated device drivers may cause unexpected operational changes in some instances, an enterprise's IT manager may input 216 policy 214 to specify that the user may not be enabled or authorized to update device drivers. Another example of a policy that an IT manager may apply to monitoring service 212 is that no automatic problem resolution steps are to be taken. This would allow the user or IT manager to decide whether or not to perform the action instead of having computing system 201 perform the action automatically.
  • the monitoring service 212 invokes 218 an appropriate one of diagnostics module(s) 220 (act 311) when a particular set of one or more error conditions is detected by the diagnostics policy service 208.
  • the appropriate diagnostics module may be invoked directly by the diagnostic policy service 208, or by one of event providers 262 (e.g., in embodiments without a monitoring service 212).
  • the computing system 201 may include multiple diagnostics modules, each for diagnosing the root cause of predetermined error conditions or predetermined sequences of error conditions.
  • each diagnostics module When invoked, each diagnostics module is configured to query and correlate 242 relevant data sources to diagnose the problem evidenced by the one or more error conditions (act 312) to determine information about which events and/or state preceded the problem event.
  • relevant data sources may include, for example, event trace log 248, a configuration database 252 such as a registry, system compatibility manager 254, WMI providers 256, and other data sources and log files 250.
  • log files 250 e.g., network status logs
  • other data sources may be queried in addition to, or in lieu of the sources illustrated in the figure.
  • System compatibility manager 254 is a service that receives status and error messages from different subsystems (e.g., the PCI bus subsystem, the USB subsystem, and the AGP subsystem) and other bus drivers and driver stacks in the system regarding known hardware anomalies that require device specific workarounds in order to allow the hardware in question to function properly. Such workarounds may impact how the device functions and may, therefore, end up being the root cause of a problem perceived by the end user. WMI providers expose diagnostic information about hardware devices on the system.
  • the diagnostics module evaluates the results 244 of the query (act 313), and identifies the root cause of the one or more error conditions in response to the evaluation (act 314). This may be accomplished by running a diagnostics routine that corresponds to the error condition(s).
  • Each of at least some of the diagnostics modules (as well as at least some of the resolution modules 224 and the diagnostic policy service 208) may have plug-in capability to allow for more minor modifications of the corresponding diagnostics module. More specifically in one embodiment, the diagnostics module 220 compares the query results 244 with a list of root cause associations. This completes the step for programmatically diagnosing a problem evidenced by the one or more error conditions (step 310).
  • the invoked diagnostics module 220 may invoke an appropriate resolution module 224 (act 308) to perform an identified resolution that corresponds to the identified root cause.
  • the identified root cause for a problem is some problem that is known to exist.
  • the queries are specifically made to diagnose whether that problem is present or not.
  • the monitoring service 212 may once again allow for stored policy to determine whether the resolution module should be invoked according to the rules (act 307). Accordingly, the diagnostics modules may first notify 222A the monitoring service 212 of the root cause. If the stored policy so permits, the monitoring service 212 invokes 222B the appropriate resolution module 224.
  • Each resolution module 224 may be configured with error resolution routines that are invoked by the appropriate diagnostics module subject to the policy in monitoring service 212. Examples of error resolution routines include searching for and/or installing new device drivers, or disabling or reconfiguring conflicting device drivers or applications. In one embodiment, at least some of the routines are performed automatically (i.e., without requiring user input). Some resolution modules, however, may utilize user input that obtained by invoking 228 a diagnostic user interface module 232 (e.g., a "trouble shooting wizard"). The diagnostic user interface module 232 may be engaged to prompt the user to enter additional information to be used by the appropriate resolution module (or by the computing system as a whole) to attempt to identify or resolve the problem. This may be particularly useful when the root cause of the one or more error conditions cannot be programmatically identified and/or resolved without further user assistance.
  • error resolution routines include searching for and/or installing new device drivers, or disabling or reconfiguring conflicting device drivers or applications. In one embodiment, at least some of the routines are performed automatically
  • the interaction between the resolution module(s) 224 and the diagnostic user interface 232 is represented by bi-directional arrow 228A.
  • the interaction between the diagnostics module(s) 220 and the diagnostic user interface 232 is represented by bi-directional arrow 228B.
  • the diagnostic user interface 232 may also allow user interaction 228C with the event generators 262 to modify what events are generated.
  • a trouble shooter application 264 provides a user interface that allows a user to expressly report a problem to the monitoring service 212, rather than wait for the diagnostic policy service 208 to detect the problem.
  • the diagnostic modules 220 would then diagnose the root cause of the reported problem, followed by the resolution modules 224 resolving the problem.
  • a diagnostics module may be modified by modifying what events are logged, how a diagnostics module diagnoses, or how a resolution module resolves an identified root cause of a problem. For example, perhaps a diagnostics module cannot diagnose a problem based on the logged events, or perhaps a resolution module cannot properly resolve the problem without modification. Accordingly, information from the diagnostic policy service 208, the diagnostics modules 220, the resolution modules 224 and/or the diagnostic user interface module 232 may be conveyed to the activity log 230 for reporting to error reporting service 238 (act 305). For example, resolution module reports to the activity log 230 as represented by arrow 226.
  • the activity log 230 may also be displayed to a user to allow the user to view the problems detected, what the diagnosis is, and how the diagnosed problem was resolved.
  • the activity log 230 may also be provided to a remote location to allow technical support to view the relevant facts without having to rely on the user to state the relevant facts.
  • the activity log 230 may also be sent to the error reporting service 238 to assist in forming statistical information regarding what problems are generally occurring on user systems.
  • Update service 240 may be used to send updates to one or more of the modules of computing system 201 to be received by the computing system 201 (act 306).
  • update service 240 may update logger 204 with additional events or event sequences to store to event trace log file 248 to assist in resolving the root cause of a new problem that has been detected by error reporting service 238 or other sources of information regarding failures experienced by users.
  • Update service 240 may also update the diagnostic policy service 208 to change how problems are detected.
  • Update service 240 may also be used to update (change existing modules, provide new modules, or add or modify a plug in) one or more of diagnostics modules 220 and resolution modules 224 to reflect a new solution that has been determined for a particular problem.
  • update service 240 is operated by vender computing systems 236 and transmits updates the modules of computing system 201 via the Internet.
  • a third party could provide custom changes or entirely new modules and configuration information.
  • diagnostics module 220 will report this information to the activity log 230, which in turn sends an error report 234 to error reporting service 238.
  • the root cause association information and corresponding problem resolution information is information is sent to the computing system 201 via update service 240. If the vendor is unable to determine the root cause, the vendor may use update service 240 to instruct diagnostic policy service 204 to store additional event or state information to event trace log file 248. The resolution module 224 may likewise instruct 260 the logger to store additional events 224 in order to ensure that proper resolution is achieved. When the additional information is transmitted to the error reporting service 238 after the next occurrence of the problem, the additional information may enable the vendor to better identify the root cause of the problem.
  • the error report 234 may be sent even before a known root cause is diagnosed. Reporting the error at this early stage allows the update service 240 to update the diagnostic modules 220 and/or resolution modules 224 to be updated prior to attempting diagnosis and resolution. Alternatively, the error may be reported 234 after diagnosis, but before resolution. In that case, the update service 240 may update the specific resolution module dedicated to resolving the specifically diagnosed problem.
  • the diagnostic policy service 208, the diagnostic modules 220, and the resolution module 224 may be configured to report their activity to activity log 230 (e.g., an error was detected, a diagnostic module was invoked, the diagnostic module took certain steps, the root cause was found and is this, the root cause could not be determined, a resolution module was invoked, the resolution module took these steps, the problem was resolved, the problem was not resolved, and the like).
  • activity log 230 e.g., an error was detected, a diagnostic module was invoked, the diagnostic module took certain steps, the root cause was found and is this, the root cause could not be determined, a resolution module was invoked, the resolution module took these steps, the problem was resolved, the problem was not resolved, and the like.
  • This provides information to the vendor regarding whether the system is diagnosing problems and whether or not the problems are being resolved. This information may be valuable to the vendor because it may be used to determine if either the diagnostics module 220 or resolution module 224 needs to be updated.
  • the information may also be useful for the vend
  • the error report 234 may be sent even before a known root cause is diagnosed. Reporting the error at this early stage allows the update service 240 to update the diagnostic modules 220 and/or resolution modules 224 to be updated prior to attempting diagnosis and resolution. Alternatively, the error may be reported 234 after diagnosis, but before resolution. In that case, the update service 240 may update the specific resolution module dedicated to resolving the specifically diagnosed problem.
  • an update that may be sent via update service 240 is a new problem resolution for resolution module 224. If a particular type of error has been identified, but the root cause is difficult to determine, an update may instruct an event provider or the logger to store more event information to event trace log file 248. This will allow diagnostics module 220 to send more detailed information to activity log 230, which sends the more detailed information to error reporting service. The additional information is likely to assist the vendor computing system 236 in determining the root cause and solution to the problem. In turn, new diagnostic and resolution modules can be downloaded to address the problem.
  • a mechanism that programmatically diagnoses and resolves problems subject to internal policy constraints. Furthermore, the mechanism updates itself as needed in order to better diagnose the root cause of error condition(s), and resolve the root cause.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mechanical Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Debugging And Monitoring (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

Programmatically diagnosing the root cause of a problem in a computing system. Events are monitored within an operating system, and at least a subset of the events are logged to a log file. In response to the detection of error condition(s), a diagnostics module is invoked. The diagnostics module queries the log file to correlate events relevant to diagnosis of the problem, and identifies the root cause by evaluating the results of the query. Once the root cause of the problem is diagnosed, a resolution module corresponding to that root cause may be invoked to programmatically resolve the problem. The invocation of the diagnostic and resolution modules may be subject to policy rules. Furthermore, the logging, diagnostics and resolution modules may be automatically updated as needed.

Description

    The Field of the Invention
  • The present invention relates generally to software, and more particularly relates to systems and methods for programmatically determining root causes of problems that occur when operating a personal computer, and providing programmatic resolution and/or rich diagnostic data for users to address those problems.
  • Background and Related Art
  • Computing technology has transformed the way we work and play. In recent decades and years, computing technology has become quite complex. This complexity enables a computing system to perform a wide variety of high-complexity functions and applications thereby steadily improving the utility of the computing system. On the other hand, such complexity also makes it increasingly difficult for even the most skillful software engineers to develop software which is perfectly compatible and functional in all possible circumstances. Accordingly, even advanced computing systems often experience problems such as crashes, system hangs, or performance degradations.
  • Currently, it is difficult or impossible to easily diagnose or determine the root cause of many problems in computing systems. The computing system's operating system typically includes some limited mechanism for identifying the presence of a problem in the form of basic error messages. However, the error messages may not provide enough information to those attempting to diagnose and solve the root cause of the problem or to identify the workarounds for avoiding the problem.
  • Since many different applications and devices can run on an operating system at a given time, and since interoperability between such components can result in complex problems, it is often difficult for the operating system to determine which application, device driver, or configuration is the root cause of the problem that has surfaced. Interoperability can especially result in complex problems where the various interoperating components are provided by different venders. The problems may involve the operating system, applications, or device drivers, but once the problem has surfaced (e.g., via a system crash) it may be too late to provide any information that is useful in solving the problem. This problem is aggravated when the applications or device drivers executing on the operating system do not comply with programming guidelines for the operating system.
  • Furthermore, even if there is sufficient information with which to diagnose a problem, significant user effort is often needed in order to diagnose the root cause of the problem and provide a resolution. The requirement of significant user effort to diagnose and resolve computing system problems can degrade the user experience in working with the computing system, especially if the user expects fewer problems with the computing system.
  • Furthermore, many users are not experienced enough to diagnose and resolve computing system problems on their own. Accordingly, they may take actions they hope are corrective, but which may not resolve the problem due to an incorrect diagnosis or resolution of the problem. Such actions may actually further degrade the performance or stability of the computing system. Users may also solicit the help of others to diagnose and resolve the problem thereby incurring unnecessary costs in time or money on the user or the party that assists in resolving the problem.
  • For these reasons, a system and method that enables an operating system to better determine the root cause of computing system problems would be advantageous. Furthermore, a system and method that provides a programmatic means for addressing identified problems would be advantageous.
  • BRIEF SUMMARY OF THE INVENTION
  • The foregoing problems with the prior state of the art are overcome by the principles of the present invention, which is directed towards a system and method for programmatically diagnosing the root cause of a problem in a computing system. In one embodiment, the method includes monitoring events generated by appropriate instrumentation within an operating system, logging at least a subset of the events to a log file, and detecting one or more error conditions. In response thereto, a diagnostics module is invoked. The diagnostics module queries the log file for events relevant to diagnosis of the problem, and identifies the root cause by evaluating the results of the query. Once the root cause of the problem is diagnosed, a resolution module corresponding to that root cause may be invoked to programmatically resolve the problem.
  • User-defined or default policy rules may govern if and when a diagnosis module and/or resolution module is invoked. Accordingly, a computing system problem may be diagnosed and resolved programmatically thereby improving the user experience, while still allowing some degree of user control over the diagnosis and resolution process generally. In one embodiment, at least some of the query results are sent to an error reporting service, which returns one of more updates to the computing system. These updates modify which events are logged, how the diagnoses module diagnoses, and/or how the resolution module resolves.
  • Additional features and advantages of the invention will be set forth in the description that follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
  • Figure 1 illustrates a suitable computing system that may implement the features of the present invention;
  • Figure 2 illustrates a more specific architecture that may be used to implement the features of the present invention; and
  • Figure 3 illustrates a flowchart of a method for programmatically diagnosing and potentially resolving a problem in a computing system in accordance with the principles of the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The present invention relates to mechanisms for programmatically diagnosing the root cause of a problem in a computing system. First, appropriate instrumentation is added to generate events to describe the state of execution of the tasks to be diagnosed. These events are monitored within an operating system, and at least some of the events are logged to a log file. In response to the detection of error condition(s), a diagnostics module is invoked. The diagnostics module queries the log file for events relevant to diagnosis of the problem, and identifies the root cause by evaluating the results of the query. Once the root cause of the problem is diagnosed, a resolution module corresponding to that root cause may be invoked to programmatically resolve the problem. The invocation of the diagnostic and resolution modules may be subject to policy rules. Furthermore, the detection, diagnostic and resolution modules may be automatically updated as needed by an update service.
  • Turning to the drawings, wherein like reference numerals refer to like elements, the invention is illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the invention and should not be taken as limiting the invention with regard to alternative embodiments that are not explicitly described herein.
  • In the description that follows, the invention is described with reference to acts and symbolic representations of operations that are performed by one or more computers, unless indicated otherwise. As such, it will be understood that such acts and operations, which are at times referred to as being computer-executed, include the manipulation by the processing unit of the computer of electrical signals representing data in a structured form. This manipulation transforms the data or maintains them at locations in the memory system of the computer, which reconfigures or otherwise alters the operation of the computer in a manner well understood by those skilled in the art. The data structures where data are maintained are physical locations of the memory that have particular properties defined by the format of the data. However, while the invention is being described in the foregoing context, it is not meant to be limiting as those of skill in the art will appreciate that the various acts and operations described hereinafter may also be implemented in hardware.
  • Referring to Figure 1, the present invention relates to the monitoring of software application and hardware reliability and availability. The software application resides on a computer that may have one of many different computer architectures. For descriptive purposes, Figure 1 shows a schematic diagram of a sample computer architecture usable for these devices. The architecture portrayed is only one example of a suitable environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing devices be interpreted as having any dependency or requirement relating to anyone or combination of components illustrated in Figure 1.
  • The invention is operational with numerous other general-purpose or special-purpose computing or communications environments or configurations. Examples of well known computing systems, environments, and configurations suitable for use with the invention include, but are not limited to, mobile telephones, pocket computers, personal computers, servers, multiprocessor systems, microprocessor-based systems, minicomputers, mainframe computers, and distributed computing environments that include any of the above systems or devices.
  • In its most basic configuration, a computing system 100 typically includes at least one processing unit 102 and memory 104. The memory 104 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two. This most basic configuration is illustrated in Figure 1 by the dashed line 106.
  • The storage media devices may have additional features and functionality. For example, they may include additional storage (removable and non-removable) including, but not limited to, PCMCIA cards, magnetic and optical disks, and magnetic tape. Such additional storage is illustrated in Figure 1 by removable storage 108 and non-removable storage 110. Computer-storage media include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Memory 104, removable storage 108, and non-removable storage 110 are all examples of computer-storage media. Computer-storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory, other memory technology, CD-ROM, digital versatile disks, other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, other magnetic storage devices, and any other media that can be used to store the desired information and that can be accessed by the computing device.
  • As used herein, the term "module" or "component" can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in software and hardware or hardware are also possible and contemplated.
  • Computing device 100 may also contain communication channels 112 that allow the host to communicate with other devices. Communication channels 112 are examples of communications media. Communications media typically embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and include any information-delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communications media include wired media, such as wired networks and direct-wired connections, and wireless media such as acoustic, radio, infrared, and other wireless media. The term computer-readable media as used herein includes both storage media and communications media.
  • The computing device 100 may also have input components 114 such as a keyboard, mouse, pen, a voice-input component, a touch-input device, and so forth. Output components 116 include screen displays, speakers, printer, etc., and rendering modules (often called "adapters") for driving them. The computing device 100 has a power supply 118. All these components are well known in the art and need not be discussed at length here.
  • Figure 2 illustrates a more specific architecture 200 that may be used to implement the features of the present invention. The architecture 200 includes a computing system 201 in communication with a remote computing system 236. However, the computing system 201 may also implement the features of the present invention even without the assistance of the remote computing system 236, albeit without the features of the update service described further below. Although not required, each of the computing systems 201 and 236 may be structured as described above with respect to the computing system 100.
  • Figure 3 illustrates a flowchart of a method 300 for programmatically diagnosing and potentially resolving a problem in a computing system in accordance with the principles of the present invention. As the method 300 may be performed in the context of the architecture 200, Figures 2 and 3 will now be described with frequent reference to each other.
  • In Figure 3, the method 300 includes an act of monitoring events within an operating system (act 301). Referring to Figure 2, the monitored events are generated by a number of operating system (OS) components, drivers, applications, and services 262 which will also be collectively referred to herein as "event providers 262". The event providers 262 communicate events 202 to a logger 204. In one embodiment, the amount of data that is to be collected at any given point is bounded by the then existing circumstances. Thus, the logger 204 would deal with fewer events. Accordingly, any given event provider need not generate an event for every interaction it senses, but may generate only the more relevant events relating to root causes of problems. For example, an event need not be generated every time a disk drive writes to a sector. However, an event might be generated if the disk drive fails to respond to a read or write command, or attempts to write to a forbidden sector.
  • Examples of event providers 262 include software modules that manage power, Plug-and-Play (PnP) operation, memory management, bus control (e.g., PCI), and other low-level APIs (application programming interfaces). Other operating system components (or applications or drivers) may also raise events to the logger 204. Examples of the other operating system components include networking modules, graphics modules, audio modules, and printing modules.
  • Examples of the types of events 202 that are communicated to the logger 204 include user requests, system calls, device connections, communication requests, or the like. For example, one event may describe that a user has requested to put the computing system 201 into a low power or standby state, and subsequent events that will help the user or support engineer diagnose and resolve a standby failure in the event that the user's request does not succeed. For example, the standby failure may include what applications or drivers vetoed the request to be put into a low power state. However, any other events that are detectable by the operating system may be provided by the events provider 262 to the logger 204.
  • As the computing system 201 (specifically the logger 204) monitors the events (act 301), the logger 204 logs at least a subset of the events to a log file (act 302). For example, the event trace log file 248 represents an example of such a log file. The logger 204 is configured to log all or a portion of the events 202. Optionally, the logger may be configured to log those events that are more likely to be helpful with diagnosing a problem. The logger 204 also may notify the diagnostics policy service 208 of the events. The volume of events flowing to the diagnostics policy services 208 may be much lower than the volume of event flowing to the event trace log file 248 in some embodiments. For example, the logger 204 may simply notify the diagnostics policy service 208 when a transaction begins or ends, or when an error condition arises.
  • At some point while logging at least a subset of the monitored events (act 302), the computing system 201 detects one or more error conditions (act 303). Referring to Figure 2, this may be accomplished by the diagnostics policy service 208. The diagnostics policy service 208 determines when an actual problem has occurred by, for example, detecting a predetermined single error condition, or by detecting a predetermined sequence of error conditions has arisen.
  • Once a problem is detected, the computing system 201 performs a functional, result-oriented step for programmatically diagnosing a problem evidenced by the one or more error conditions (step 310). This may include any corresponding acts for accomplishing this result. However, in the illustrated embodiment, this includes corresponding acts 311 through 314.
  • Prior to actually performing programmatic diagnosis by invoking a diagnostics module (act 311), the computing system 201 may consult rules to determine that the diagnostic module should be invoked according to the rules (act 304). Rules may be set by received user input instructions or perhaps by default. Accordingly, the diagnostics policy service 208 is indirectly coupled to the diagnostics module(s)- 220 via a monitoring service 212.
  • The monitoring service 212 applies policy to filter which events are propagated up to invoke diagnostic module(s) 220 for root cause determination. Examples of when filtering of such events may be desirable include enterprise environments where an Information Technology (IT) manager or system administrator may prefer that the operating system not perform certain automatic root cause determination and/or problem resolutions actions automatically. For example, an IT manager may want to be informed that a problem has occurred, but have no automated root cause analysis or any automated resolution occur. Or, the IT manager may want root cause analysis to occur, but no automated resolution.
  • For example, one action that computing system 201 may undertake in response to determining a root cause problem may be to automatically install an updated device driver. Since updated device drivers may cause unexpected operational changes in some instances, an enterprise's IT manager may input 216 policy 214 to specify that the user may not be enabled or authorized to update device drivers. Another example of a policy that an IT manager may apply to monitoring service 212 is that no automatic problem resolution steps are to be taken. This would allow the user or IT manager to decide whether or not to perform the action instead of having computing system 201 perform the action automatically.
  • If its stored policy so permits, the monitoring service 212 invokes 218 an appropriate one of diagnostics module(s) 220 (act 311) when a particular set of one or more error conditions is detected by the diagnostics policy service 208. Alternatively, the appropriate diagnostics module may be invoked directly by the diagnostic policy service 208, or by one of event providers 262 (e.g., in embodiments without a monitoring service 212). The computing system 201 may include multiple diagnostics modules, each for diagnosing the root cause of predetermined error conditions or predetermined sequences of error conditions.
  • When invoked, each diagnostics module is configured to query and correlate 242 relevant data sources to diagnose the problem evidenced by the one or more error conditions (act 312) to determine information about which events and/or state preceded the problem event. Such relevant data sources may include, for example, event trace log 248, a configuration database 252 such as a registry, system compatibility manager 254, WMI providers 256, and other data sources and log files 250.
  • Depending on the particular operating system implementation, other log files 250 (e.g., network status logs) and other data sources may be queried in addition to, or in lieu of the sources illustrated in the figure.
  • System compatibility manager 254 is a service that receives status and error messages from different subsystems (e.g., the PCI bus subsystem, the USB subsystem, and the AGP subsystem) and other bus drivers and driver stacks in the system regarding known hardware anomalies that require device specific workarounds in order to allow the hardware in question to function properly. Such workarounds may impact how the device functions and may, therefore, end up being the root cause of a problem perceived by the end user. WMI providers expose diagnostic information about hardware devices on the system.
  • The diagnostics module evaluates the results 244 of the query (act 313), and identifies the root cause of the one or more error conditions in response to the evaluation (act 314). This may be accomplished by running a diagnostics routine that corresponds to the error condition(s). Each of at least some of the diagnostics modules (as well as at least some of the resolution modules 224 and the diagnostic policy service 208) may have plug-in capability to allow for more minor modifications of the corresponding diagnostics module. More specifically in one embodiment, the diagnostics module 220 compares the query results 244 with a list of root cause associations. This completes the step for programmatically diagnosing a problem evidenced by the one or more error conditions (step 310).
  • If the query results 244 are associate with an identified root cause, the invoked diagnostics module 220 may invoke an appropriate resolution module 224 (act 308) to perform an identified resolution that corresponds to the identified root cause. The identified root cause for a problem is some problem that is known to exist. The queries are specifically made to diagnose whether that problem is present or not. The monitoring service 212 may once again allow for stored policy to determine whether the resolution module should be invoked according to the rules (act 307). Accordingly, the diagnostics modules may first notify 222A the monitoring service 212 of the root cause. If the stored policy so permits, the monitoring service 212 invokes 222B the appropriate resolution module 224. There may also be multiple resolution modules 224, each associated with a different set of one or more root causes. Each resolution module may also have plug-in capability to allow for minor modifications as needed.
  • Each resolution module 224 may be configured with error resolution routines that are invoked by the appropriate diagnostics module subject to the policy in monitoring service 212. Examples of error resolution routines include searching for and/or installing new device drivers, or disabling or reconfiguring conflicting device drivers or applications. In one embodiment, at least some of the routines are performed automatically (i.e., without requiring user input). Some resolution modules, however, may utilize user input that obtained by invoking 228 a diagnostic user interface module 232 (e.g., a "trouble shooting wizard"). The diagnostic user interface module 232 may be engaged to prompt the user to enter additional information to be used by the appropriate resolution module (or by the computing system as a whole) to attempt to identify or resolve the problem. This may be particularly useful when the root cause of the one or more error conditions cannot be programmatically identified and/or resolved without further user assistance.
  • The interaction between the resolution module(s) 224 and the diagnostic user interface 232 is represented by bi-directional arrow 228A. The interaction between the diagnostics module(s) 220 and the diagnostic user interface 232 is represented by bi-directional arrow 228B. The diagnostic user interface 232 may also allow user interaction 228C with the event generators 262 to modify what events are generated.
  • A trouble shooter application 264 provides a user interface that allows a user to expressly report a problem to the monitoring service 212, rather than wait for the diagnostic policy service 208 to detect the problem. The diagnostic modules 220 would then diagnose the root cause of the reported problem, followed by the resolution modules 224 resolving the problem.
  • On occasion, it may be advantageous to modify what events are logged, how a diagnostics module diagnoses, or how a resolution module resolves an identified root cause of a problem. For example, perhaps a diagnostics module cannot diagnose a problem based on the logged events, or perhaps a resolution module cannot properly resolve the problem without modification. Accordingly, information from the diagnostic policy service 208, the diagnostics modules 220, the resolution modules 224 and/or the diagnostic user interface module 232 may be conveyed to the activity log 230 for reporting to error reporting service 238 (act 305). For example, resolution module reports to the activity log 230 as represented by arrow 226.
  • The activity log 230 may also be displayed to a user to allow the user to view the problems detected, what the diagnosis is, and how the diagnosed problem was resolved. The activity log 230 may also be provided to a remote location to allow technical support to view the relevant facts without having to rely on the user to state the relevant facts. The activity log 230 may also be sent to the error reporting service 238 to assist in forming statistical information regarding what problems are generally occurring on user systems.
  • Update service 240 may be used to send updates to one or more of the modules of computing system 201 to be received by the computing system 201 (act 306). For example, update service 240 may update logger 204 with additional events or event sequences to store to event trace log file 248 to assist in resolving the root cause of a new problem that has been detected by error reporting service 238 or other sources of information regarding failures experienced by users. Update service 240 may also update the diagnostic policy service 208 to change how problems are detected. Update service 240 may also be used to update (change existing modules, provide new modules, or add or modify a plug in) one or more of diagnostics modules 220 and resolution modules 224 to reflect a new solution that has been determined for a particular problem.. In one embodiment, update service 240 is operated by vender computing systems 236 and transmits updates the modules of computing system 201 via the Internet. Alternatively, a third party could provide custom changes or entirely new modules and configuration information.
  • If the error event does not have a known root cause associated with it, diagnostics module 220 will report this information to the activity log 230, which in turn sends an error report 234 to error reporting service 238.
  • If the vendor is able to determine the root cause from the information sent by activity log 230, the root cause association information and corresponding problem resolution information is information is sent to the computing system 201 via update service 240. If the vendor is unable to determine the root cause, the vendor may use update service 240 to instruct diagnostic policy service 204 to store additional event or state information to event trace log file 248. The resolution module 224 may likewise instruct 260 the logger to store additional events 224 in order to ensure that proper resolution is achieved. When the additional information is transmitted to the error reporting service 238 after the next occurrence of the problem, the additional information may enable the vendor to better identify the root cause of the problem.
  • The error report 234 may be sent even before a known root cause is diagnosed. Reporting the error at this early stage allows the update service 240 to update the diagnostic modules 220 and/or resolution modules 224 to be updated prior to attempting diagnosis and resolution. Alternatively, the error may be reported 234 after diagnosis, but before resolution. In that case, the update service 240 may update the specific resolution module dedicated to resolving the specifically diagnosed problem.
  • The diagnostic policy service 208, the diagnostic modules 220, and the resolution module 224 may be configured to report their activity to activity log 230 (e.g., an error was detected, a diagnostic module was invoked, the diagnostic module took certain steps, the root cause was found and is this, the root cause could not be determined, a resolution module was invoked, the resolution module took these steps, the problem was resolved, the problem was not resolved, and the like). This provides information to the vendor regarding whether the system is diagnosing problems and whether or not the problems are being resolved. This information may be valuable to the vendor because it may be used to determine if either the diagnostics module 220 or resolution module 224 needs to be updated. The information may also be useful for the vender in understanding which problems are most pervasive to the end user, thereby allowing the vender to act on that information. For example, the vender might respond by creating a new architecture to avoid the pervasive problems in the future.
  • The error report 234 may be sent even before a known root cause is diagnosed. Reporting the error at this early stage allows the update service 240 to update the diagnostic modules 220 and/or resolution modules 224 to be updated prior to attempting diagnosis and resolution. Alternatively, the error may be reported 234 after diagnosis, but before resolution. In that case, the update service 240 may update the specific resolution module dedicated to resolving the specifically diagnosed problem.
  • As note above, one example of an update that may be sent via update service 240 is a new problem resolution for resolution module 224. If a particular type of error has been identified, but the root cause is difficult to determine, an update may instruct an event provider or the logger to store more event information to event trace log file 248. This will allow diagnostics module 220 to send more detailed information to activity log 230, which sends the more detailed information to error reporting service. The additional information is likely to assist the vendor computing system 236 in determining the root cause and solution to the problem. In turn, new diagnostic and resolution modules can be downloaded to address the problem.
  • Accordingly, a mechanism is described that programmatically diagnoses and resolves problems subject to internal policy constraints. Furthermore, the mechanism updates itself as needed in order to better diagnose the root cause of error condition(s), and resolve the root cause.

Claims (30)

  1. A method for programmatically diagnosing the root cause of a problem in a computing system that executes an operating system, the method comprising the following:
    an act of generating events within an operating system;
    an act of logging at least a subset of the events to a log file;
    an act of detecting one or more error conditions;
    an act of invoking a diagnostic module in response to the act of detecting one or more error conditions, wherein the diagnostic module is configured to do the following when invoked:
    an act of querying the log file to correlate events relevant to diagnosis of the problem evidenced by the one or more error conditions;
    an act of evaluating the results of the query; and
    an act of identifying the root cause of the one or more error conditions in response to the evaluation.
  2. A method in accordance with Claim 1, further comprising the following after the act of detecting one or more error conditions:
    an act of consulting rules to determine that the diagnostic module should be invoked according to the rules.
  3. A method in accordance with Claim 2, further comprising the following:
    an act of receiving user input to set the rules.
  4. A method in accordance with one of Claims 1 to 3, further comprising the following:
    an act of sending at least a subset of the results of the query to an error reporting service.
  5. A method in accordance with Claim 4, further comprising the following:
    an act of receiving one or more updates, wherein the updates modify which events are logged.
  6. A method in accordance with Claim 5, wherein the updates further alter how the diagnostic module diagnoses.
  7. A method in accordance with Claim 4, further comprising the following:
    an act of receiving one or more updates, wherein the updates alter how the diagnostics module diagnoses.
  8. A method in accordance with one of Claims 1 to 7, further comprising the following:
    an act of invoking a resolution module in response to the act of identifying the root cause of the one or more error conditions, the resolution module configured to do the following when invoked:
    an act of resolving the root cause of the one or more error conditions.
  9. A method in accordance with Claim 8, further comprising the following after the act of detecting one or more error conditions:
    an act of consulting rules to determine that the resolution module should be invoked according to the rules.
  10. A method in accordance with Claim 9, further comprising the following:
    an act of receiving user input to set the rules.
  11. A method in accordance with Claim 8, further comprising the following:
    an act of sending at least a subset of the results of the query to an error reporting service.
  12. A method in accordance with Claim 11, further comprising the following:
    an act of receiving one or more updates, wherein the updates modify which events are logged.
  13. A method in accordance with Claim 12, wherein the updates further alter how the diagnostic module diagnoses.
  14. A method in accordance with Claim 13, wherein the updates further alter how the resolution module resolves.
  15. A method in accordance with Claim 11, wherein the updates alter how the diagnostic module diagnoses.
  16. A method in accordance with Claim 15, wherein the updates further alter how the resolution module resolves.
  17. A method in accordance with Claim 11, wherein the updates alter how the resolution module resolves.
  18. A method in accordance with Claim 17, wherein the updates further alter which events are logged.
  19. A method in accordance with Claim 4, further comprising the following:
    an act of receiving one or more updates, wherein the updates alter how the diagnostics module diagnoses.
  20. A method in accordance with one of Claims 1 to 19, further comprising the following:
    an act of determining that the root cause of the one or more error conditions cannot be programmatically resolved; and
    an act of engaging a user interface module to prompt the user to enter additional information to be used by the diagnostic or resolution module to attempt to identify or resolve the problem.
  21. A method in accordance with one of Claims 1 to 20, wherein the user interface module is a trouble shooting wizard.
  22. A method for programmatically diagnosing the root cause of a problem in a computing system that executes an operating system, the method comprising the following:
    an act of generating events within an operating system;
    an act of logging at least a subset of the events to a log file;
    an act of detecting one or more error conditions; and
    a step for programmatically diagnosing a problem evidenced by the one or more error conditions.
  23. A method in accordance with Claim 22, wherein the step for programmatically diagnosing a problem evidenced by the one or more error conditions comprises the following:
    an act of invoking a diagnostic module in response to the act of detecting one or more error conditions, wherein the diagnostic module is configured to do the following when invoked:
    an act of querying the log file to correlate events relevant to diagnosis of the problem evidenced by the one or more error conditions; and
    an act of evaluating the results of the query; and
    an act of identifying the root cause of the one or more error conditions in response to the evaluation.
  24. A method for determining the root cause of a problem in a computing system that executes an operating system and that is network connected to an error reporting service, the method comprising the following:
    an act of generating events within an operating system;
    an act of logging at least a subset of the events to a log file;
    an act of detecting one or more error conditions, and in response thereto:
    an act of querying the log file to correlate relevant events,
    an act of sending at least a subset of the results of the query to an error reporting service; and
    an act of receiving one or more updates, wherein the updates modify which events are logged, what diagnostic steps are to be taken, or what resolution steps are taken by the computer system's operating system or recommended to the end user.
  25. A computer program product for use in a computing system, the computer program product comprising one or more computer-readable media having thereon computer-executable instructions that, when executed by one or more processors of the computing system, cause the computing system to perform the method of one of claims 1 to 24.
  26. A computer program product in accordance with Claim 25, wherein the one or more computer-readable media are physical memory media.
  27. A computer-readable media having thereon computer-executable instructions that, when executed by one or more processors of a computing system, cause the computing system to instantiate in memory the following:
    an event logger configured to log events in a log file;
    a problem detection module configured to detect a problem when one or more error conditions occur, and configured to cause a resolution module to be invoked in at least some circumstances when the problem is detected; and
    a diagnosis module configured to query the log file, evaluate the results of the query, and diagnose the problem based on the evaluation.
  28. A computer-readable media in accordance with Claim 27, further having thereon computer-executable instructions that, when executed by the one or more processors cause the computing system to further instantiate in memory the following:
    a monitoring module that maintains rules regarding when the diagnosis module should be invoked, wherein the monitoring module causes the diagnosis module to be invoked in response to the problem detection module detecting the problem if the rules so permit.
  29. A computer-readable media in accordance with Claim 27, further having thereon computer-executable instructions that, when executed by the one or more processors cause the computing system to further instantiate in memory the following:
    a resolution module configured to resolve the problem when invoked, wherein the diagnosis module is further configured to cause the resolution module to be invoked in at least some circumstances when the diagnosis module diagnoses the problem.
  30. A computer-readable media in accordance with Claim 29, further having thereon computer-executable instructions that, when executed by the one or more processors cause the computing system to further instantiate in memory the following:
    a monitoring module that maintains rules regarding when the resolution module should be invoked, wherein the monitoring module causes the resolution module to be invoked in response to the diagnosis module diagnosing the problem if the rules so permit.
EP04010605A 2003-05-07 2004-05-04 Programatic computer diagnosis and resolution and automated reporting and updating of the same Withdrawn EP1515234A3 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US46877203P 2003-05-07 2003-05-07
US468772P 2003-05-07
US651430 2003-08-30
US10/651,430 US7263632B2 (en) 2003-05-07 2003-08-30 Programmatic computer problem diagnosis and resolution and automated reporting and updating of the same

Publications (2)

Publication Number Publication Date
EP1515234A2 true EP1515234A2 (en) 2005-03-16
EP1515234A3 EP1515234A3 (en) 2007-08-08

Family

ID=33423772

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04010605A Withdrawn EP1515234A3 (en) 2003-05-07 2004-05-04 Programatic computer diagnosis and resolution and automated reporting and updating of the same

Country Status (5)

Country Link
US (1) US7263632B2 (en)
EP (1) EP1515234A3 (en)
JP (1) JP2004334869A (en)
KR (1) KR101021394B1 (en)
CN (1) CN100412802C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004047363A1 (en) * 2004-09-29 2006-03-30 Siemens Ag Processor or method for operating a processor and / or operating system in the event of a fault
WO2015026680A1 (en) * 2013-08-19 2015-02-26 Microsoft Corporation Cloud deployment infrastructure validation engine

Families Citing this family (114)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1729734B (en) 2002-10-22 2011-01-05 贾森·A·沙利文 Systems and methods for providing a dynamically modular processing unit
BR0315624A (en) 2002-10-22 2005-08-23 Jason A Sullivan Rugged Customizable Computer Processing System
EP1557075A4 (en) 2002-10-22 2010-01-13 Sullivan Jason Non-peripherals processing control module having improved heat dissipating properties
US8032792B2 (en) * 2003-07-11 2011-10-04 Avicode, Inc. Dynamic discovery algorithm
DE102004022624A1 (en) * 2004-05-07 2005-12-08 Robert Bosch Gmbh Method for monitoring a system
US7440933B2 (en) * 2004-06-18 2008-10-21 International Business Machines Corporation Method for facilitating problem resolution
US20060155595A1 (en) * 2005-01-13 2006-07-13 Microsoft Corporation Method and apparatus of managing supply chain exceptions
US20060177004A1 (en) * 2005-01-26 2006-08-10 N-Able Technologies International, Inc. Apparatus and method for monitoring network resources
US20060200548A1 (en) * 2005-03-02 2006-09-07 N-Able Technologies International, Inc. Automation engine and method for providing an abstraction layer
CN100373350C (en) * 2005-03-21 2008-03-05 华为技术有限公司 Fault alarm reporting management method
US20060249576A1 (en) * 2005-04-04 2006-11-09 Mark Nakada Systems and methods for providing near real-time collection and reporting of data to third parties at remote locations
US20060224517A1 (en) * 2005-04-04 2006-10-05 Anirudha Shimpi Systems and methods for delivering digital content to remote locations
US10210529B2 (en) * 2005-04-04 2019-02-19 Mediaport Entertainment, Inc. Systems and methods for advertising on remote locations
US9189792B2 (en) * 2005-04-04 2015-11-17 Mediaport Entertainment, Inc. System and methods for monitoring devices, systems, users, and user activity at remote locations
US7421625B2 (en) * 2005-05-26 2008-09-02 Microsoft Corporation Indicating data connection and status conditions
KR100731497B1 (en) * 2005-05-31 2007-06-21 지멘스 오토모티브 주식회사 Apparatus and method for controlling error code of micom
DE102005061386A1 (en) * 2005-12-22 2007-06-28 Robert Bosch Gmbh Process to diagnose a fault within an automotive electronic entertainment system with software components
CN101438249A (en) * 2006-05-07 2009-05-20 应用材料股份有限公司 Ranged fault signatures for fault diagnosis
CN101192192B (en) * 2006-11-21 2010-08-18 华为技术有限公司 Task abnormity diagnosis method and system for real-time operating system
JP2008134705A (en) * 2006-11-27 2008-06-12 Hitachi Ltd Data processing method and data analysis device
US20080155305A1 (en) * 2006-12-22 2008-06-26 International Business Machines Corporation Collaborative problem determination based on graph visualization
JP2008271126A (en) 2007-04-19 2008-11-06 Ntt Docomo Inc Mobile terminal apparatus and diagnosis method for mobile terminal apparatus
JP4582167B2 (en) * 2007-04-27 2010-11-17 ダイキン工業株式会社 Group management device and group management program
US10223858B2 (en) 2007-07-05 2019-03-05 Mediaport Entertainment, Inc. Systems and methods monitoring devices, systems, users and user activity at remote locations
DE102007035584B4 (en) * 2007-07-30 2009-12-17 Texas Instruments Deutschland Gmbh Watchdog device for monitoring an electronic system
DE102007035586B4 (en) * 2007-07-30 2009-12-17 Texas Instruments Deutschland Gmbh Watchdog device for monitoring an electronic system
JP2009048291A (en) * 2007-08-15 2009-03-05 Oki Electric Ind Co Ltd System analysis device and program
KR101441506B1 (en) * 2007-11-20 2014-09-18 삼성전자주식회사 Diagnostics and Monitoring Method of Potable Device And System Thereof
US20090182707A1 (en) * 2008-01-10 2009-07-16 Dbix Corporation Database changeset management system and method
JP2009181441A (en) * 2008-01-31 2009-08-13 Nomura Research Institute Ltd Automatic repair system and method
US8201029B2 (en) * 2008-01-31 2012-06-12 International Business Machines Corporation Method and apparatus for operating system event notification mechanism using file system interface
US8806037B1 (en) 2008-02-29 2014-08-12 Netapp, Inc. Remote support automation for a storage server
US7793141B1 (en) * 2008-05-15 2010-09-07 Bank Of America Corporation eCommerce outage customer notification
US20090320021A1 (en) * 2008-06-19 2009-12-24 Microsoft Corporation Diagnosis of application performance problems via analysis of thread dependencies
CN100592315C (en) * 2008-08-29 2010-02-24 中国科学院软件研究所 XACML policy rule checking method
JP5237034B2 (en) * 2008-09-30 2013-07-17 株式会社日立製作所 Root cause analysis method, device, and program for IT devices that do not acquire event information.
US8086909B1 (en) * 2008-11-05 2011-12-27 Network Appliance, Inc. Automatic core file upload
US8069374B2 (en) * 2009-02-27 2011-11-29 Microsoft Corporation Fingerprinting event logs for system management troubleshooting
US20100229022A1 (en) * 2009-03-03 2010-09-09 Microsoft Corporation Common troubleshooting framework
CN102341788A (en) * 2009-04-13 2012-02-01 索尼公司 System care of computing devices
US8464221B2 (en) * 2009-06-16 2013-06-11 Microsoft Corporation Visualization tool for system tracing infrastructure events
CN102143008A (en) * 2010-01-29 2011-08-03 国际商业机器公司 Method and device for diagnosing fault event in data center
US8060782B2 (en) * 2010-03-01 2011-11-15 Microsoft Corporation Root cause problem identification through event correlation
US8380729B2 (en) 2010-06-04 2013-02-19 International Business Machines Corporation Systems and methods for first data capture through generic message monitoring
US20110307746A1 (en) * 2010-06-07 2011-12-15 Sullivan Jason A Systems and Methods for Intelligent and Flexible Management and Monitoring of Computer Systems
US9009530B1 (en) * 2010-06-30 2015-04-14 Purdue Research Foundation Interactive, constraint-network prognostics and diagnostics to control errors and conflicts (IPDN)
US9891971B1 (en) * 2010-06-30 2018-02-13 EMC IP Holding Company LLC Automating the production of runbook workflows
US9294946B2 (en) * 2010-08-27 2016-03-22 Qualcomm Incorporated Adaptive automatic detail diagnostic log collection in a wireless communication system
US8850172B2 (en) 2010-11-15 2014-09-30 Microsoft Corporation Analyzing performance of computing devices in usage scenarios
US8499197B2 (en) 2010-11-15 2013-07-30 Microsoft Corporation Description language for identifying performance issues in event traces
WO2012106066A1 (en) * 2011-01-31 2012-08-09 Thomson Licensing Diagnostic information logging
US9189317B1 (en) * 2011-03-17 2015-11-17 Extreme Networks, Inc. Software sustaining system
US8862938B2 (en) * 2011-04-18 2014-10-14 General Electric Company System, method, and apparatus for resolving errors in a system
US8600992B2 (en) 2011-08-17 2013-12-03 International Business Machines Corporation Coordinating problem resolution in complex systems using disparate information sources
JP2013101548A (en) * 2011-11-09 2013-05-23 Hitachi Systems Ltd Computer system and recovery method
US8868064B1 (en) * 2011-11-09 2014-10-21 Sprint Communications Company L.P. Mobile device metrics management
FR2986879B1 (en) * 2012-02-15 2014-10-17 Airbus Operations Sas METHOD AND SYSTEM FOR DETECTING ANOMALIES SOLVING IN AN AIRCRAFT
US8977909B2 (en) 2012-07-19 2015-03-10 Dell Products L.P. Large log file diagnostics system
US10162693B1 (en) 2012-10-18 2018-12-25 Sprint Communications Company L.P. Evaluation of mobile device state and performance metrics for diagnosis and troubleshooting of performance issues
US20140282426A1 (en) * 2013-03-12 2014-09-18 Microsoft Corporation Divide and conquer approach to scenario timeline activity attribution
CN104346571B (en) * 2013-07-23 2019-03-15 深圳市腾讯计算机系统有限公司 Security breaches management method, system and equipment
KR101537680B1 (en) * 2013-09-02 2015-07-17 엘에스산전 주식회사 Event communication apparatus for protection relay
US20150370619A1 (en) * 2013-09-18 2015-12-24 Hitachi, Ltd. Management system for managing computer system and management method thereof
US9535780B2 (en) * 2013-11-18 2017-01-03 International Business Machines Corporation Varying logging depth based on user defined policies
US20150161123A1 (en) * 2013-12-09 2015-06-11 Microsoft Corporation Techniques to diagnose live services
US9678825B2 (en) * 2014-02-18 2017-06-13 International Business Machines Corporation Autonomous reconfiguration of a failed user action
WO2015187001A2 (en) * 2014-06-04 2015-12-10 Mimos Berhad System and method for managing resources failure using fast cause and effect analysis in a cloud computing system
US9946614B2 (en) * 2014-12-16 2018-04-17 At&T Intellectual Property I, L.P. Methods, systems, and computer readable storage devices for managing faults in a virtual machine network
EP3059676B1 (en) * 2015-02-20 2019-09-11 Siemens Aktiengesellschaft A method and apparatus for analyzing the availability of a system, in particular of a safety critical system
US10650085B2 (en) 2015-03-26 2020-05-12 Microsoft Technology Licensing, Llc Providing interactive preview of content within communication
US10379702B2 (en) 2015-03-27 2019-08-13 Microsoft Technology Licensing, Llc Providing attachment control to manage attachments in conversation
US10540225B2 (en) * 2015-05-27 2020-01-21 Hewlett Packard Enterprise Development Lp Data validation
US10365962B2 (en) * 2015-11-16 2019-07-30 Pearson Education, Inc. Automated testing error assessment system
CN105930329A (en) * 2015-12-28 2016-09-07 中国银联股份有限公司 Transaction log analysis method and apparatus
US10073753B2 (en) * 2016-02-14 2018-09-11 Dell Products, Lp System and method to assess information handling system health and resource utilization
US10176034B2 (en) * 2016-02-16 2019-01-08 International Business Machines Corporation Event relationship analysis in fault management
CN105786635B (en) * 2016-03-01 2018-10-12 国网江苏省电力公司电力科学研究院 A kind of Complex event processing system and method towards Fault-Sensitive point dynamic detection
US10162698B2 (en) 2016-03-25 2018-12-25 Dropbox, Inc. System and method for automated issue remediation for information technology infrastructure
US10223145B1 (en) * 2016-06-21 2019-03-05 Amazon Technologies, Inc. System for impairment issues of distributed hosts
US10416982B1 (en) 2016-06-30 2019-09-17 EMC IP Holding Company LLC Automated analysis system and method
US10095504B1 (en) * 2016-06-30 2018-10-09 EMC IP Holding Company LLC Automated analysis system and method
KR102195640B1 (en) * 2016-08-18 2020-12-28 휴렛-팩커드 디벨롭먼트 컴퍼니, 엘.피. Host device and method for performing network diagnostics of image forming apparatus, image forming apparatus performed network diagnostics and method for controlling the same
US10216622B2 (en) 2016-09-01 2019-02-26 International Business Machines Corporation Diagnostic analysis and symptom matching
JP6832361B2 (en) 2016-09-29 2021-02-24 株式会社ディーアンドエムホールディングス Audio-visual equipment
US10394633B2 (en) 2016-09-30 2019-08-27 Microsoft Technology Licensing, Llc On-demand or dynamic diagnostic and recovery operations in conjunction with a support service
US10241848B2 (en) 2016-09-30 2019-03-26 Microsoft Technology Licensing, Llc Personalized diagnostics, troubleshooting, recovery, and notification based on application state
US10476768B2 (en) 2016-10-03 2019-11-12 Microsoft Technology Licensing, Llc Diagnostic and recovery signals for disconnected applications in hosted service environment
US10929363B2 (en) * 2016-11-11 2021-02-23 International Business Machines Corporation Assisted problem identification in a computing system
US11429473B2 (en) 2016-11-30 2022-08-30 Red Hat, Inc. Automated problem resolution
US10467083B2 (en) * 2017-06-08 2019-11-05 International Business Machines Corporation Event relationship analysis in fault management
US10565045B2 (en) * 2017-06-28 2020-02-18 Microsoft Technology Licensing, Llc Modularized collaborative performance issue diagnostic system
US10452465B2 (en) * 2017-09-08 2019-10-22 Oracle International Corporation Techniques for managing and analyzing log data
US10678630B2 (en) * 2017-12-15 2020-06-09 Wipro Limited Method and system for resolving error in open stack operating system
US11075925B2 (en) 2018-01-31 2021-07-27 EMC IP Holding Company LLC System and method to enable component inventory and compliance in the platform
US10769009B2 (en) * 2018-03-21 2020-09-08 International Business Machines Corporation Root cause analysis for correlated development and operations data
US10713110B2 (en) * 2018-03-27 2020-07-14 Accenture Global Solutions Limited Automated issue detection and resolution framework for enterprise resource planning
US10693722B2 (en) 2018-03-28 2020-06-23 Dell Products L.P. Agentless method to bring solution and cluster awareness into infrastructure and support management portals
US10754708B2 (en) 2018-03-28 2020-08-25 EMC IP Holding Company LLC Orchestrator and console agnostic method to deploy infrastructure through self-describing deployment templates
US10795756B2 (en) 2018-04-24 2020-10-06 EMC IP Holding Company LLC System and method to predictively service and support the solution
US11086738B2 (en) * 2018-04-24 2021-08-10 EMC IP Holding Company LLC System and method to automate solution level contextual support
US10970632B2 (en) * 2018-06-25 2021-04-06 Hcl Technologies Ltd Generating a score for a runbook or a script
US10769043B2 (en) * 2018-06-25 2020-09-08 Hcl Technologies Ltd. System and method for assisting user to resolve a hardware issue and a software issue
US11599422B2 (en) 2018-10-16 2023-03-07 EMC IP Holding Company LLC System and method for device independent backup in distributed system
US10824528B2 (en) 2018-11-27 2020-11-03 Capital One Services, Llc Techniques and system for optimization driven by dynamic resilience
US10282248B1 (en) * 2018-11-27 2019-05-07 Capital One Services, Llc Technology system auto-recovery and optimality engine and techniques
US11288114B2 (en) * 2019-01-26 2022-03-29 Microsoft Technology Licensing, Llc Remote diagnostic of computing devices
JPWO2020166004A1 (en) * 2019-02-14 2021-02-18 三菱電機株式会社 Control systems, programmable logic controllers, methods, and programs
US10862761B2 (en) 2019-04-29 2020-12-08 EMC IP Holding Company LLC System and method for management of distributed systems
US11061800B2 (en) * 2019-05-31 2021-07-13 Microsoft Technology Licensing, Llc Object model based issue triage
US11301557B2 (en) 2019-07-19 2022-04-12 Dell Products L.P. System and method for data processing device management
US10686645B1 (en) 2019-10-09 2020-06-16 Capital One Services, Llc Scalable subscriptions for virtual collaborative workspaces
US10866872B1 (en) 2019-11-18 2020-12-15 Capital One Services, Llc Auto-recovery for software systems
US20210406112A1 (en) * 2020-06-29 2021-12-30 International Business Machines Corporation Anomaly classification in information technology environments
US11322976B1 (en) * 2021-02-17 2022-05-03 Sas Institute Inc. Diagnostic techniques for monitoring physical devices and resolving operational events

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000039674A1 (en) * 1998-12-31 2000-07-06 Computer Associates Think, Inc. System and method for dynamic correlation of events
WO2003005200A1 (en) * 2001-07-06 2003-01-16 Computer Associates Think, Inc. Method and system for correlating and determining root causes of system and enterprise events
US6550055B1 (en) * 1998-12-29 2003-04-15 Intel Corp. Method and apparatus for cheating an information report resulting from a diagnostic session

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6822553B1 (en) * 1985-10-16 2004-11-23 Ge Interlogix, Inc. Secure entry system with radio reprogramming
US5200958A (en) * 1990-09-28 1993-04-06 Xerox Corporation Method and apparatus for recording and diagnosing faults in an electronic reprographic printing system
US5245615A (en) * 1991-06-06 1993-09-14 International Business Machines Corporation Diagnostic system and interface for a personal computer
US5463768A (en) * 1994-03-17 1995-10-31 General Electric Company Method and system for analyzing error logs for diagnostics
JP2868114B2 (en) * 1994-06-07 1999-03-10 三菱電機株式会社 Computer with monitoring and diagnostic function
EP0690378A1 (en) * 1994-06-30 1996-01-03 Tandem Computers Incorporated Tool and method for diagnosing and correcting errors in a computer programm
US5495573A (en) * 1994-08-05 1996-02-27 Unisys Corporation Error logging system with clock rate translation
JPH08305595A (en) * 1995-05-11 1996-11-22 Mitsubishi Electric Corp Diagnostic method for information processing unit
US5884073A (en) * 1996-10-28 1999-03-16 Intel Corporation System and method for providing technical support of an electronic system through a web bios
US6247149B1 (en) * 1997-10-28 2001-06-12 Novell, Inc. Distributed diagnostic logging system
US6212653B1 (en) * 1998-02-18 2001-04-03 Telefonaktiebolaget Lm Ericsson (Publ) Logging of events for a state driven machine
JP3141856B2 (en) * 1998-09-30 2001-03-07 日本電気株式会社 Failure recovery assisting method, apparatus therefor, and machine-readable recording medium recording a program
US6467052B1 (en) * 1999-06-03 2002-10-15 Microsoft Corporation Method and apparatus for analyzing performance of data processing system
EP1065619A2 (en) * 1999-06-29 2001-01-03 General Electric Company Method and apparatus for automatically guiding a user through a medical diagnostic system service workflow
CN1283029A (en) * 1999-07-29 2001-02-07 神基科技股份有限公司 Remote system diagnosis method
US6615367B1 (en) * 1999-10-28 2003-09-02 General Electric Company Method and apparatus for diagnosing difficult to diagnose faults in a complex system
US7500143B2 (en) * 2000-05-05 2009-03-03 Computer Associates Think, Inc. Systems and methods for managing and analyzing faults in computer networks
JP2002049508A (en) * 2000-05-29 2002-02-15 Mirae E Net Co Ltd Method for diagnosis of computer, system through internet
US7043661B2 (en) * 2000-10-19 2006-05-09 Tti-Team Telecom International Ltd. Topology-based reasoning apparatus for root-cause analysis of network faults
JP3979000B2 (en) * 2000-11-27 2007-09-19 株式会社日立製作所 Log information acquisition / output method and recording medium storing program for realizing the method
US6738933B2 (en) * 2001-05-09 2004-05-18 Mercury Interactive Corporation Root cause analysis of server system performance degradations
US7120685B2 (en) * 2001-06-26 2006-10-10 International Business Machines Corporation Method and apparatus for dynamic configurable logging of activities in a distributed computing system
US6738832B2 (en) * 2001-06-29 2004-05-18 International Business Machines Corporation Methods and apparatus in a logging system for the adaptive logger replacement in order to receive pre-boot information
US7065767B2 (en) * 2001-06-29 2006-06-20 Intel Corporation Managed hosting server auditing and change tracking
US7194445B2 (en) * 2002-09-20 2007-03-20 Lenovo (Singapore) Pte. Ltd. Adaptive problem determination and recovery in a computer system
US7516362B2 (en) * 2004-03-19 2009-04-07 Hewlett-Packard Development Company, L.P. Method and apparatus for automating the root cause analysis of system failures
US7398429B2 (en) * 2005-02-03 2008-07-08 Cisco Technology, Inc. System and method for tracing and logging for software module

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6550055B1 (en) * 1998-12-29 2003-04-15 Intel Corp. Method and apparatus for cheating an information report resulting from a diagnostic session
WO2000039674A1 (en) * 1998-12-31 2000-07-06 Computer Associates Think, Inc. System and method for dynamic correlation of events
WO2003005200A1 (en) * 2001-07-06 2003-01-16 Computer Associates Think, Inc. Method and system for correlating and determining root causes of system and enterprise events

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004047363A1 (en) * 2004-09-29 2006-03-30 Siemens Ag Processor or method for operating a processor and / or operating system in the event of a fault
WO2015026680A1 (en) * 2013-08-19 2015-02-26 Microsoft Corporation Cloud deployment infrastructure validation engine
US9471474B2 (en) 2013-08-19 2016-10-18 Microsoft Technology Licensing, Llc Cloud deployment infrastructure validation engine

Also Published As

Publication number Publication date
CN1550989A (en) 2004-12-01
CN100412802C (en) 2008-08-20
KR20040095682A (en) 2004-11-15
US7263632B2 (en) 2007-08-28
KR101021394B1 (en) 2011-03-14
EP1515234A3 (en) 2007-08-08
JP2004334869A (en) 2004-11-25
US20040225381A1 (en) 2004-11-11

Similar Documents

Publication Publication Date Title
US7263632B2 (en) Programmatic computer problem diagnosis and resolution and automated reporting and updating of the same
US7028175B2 (en) System and method for computer hardware identification
KR102268355B1 (en) Cloud deployment infrastructure validation engine
EP2427822B1 (en) Exception raised notification
US7877642B2 (en) Automatic software fault diagnosis by exploiting application signatures
US7788537B1 (en) Techniques for collecting critical information from a memory dump
US7523340B2 (en) Support self-heal tool
US20070067754A1 (en) Server application state
US20050203952A1 (en) Tracing a web request through a web server
US20080126887A1 (en) Method and system for site configurable error reporting
US20080320336A1 (en) System and Method of Client Side Analysis for Identifying Failing RAM After a User Mode or Kernel Mode Exception
CN112199284A (en) Program automation testing method and corresponding device, equipment and medium
JP5425720B2 (en) Virtualization environment monitoring apparatus and monitoring method and program thereof
US10846206B2 (en) Adaptive software testing
WO2005103915A2 (en) Method for collecting monitor information
JP5840290B2 (en) Software operability service
US11443011B2 (en) Page objects library
EP3674903B1 (en) Mobile device with secondary debug display
Cisco Release Notes for Cisco Element Management Framework v3.2
US20110179428A1 (en) Self-testable ha framework library infrastructure
EP1274013A1 (en) Process monitor module
CN116974836A (en) Vehicle-mounted USB non-response detection method, device, equipment and medium
CN115543672A (en) Error data positioning method, device, equipment and storage medium
CN114546428A (en) Cuckoo cluster deployment method, cuckoo cluster-based detection method and device
CN116126703A (en) Debugging method, debugging device and readable storage device

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL HR LT LV MK

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL HR LT LV MK

RTI1 Title (correction)

Free format text: PROGRAMATIC COMPUTER DIAGNOSIS AND RESOLUTION AND AUTOMATED REPORTING AND UPDATING OF THE SAME

17P Request for examination filed

Effective date: 20071217

17Q First examination report despatched

Effective date: 20080218

AKX Designation fees paid

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

APBR Date of receipt of statement of grounds of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA3E

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

APAX Date of receipt of notice of appeal deleted

Free format text: ORIGINAL CODE: EPIDOSDNOA2E

APAZ Date of receipt of statement of grounds of appeal deleted

Free format text: ORIGINAL CODE: EPIDOSDNOA3E

APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

APBR Date of receipt of statement of grounds of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA3E

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ROVI TECHNOLOGIES CORPORATION

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ROVI TECHNOLOGIES CORPORATION

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ROVI TECHNOLOGIES CORPORATION

APBT Appeal procedure closed

Free format text: ORIGINAL CODE: EPIDOSNNOA9E

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20210707