WO2001015004A2 - Service bureau architecture - Google Patents

Service bureau architecture Download PDF

Info

Publication number
WO2001015004A2
WO2001015004A2 PCT/US2000/023355 US0023355W WO0115004A2 WO 2001015004 A2 WO2001015004 A2 WO 2001015004A2 US 0023355 W US0023355 W US 0023355W WO 0115004 A2 WO0115004 A2 WO 0115004A2
Authority
WO
WIPO (PCT)
Prior art keywords
content
information
sales
document
metadata
Prior art date
Application number
PCT/US2000/023355
Other languages
French (fr)
Other versions
WO2001015004A8 (en
Inventor
Steven Parkes
Ken Kubiak
Michael Peercey
John Chandy
Original Assignee
Cma Business Credit Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cma Business Credit Services filed Critical Cma Business Credit Services
Priority to AU68010/00A priority Critical patent/AU6801000A/en
Publication of WO2001015004A2 publication Critical patent/WO2001015004A2/en
Publication of WO2001015004A8 publication Critical patent/WO2001015004A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the invention relates to an electronic sales support system. More particularly, the invention relates to a service bureau architecture.
  • Information appliances and the Internet are revolutionizing the buying and selling process. While their primary impact so far has been felt in the retail distribution of branded, commodity products, there is great potential to leverage these technologies to improve the business-to-business sales process for more sophisticated goods and services. In particular, the complexity and rapid change characteristic in industries such as telecommunications, high technology, and financial services make them ripe for the application of innovative Internet technologies.
  • the underlying problem is not that the sales information does not exist. Marketing generates gigabytes of Word documents, PowerPoint presentations, and e-mails, but sales for the most part is not able to take advantage of these efforts. The reason for this is that the information is not in a form that is readily accessible and guaranteed to be accurate and up-to- date.
  • Sales pros and organizations are at a growing disadvantage in this environment.
  • sales reps move to close by answering an unpredictable range of issues and questions from the prospect.
  • Most sales organizations have reams of information for their sales reps and channels, but lack information systems that quickly provide the exact information required to close deals. Information quickly falls out of date. Sellers cannot get immediate access to the precise information they need to compete and win, much less add value to the customer's buying process.
  • Forrester Research calls this sales information gap the most important challenge for companies in the Internet era, and predicts the rise of a new generation of systems to solve it.
  • Harnessing unstructured information for internal and external users is a competitive imperative that few organizations are prepared to meet. Companies do not have the time, infrastructure, tools, and process support to solve the sales information problem by themselves. It would be advantageous to provide sales and marketing information exchange, e.g. an Internet e- service that lets direct sales reps, telesales, and channel partners zero in on the precise information needed to motivate prospects to close deals, with the assurance they have the latest, most accurate information available.
  • sales and marketing information exchange e.g. an Internet e- service that lets direct sales reps, telesales, and channel partners zero in on the precise information needed to motivate prospects to close deals, with the assurance they have the latest, most accurate information available.
  • the invention provides a service bureau architecture.
  • the invention provides a personal sales information channel that helps him quickly find the right document, news article, presentation, competitive analysis, customer reference to answer prospect and customer issues and keep deals moving.
  • the business partner, the telesales rep, and even prospects and customers see content tailored to their individual needs.
  • Behind each individual's channel is a carefully organized and maintained custom information space containing only relevant marketing and sale's information drawn from both inside and outside of the company. From this common base of information, organizations can provide all of their key sales and marketing constituents with quick access to the information. that drives sales execution.
  • the preferred embodiment of the invention provides a hosted Internet e- service, which comprises a sales and marketing information exchange that equips direct sales reps, telesales, indirect channels, and channel partners with the precise information needed to develop and close deals.
  • the information exchange organizes information for search, navigation, and delivery to a variety of audiences from a common base.
  • Such information exchange is devoted to sales and marketing information to help sales pros and others in a company's sales channel zero in on the perfect information for each sales situation, drawing on sources inside and outside of the company. This ensures that the information accessible to these sales channel participants is accurate, relevant, and targeted.
  • Such information exchange also fosters creation of better sales and marketing information by supporting collaboration among those in the sales process and between sales and marketing.
  • Information exchanges in accordance with the invention are interactive, and employ collaboration, usage tracking, and other techniques to ensure that information is current and relevant.
  • sales and marketing such information exchange is a vital element of business-to- business Internet commerce in automating information exchange between trading partners, suppliers and customers, suppliers and partners, and other commerce arrangements.
  • the companies in a demand chain for example, each set up such information exchanges and automate the movement of new product and competitive information from primary manufacturer to value- added distributor to customer, with requirements following a return route.
  • the inventive information exchange enforces appropriate constraints on information flows.
  • the invention provides a sales and marketing information exchange tailored to individual companies and accessible as an Internet e-service.
  • the invention organizes and categorizes a company's sales and marketing content, integrates it with new information from third parties, and facilitates the exchange of information to motivate closes and generally shorten sales cycles.
  • the invention allows participants in direct sales, inside sales, partner sales, marketing, and other channel participants to reduce significantly the time they spend looking for information and, instead, focus on developing and closing the deal.
  • the invention promotes an efficient, productive relationship between marketing and sales and supports collaboration and teamwork among internal sales reps, sales and marketing, and between personnel and channel partners.
  • organizations can use the invention to promote teamwork with customers and prospects. -
  • the invention comprises a delivery platform employs three major components:
  • Context maps which allow users to find quickly the precise and relevant information needed to close deals.
  • the context maps are based on expertise in sales information organization, process, and provision.
  • Information resource tools which allow users to maintain and provide rich, easy to manage sales information exchanges.
  • Sales information channels which allow organizations to target multiple audiences from a single base of sales and marketing information. The result is strong sales partnerships and communities of interest.
  • Fig. 1 is a block diagram of a service bureau architecture according to the invention
  • FIGs. 2a and 2b are block diagrams that compare the state of the art (Fig. 2a) to an architecture according to the invention (Fig. 2b);
  • Fig. 3 is a diagram showing multidimensional navigation according to the invention.
  • Fig. 4 is a block diagram showing ice-enabled content exchange according to the invention.
  • Fig. 5 is a flow diagram of a system architecture according to the invention.
  • Fig. 6 is another block diagram of system architecture according to the invention.
  • Fig. 7 is a further block diagram of the system architecture according to the invention.
  • Fig. 8 is an example of an assisted publishing application according to the invention.
  • Fig. 9 is an example of an information structure according to the invention.
  • one aspect of the invention delivers highly targeted sales information E-services. These services are delivered as separately packaged modules that focus on a specific pain point and can be introduced individually or in combination with a phased implementation approach.
  • E-services are delivered as separately packaged modules that focus on a specific pain point and can be introduced individually or in combination with a phased implementation approach.
  • Context maps are the vehicle by which the invention embeds marketing/sales domain categories, as well as an understanding of existing business rules, job functions, and terminology for targeted vertical markets. By pre-populating context maps, the invention offers significant value before a customer enters any information about their specific business.
  • the invention's E-services are designed to support two groups of information users: producers and consumers. Although anyone can assume the role of a producer, a consumer, or both, the primary producers are product marketing and corporate marketing professionals. The primary consumers are sales professionals.
  • Figure 1 provides a summary view of the system and [Table 3] lists the major feature categories in the solution.
  • Table 1 Feature Categories
  • the invention provides the next level of expressiveness needed to solve the sales information glut while leveraging the strengths of RDMBS technology.
  • the key is to capture significantly more metadata — that is, information about content.
  • the description logics engine also known as the category engine, enables incredibly fast retrieval of useful sales information from a relational database.
  • Ontologies or roadmaps of sales information organize content in a format that mirrors the way sales professionals think, but is able to be processed directly by a computer.
  • ICE Information and Content Exchange
  • Figure 2 compares the design of the invention to existing Web-based database applications.
  • the category engine 20 implements mechanisms for enabling powerful, high- performance querying of content based upon resource description framework (RDF)-specified categories and attributes.
  • RDF resource description framework
  • Full-text indexing full-text queries have proven to be of limited use in the sales information arena.
  • Full-text indexing technologies are inherently statistical. That is, indexing engines guess what set of documents relate closest to the words the user entered to locate the documents of interest. The result is generally a very long list that the user must plod through looking for something that is truly relevant.
  • due to the nature of sales information coming up with a set of words that result in a list of relevant materials is, if not impossible, extremely hard and generally pushes both the technical abilities and patience of sales personnel.
  • Static hierarchies folder-like, hierarchical management structures have proven very popular in the Internet portal marketplace with such companies as Yahoo and Excite® Home. However, this structure again fails in the sales information context. While directories have proven adequate when the individual item cataloged is a Web site, they grow unwieldy when the cataloged item is a document, a section of a document, or a sound bite. When used to catalog tens of thousands of items, a directory must either have thousands of folders or have individual folders with hundreds of entries. Neither of these solutions is useful to the sales professional trying to identify the small set of documents that enable him to make the most compelling case to the prospect.
  • the invention provides virtually limitless navigation possibilities, allowing the user to select the navigation that coincides with his intuition in a specific sales scenario.
  • the use of context maps for capturing and describing sales information is a second technology competitive advantage.
  • the context maps use the World Wide Web Consortium's (W3C) RDF.
  • W3C metadata committee realized some time ago that simple HTML hyperlink technology, directories, and the full-text indexes of Internet portals are not adequate for locating information on the Web. To meet the challenge, they developed a description language with the goal of encouraging the development of new technologies for locating information.
  • RDF is based on XML, and together with extending HTML to provide a richer display model, is one of the primary reasons XML was developed.
  • RDF provides a language capable of describing content with terminology that:
  • RDF has found broad industry support.
  • Context maps, and their underlying RDF definitions, provide another key benefit of enabling application logic to be implemented outside of an RDBMS. This capability allows organizations to modify and extend their company-specific context maps using a visual configurator without having to change the underlying application.
  • the invention uses the Information and Content Exchange (ICE) protocol to collect and distribute information from various sources, both internal and external.
  • ICE Information and Content Exchange
  • an organization can use ICE to import content automatically from a document management system, such as Documentum.
  • an organization may use ICE to import content automatically from a news wire service such as PRNewswire or from Web sites using solutions, such as Vignette's StoryServer.
  • the invention supports distributing content via ICE to other ICE- enabled systems. This is used to publish content automatically to a Web site created using Vignette's StoryServer and to other content aggregators that support ICE.
  • Figure 4 shows the system communicating with servers inside and outside the enterprise.
  • the vendor 40 in the example uses ICE to import content automatically from a document management system into the system and to export information to the corporate Web server.
  • the distributor 41 uses their own server to aggregate information automatically from the vendor and one of their premier accounts 42.
  • ICE the vendor, the distributor, and key contacts at the customer can work collaboratively on sales opportunities and implementations.
  • the invention provides a high degree of integration with the office productivity tools used by sales and marketing professionals.
  • Microsoft has enabled the next level of integration with Web technologies. All applications in Office 2000 can use HTML-compatible XML as a first class file format. Files saved in this format retain fidelity whether viewed using a Web browser, in the application, or on the printed page.
  • Support for XML enables the next level of value-added browsing for users. They can now browse to individual components of a document — sections or chapters — all with the same seamless interface. Similarly, metadata such as title and author represented in the XML metadata are automatically extracted, making submission and maintenance of content painless.
  • Microsoft also provides better integration with Web servers in Office 2000. Users can both save files to Web servers and organize their work using Web Folders.
  • the invention's integration uses these same technologies and provide the next level in functionality.
  • the invention uses these same technologies and provide the next level in functionality.
  • the invention uses the Internet Engineering Task Force (lETF)'s Web Distributed Authoring and Versioning (WebDAV) protocol, the invention makes adding and maintaining new content, and managing folders and categories, all possible without leaving the familiar desktop environment.
  • LETF Internet Engineering Task Force
  • WebDAV Web Distributed Authoring and Versioning
  • the discussion below (See Figure 1 ) primarily reflects the requirement that the invention be extensible. It is not possible to anticipate all the content types, content sources, and delivery mechanisms that are required. Furthermore, it is not entirely possible to anticipate the demands that future knowledge management tools and processes will place on the infrastructure. The discussion below concentrates on a modularization of the architecture such that significant extension is feasible without changes to large portions of the infrastructure.
  • the architecture core 100 represents the common functionality required of any document management architecture.
  • the I/O interfaces block contains the drivers for the different interfaces 101 which are used to access the system.
  • the extensions block 102 contains those interfaces which provide value-added functionality, for example, search and navigation tools. Note that this architecture is not intended to constrain the space of possible solutions built from commercial tools.
  • the architecture core represents the common functionality of any document management architecture.
  • This core generally consists of a relational database management system (RDBMS) 103 upon which are built application-specific tools for document management.
  • RDBMS relational database management system
  • the heart of the system is an RDBMS which is responsible for storing all content data along with the metadata attributes which are used to organize the content.
  • the RDBMS may also store the applications that are used to navigate the content. This depends on the characteristics of the chosen commercial components. Use of a commercial RDBMS potentially simplifies administrative and operations tasks significantly.
  • the leading databases include integral support for on-line backups and mirroring. They are also potentially highly scalable, capable of using multiprocessors to reduce response time in a heavily loaded environment.
  • the leading database vendors, Oracle, Informix, and Sybase are candidates for an RDBMS.
  • a document store 104 adds to the basic features of an RDBMS that functionality which is specific to document management. These features include, at a minimum, authoring support for entering content into the store, mechanisms for fetching content from the store, mechanisms for revision control, mechanisms for specifying and enforcing access control, and audit tools for extracting information about the content store.
  • the two primary goals for the document store are high functionality and high extensibility.
  • This session daemon 105 should also be capable of inter-operating with a user preference manager to communicate configuration information between sessions.
  • Input/output interfaces 101 interface the architecture core to consumers and other entities in the outside world.
  • http 107 serves as the primary method of accessing the system.
  • the delivery platform Fig. 5 is based on three components:
  • Context maps 50 that define, categorize and organize the information
  • Sales information channels 52 that support the delivery of information to multiple channels.
  • the information space contains a range of unstructured content, e.g. documents, sales reports, presentations, case studies, and competitive analyses drawn from both internal and third party sources.
  • the invention uses context maps to organize this information space and facilitate access to it.
  • the context maps are key to the user's ability to access precise and relevant information for each sales opportunity.
  • the maps are a dynamic framework for sales and marketing information exchange. (See Figure 3, discussed above.)
  • the context maps perform three functions:
  • Context maps organize and facilitate access to a wide range of unstructured information.
  • the technology is a unique way of describing and classifying information that lets users access content the way they think about it.
  • the technology also zeros in quickly on precisely the right item from thousands of items.
  • the invention provides processes and tools to load, maintain, and use the information managed by the context maps. These include tools to aid in the extraction of information about the structure of new content and load it into the context maps.
  • the invention comprises a content metadata extraction process for existing documents. Clients use these tools to load their marketing and sales content into the system, and then enhance and expand the metadata for that content over time. This process provides the basic information needed to populate the context maps and allows clients to continue to use existing document tools and file formats.
  • the preferred embodiment of the invention uses Web metadata standards, placing it in alignment with XML, Internet Content Exchange (ICE), and Resource Description Framework (RDF).
  • the invention also provides other tools and processes to help its customers build and maintain their sales information spaces as well.
  • the invention provides specialized documents that improve information capture and exchange for sales channels.
  • One of these is a format for presenting quick bullet-point conclusions about an event or an issue.
  • the intent of such documents is to help ensure that information is presented in the way that sales people think.
  • An effective sales information exchange depends both on the quality and relevance of the information a system contains and on processes to support constant improvement and evolution of the information.
  • the invention tracks the age, version, and usage of information in the sales information space, ensuring that information is current, correct, and relevant.
  • the invention incorporates collaboration features, making it easy for users to provide feedback to content authors through voting and direct comment.
  • the invention also incorporates discussion threads about a particular document or topic. Sales teams can easily use this feature to set up information spaces that make information easy to share and discuss.
  • the invention catalogs and tracks all of this feedback and discussion to preserve the full context surrounding issues and documents.
  • Individual users drive the content delivered by sales information channels using information consumer tools designed to satisfy the wide variety of needs in the market.
  • a sales wizard helps the sales pro decide how best to respond to a prospect or customer situation.
  • the sales wizard asks the sales rep three questions - who is the competition? where are you in the sales cycle?, and what industry are you targeting? - and uses the answers to locate the best information available in the context maps.
  • the front-end interface enables the user to easily search and navigate the context maps. Sales pros, channel partners, and prospects can either search the context maps or navigate through them. Both models allow users to obtain the precise information quickly and close business.
  • the invention next addresses the delivery of information relevant for direct sales reps, business partners, and ultimately prospects and customers themselves.
  • the invention provides sales information channels tailored the specific needs of different audiences, individuals, and communities of interest.
  • the invention supports the delivery of information to multiple audiences from the single sales information space.
  • This architecture makes it possible for companies to provide the latest sales information to channel partners while still protecting their internal systems.
  • Information delivery semantics are also part of the context maps, providing a flexible mechanism for targeting sales information to audiences.
  • the invention also sets up, with the customer, security permissions to ensure that different audiences can see and access only the information relevant for their needs.
  • Direct sales reps for example, may have access to more information than channel partners and prospects.
  • the result is customized information channels for several audiences driven from a common information space, an efficient approach to information delivery.
  • the preferred embodiment of the invention is implemented by an e-services provider, delivering a comprehensive, Web-based application that is hosted, and therefore, virtually risk free for clients.
  • Companies sign up for subscriptions to the invention, specifying the number of people who have access to the application via the Internet.
  • Customers can use the solution without any additional technology infrastructure investments, and can buy as many subscriptions as they need over time to satisfy the demands of their sales channels and customers.
  • Working with the sales information framework provided by the category maps, the e-services provider and the customer define a custom sales information space that the e-services provider then hosts for that customer.
  • the invention provides a startup process that begins generating value for sales channels within thirty days, and ensures continual expansion and improvement of the customer's sales information space afterward. This approach provides four benefits to clients: •
  • Context maps define a common vocabulary for sales and marketing information, as well as a flexible scheme for organizing access and provision of that information.
  • the invention provides a combination of categories that are optimized for information access and exchange and dynamic relationships yields an information space that reflects the real meaning of content, and uses that understanding of meaning to aid access and exchange.
  • the information space is a controlled collection of information that can exist outside of the corporate firewalls. Companies can draw on the same base of information for their internal sales personnel and their channel partners without placing the primary data stored in customer relation management (CRM), help desk, and accounting systems at risk.
  • CRM customer relation management
  • the invention provides a new category of solution that complements CRM, sales force automation (SFA), marketing encyclopedias, and other earlier- generation products. These systems are designed to manage data about, e.g. customers, accounts, opportunities, and demographic trends and to manage sales processes involving that data.
  • SFA sales force automation
  • the invention makes these data management products more useful to the field by categorizing their output of reports and other documents and hooking them to sales information channels.
  • the sales cycle may be thought of as comprising three segments: Pre-sales, Closing the Sale, and Post Sale.
  • Current CRM and SFA solutions are designed to manage structured information, such as data records, critical to the selling process, including contacts, accounts, and opportunities. Organizations primarily use this information to manage leads, pipelines, and campaigns, as well as for forecasting and analysis of sales force performance.
  • CRM and SFA systems have proven to be ineffective for the management of documents and other unstructured information -- the kind of information that is crucial to competitive selling and closing the sale.
  • CRM systems provide marketing encyclopedias for this purpose, but these modules quickly fall victim to the issues that doom file systems to failure as the basis for sales and marketing information exchanges.
  • the invention provides precise and relevant information to close business, which is a phase of the sales cycle that CRM/SFA systems do not address.
  • the invention enables close communication between the field and marketing, directly improving return on investment on marketing investments.
  • the invention addresses the sales information problem directly by managing the unstructured information that is crucial to closing deals.
  • the invention is designed to draw information from CRM, SFA, marketing encyclopedia, and other sources into its context maps for the sole purpose of information exchange.
  • the invention extends the value of investments in CRM, SFA, and other sales and marketing data management systems by providing them with an effective medium for exchange and distribution.
  • the preferred embodiment of the invention is an Internet hosted application, such that users do not see the technology behind the solution. Users see information access and collaboration tools, category maps, and results.
  • the invention adds semantic analysis to Web information searching to improve the relevance of information searching and navigation.
  • the invention provides a semantic Web for sales and marketing information which makes extensive use of document metadata, description logics, cases, categories, and related techniques to make searching operations much more precise than they are today, and to automate information exchange applications.
  • the invention is based on two underlying technologies: description logics and the category engine.
  • the description logics technology classifies content using a set of categories and relationships about a particular domain.
  • the category engine is server-based technology that enables high-performance querying of content.
  • Description logics classify elements for the purpose of reasoning about those elements. Description logics employ a common vocabulary to express the meaning, purpose, and relationships of elements and a small number of operations to reason about those elements.
  • the context maps provides a shared vocabulary about sales and marketing.
  • DL defines information categories, relationships between information, and operations. The invention supports a variety of relationships, including class-subclass, category membership, product-company, and competitive , relationships.
  • the DL's operations address query, navigation, and exchange.
  • the DLs are a subset of ontology technology, which has been used in knowledge management systems.
  • the inventive DL is optimized for information exchange, rather than for a broader knowledge-representation and reasoning purpose.
  • the invention can provide both good performance and flexibility.
  • the design center for the solution was a system with a relatively small number of categories and a large number of instances.
  • the operations that the DL supports are limited to information navigation and query.
  • the invention is not designed to define a broad range of knowledge and, through analysis, interpret and extrapolate from that base of knowledge.
  • the invention uses DL technology for a narrower, more practical purpose: query and navigate a lot of information fast for the purposes of exchange.
  • Each set of context maps is uniquely tailored to an individual company.
  • the context maps are implemented in a framework.
  • a context framework which employs four levels of categories for sales and marketing information. At the base are foundation concepts about digital information, at the next level are concepts about commerce activities.
  • An Industry-Specific level contains categories used by specific industries, and a Customer-Specific level contains the customer's specific sales and marketing vocabulary. The categories build on one another, which reduces the number of categories required at the higher and more customer-specific levels.
  • the context framework is the basis for context maps.
  • the framework defines all of the terms for a customer's context maps, including company-specific information.
  • the framework design makes it practical for the invention to customize the context maps to individual customers by isolating changeable categories in the two high levels of the scheme (industry- and customer- specific.)
  • the context maps are navigated with a server called a category engine (See Figures 6 and 7), which runs at the e-service provider site.
  • the category engine is the access and search front-end to the collection of metadata about the content under management by the application.
  • the invention stores this metadata inside a relational DBMS 103.
  • Actual content is stored in file systems, document management systems, content management systems, streaming servers, and other systems for managing unstructured data.
  • the category engine handles the query, redirection, and routing operations.
  • the RDBMS At the core of the content management architecture lie four modules: the RDBMS, the document store, the session daemon, and the event daemon
  • the heart of the CMA is a relational database management system (RDBMS) which maintains the metadata, and — depending on implementation strategy — may also store content. Owing to the built-in support for backups and replication, the RDBMS is a convenient place to store configuration information and applications. The use of the term RDBMS in this report is not intended to exclude other applicable database technologies, but rather to distinguish this database from the document store.
  • RDBMS relational database management system
  • the RDBMS supports the chosen document store.
  • the RDBMS may run on any available platforms, e.g. HP PA-RISC or Intel x86 based servers.
  • the RDBMS is scalable, allowing hardware to be added to provide acceptable performance as datasets and consumer base grow. Scalability support includes using multiprocessor servers.
  • the RDBMS allows the data to be replicated on other servers to meet performance and fault tolerance requirements. Mirror servers should perform incremental, low-latency update against a master.
  • the RDBMS allows subsets of the data to be replicated. The selection of items to be included in the replicated subset is based on specified metadata conditions.
  • Administration is possible from any available platforms. Remote administration via a TCP/IP network is supported. Administration should not require the use of X-windows or other graphic interfaces.
  • the RDBMS supports multiple levels of security, including user-based security. Data and meta-data supports security level specification and the RDBMS enforces security constraints based on these specifications.
  • the RDBMS allows a nominal degree of functionality without requiring individual user identification.
  • the RDBMS will be required to store the bulk content of the system. This may be via Binary Large Objects (BLOBs) and/or extensible data types.
  • BLOBs Binary Large Objects
  • the RDBMS stores attribute data associated with each item of content. Attribute types should include integers, fixed-length strings, dates, and sets of these basic types. The names and types of attribute data are configurable and extensible, at least on a per-document-type basis.
  • the RDBMS provides an API for accepting metadata specifications from applications, such as keyword indexing programs. Tables
  • the RDBMS presents a complete RDBMS interface that can be used to store other data, for examplea solutions catalog.
  • the RDBMS supports a query language based on SQL.
  • Support includes embedded SQL, a store procedure language, and a subroutine library interface.
  • the interface allows applications to perform both data manipulation and administrative operations.
  • the interface is secure and requires authentication from the application.
  • the API supports C and an interpreted language, such as perl.
  • Applications are executable on either the server or on a workstation or PC client connected to the server via a TCP/IP network.
  • the RDBMS supports a trigger mechanism for executing code when certain events and conditions occur within the database.
  • the RDBMS is accessible from code linked into an HTTP server such as the Netscape Commerce Server. This includes support for multiple concurrent accesses.
  • the document store adds features related to content management and is implemented as a layer on top of the RDBMS.
  • the key features of the document store are revision control, access control, support for multiple renditions, and support for compound and structured documents.
  • the requirements of the document store may be fulfilled completely by the RDBMS.
  • the content store contributes a value-added interface to the underlying RDBMS functionality.
  • Rendition refers to either the format, e.g. AmiPro/WordPro and HTML, or the language, e.g. English and Japanese, in which an item of content can be represented.
  • the document store stores any type of rendition. It is possible to add new rendition types at any time.
  • the document store manages multiple renditions of each document. Operations on the document store treat multiple renditions of the document as a single document, and not as separate individual documents. Rendition support includes manual, automatic, and on- demand rendition generation.
  • the document store component does not include particular rendition converters.
  • the document store supports an API flexible enough to support a wide range of rendition converters from multiple vendors.
  • the content store preferably supports the following formats: HTML, SGML, AmiPro/WordPro, Freelance. 1 -2-3, PDF, Envoy, ASCII Text, an audio type, and a video type.
  • the content supports the following languages: English and Japanese
  • the document store allows multiple document elements to be grouped together and managed as a unit. All routine content operations, including but not limited to viewing, printing, and downloading are performed correctly using the same user interface for all simple and compound documents. Elements are included in more than one group.
  • the extension API allows extensions to sequence through and individually access the elements of a compound document.
  • the interface API allows compound documents to be delivered as a single unit.
  • Compound document support is implemented via a proprietary interface, but must also include support for common compound document standards such as Microsoft's OLE.
  • Compound document support includes support for documents where the individual document elements have differing document sensitivities.
  • the document store When structure, e.g. chapters and sections, is explicitly present in a document the document store preserves and uses this structure.
  • An API is provided which allows extensions to sequence through and individually access the structure elements of a document.
  • the document store supports at least the following structures: document parts, such as sections of an article and chapters of a manual; slides in a slide presentation; articles in a newsletter; and pages in print-ready material.
  • the document store supports a mixture of structured and unstructured renditions for the same documents and should support renditions with different structures for the same document.
  • Structured document support includes support for documents where the individual compo-nents have differing sensitivities.
  • the content store allows authors to submit references to materials that are not available on-line.
  • the off-line content type is capable of specifying content metadata, e.g. keywords, from the full item.
  • the document store supports on- line ordering of the item via an extension.
  • the content store stores metadata, such as document attributes, for each item of content. Attributes include information such as author and date of last change. Metadata is extensible. It is possible with minimal effort to add new metadata fields to the content store. Conversion Execution
  • the content store allows content to be filtered between extraction from the store and delivery to an interface application.
  • filter extensions are hyperlink recognizers for HTML and search term highlighters for renditions supporting highlights.
  • the document store provides a query language which allows selection of documents based on combinations of attribute values and extension data, e.g. a full-text search engine.
  • the query language is compatible with SQL.
  • the document store maintains multiple revisions of each document. Operations on the document store treat multiple revisions of the document as a single document, unless specific indication is given to the contrary. Multiple revisions are not treated as separate individual documents.
  • Revision handles revisions in both content and metadata.
  • the content store determines the differences between revisions of a content item. Revision 'diffs' are viewable by consumers.
  • the content store provides workflow processes to support the production and authoring extensions.
  • the document store workflow supports a production submission queue.
  • the production submission queue accepts documents from authors and maintains them until the production group (SFC personnel and contractors) validate and approve the content. Content in the submission queue is not be visible to anyone outside of SFC.
  • Unreleased content is content which has been approved by production personnel but which represents time sensitive material that must be released in synchronization with other events, e.g. a product roll out. Unreleased content is not viewable by general users. It should be viewable by SFC personnel, the content author, and other identified individuals. Content release should be configurable. For example, it is possible to release product information to the field before a product introduction date, while not releasing that information to channel partners.
  • the document store provides a means of synchronizing or cross-indexing the elements of a compound document to provide multimedia delivery.
  • Each document in the store is associated with a unique ID which can be used to refer to the document. All renditions and revisions of a document must share a single ID. The ID for a document does not change when a document is revised or when the document store is reorganized.
  • the content store When receiving a request, the content store receives information about the capabilities of the delivery channel. Using this information and user profiles, the content store automatically selects the most appropriate rendition to deliver to the consumer. Selection includes at least the following criteria: client hardware capabilities (graphics resolution, sound hardware); client software capabilities (installed viewers and applications); connection bandwidth; language preferences, content use (whether the request was for editing or viewing); and explicit indication by the consumer
  • the content store maintains user profiles.
  • Profiles include a set of attributes of different types that are interpreted by the content store and extensions.
  • the data store in a profile are updateable. It is possible to add new attribute types to profiles.
  • the document store performs user authorization on all document accesses. A different set of authorized users is maintained for each document. Security flows through to all navigation and searching processes. Users who are not allowed access to a document are not presented with the title of the document during navigation or search. There is no indication that such documents exist. Users are placed in groups so that authorization is extended to an entire group. Operations on a document are only allowed when performed by users authorized to perform the operation on that document. Such operations include modification of the content and metadata, removal of the document, and modification of the authorization lists for the document.
  • the content store provides an API used by interfaces to communicate with the content store.
  • the API supports HTTP, SMTP, and fax, but also extensible to custom protocols, for example an audio telephone delivery mechanism.
  • the interface API supports navigating content and administrative tasks.
  • the interface API allows requests from one interface to result in content delivery via another interface, e.g. request via HTTP with delivery via e-mail.
  • the document store supports an application programming interface that allows applications to perform data manipulation and administrative operations.
  • the interface is secure and requires authentication from the application.
  • the API supports C and an interpreted language, such as perl.
  • Applications are executable on either the server or on a workstation or PC client connected to the server via a TCP/IP network.
  • the extension API allows some extensions to play the role of content. They have document ID's, keywords, and are full-text searchable. Properties, such as document ID, are assigned by the content store. Properties, such as item contents, in the sense of full text search, are delegated to the extension for evaluation. When an extension produces results in a particular rendering, the content store provides the standard rendering conversion operations where required.
  • the content store is accessible from code linked into an HTTP server, such as, Netscape Commerce Server. This includes support for multiple concurrent accesses.
  • Session Daemon maintains information about each session which interacts with the CMA. CMA clients are not required to maintain session state. Such state is maintained by the CMA core.
  • the session daemon stores the state of a user while he navigates the content store.
  • the session daemon links individual requests to session data using a magic cookie such as that provided by Netscape Navigator. It is possible that the session daemon is an integral part of one of the other components, for example the RDBMS or the content store.
  • the event daemon allows actions to be bound to specific events, including new content submissions, feedback messages, change in content or metadata, administrative action, and time of day. Actions are specifiable through an API, allowing the development of new functions which respond to events. The following actions are supported: deliver a notification, alter metadata, begin an external process, and remove content. As with the session daemon, it is possible that the event daemon is implemented as a part of another component such as the RDBMS. Any implementation is as a sufficiently flexible API to enable the ease of event-driven extensions.
  • Interfaces translate users' requests into the low-level protocol of the CMA core and format responses for delivery to the user.
  • the most common interface is HTTP, the protocol of the WWW.
  • Other interfaces are electronic mail and fax. All interfaces must abide by a standard content store API.
  • Components of the standard interface are channel properties. Each interface communicates to the content store the properties of the communication channel they implement. Channel properties include connection bandwidth and latency. Viewer Capabilities
  • Each interface communicates to the content store the properties of the client used. These properties include multimedia capabilities (graphics, sound) and document format capabilities (AmiPro/WordPro, Envoy).
  • Interfaces delivers compound documents and collections of documents.
  • Selection request channels redirect response to another channel. For example, it is possible to make a request via e-mail and receive the content via fax.
  • Each interface provides a mechanism for performing authentication on the source of the request.
  • Each interface provides an indication of the security of the channel with respect to issues such as eavesdropping and data interception.
  • the difficulty of implementing privacy in e-mail and fax delivery requires that some content be excludable from these delivery mechanisms.
  • the HTTP interface is the primary interface to ESP. It provides each of the general requirements listed above.
  • the HTTP interface consists of two parts: the HTTP sever and the HTTP client.
  • the HTTP server provides an API by which the HTTP interface is tightly integrated into the content store and RDBMS.
  • the HTTP server responds quickly to requests.
  • individual content requests should not require forking in response to each content request if this creates performance problems; and should not require new connections between the HTTP server and the content store, between the HTTP server and the database, or between the content store and the database if any of these operations limit performance.
  • the HTTP interface allows documents to be processed without requiring they be viewed first.
  • documents and sets of documents are downloadable and/or printable without viewing them first.
  • the HTTP interface supports multimedia types, including at least one audio type and one video type. These types may be supported by HTML browser pluggins.
  • the client is externally specified to be Netscape Navigator 2.0 and the server to be Netscape Commerce Server or Netscape Communications Server.
  • the CMA provides an e-mail interface for users who have electronic mail capability, but do not have the TCP/IP connectivity required for HTTP access.
  • the e-mail interface approximates the HTTP interface as closely as possible.
  • Navigation through the e-mail interface takes place via forms, either textual or in a format suitable for processing by a client application, such as Lotus Forms.
  • Delivery of content via e-mail uses MIME types to support compound and non-text content.
  • the CMA provides delivery via e-mail, even if it does not accept requests via e-mail.
  • the CMA provides a fax interface for delivery and supports a fax-back interface (request by phone, delivery by fax).
  • the CMA provides delivery via fax, even if it does not accept requests by phone.
  • the fax-back interface approximates the HTTP interface as closely as possible.
  • An interface to audio content via telephone is required.
  • This interface allows consumers to select audio renditions of content via telephone.
  • This interface is used to access non-audio content if a text-to-speech rendition converter is acquired.
  • the primary application for the telephone interface is to provide a value-added voice mail distribution mechanism. Combined with user profiling and other features of the system, it allows voice updates and urgent messages to be distributed worldwide with less overhead than individual voice mail implementations.
  • Extensions are applications external to the CMA core which implement additional document management features. Extensions generally interact only with the core, not directly with each other. There is little impact on the remainder of the system when an extension is added, removed, or replaced. Extensions use the content store extension API to communicate with the CMA core.
  • Extensions can be broadly classified as content-like and non-content-like.
  • Content-like extensions appear to most components as a normal item of content but do not actually store content. They generally create content by accessing an external content source or by analyzing other data in the content store and RDBMS. Non-content-like extensions are generally administrative applications such an RDBMS management tool. They may not show up as normal content. The most significant aspect of content-like extensions is that they are treated as content by the document store.
  • Content extensions have keyword values and other content metadata. Attributes values such as keywords are determined by the extension and communicated through the extension API. Content-like extensions are indexed for keyword search and produce results in a standard rendition format. User access to individual extensions are controlled by the same access control mechanisms used for plain content.
  • the navigator extension is the primary interface used by consumers to access the content store. It provides the functionality necessary to browse and search the content store.
  • the navigator is constructed modularly.
  • the selection mechanism is composable. It allows multiple selection modules to be combined. For example, the navigator allows a combination of metadata- based selection and keyword-based selection. Modules communicate with each other via the navigator session API.
  • the metadata and search navigation modules are required.
  • the navigator allows the related content module and other custom modules to be added at a later time.
  • the navigator defines an API that allows navigation modules to create, modify, delete, and examine the set of constraints that a consumer has selected during navigation. This API is used, for example, to communicate between the metadata navigation module and the search navigation module.
  • Metadata Navigation Module The navigator allows consumers to navigate the repository based on information represented in the metadata of the repository. For example, users can select only datasheet items or only items relating to a particular industry. Metadata navigation is incremental. For example, the user can select only datasheets, and then to narrow the list of datasheets to only those relating to a particular product line.
  • the metadata schema of the content repository is not hard coded in the metadata navigation module. The module allows the specification of pertinent metadata fields via tables in the RDBMS.
  • the navigator provides a full-text search component. This component communicates via an API to the search engine to select content which matches a consumer specification.
  • the related content module updates the session criteria to include content based on all or some of the attributes of the current document. Keyword indexes are used to find relevant relationships between documents. Documents are related if they are classified in similar ways.
  • the navigator generates a standard rendition, either HTML or a rendition which can be converted accurately and quickly to HTML.
  • the navigator supports full functionality over an HTTP interface. It supports at least partial functionality over e-mail and fax-back interfaces. The verbosity and depth of navigational menus is adjusted according to interface type, display capabilities, and user preferences.
  • the navigator provides a mechanism by which consumers indicate their preferences for navigation order, renditions, and searches. This profile is customizable on a session basis and on a permanent basis.
  • a full text search extension provides a mechanism to index all content in the repository. It provides an API to the navigation module which allows queries to be composed and matching item references to be returned.
  • the search extension provides a query API that can take a keyword expression and create a list of references to matching content items.
  • the search extension provides an indexing API which includes references to dynamic content generated by CMA extensions. For example, navigational pages are included in keyword searches even though they are generated dynamically.
  • Taxonomic Specification The search extension provides for the specification of a concept taxonomy that can be created and extended to represent company terminology.
  • the taxonomic specification supports multiple languages.
  • the search extension supports searching structured documents. It has a mechanism for identifying where in a structured document a hit occurs.
  • the search extension provides a mechanism for ranking hits. This mechanism tunes the scoring using system domain knowledge.
  • the scoring mechanism including tuning hooks, supports structured content. Users can influence scoring by indicating groups of items of high or low interest.
  • the search extension supports multiple renditions.
  • Multiple rendition structures are supported, e.g, structured references refer to sections within an HTML document and also to pages within a portable document.
  • the search extension provides a mechanism for filtering content on-the-fly to provide match cuing. For example, a visual cue such as a font change is provided to indicate instances of a keyword match. Visual cuing is provided for as many renditions as possibly but must at least include HTML.
  • the search extension differentiates rendition content by language and uses the appropriate processing for non-English content.
  • the search module does not confuse English words with non-English words and visa versa.
  • the search extension provides a content administration API to manage indexes when an item of content is added, removed, or updated within the content repository.
  • the search module allows indexes of subsets of documents to be created and combined.
  • the search extension provides indexing of multiple revisions of an item of content but only returns outdated content when specifically requested.
  • Rendition converters and filters are used to add value by either converting an item to a more suitable rendition or by annotating a rendition to represent additional information.
  • Some CMA tasks have a preferred rendition.
  • HTML is generally the preferred rendition for viewing documents via Netscape Navigator.
  • the CMA converts content from the submitted rendition to the desired rendition. All such conversions, as much as possible, maintain any formatting present in the original.
  • the fundamental formats used by the CMA are HTML for online viewing and searching, a portable document format such as PDF or Envoy for page preview and printing, and TIFF for faxing
  • Converters are programmable so that their operation can be tuned for highly valued content such as the most frequently accessed content in the system.
  • Conversion to and from SGML is desirable but is not mandatory. Where a converter recognizes structure in a document, it is able to generate SGML representing that structure. Converters perform conversion between two formats using SGML as an intermediary if significant fidelity is not lost. For example, conversion from AmiPro/WordPro to HTML via SGML is acceptable if there is less loss of information between AmiPro/WordPro and SGML than there is between AmiPro/WordPro and HTML. However, conversion from AmiPro/WordPro to Envoy via SGML is generally not acceptable because Envoy expresses the visual characteristics of an AmiPro/WordPro document much more accurately than most SGML document type definitions.
  • x-to-HTML Conversion filters convert submitted or intermediate types to HTML. Any formatting instructions in the source file are mapped to the corresponding HTML markup, if available. Formatting features which are rendered appropriately in the resultant HTML include: section titles, paragraph breaks, and bulleted and enumerated lists. Sectioned documents are split into several HTML pages. Indexes and tables of contents in the source file are converted to hyperlinked lists in the produced HTML. Intra-document references are converted to hyperlinks.
  • HTML markup When plain text is converted to HTML, standard textual formatting conventions are recognized and converted to HTML markup, including the use of indentation or blank lines to indicate paragraph breaks; the use of underscores and backspaces, or a row of graphic characters beneath the text, to indicate emphasis or titles; the use of stars, dashes, and other graphic characters as bullets in lists; and a row of dashes or other graphic characters as a horizontal rule.
  • a conversion path is provided from every submitted type to a portable document, format such as Envoy or PDF. Pages in the resulting format are scalable to both US letter and metric A4 paper sizes.
  • a conversion path is provided from every submitted type to TIFF.
  • the result of this conversion is similar in appearance to the portable document image but removes formatting features which are not appropriate for fax delivery, e.g. a gray background.
  • TIFF-to- x A conversion path from TIFF scanned images to other content types is required. This mechanism produces results of sufficient clarity to read. Optical character recognition is an option.
  • Rendition annotators filter a document rendition, producing a new rendition of the document with one or more value-added features.
  • the information used to guide annotation is generated from the document itself. For example, when recognizing URLs that do not have the necessary markup to make them live in HTML or PDF. More may come from an external source, for example, highlighting matches under the direction of the full text search extension.
  • the hyperlink markup extension scans renditions capable of specifying hyperlinks, such as HTML, and adds the markup to URLs if it does not exist.
  • URL and other hyperlinks are specifiable by an extensible set of pattern matching rules.
  • the inter-content markup extension annotates other types of hypermedia so that references to other content in the repository become hyperlinks to the associated content.
  • the extension provides a mechanism for identifying potential links. This matching is customizable.
  • An audit allows the addition of annotations to be added to all content to facilitate the collection of data regarding the value and classification of content and the overall operation of the CMA.
  • An example of this class of extension is an HTML annotator that adds headers and forms to every page to allow the consumer to indicate the usefulness of the content returned.
  • External content sources are extensions which bring content into the CMA.
  • External content sources are integrated via one of two APIs.
  • the pull API is used for external sources which only produce content when asked by a consumer.
  • the push API is used by content sources that provide content streams which are fed directly into the repository. Extensions that provide access to content that has not been validated give a visual cue to this fact. The cue indicates the confidence of the source, e.g. high for a marketing research company, medium for unaudited intranet websites, and low for content retrieved from the Internet.
  • Extensions allow external content to be entered into the system but must also provide access without depositing new content into the content store. All external content sources generates valid and verifiable metadata.
  • This extension is implemented as a native Windows application.
  • the interface on this extension should be rich, even if this requires substantial training.
  • An extension application is provided which allows content authors to submit content directly into the content store. Content so submitted is given provisional status until processed by knowledge management.
  • the online submission process allows submission of document metadata along with the content. It supports compound documents.
  • An extension application is provided which allows consumers to create limited forms of content. All consumer authoring capabilities are completely accessible via HTTP. It is desirable to use an extended web client to perform consumer authoring.
  • An extension is provided to allow users to submit limited textual content related to existing content. Content so submitted is automatically entered into the content store. Author, date, keyword index, and related document metadata are automatically generated. Annotation authoring includes adding 'post-it' style notes to portable document formats such as Envoy or PDF.
  • An extension is provided to allow users to build collections of documents for personal reference.
  • the personal collection extension gives users a means to allow or disallow access to their collections by other users.
  • the consumer authoring extension allows users to create small documents, such as sales success stories.
  • the critical aspect of this function is ease of use.
  • a simple HTML authoring tool is integrated into a client, for example Netscape Navigator Gold.
  • Discussion Groups An extension implements threaded discussion groups similar to Usenet news groups.
  • the mechanism stores discussion group submission as content in the repository and makes it fully accessible via navigator and search extensions.
  • the discussion group interface has a well developed HTTP interface. It is also desirable that the extension allow participation via electronic mail.
  • External sites are classifiable. External web sites in the following are included: Customer Web Sites; Channel Partner Web Sites; and Competitor Web Sites.
  • a database API is provided which allows raw database tables to be maintained within the database.
  • the API allows standard SQL database applications to be accessible via the system.
  • the SQL is upwardly compatible with either standard SQL (SQL92 or later) or with the RDBMS vendor's SQL variant.
  • the database API is compatible with the RDBMS vendors tools, including SQL interpreters, embedded SQL tools, and C language library bindings.
  • the extension is flexible enough to provide the basis for a new version of the solutions catalog database browser, as well as a reference accounts database.
  • Pull API The pull API is used to communicate a query from the content repository, and to retrieve the results.
  • Query parameters specify the content to be selected and the acceptable formats that can be returned.
  • the extension returns the requested content in an acceptable rendition.
  • Anticipated extension using the pull API are externally accessible databases such as the Order Fulfillment Initiative, and Dunn and Bradstreet reports.
  • the push API is used to communicate with content streams. Communication is instigated by the extension, not by the content store.
  • the primary push API extensions are the authoring extensions used to submit content. While the product authoring extension may not use the push API because it is integrated with the content store, the push API extension is required to provide simplified submission by non-SFC personnel.
  • Other push API extensions are Usenet news feeds, wire services such as NewsEdge, marketing research feeds from sources such as the Gartner Group and Dunn and Bradstreet, and electronic versions of trade publications.
  • Metadata often occurs in the form of specially formatted headers which precede the content in newswire feeds, Usenet feeds, electronic mail messages, and reports delivered as text.
  • an extension which validates the format of the content.
  • the rules applied by a validator are configurable. Each validator at least checks that the content is in the format in which the metadata claims it to be. Validators check that any intra-document or inter- document references are valid. Validators also check for conformance to standard syntax, especially when authoring tools allow generation of non- conforming documents, e.g. HTML. Validators also check for conformance to submission guidelines. Validators that can check format and links are required for HTML and possibly SGML. Validators that can validate links are required for the chosen portable document format, either Envoy or PDF. Validators that can at least determine rendition type are required for all other types, including AmiPro, Freelance, and ASCII text.
  • An extension is provided which automatically selects new or recently modified documents from the document store, composes a collection of new documents for each user, and delivers these collections to the users.
  • the clipping filter considers the relevance of each document and the significance of the change to the document. The relevance of a particular document varies from user to user.
  • a user profile contains sets of attribute values and keywords to describe the user's interests. It may also contain explicit search criteria, unique document identifiers, and identifiers of documents for which similar documents have been requested. Content may carry an attribute which forces distribution to some or all users. A new document which matches the relevance criteria is always delivered to the user. Modified documents are only delivered if the modification significantly impacts the content.
  • the clipping filter includes an automatic means of excluding insignificant modifications.
  • the knowledge management extension is a set of tools used by a knowledge manager and content development specialists to collect and analyze data concerning items in the content store.
  • the knowledge management extension provides a basic set of reports and provide an API or scripting language.
  • a report generator which summarizes system usage. User requests for all resources is included in the report, including access to the content store and all extensions.
  • the usage report generator is configurable, allowing requests to be selected, sorted, and grouped according the following parameters: document metadata, including type, format, date, author, keyword classification; user profile data, including user type, language, and geographical location; and additional parameters used when invoking extensions, such as the scope of a navigator request.
  • a report generator is provided which characterizes the content in the CMA. This report generation capability is highly configurable, allowing results to be summarized by document metadata, including metadata extracted from the content.
  • a report generator which compiles survey results.
  • the survey report generator is configurable, allowing results to be selected, sorted, and grouped by responses to survey questions and by the itemized request parameters.
  • Analysis API/Scripting Language Analysis tools in the form of an API or scripting language allow new reports to be generated from the state of the content store and the access patterns of consumers. The tools allow the generation of adhoc analyses when addressing any problems of accessibility and usability of content in the system.
  • the production support extension is used by production staff to maintain the validity and value of the content store, for example, by automatically removing obsolete items.
  • An audit process periodically scans the content store and identifies items requiring action. Actions supported are removal, notification of production personnel, and notification of item author.
  • the audit process is configurable via an API or a scripting language.
  • the programmable interface allows the selection of content based on metadata attributes and allows the action taken in response to a match to be programmable.
  • a mechanism for collecting feedback messages and generating reports from the feedback database.
  • the feedback report generator is configurable, allowing messages to be selected, sorted, and grouped according to the following parameters, in addition to the parameters of the request which instigated the feedback, as itemized in status of the feedback message, severity of the issue, person responsible for addressing the issue, and date of feedback message.
  • the feedback mechanism is capable of indicating or otherwise distinguishing item classes.
  • Author issues are issues related to the accuracy of an individual item. They involve the actual material in an item of content and are generally correctable only by the original content author.
  • Management issues are all other issues, including the operation of system servers and interfaces.
  • the feedback mechanism allows for messages to be routed automatically based on the class of feedback and other metadata associated with individual content items.
  • the feedback mechanism provides trouble issue tracking mechanisms. Feedback issues are maintained in a database and tracked throughout there lifetime, through creation, assignment to production personnel, analysis, and resolution. Issue tracking supports interaction with content provides, for example a Marcom division, but also maintains state within the system to allow progress by non-system personnel to be monitored.
  • This extension provides an interface that is used to manage the overall system. This interface is capable of starting, stopping, and backing up the repository. It is also used to add, delete, and update users. The bulk of this interface may be implemented via a custom application for PCs running Windows 3.1 1 or a workstation running HP-UX. Remote connections via TCP/IP are supported. A limited interface allows consumers to update a limited number of fields in their user authorization record, for example, their password.
  • the primary users of the system are sales representatives whose information needs change across the phases of the sales process.
  • Other users of the system include field sales specialists and the professional services organization. When the system is released to channel partners, it is the main place for them to get information about the company.
  • the first section below outlines the sales process, which motivates a classification of the types of information that should be provided by the system. The remaining user requirements are presented in subsequent sections.
  • the sales process is divided into five phases, each with its own information needs. This section describes each phase of the sales cycle and categorizes the types of content used during each phase. While most types do not place any unusual constraints on the architecture, influence system design and product selection. For example, the goal of being the single online information tool for the sales force requires that information originating outside of the company be incorporated, which impacts the content submission and acquisition processes. Specific requirements such as these are noted where pertinent. The relative importance of each requirement is determined by the value of the content it supports. Currently, the system is largely used for the same purposes for which hardcopy literature is used, i.e. to present the customer with information about company products.
  • Content is abstract information, divorced from its physical appearance or presentation. As an abstraction, content does not exist in its pure form. It is always manifest in some physical form when being stored or delivered. Associated with the abstraction are high level concepts such as rendition and structure.
  • data is used herein to refer to information in a more general sense, including the information which represents content for the purposes of storage and delivery. In contrast to content, data does not have the associated high level concepts. There is no concept of rendition for a database.
  • the architecture discussed herein revolves around content and relies on a uniform model for expressing content.
  • the power of the uniform content model lies in the ability to leverage one code base to manipulate a wide range of content types. If content is not represented in a uniform manner, each function of the system must be reimplemented for each type of content.
  • the model affords uniform access to content across the system via a common API. Because the content model forms the infrastructure of the invention, it is difficult to implement incrementally. While it can be improved iteratively, flaws in the initial design mitigate any potential benefits. Design of a uniform content model requires that the range of content managed be characterized, noting commonalities and differences. This characterization also aids in the understanding of which elements are content and which are only data.
  • Content exhibits the following characteristics: it is identified by name, it is described by attributes, it bears relationships to other content, it has structure, it is updated by revisions, its distribution is governed by its sensitivity, and it is manifest in one or more renditions.
  • Every element of content is identified by a unique and canonical name which is used to refer to the content.
  • An absolute name references the same element of content regardless of where the name is used. Absolute names do not change. If an absolute name for an element of content such as a document is stored the same document can later be retrieved using the same name, regardless of the current state of the retrieving application or any changes to the attribution of the content which may have occurred.
  • the content referenced by a contextual name depends on other information, such as the state of a user's session. For example, if a user navigates to a product line and wishes to bookmark an overview document, the bookmark is made to the contextual name — the document role — not the particular document which is fulfilling that role at the current time.
  • Contextual names are particularly important in implementing applications which generate content dynamically, such as the solutions catalog database browser.
  • Names are instrumental in storage management. Content is accessible only if its name is known, and inaccessible content need not be maintained by the system. This technique, known as garbage collection, is necessary to support a consistent content store in the presence of multiple references to data, which come about via user hot lists, mailing lists, and transient content. While the name model and the name manager that implements that model may appear inconsequential, inadequate initial consideration is potentially the most significant limiting factor of an extensible system.
  • Attributes are intra-content metadata. Some attributes, such as expiration date, are fundamental to the operation of the system. This use of attributes is well-understood and implemented by most document management systems. Attributes can also be used to classify documents according to ontologies, which describe the ways in which content can be classified.
  • Ontologies also capture information about attribute values, for example, the fact that GSY is a division (a subset) of CSO. If available, the system uses such information to determine that the content authored by GSY is a subset of the content authored by CSO. This use of attributes to specify ontological metadata is not widely implemented or even well understood. Support for such usage is lacking in off-the-shelf products. Effective navigation of content relies on complete and accurate attribution. The greater the sophistication of an underlying ontology, the greater the potential for powerful navigation aids.
  • the design of an attribute system for a content management system is analogous to schema design for an RDBMS. Attribute design must be carried out prior to content migration. Subsequent modifications to the attribute system require content be reattributed, a problem akin to schema migration.
  • Relationships Two or more elements of content may bear a relationship to each other. Relationships are inter-content metadata. Relationships are a more powerful construct than attributes. Attributes can actually be implemented using relationships. For example, an author attribute of an item of content can be expressed as a relationship between the content and an object representing the author.
  • Structure organizes and relates the data that comprises content.
  • the structure inherent in content is its logical structure. Examples of this kind of structure are chapters in a book and rows in a table. Structure subdivides content into smaller elements of content, or sub-content.
  • sub-content is also content and has all the qualities of content.
  • present-day content authoring and management tools treat the document as the smallest unit of content and introduce a disparity in functionality between documents and smaller elements of content. As a result, it is generally impossible to name, attribute, render, revise, or control access to sub-content.
  • Content is often grouped into collections to assist in content management and delivery. These collections are also structured content. For example, the set of documents which results from a search query is content.
  • the invention in its entirety can be considered to be one element of content. It is critically important that the content model treat collections as content so that any operations defined for content are also applicable to collections. It is also very valuable that content operations be applicable to the smallest units of content, though this may prove impractical in some cases.
  • the sensitivity of content determines the scope of its distribution. Sensitivity also impacts content generation. For example, when a search results list is generated, it must not contain references to any content whose sensitivity exceeds the user's authorization.
  • the sensitivity attribute of agents can be used to restrict access to system functionality.
  • the sensitivity of an application which allows the user to submit content can be set so that only users with author authorization may exercise this function.
  • Any element of content is manifest in one or more formats, or renditions. There can be great diversity among multiple renditions of an item of content. The abstraction unifying the renditions is that they all convey the same meaning.
  • Rendition is not a simple attribute, but a combination (the cross-product) of several attributes, including written language, file format, encoding, and media type. Rendition types bear relationships to each other through an ontology. For example, HTML and RTF are textual. They explicitly represent characters as distinct objects.
  • TIFF and GIF are raster formats. They only represent the pixels which compose an image of the document.
  • Rendition ontologies allow many rendition types to be. treated similarly, provided that they have a unifying quality represented in the ontology. For example, all textual renditions can be searched using full-text search. When submitted, content usually exists in a unique source rendition. This is true even if multiple renditions are submitted because one revision is usually used as a source from which the others are generated.
  • the rendition designated as the source rendition changes when the content is updated, if for example, an AmiPro file is converted to Word for further editing, or if an English document is translated to Japanese and subsequently revised in the Japanese.
  • Identification of source renditions is necessary to insure that dependent renditions are updated if the source rendition is revised.
  • Conversion between renditions may be necessary for delivery and other operations performed on content. Rendition conversion may be fully automatic, machine-assisted, or fully manual.
  • the use of an on- demand application such as Adobe Acrobat Distiller or Tumbleweed Publisher to convert Postscript into an electronic document format is an example of an automatic conversion.
  • Generation of HTML from a word processor document is an example of a machine-assisted operation, because automatic conversion is imperfect and must be verified by a human.
  • Translation between written languages is an example of a mostly manual conversion. Once converted, new renditions of a document may be cached by the system. This is essential for manual and machine-assisted conversions so that human effort is not lost. For automatic conversions, caching of the converted result represents a tradeoff between retrieval latency and storage requirements.
  • Generated renditions bear a dependence relationship to the source rendition.
  • the architecture must maintain these dependences to determine when conversion must be repeated. For example, if an HTML rendition is generated from an AmiPro source file, changes to the AmiPro file require that the HTML file be regenerated from the AmiPro. Likewise, if an English document translated to Japanese is then revised in its Japanese form, the English document is out of date until the modifications are translated back into English.
  • the principal file formats required include a neutral format for operations on content, including indexing and annotation.
  • SGML (see Appendix D)] is the best choice of a neutral format; HTML for delivery of predominantly textual content; an electronic document format (PDF or Envoy) for delivery of print- quality content; TIFF for delivery to facsimile devices; AmiPro/WordPro for templates such as sales proposals; and Freelance for templates such as sales presentations.
  • Temporal Nature of Content Content is characterized by the duration of its lifetime and frequency of update.
  • Transient content is re-created each time it is requested and is no longer available once it is delivered. Examples of transient content include a hit list resulting from a search and the result of a database query. Transient content is generated by applications. Each time the application is executed, a new agent (process or thread) is created and assigned a name which users may use to interact with that particular agent. Application instances appear to be content in that they have names and may respond to requests by creating content. They are not themselves content. Application programs, generally binary software, may be considered content, but this type of content is generally not exposed to users. Treatment of programs as content is beneficial primarily in administrative operations such as backup and replication.
  • the integrated and evolutionary nature of the invention calls for a modular design.
  • the provision of an electronic interface for the field sales force necessitates a system architecture which can integrate a diverse set of applications.
  • a major aspect of the invention process is the isolation, classification, and specification of the required functional components.
  • the product of that analysis is a set of functional components and a set of interfaces for communicating between those components.
  • the functional components can be separated into four categories: core, services, channels, and agents.
  • the core comprises essential system functionality and serves as a hub for integrating other components.
  • Channels are the means of exchanging requests and data between the core and users and other systems.
  • Agents are modular applications which extend the functionality of the core.
  • the architecture's core provides the system's fundamental capabilities and serves as an integration hub for other components.
  • the fundamental capabilities provided by the core include content addressing (or naming), event scheduling, content caching, and session management. Because the core is a prerequisite for all other components, it is initially minimally functional, to facilitate expedited development, but it is also extensible so that inflexibility does not become a barrier to adding new features.
  • the minimal core is difficult to implement in an incremental manner. A minimum critical functionality must be achieved before the core can serve basic requests. Trying to produce a design smaller than this minimum produces code that in the future is not sufficiently flexible. Name Manager
  • Each element of content has a unique absolute name which never changes and can be used throughout the system to reference the content.
  • Storage for the data representing an element of content may be provided by any of a number of content management agents.
  • content may be stored in a local file system, a local database, or in a remote document management system.
  • the name manager translates content names to physical content locations.
  • a namespace is a set of names, typically designated by a common name prefix.
  • Each module which implements content storage registers its namespace with the name manager.
  • a document store interface might register all names beginning with docstore/.
  • the name manager provides for communication between modules by routing messages, which are directed at individual items of named content, to the module implementing that particular content.
  • the name manager may also interrogate a content cache to improve performance.
  • garbage collection The most effective storage management policy for large highly-interconnected systems is garbage collection. Under this policy, data are removed only when they can be proven to be inaccessible. Data are accessible if they can be reached by following references — represented by inter-content relationships — starting from a set of known roots.
  • the roots of the garbage collector include data currently being operated on and data whose name is registered in a permanent namespace.
  • Static content is accessible from a permanent namespace and is protected from the garbage collector as long as a reference to the content exists in the namespace. Deletion of static content occurs when its name binding is deleted from the namespace and no other references exist.
  • garbage collection the content expiration process can be modified to keep content referenced by a user's personnel folders from being removed even if the expiration period is exceeded. When, in the future, no users have links to this content, the item is automatically removed.
  • Transient content is accessible from the current list of tasks being performed by the system. Once the task is completed, the content referenced by the task is no longer accessible. Transient content can be made static by binding it to a name in a permanent namespace.
  • the garbage collector is implemented as a background daemon process.
  • Event-driven programming is an effective means of providing modular communications between a large number of software components.
  • modules are able to send data to a large number of recipients without knowing their exact identities.
  • modules may receive data from many senders.
  • Applications generate events when they perform operations affecting the state of the system at large.
  • Such operations are performed through the architecture core, which generates an event as a side effect. For example, an event is generated every time content is modified. Certain applications must perform specific operations in response to events occurring elsewhere in the system. For example, a full-text search engine must update its indexes whenever any indexed content is modified. Applications can request notification from the event scheduler whenever events matching a specified pattern occur.
  • the event scheduler is implemented as a daemon process, with possible assistance from the RDBMS in the form of triggers.
  • One unique aspect of the invention is its relationship to the content it manages.
  • the system itself is not the definitive source for much of the content it provides to users. Instead, the system serves as a broker, distributing content gathered from various sources.
  • the content storage capabilities at the core of the invention constitute a local subordinate store or cache. Because all content has a definitive source, the caching of the data representing that content is discretionary to a certain extent. There are, however, several practical reasons why some data must be cached:
  • the data are metadata not provided by the content source, but generated manually,
  • the data are a manifestation of dynamically generated content, and must be stored until delivery, or
  • Data Storage Content when stored, is manifest as data in some particular rendition. Metadata, including attributes and relationships, are other types of data.
  • the content cache maintains local copies of content by storing the data and metadata which represent the content.
  • the invention uses an RDBMS to implement data storage for the content cache. Certain types of data, full-text indexes in particular, are stored separately in databases designed specifically for that type of data. Large data elements, such as documents, may be stored outside the database for reasons of efficiency.
  • the content cache maintains the validity of the data it contains by maintaining a set of dependence relationships between data elements and regularly checking dependences. This is used, for example, to ensure that when a source rendition is changed, all automatically generated renditions are regenerated. Each cache entry bears a dependence relationship to other data, either within the cache, or external to the core.
  • Many applications are session-based and maintain the current state of the session. For example, a navigator maintains the set of current navigation parameters entered by the user.
  • the session manager provides a consistent mechanism for managing the state of a user's session.
  • a central session manager ensures that the session state is always recoverable in the case of unintentional session termination, and provides a means for maintaining bookmarks and history lists.
  • While a session may involve the interaction of many applications, from a user's perspective, the session state is the union of the session states of all applications involved in the session.
  • the session manager provides an interface by which clients can request state to be saved. It also brokers information about individual users and sessions, including log in, log out, and session splitting.
  • the session manager uses an RDBMS as a persistent object store.
  • Services are modular components which extend the basic functionality of the core. They generally operate at the data level rather than the content level and, unlike agents, do not have instance names.
  • the service-level interface to the core is intended to facilitate tight integration of third-party software. A service is automatically invoked when the particular core functionality it provides is required. They are generally not invoked though a direct request. While the service interface is designed to be extensible to new classes of services, several classes have been identified as immediately valuable: rendition conversion, automatic annotation, and metadata extraction.
  • Rendition converters translate content from one rendition to another.
  • the rendition conversion service architecture facilitates integration of rendition conversion software supplied by various third-party vendors. All rendition conversions should be content-preserving, i.e. all information, both textual and visual, should be maintained by the conversion process. However, automatic conversions are rarely perfect in this regard. Some conversions, AmiPro to text, for example, are imperfect because the resulting format is not capable of expressing all the information in the source format. Other conversions, AmiPro to HTML, for example, are imperfect because of inadequacies in the conversion software or because of the intrinsic difficulty of a particular conversion.
  • Each rendition conversion made available to the system is assigned a fidelity attribute, which is a measure of the ability of the conversion process to faithfully reproduce the content.
  • a fidelity attribute which is a measure of the ability of the conversion process to faithfully reproduce the content.
  • fidelity is a multidimensional value. Individual components of the value are generally partially ordered. Representing and manipulating multidimensional fidelity is very useful in consistently, automatically presenting the user with the highest fidelity rendition for each request.
  • the primary source of rendition converters for word processor (WP) formats is the Mastersoft Word for Word package, now marketed by Adobe as File Utilities.
  • the Mastersoft filters convert between WP formats and HTML, and between WP formats and neutral formats such as RTF.
  • Mastersoft filters can convert one line drawing format to another line drawing format and can convert one raster format to another raster format, but they cannot convert from a line drawing format to a raster format.
  • the Mastersoft filters are available for HP-UX as well as Windows platforms.
  • HTML converters are currently being marketed. Unfortunately none is sufficiently robust for use in wholesale automatic conversion. All HTML converters, with the exception of the Mastersoft filter, are only available on Windows platforms, to PDF/Envoy Adobe and Tumbleweed both offer Windows printer drivers to generate output in their respective electronic document formats. Adobe offers Distiller and Tumbleweed offers Publisher, which convert Postscript to PDF and Envoy respectively. The Adobe products are available both on HP-UX and Windows platforms. The Tumbleweed products are available on Windows platforms only.
  • TIFF TIFF
  • Some versions of Windows include printer drivers which generate TIFF images.
  • ghostscript a freeware/commercial product, can produce TIFF from Postscript.
  • Annotators add value to content by gathering other data and interpolating it into the content. For example, when displaying an item of content identified using a full-text search, it is often valuable to highlight the search words in the body of the displayed document. Most annotators are rendition-specific. They can only annotate a single format, such as HTML or PDF. Theoretically, annotators for a neutral document format would be most effective. A single annotator for a neutral format can be leveraged to provide annotation for a wide range of source formats. However, this assumes there exist high-fidelity rendition conversions to and from the neutral format which is currently not the case. Due to their relationship to metadata and therefore the attribute schema, most annotators are custom applications or at least custom interfaces to standard applications.
  • High-level languages designed for manipulating text or structured text may be useful for rapid prototyping and development of automatic annotators.
  • Useful automatic annotators include: Find URLs embedded in textual material and convert them to hyperlinks; For each hyperlink in the content, indicate the relevance of the link target to the current search state; Indicate the security level of the document; available from document metadata; through a header annotation or background; Insert navigational aids, including links to related sub-content; Warn about potential technical or content-related problems which have been re-ported through feedback or audit procedures; and Highlight words which match the current search criteria.
  • Metadata extraction tools 81 (See Fig. 8) are required which recognize metadata in content and generate metadata in compliance with the metadata schema.
  • An example of intermingled metadata is the city, date, and wire service name which often begins the first paragraph of a newswire story. If such data are not identified as metadata, the effectiveness of metadata-based navigation is compromised.
  • Metadata is explicitly denoted in source data
  • accurate metadata extraction tools can be used. Examples of source data which contain explicit metadata are textual forms with labeled fields and HTML files with META tags.
  • metadata is not explicitly denoted but it may be inferred from the entire content. For example, a document which mentions competitors frequently is likely to be competitive information. Extraction of such implicit metadata can not be performed deterministically, and therefore its accuracy is questionable. In such cases, the extracted metadata should be analyzed and confirmed by a knowledge manager (See Fig. 9).
  • the document contains competitive information.
  • the same company might also be a customer or a channel partner.
  • other data might be consulted.
  • the industry code associated with the document may be helpful, if the company is a competitor in some markets and a customer in others.
  • Channels are the mechanisms by which requests are accepted and content delivered by the system
  • a channel instance is created when a physical connection is established with an external source. For example, an instance of the HTTP channel is generated when the HTTP server receives a request.
  • a channel instance may also be created by the system to initiate a new communication, such as to deliver a fax machine.
  • Content must be delivered to a specific channel instance, not to the HTTP channel. This requirement follows from the fact that only instances of channels have concrete characteristics. For example, there is no concept of the HTTP channel bandwidth, but for a channel instance representing an individual interaction between an HTTP server and a Web browser, the bandwidth of the channel is well-defined.
  • the capabilities provided by a channel determine the range of content it may deliver and the relative preference of available renditions.
  • a channel may support any number of media formats to a variable degree.
  • the media capabilities for a channel are expressed as the fidelity of the channel when delivering each media format.
  • the security of the channel dictates the security level of all system operations performed while responding to a request from or delivering content to a channel.
  • a channel must provide private communication and user authentication. Privacy may be maintained by session encryption algorithms such as RC4. Authentication may be based on passwords or RSA public-key certificates.
  • Bandwidth The bandwidth of the channel influences rendition selection. Large graphical renditions are not appropriate for delivery over channels with limited bandwidth.
  • the latency of the channel influences the behavior of agents which converse over the channel. Over high-latency channels, the system attempts to deliver more content per communication to reduce the number of round-trip delays incurred. Where latency is not an issue, the use of smaller chunks of information is more ergonomic.
  • Channel capabilities vary between instances of a channel.
  • the media capabilities of the HTTP channel depend on the capabilities of the HTTP client which initiated the communication, which in turn depend on the set of plug-in media viewers which have been installed.
  • the capabilities of another instance of the HTTP channel, initiated by a different HTTP client, may be markedly different.
  • the primary channel is HTTP. Facsimile support is also desirable, especially for delivery. Support for electronic mail interaction is useful as a mechanism for users to send content to non-users. It is also potentially useful for channel partner access. Telephone and pager interfaces are also useful because they provide a uniform mechanism for reaching the worldwide field, cementing the use of the invention as the single electronic interface.
  • Channels communicate with the system through a channel API, which allows new channels to be added to the system at a later date. Some degree of early consideration of channels that may not be implemented immediately is useful in determining that the channel interface is sufficiently expressive.
  • HTTP The HTTP channel provides communication with Web clients, such as Netscape Navigator.
  • the HTTP channel requires the use of an HTTP server to receive requests and deliver responses according to the HTTP protocol.
  • Netscape Enterprise Server is the recommended server software for secure transactions over the World Wide Web. Access from Netscape Enterprise Server to the system can be implemented efficiently by binding the interface code into the server using Netscape's NSAPI protocol. The complexity of such interface code should be minimized to reduce the potential for adverse impact on the reliability of the HTTP server.
  • the electronic mail channel provides communication with electronic mail clients such as CC:MAIL. This channel makes the system accessible to users who have e-mail accounts but no direct network access.
  • the e-mail channel is also useful for asynchronous delivery, such as notifications.
  • the e-mail channel has capabilities similar to those of the HTTP channel, but generally with a significantly higher latency.
  • SMTP the Internet protocol for mail delivery
  • MIME a protocol for encapsulating one or more files of various formats in a single mail message
  • RSA public keys a series of cryptographic algorithms which provide privacy and authentication
  • Form Interface A form interface, such as Lotus Forms, to allow forms to be filled out and submitted using a mail client Implementation of simple delivery of content in a single-file rendition such as PDF or Envoy is possible with less effort.
  • a voice telephone may be used to request content through menu navigation. This mechanism is most effective for finding content within a limited domain, such as a user's personal folder, or for finding specific documents, given a document identifier. Delivery of content via telephone is possible for content available as audio data.
  • Brief urgent messages such as notifications, may be delivered via pager. This also requires a notification agent to implement the notification selection.
  • Agent processes may be started when the system is initialized or when certain events occur, for example, initiation of a new user session. It is possible that multiple instances of the same agent are active simultaneously. For example, users are interacting with different instances of the navigation agent. Each instance of an agent has a unique address or name, used by the core to route requests to the agent. Once started, agent instances may be available to accept requests. They continue to service requests until explicitly terminated.
  • Some agents present a human interface, either to users or administrative personnel. These agents present a persistent session interface, in which case they use services of the session interface
  • agents are categorized as user agents, administrative agents, and system agents. This classification has little bearing on implementation. All agents are treated uniformly by the core.
  • User agents maintain an ongoing dialog with a user and interact with the system on that user's behalf. User agents are created when a user begins a new session or requests content from a namespace registered to an agent. Each instance of a user agent serves only one user session. Only requests generated on behalf of that user session are accepted by the agent instance. All operations performed by the agent instance carry the access permissions of the user.
  • the navigation agent maintains an ongoing session with a user, directing him toward relevant content.
  • the submission agent manages manual submission of content to the system.
  • the user typically an author or other content provider, is presented with a series of forms.
  • the values of various attributes can be specified, and the content data can be submitted directly or by reference to an online location.
  • the submission agent also allows content to be composed from existing content. For example, an author may compose an info kit from a set of product specifications and collateral literature.
  • Submitted content is first approved and processed by the production staff before it is made accessible to the general audience. Authors are able to view content they have submitted prior to its release to the general audience.
  • the submission agent also allows authors to query the production status of content they have submitted.
  • the submission agent provides a user-friendly interface to the submission management agent which, in coordination with the production agent, actually enters content into the database.
  • User Authoring Users are given a limited degree of authoring capability. These capabilities are implemented by the user authoring agent.
  • User-authored content may include contributions to discussion groups; bookmarks and other personal collections; annotations attached to content; and simple notes.
  • content authored by users is not accessible to other users. Users may extend to other users access to portions of their personal content. In most cases, it may be desirable to ascribe very low importance or relevance to user-authored content, so that such content rarely appears in search results.
  • Portal agents provide access to content which does not adhere to the attribution schema and cannot be fitted to the attribution schema without significant loss of information. Because it does not adhere to the attribution schema, such content is not accessible directly through the navigation agent. Content which does adhere or can be fitted to the attribution schema can be made fully accessible at the system level.
  • Portals facilitate a variable degree of integration between the system and non- system content managers. All communication between the portal with the core occurs via a portal API. The level of integration of a particular portal is determined by the portal agent's implementation of the access, search, navigation, and backup portal interfaces. The system may contain links to non systems, such as external Web sites. Integration through a portal agent offers several advantages over a hyperlink: Portal agents can cause their content to be indexed by the system, which allows users to find references to the portal agent from the standard system navigator; Content generated by a portal agent is amenable to rendition conversion and an-notation, and may be delivered by various channels; Navigation of content through a portal agent is integrated with the standard system navigation agent and uses the same bookmark and history mechanisms. The system provides standard portals for accessing WWW sites and Usenet news-groups.
  • Each user's profile, maintained by the system, includes a set of user preferences which specify the manner in which the system communicates with the user, including appearance and verbosity. Users may modify the user preference portions of their user profiles through the user preferences agent. User profiles also include a list of clipping requests, expressed as event descriptions. These event descriptions are automatically registered with the notification agent which detects content which matches the descriptions. Any such content is linked by the notification agent into the user's private collection.
  • Feedback may be managed through various means, ranging from a simple electronic mailbox to a customer support database. Users report problems and submit comments regarding the system to the feedback system via the feedback agent.
  • Administrative agents are similar to user agents except that they are only made available to system administrators. This exclusion can be trivially implemented via sensitivity attributes on the agents.
  • the production agent implements a workflow which prepares submitted content for distribution to the system audience.
  • the content is examined by a knowledge manager to verify that it is relevant to the system audience and that the same content does not already exist in ESP.
  • Replication identification can be aided by searching for other content with identical or almost identical extracted metadata.
  • the content is characterized and attributed by a knowledge manager, starting from the candidate attribute values supplied by the author. Audit
  • the production agent also provides functions for modifying existing content and metadata and for deleting items from the database.
  • the production agent provides a user-friendly interface to content management agents which actually manage content and associated metadata.
  • the production agent coordinates with the submission management agent which receives data to made into content from the submission agent.
  • the reporting and analysis agent allows administrative users to generate reports and graphs fromthe system logs.
  • Systemlogs are created by a system agent.
  • Several types of reports can be generated, including performance reports
  • the performance of the system can be analyzed and related to time of day, geography, channel, and content source.
  • Knowledge Reports Using data in the knowledge logs, knowledge management issues can be analyzed, including content demand, missing content, and content mis- attribution.
  • reports can be generated which indicate the number of active users or concurrent users, related to user profile and geography.
  • reports can be generated which indicate possible security concerns, including multiple concurrent sessions by the same user, denied requests for sensitive content, and frequent unsuccessful authentication attempts.
  • User profiles are managed by administrative staff via an account management agent.
  • This agent provides a means of assigning passwords to users and associating RSA public keys with users. Users may also be assigned to access control groups.
  • System agents are anonymous automated clients which implement advanced system features. System agents are not associated with a user session. Instances of system agents are created when the system server is started and are not terminated until the server is shut down. Each instance operates as a particular user. Content Management
  • Content management agents transform external data into internal content.
  • content management agents communicate feedback and other change requests back to the content source.
  • content source and interchange mechanisms are reliable, content may be made immediately accessible to the general system audience. In this case, automatic content auditing is required to ensure that all content conforms to the schema and other acceptance criteria. Examples of this type of content management agent are newswire agents and external web site crawlers for trade magazines.
  • content may be placed in a production queue. Such content must be reviewed via the production agent before final release into the system.
  • the primary agent in this class is the submission management agent.
  • the submission management agent internally maintains a queue which accepts data from the submission agent and holds it until the content has been reviewed and approved via the production agent.
  • Content management agents communicate with the core via a standard interface, which ensures uniform treatment by other components in the system.
  • This interface must be sufficiently flexible to support a wide range of potential content sources, including document stores, newswires, feeds from market analysts, Web sites, and Usenet newsgroups.
  • Export content can be extracted from the system by export agents for the purpose of generating other views of the content.
  • export agents for the purpose of generating other views of the content.
  • the production of CD-ROMs containing content is implemented as an export agent which programmatically walks content and generates an indexed hierarchical file structure suitable for offline browsing.
  • a key to enabling export functions is the treatment of the results of navigation as items of content.
  • An HTML exporter can be created modularly by having an agent create a navigator session and then communicate with the navigator to walk the set of content of interest. The navigator sends the content representing the navigation pages back to the exporter, rather than to an HTTP channel. The exporter makes small changes to the content and then writes it to a CD prototype file system.
  • the system notifies users when certain events occur, such as modification to a particular file.
  • the notification agent monitors system events through the event scheduler and generates notification messages when user-specified criteria are met. Notification messages are either delivered via channel, such as e-mail, or linked in to the user's personal content namespace.
  • Logging The logging agent monitors system events through the event scheduler and writes messages to a log database. Each instance of the logging agent can be configured to monitor specific events, so that different types of logs can be created.
  • Performance Log a log of the response time for each user access
  • Knowledge Log a log of accesses to content, including search requests
  • the logging agent is additionally responsible for importing logs from various other components of the system including HTTP and RDBMS logs.
  • Auditing agents check the validity of the content according to a specified set of rules.
  • the auditing agent is invoked by other agents submitting content. Auditing is a stage in the production process and is performed automatically in conjunction with other content managers.
  • the auditing agent may also scan existing content to check rules which cannot be tested at submission.
  • One example is the rule which invalidates out-of-date content.
  • APIs APIs
  • a key element of the design of the system is the design of the interfaces which allow the system components to interoperate. To support current and future needs adequately, care must be taken to develop interfaces that are flexible and adaptable.
  • the content interface provides a uniform mechanism for accessing content.
  • the interface is implemented by objects created by content management agents (repository content) and agents (dynamic content).
  • All objects which provide content implement an interface which provides access to attribute and rendition data.
  • the retrieval interface provides a facility for determining the availability and fidelity of renditions.
  • All objects which store content implement an interface which allows data values to be assigned to attributes and renditions. Assignments can be used to modify the metadata or renditions associated with a particular content name. Assigning to a source rendition and assigning to a derived rendition are distinguished. The former is an update of the content which causes other renditions to be out-of-date, while the latter is the result of a manual rendition conversion.
  • Traversal All objects which provide structured content or collections of content implement a traversal interface which provides access to the sub-content. Through the traversal interface, clients can retrieve a list of the names of each contained element of content. Each name so retrieved can then be used to operate on the sub-content. Structure can be walked in a hierarchical manner by recursive application of the traversal interface.
  • content is indexed for searching by traversal and retrieval of a textual — and preferably neutral — rendition.
  • Content objects which wish to override the default search indexing may override the searching interface.
  • objects may return search indexes, each of which maps terms to content names.
  • a standard protocol that may be applicable in the definition of this interface is WAIS/Z39.50.
  • the delivery interface is implemented by channels and other components which accept content. There are two aspects the interface: access to channel capability information and delivery.
  • Channel capabilities may impact the behavior of applications which deliver content to the channel. For example, the range of renditions that are acceptable to a channel determines which rendition is chosen to manifest content retrieved from a content store. Other channel capabilities include security, bandwidth, and latency.
  • Channels also implement a delivery function, which transfers metadata and content to a recipient through the channel.
  • the content must be provided in a rendition accepted by the channel.
  • a channel may use content attributes in its operation. For example, an HTTP channel transmits expiration information to a web client if available.
  • Components which provide conversion capabilities include rendition conversion, metadata extraction, and annotation, implement the conversion interface.
  • This interface is similar to the delivery interface. Both interfaces accept content as input. Instead of a delivery function, the conversion interface specifies a conversion function, which generates new content from the input metadata and content. The conversion function returns the name of a newly generated item of content which represents the results of the conversion.
  • the event interface provides system components the capability of generating events and receiving notification when selected events occur.
  • the event scheduler provides functions that agents may invoke to generate a new event or to request notification of an event.
  • Each event has a classification which describes the occurrence which caused the event to be generated.
  • Events may contain additional information in the form of parameters.
  • the exact parameters depends on the classification of the event.
  • agents specify event selection criteria in terms of event classification and parameters. Agents which request to be notified of certain events also implement a notification function which is invoked by the event scheduler when an event matching the selection criteria occurs.
  • the classification and parameters for the event which triggered the notification are passed to the agent as parameters to the notification function.
  • Session Interface provides a uniform means by which all agents maintain state between accesses.
  • a typical user session involves multiple agents.
  • the session state is the union of the states of all agents.
  • Centralized maintenance of the combined session state allows the implementation of user session features such as bookmarks, histories, and session recovery.
  • the session interface is implemented by agents which carry state, and requires that two methods be defined: one for saving or checkpointing the state, and one to restore a saved state.
  • the session manager invokes an agent's session interface to save whatever state information is necessary to return the agent from an unknown state to the current state. This information can be maintained by the session manager to implement bookmarks, histories, and session recovery by requesting that the agent reload an earlier state.
  • Agents which provide portals to content do so by implementing various aspects of the ⁇ portal interface.
  • This interface allows portal agents to define custom methods for access, search, navigation, and backup.
  • a portal may deliver content online or offline.
  • a portal providing access to hardcopy literature may accept requests online, but only deliver content offline through the mail.
  • An interactive portal such as a stock quote service, delivers content online.
  • a portal may implement the searchable content API, enabling portions of its content to be searched and allowing navigational agents to present the user with references to the portal when appropriate.
  • the solutions catalog could search -enable the product descriptions it contains so that the standard navigation techniques present the user with links into the solutions catalog when appropriate.
  • the standard navigation agent may not be a convenient interface for certain types of structured data. For example, users expect to be able to browse an events database using the familiar calendar paradigm. In such circumstances, a portal may implement a custom navigation agent.
  • Backup of content accessed by a portal may be the responsibility of another system which manages the data, or may be implemented through the system via the portal API.
  • Content management agents are the sources of information for.
  • the content management interface provides a mechanism by which the content source may influence the behavior of the content it provides.
  • a content management agent may override several aspects of content behavior, including retrieval, modification, content creation, and feedback.
  • the content manager specifies the manner by which content is retrieved. Most content is cached by the core. Content managers may specify caching parameters or prohibit caching altogether, requiring that the system contact the content manager directly for every access to the content. Modification
  • Content managers must maintain the consistency of data between the system and the content source. To this end, content managers may control modification of the content from the system. For example, modifications originating from the system may be reflected back to the source, or they may generate a feedback message to the content provider. For some sources, modification of the content is disallowed under contractual agreement.
  • a new element of content is created when content is assigned to a name which previously had no content associated with it.
  • Agents which store content provide a common means of creating new names within their registered namespace.
  • the content management interface allows agents which supply content to specify the means by which to deliver feedback on that content.
  • Information routing systems leverage content metadata and user profiles to deliver relevant information to interested parties.
  • Information routing is a pushed-content model, in which providers publish content to subscription lists. Pushed content is convenient for content providers because it gives them a direct channel to their audience. Users, however, are at the mercy of the content providers. The result is that users either ignore most of the pushed content because they don't have time to determine its usefulness, or they save all pushed content, so that they have it should it become useful.
  • the first scenario there is no communication.
  • content management be-comes the responsibility of each individual user.
  • the invention implements a pulled-content model, in which users request information when they need it. Users can specifically request notification of the presence of new material of interest by customizing their user profile. It is also possible for administrative users to enter notification requests that users may not modify, which can be used to force announcements, but this functionality is not available to unprivileged users or authors.

Abstract

The service bureau architecture disclosed herein is an electronic performance support system for sales organization. The preferred embodiment of the invention is a single window by which a field personnel can access all online information of importance to the sales process. One aspect of the invention is to limit the amount of time sales representatives act as information brokers by assisting them in the retrieval and delivery of information. Another aspect is a system which thinks the way the user does to provide an intuitive and efficient interface.

Description

Service Bureau Architecture
BACKGROUND OF THE INVENTION
TECHNICAL FIELD
The invention relates to an electronic sales support system. More particularly, the invention relates to a service bureau architecture.
DESCRIPTION OF THE PRIOR ART
Information appliances and the Internet are revolutionizing the buying and selling process. While their primary impact so far has been felt in the retail distribution of branded, commodity products, there is great potential to leverage these technologies to improve the business-to-business sales process for more sophisticated goods and services. In particular, the complexity and rapid change characteristic in industries such as telecommunications, high technology, and financial services make them ripe for the application of innovative Internet technologies.
It has become increasingly difficult for the information consumers — sales, distribution channels, and customers of companies in these industries — to stay armed with exactly the information they need to compete and win. At the same time, the information producers — marketing — have found it increasingly difficult to respond with the up-to-date product, customer and competitor information required.
Conversely (and paradoxically) the easy availability of information on the Internet has meant that sales prospects have more exposure to competitive offerings than ever before. The end result is that key corporate messages are not consistently delivered; sales are lost to delay, confusion, and misinformation and product marketing experts become expensive, one-to-one support people.
Corporate intranets seemed to offer marketing departments an improved means for distributing the data sheets, white papers, presentations, competitive analyses, and demonstrations they produce. Unfortunately, such information — whether delivered via this new medium or via traditional paper mechanisms — is difficult to navigate and quickly becomes out-of-date. Furthermore, the information rarely incorporates the "net-net" or "sound bites" required in easily digestible form for sales to win in specific competitive situations.
Most industries today are characterized by a changing regulatory environment and a significant increase in global competition. Customers have come to expect more value and greater selection at the same or lower prices. At the same time, companies are undertaking increasingly sophisticated distribution strategies-often employing a multitude of channels and partnerships-in an effort to increase their reach and decrease their costs. Companies are also making wholesale changes to their corporate and product positioning in an effort to differentiate themselves and rise above the noise,,.
Marketing and sales of products and services in this environment is highly challenging. "Business as usual" is not working. Sales forces are under pressure to beat out the competition and close business, and look to marketing for the information on products, services, and competition to do so. Marketing is scrambling to define and drive new products and services through the development cycle, while at the same time attempting to support sales' need for information on a day-to-day basis.
The underlying problem is not that the sales information does not exist. Marketing generates gigabytes of Word documents, PowerPoint presentations, and e-mails, but sales for the most part is not able to take advantage of these efforts. The reason for this is that the information is not in a form that is readily accessible and guaranteed to be accurate and up-to- date.
The communication from marketing to sales is only half of the problem. The other half of the problem is that there is no convenient and reliable feedback loop from sales to marketing. Through their daily contact with prospects and customers, in-the-trenches sales personnel get valuable firsthand feedback on product capabilities, competition, and industry trends. Sales people do not consistently call or send e-mail to marketing with this information, and thus some of the most vital and current market intelligence is never captured and made broadly available.
Attempts to solve this problem have focused on the organization and accessibility of document-based information. In the absence of a standard software application that addresses the real scope of the problem, companies have developed custom solutions based on enabling technologies, such as Lotus Notes, Microsoft Exchange, intranet Web sites, marketing encyclopedias, and networked file systems.
Such systems have proven to be expensive to develop and maintain, short on features, and are not readily extensible. In the end, sales professionals, sales engineers, and channel partners still rely on product marketing for one-to-one support, or their information needs simply go unmet.
There are many industries that have severe sales information problems, but some are particularly ripe for a sales information automation solution:
• Financial Services,
• High Technology, and • Telecommunications.
Their characteristics include:
• They have complex product lines that are change frequently and are sold through multi-tiered distribution channels,
• They sell in highly competitive, global markets with changing regulatory environments, and
• They are acutely aware of the sales information problem, but have been ineffective in their own attempts to solve the problem.
Wide availability of information on the Internet has changed the relationship between vendors and corporate buyers as discussed above. Buyers now have more sources of information about competitive offerings than ever before and are defining their buying criteria prior to engaging vendors. Every sales opportunity reaches a critical point at which the prospect is qualified and ready to buy if the sales rep can provide the precise information required to close the deal. If the rep can respond quickly with the required information, the deal closes. If not, the deal is at risk of a more effective competitor. Competitors, after all, have access to the same information on the Internet as the prospect!
Sales pros and organizations are at a growing disadvantage in this environment. In a prospect-controlled buying process, sales reps move to close by answering an unpredictable range of issues and questions from the prospect. Most sales organizations have reams of information for their sales reps and channels, but lack information systems that quickly provide the exact information required to close deals. Information quickly falls out of date. Sellers cannot get immediate access to the precise information they need to compete and win, much less add value to the customer's buying process. Forrester Research calls this sales information gap the most important challenge for companies in the Internet era, and predicts the rise of a new generation of systems to solve it.
Forrester and other analysts recognize that companies do not have the systems required to meet the sales information challenge. Email and office productivity tools make communications easier for sales, but do not solve the need to find the right information to power the close. Customer relationship management and sales force automation solutions address pre- and post- sales information, but not the information needed to close.
Companies have used Internet tool kits to create a profusion of sales and marketing portals and Web sites, but these fragment information, are difficult to search, and expensive to keep up to date and relevant. Current systems do not effectively manage documents and other unstructured information, which is precisely the kind of information critical to closing business.
Harnessing unstructured information for internal and external users is a competitive imperative that few organizations are prepared to meet. Companies do not have the time, infrastructure, tools, and process support to solve the sales information problem by themselves. It would be advantageous to provide sales and marketing information exchange, e.g. an Internet e- service that lets direct sales reps, telesales, and channel partners zero in on the precise information needed to motivate prospects to close deals, with the assurance they have the latest, most accurate information available.
SUMMARY OF THE INVENTION
The invention provides a service bureau architecture. To the direct sales rep, the invention provides a personal sales information channel that helps him quickly find the right document, news article, presentation, competitive analysis, customer reference to answer prospect and customer issues and keep deals moving. The business partner, the telesales rep, and even prospects and customers see content tailored to their individual needs. Behind each individual's channel is a carefully organized and maintained custom information space containing only relevant marketing and sale's information drawn from both inside and outside of the company. From this common base of information, organizations can provide all of their key sales and marketing constituents with quick access to the information. that drives sales execution.
The preferred embodiment of the invention provides a hosted Internet e- service, which comprises a sales and marketing information exchange that equips direct sales reps, telesales, indirect channels, and channel partners with the precise information needed to develop and close deals. The information exchange organizes information for search, navigation, and delivery to a variety of audiences from a common base. Such information exchange is devoted to sales and marketing information to help sales pros and others in a company's sales channel zero in on the perfect information for each sales situation, drawing on sources inside and outside of the company. This ensures that the information accessible to these sales channel participants is accurate, relevant, and targeted. Such information exchange also fosters creation of better sales and marketing information by supporting collaboration among those in the sales process and between sales and marketing.
Information exchanges in accordance with the invention are interactive, and employ collaboration, usage tracking, and other techniques to ensure that information is current and relevant.
In addition to the immediate solution of the sales information problem, sales and marketing such information exchange is a vital element of business-to- business Internet commerce in automating information exchange between trading partners, suppliers and customers, suppliers and partners, and other commerce arrangements. The companies in a demand chain, for example, each set up such information exchanges and automate the movement of new product and competitive information from primary manufacturer to value- added distributor to customer, with requirements following a return route.
The inventive information exchange enforces appropriate constraints on information flows. Thus, the invention provides a sales and marketing information exchange tailored to individual companies and accessible as an Internet e-service. The invention organizes and categorizes a company's sales and marketing content, integrates it with new information from third parties, and facilitates the exchange of information to motivate closes and generally shorten sales cycles. The invention allows participants in direct sales, inside sales, partner sales, marketing, and other channel participants to reduce significantly the time they spend looking for information and, instead, focus on developing and closing the deal. In addition, the invention promotes an efficient, productive relationship between marketing and sales and supports collaboration and teamwork among internal sales reps, sales and marketing, and between personnel and channel partners. Ultimately, organizations can use the invention to promote teamwork with customers and prospects. -
Unique Delivery Platform
The invention comprises a delivery platform employs three major components:
• Context maps, which allow users to find quickly the precise and relevant information needed to close deals. The context maps are based on expertise in sales information organization, process, and provision. Information resource tools, which allow users to maintain and provide rich, easy to manage sales information exchanges.
Sales information channels, which allow organizations to target multiple audiences from a single base of sales and marketing information. The result is strong sales partnerships and communities of interest.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram of a service bureau architecture according to the invention;
Figs. 2a and 2b are block diagrams that compare the state of the art (Fig. 2a) to an architecture according to the invention (Fig. 2b);
Fig. 3 is a diagram showing multidimensional navigation according to the invention;
Fig. 4 is a block diagram showing ice-enabled content exchange according to the invention;
Fig. 5 is a flow diagram of a system architecture according to the invention;
Fig. 6 is another block diagram of system architecture according to the invention;
Fig. 7 is a further block diagram of the system architecture according to the invention; Fig. 8 is an example of an assisted publishing application according to the invention; and
Fig. 9 is an example of an information structure according to the invention.
DETAILED DESCRIPTION OF THE INVENTION
What is needed to resolve the above problems is a packaged solution, designed from the ground up to address the specific problem of communicating sales information, including capture, dissemination, and feedback. The solution must be designed for sales, used by sales and, ultimately, regarded as indispensable by sales. Thus, one aspect of the invention delivers highly targeted sales information E-services. These services are delivered as separately packaged modules that focus on a specific pain point and can be introduced individually or in combination with a phased implementation approach. At the heart of the invention is the concept of context maps for accessing sound bite and document-level information. Context maps are the vehicle by which the invention embeds marketing/sales domain categories, as well as an understanding of existing business rules, job functions, and terminology for targeted vertical markets. By pre-populating context maps, the invention offers significant value before a customer enters any information about their specific business.
Below is an example of a sound bite in the competition roadmap for a fictitious telecommunications company that was extracted from a lengthy internal document:
Dan Pastorini beat Breakout at FMA Corp. by emphasizing the following: Our GlobalReach service costs 22% less per minute on average than Breakout's TalkTime service. Breakout's network is immature - their coverage is spotty. They are #1 in an independent ranking of dropped calls. (Communications Week article, 12.14.98)
The invention's E-services are designed to support two groups of information users: producers and consumers. Although anyone can assume the role of a producer, a consumer, or both, the primary producers are product marketing and corporate marketing professionals. The primary consumers are sales professionals.
Figure 1 below provides a summary view of the system and [Table 3] lists the major feature categories in the solution.
Table 1 : Feature Categories
Figure imgf000011_0001
0
Figure imgf000012_0001
Figure imgf000013_0001
The vital, mission critical information that the invention captures, organizes, and makes readily accessible is exactly the information companies require to achieve:
• Shorter sales cycles and higher close rates,
• Reduced cost to marketing of supporting the field,
• Streamlined communication of dynamic information, and
• Improved products/services that are more effective.
The invention provides the next level of expressiveness needed to solve the sales information glut while leveraging the strengths of RDMBS technology. The key is to capture significantly more metadata — that is, information about content.
This information takes two forms:
concepts, which are similar to types or classes from the object-oriented (OO) programming model, and
roles, which are similar to members or attributes from OO.
Using concept and role metadata to categorize documents, and sound bites, and then applying extremely powerful and proprietary components, such as ontological indexes and query planners, enables the solution to provide interactive response to thousands of objects representing gigabytes of content.
Three technology components of the architecture provide a unique competitive advantage in achieving this kind of performance:
The description logics engine, also known as the category engine, enables incredibly fast retrieval of useful sales information from a relational database.
Ontologies or roadmaps of sales information organize content in a format that mirrors the way sales professionals think, but is able to be processed directly by a computer.
Content exchange channels use the Information and Content Exchange (ICE) protocol to exchange content between servers and other internal/external content sources and clients.
Contrast this with a traditional Web technology approach. Figure 2 compares the design of the invention to existing Web-based database applications.
Category Engine
The category engine 20 implements mechanisms for enabling powerful, high- performance querying of content based upon resource description framework (RDF)-specified categories and attributes. The technology enables users to quickly retrieve contextually relevant information:
• Requested using terminology that is familiar and comfortable to a sales professional,
Categorized in multiple dimensions so information can be found no matter what path a user takes, • Written as either a net-net or sound bite brief or as a detailed document, and
• Any document level, from an entire document or press kit, down to individual sections or paragraphs.
Other approaches in use today are not able to perform such complex and interactive queries against large repositories of content commonly found in sales and marketing organizations. As a data point, consider that Hewlett Packard has over 60,000 documents in their system. Faced with a large repository, other commonly used approaches simply do not stack up:
Full-text indexing: full-text queries have proven to be of limited use in the sales information arena. Full-text indexing technologies are inherently statistical. That is, indexing engines guess what set of documents relate closest to the words the user entered to locate the documents of interest. The result is generally a very long list that the user must plod through looking for something that is truly relevant. Moreover, due to the nature of sales information, coming up with a set of words that result in a list of relevant materials is, if not impossible, extremely hard and generally pushes both the technical abilities and patience of sales personnel.
Static hierarchies: folder-like, hierarchical management structures have proven very popular in the Internet portal marketplace with such companies as Yahoo and Excite® Home. However, this structure again fails in the sales information context. While directories have proven adequate when the individual item cataloged is a Web site, they grow unwieldy when the cataloged item is a document, a section of a document, or a sound bite. When used to catalog tens of thousands of items, a directory must either have thousands of folders or have individual folders with hundreds of entries. Neither of these solutions is useful to the sales professional trying to identify the small set of documents that enable him to make the most compelling case to the prospect.
In the invention, users browse using a combination of categories of information — categories that map to their intuitive understanding of the sales opportunity. To demonstrate, suppose in a fleet sales opportunity against Chrysler, a Ford sales person felt the need for the most up-to-date competitive positioning on minivans. By simply typing "Chrysler minivan competitive" (or selecting them through the graphical user interface), the invention identifies the categories shown in Figure 3 and selects only that content that meets the sales professional's need. If the sales professional wanted broader information he gets there in one click.
For instance, he could get to all information about Chrysler minivans 31 or competitive positioning for minivans across all Ford's competitors 32. The invention provides virtually limitless navigation possibilities, allowing the user to select the navigation that coincides with his intuition in a specific sales scenario.
Context Maps
The use of context maps for capturing and describing sales information is a second technology competitive advantage. The context maps use the World Wide Web Consortium's (W3C) RDF. The W3C metadata committee realized some time ago that simple HTML hyperlink technology, directories, and the full-text indexes of Internet portals are not adequate for locating information on the Web. To meet the challenge, they developed a description language with the goal of encouraging the development of new technologies for locating information. RDF is based on XML, and together with extending HTML to provide a richer display model, is one of the primary reasons XML was developed. RDF provides a language capable of describing content with terminology that:
• Sales professionals can understand and use for everyday communications, and
► Computers can read and perform the necessary computational logic to generate results after only two-three mouse clicks.
RDF has found broad industry support.
The problem of developing a descriptive metadata format is extremely difficult because there are so many possibilities. Moreover, the power of RDF comes from its ability to represent metadata in a non-proprietary format. Because no single vendor can mandate a metadata protocol for all the systems with which it must coexist, RDF is presently the only framework that provides vendors a sustainable solution.
Context maps, and their underlying RDF definitions, provide another key benefit of enabling application logic to be implemented outside of an RDBMS. This capability allows organizations to modify and extend their company- specific context maps using a visual configurator without having to change the underlying application.
Content Exchange
Key to any successful sales information solution is the ability to collect and distribute information. Certain classes of content useful to a sales professional in closing a sale come, to a large extent, from outside the organization. For example, recent announcements from a prospect, customer, or competitor. Likewise, many of the users of sales information are outside the organization, in distributors, VARs, Sis, and other components of the channel. No existing solution has attempted to automate the flow of content from all sources inside and outside the organization to all users-both in the direct and indirect channel.
The invention uses the Information and Content Exchange (ICE) protocol to collect and distribute information from various sources, both internal and external.
Internally, an organization can use ICE to import content automatically from a document management system, such as Documentum.
Externally, an organization may use ICE to import content automatically from a news wire service such as PRNewswire or from Web sites using solutions, such as Vignette's StoryServer.
Similarly, the invention supports distributing content via ICE to other ICE- enabled systems. This is used to publish content automatically to a Web site created using Vignette's StoryServer and to other content aggregators that support ICE.
Figure 4 shows the system communicating with servers inside and outside the enterprise. The vendor 40 in the example uses ICE to import content automatically from a document management system into the system and to export information to the corporate Web server. The distributor 41 uses their own server to aggregate information automatically from the vendor and one of their premier accounts 42. Using ICE, the vendor, the distributor, and key contacts at the customer can work collaboratively on sales opportunities and implementations.
Microsoft Office Integration
The invention provides a high degree of integration with the office productivity tools used by sales and marketing professionals. With Office 2000, Microsoft has enabled the next level of integration with Web technologies. All applications in Office 2000 can use HTML-compatible XML as a first class file format. Files saved in this format retain fidelity whether viewed using a Web browser, in the application, or on the printed page.
Support for XML enables the next level of value-added browsing for users. They can now browse to individual components of a document — sections or chapters — all with the same seamless interface. Similarly, metadata such as title and author represented in the XML metadata are automatically extracted, making submission and maintenance of content painless.
Microsoft also provides better integration with Web servers in Office 2000. Users can both save files to Web servers and organize their work using Web Folders. The invention's integration uses these same technologies and provide the next level in functionality. Using the Internet Engineering Task Force (lETF)'s Web Distributed Authoring and Versioning (WebDAV) protocol, the invention makes adding and maintaining new content, and managing folders and categories, all possible without leaving the familiar desktop environment.
Architecture
The discussion below (See Figure 1 ) primarily reflects the requirement that the invention be extensible. It is not possible to anticipate all the content types, content sources, and delivery mechanisms that are required. Furthermore, it is not entirely possible to anticipate the demands that future knowledge management tools and processes will place on the infrastructure. The discussion below concentrates on a modularization of the architecture such that significant extension is feasible without changes to large portions of the infrastructure. The architecture core 100 represents the common functionality required of any document management architecture. The I/O interfaces block contains the drivers for the different interfaces 101 which are used to access the system. The extensions block 102 contains those interfaces which provide value-added functionality, for example, search and navigation tools. Note that this architecture is not intended to constrain the space of possible solutions built from commercial tools.
Core
The architecture core represents the common functionality of any document management architecture. This core generally consists of a relational database management system (RDBMS) 103 upon which are built application-specific tools for document management. In an on-line architecture, also required are mechanisms to maintain session information such as consumer profiles and mechanisms for monitoring events of interest.
RDBMS
The heart of the system is an RDBMS which is responsible for storing all content data along with the metadata attributes which are used to organize the content. The RDBMS may also store the applications that are used to navigate the content. This depends on the characteristics of the chosen commercial components. Use of a commercial RDBMS potentially simplifies administrative and operations tasks significantly. The leading databases include integral support for on-line backups and mirroring. They are also potentially highly scalable, capable of using multiprocessors to reduce response time in a heavily loaded environment. The leading database vendors, Oracle, Informix, and Sybase, are candidates for an RDBMS.
Document Store A document store 104 adds to the basic features of an RDBMS that functionality which is specific to document management. These features include, at a minimum, authoring support for entering content into the store, mechanisms for fetching content from the store, mechanisms for revision control, mechanisms for specifying and enforcing access control, and audit tools for extracting information about the content store.
In the context of the invention, the two primary goals for the document store are high functionality and high extensibility.
Session Daemon
The most common Web model of navigation is stateless. No information is maintained between when a consumer receives a page and when he requests another. Advanced navigation mechanisms require significant state to be maintained to provide an intuitive and coherent set of choices to the consumer. To implement this functionality, an entity which maintains session state for active sessions is required. This session daemon 105 should also be capable of inter-operating with a user preference manager to communicate configuration information between sessions.
Event Daemon
Most interaction with the document store takes the form of a request to view an item content. The response may be as simple as fetching the document from the RDBMS and returning it to the user. However, other types of requests, for example, management requests to add, modify, or delete a document, require significant additional processing, for example, notifying a set of interested users that a document has been updated. In a few cases, for example, periodic report processing, events must fully implement this mechanism, a programmable event manager 106 is required to generate, propagate, and react to events. Input/Output Interfaces
Input/output interfaces 101 interface the architecture core to consumers and other entities in the outside world. Currently, http 107 serves as the primary method of accessing the system.
Delivery Platform
The delivery platform Fig. 5 is based on three components:
• Context maps 50 that define, categorize and organize the information;
• Information resource tools 51 to publish, manage and deliver the information easily; and
• Sales information channels 52 that support the delivery of information to multiple channels.
Context Maps
At the heart of the invention is a sales and marketing information space that is tailored to the needs of each subscribing company. The information space contains a range of unstructured content, e.g. documents, sales reports, presentations, case studies, and competitive analyses drawn from both internal and third party sources.
The invention uses context maps to organize this information space and facilitate access to it. The context maps are key to the user's ability to access precise and relevant information for each sales opportunity. The maps are a dynamic framework for sales and marketing information exchange. (See Figure 3, discussed above.) The context maps perform three functions:
• Categorize unstructured content in the terms familiar to sales and marketing professionals,
• Define the relationships between categories, between content and categories, and between context maps. The invention allows a variety of relationships, which gives customers great flexibility in how they organize information over time, and
• Search and navigate the categories and relationships to find information. Users can zoom in and zoom out of categories to quickly find the information needed.
Context maps organize and facilitate access to a wide range of unstructured information. The technology is a unique way of describing and classifying information that lets users access content the way they think about it. The technology also zeros in quickly on precisely the right item from thousands of items.
Information resource tools
The invention provides processes and tools to load, maintain, and use the information managed by the context maps. These include tools to aid in the extraction of information about the structure of new content and load it into the context maps. The invention comprises a content metadata extraction process for existing documents. Clients use these tools to load their marketing and sales content into the system, and then enhance and expand the metadata for that content over time. This process provides the basic information needed to populate the context maps and allows clients to continue to use existing document tools and file formats. The preferred embodiment of the invention uses Web metadata standards, placing it in alignment with XML, Internet Content Exchange (ICE), and Resource Description Framework (RDF). The invention also provides other tools and processes to help its customers build and maintain their sales information spaces as well.
The invention provides specialized documents that improve information capture and exchange for sales channels. One of these is a format for presenting quick bullet-point conclusions about an event or an issue. The intent of such documents is to help ensure that information is presented in the way that sales people think. An effective sales information exchange depends both on the quality and relevance of the information a system contains and on processes to support constant improvement and evolution of the information. The invention tracks the age, version, and usage of information in the sales information space, ensuring that information is current, correct, and relevant. These features tell sales management and marketing which information is producing results in the sales channel and which is not.
In addition, the invention incorporates collaboration features, making it easy for users to provide feedback to content authors through voting and direct comment. The invention also incorporates discussion threads about a particular document or topic. Sales teams can easily use this feature to set up information spaces that make information easy to share and discuss. The invention catalogs and tracks all of this feedback and discussion to preserve the full context surrounding issues and documents. Individual users drive the content delivered by sales information channels using information consumer tools designed to satisfy the wide variety of needs in the market. A sales wizard helps the sales pro decide how best to respond to a prospect or customer situation. The sales wizard asks the sales rep three questions - who is the competition? where are you in the sales cycle?, and what industry are you targeting? - and uses the answers to locate the best information available in the context maps. The front-end interface enables the user to easily search and navigate the context maps. Sales pros, channel partners, and prospects can either search the context maps or navigate through them. Both models allow users to obtain the precise information quickly and close business.
Sales Information Channels
Having established a dynamic sales information space and information resource to support that space, the invention next addresses the delivery of information relevant for direct sales reps, business partners, and ultimately prospects and customers themselves. The invention provides sales information channels tailored the specific needs of different audiences, individuals, and communities of interest.
First, the invention supports the delivery of information to multiple audiences from the single sales information space. This architecture makes it possible for companies to provide the latest sales information to channel partners while still protecting their internal systems. Information delivery semantics are also part of the context maps, providing a flexible mechanism for targeting sales information to audiences.
The invention also sets up, with the customer, security permissions to ensure that different audiences can see and access only the information relevant for their needs. Direct sales reps, for example, may have access to more information than channel partners and prospects. The result is customized information channels for several audiences driven from a common information space, an efficient approach to information delivery.
E-Service Model
The preferred embodiment of the invention is implemented by an e-services provider, delivering a comprehensive, Web-based application that is hosted, and therefore, virtually risk free for clients. Companies sign up for subscriptions to the invention, specifying the number of people who have access to the application via the Internet. Customers can use the solution without any additional technology infrastructure investments, and can buy as many subscriptions as they need over time to satisfy the demands of their sales channels and customers. Working with the sales information framework provided by the category maps, the e-services provider and the customer define a custom sales information space that the e-services provider then hosts for that customer.
The invention provides a startup process that begins generating value for sales channels within thirty days, and ensures continual expansion and improvement of the customer's sales information space afterward. This approach provides four benefits to clients: •
• Clients see results in thirty days or less,
• E-service subscriptions eliminate the risk of custom solutions,
• Customers control content and usage of the e-service, and
• Per-user subscriptions allow customers to pay only for the value they receive.
The invention redefines the way organizations can manage and access sales and marketing content created internally as well as external third party content. Context maps define a common vocabulary for sales and marketing information, as well as a flexible scheme for organizing access and provision of that information.
The invention provides a combination of categories that are optimized for information access and exchange and dynamic relationships yields an information space that reflects the real meaning of content, and uses that understanding of meaning to aid access and exchange. The information space is a controlled collection of information that can exist outside of the corporate firewalls. Companies can draw on the same base of information for their internal sales personnel and their channel partners without placing the primary data stored in customer relation management (CRM), help desk, and accounting systems at risk.
The invention provides a new category of solution that complements CRM, sales force automation (SFA), marketing encyclopedias, and other earlier- generation products. These systems are designed to manage data about, e.g. customers, accounts, opportunities, and demographic trends and to manage sales processes involving that data. The invention makes these data management products more useful to the field by categorizing their output of reports and other documents and hooking them to sales information channels.
The sales cycle may be thought of as comprising three segments: Pre-sales, Closing the Sale, and Post Sale. Current CRM and SFA solutions are designed to manage structured information, such as data records, critical to the selling process, including contacts, accounts, and opportunities. Organizations primarily use this information to manage leads, pipelines, and campaigns, as well as for forecasting and analysis of sales force performance. CRM and SFA systems have proven to be ineffective for the management of documents and other unstructured information -- the kind of information that is crucial to competitive selling and closing the sale. CRM systems provide marketing encyclopedias for this purpose, but these modules quickly fall victim to the issues that doom file systems to failure as the basis for sales and marketing information exchanges.
The invention provides precise and relevant information to close business, which is a phase of the sales cycle that CRM/SFA systems do not address. In addition, the invention enables close communication between the field and marketing, directly improving return on investment on marketing investments.
The invention addresses the sales information problem directly by managing the unstructured information that is crucial to closing deals. The invention is designed to draw information from CRM, SFA, marketing encyclopedia, and other sources into its context maps for the sole purpose of information exchange. Thus, the invention extends the value of investments in CRM, SFA, and other sales and marketing data management systems by providing them with an effective medium for exchange and distribution.
The preferred embodiment of the invention is an Internet hosted application, such that users do not see the technology behind the solution. Users see information access and collaboration tools, category maps, and results.
Nevertheless, the technology behind the application is substantial - and unique. The invention adds semantic analysis to Web information searching to improve the relevance of information searching and navigation. The invention provides a semantic Web for sales and marketing information which makes extensive use of document metadata, description logics, cases, categories, and related techniques to make searching operations much more precise than they are today, and to automate information exchange applications.
The invention is based on two underlying technologies: description logics and the category engine. The description logics technology classifies content using a set of categories and relationships about a particular domain. The category engine is server-based technology that enables high-performance querying of content.
Description Logics Description logics (DLs) classify elements for the purpose of reasoning about those elements. Description logics employ a common vocabulary to express the meaning, purpose, and relationships of elements and a small number of operations to reason about those elements. The context maps provides a shared vocabulary about sales and marketing. DL defines information categories, relationships between information, and operations. The invention supports a variety of relationships, including class-subclass, category membership, product-company, and competitive , relationships. The DL's operations address query, navigation, and exchange.
DLs are a subset of ontology technology, which has been used in knowledge management systems. However, the inventive DL is optimized for information exchange, rather than for a broader knowledge-representation and reasoning purpose. By constraining the scope of the domain to sales and marketing and purpose to information exchange, the invention can provide both good performance and flexibility. The design center for the solution was a system with a relatively small number of categories and a large number of instances. The operations that the DL supports are limited to information navigation and query.
Unlike most prior systems based on DLs specifically and ontology technology in general, the invention is not designed to define a broad range of knowledge and, through analysis, interpret and extrapolate from that base of knowledge. The invention uses DL technology for a narrower, more practical purpose: query and navigate a lot of information fast for the purposes of exchange.
Each set of context maps is uniquely tailored to an individual company. To support this customization, the context maps are implemented in a framework. Consider a context framework which employs four levels of categories for sales and marketing information. At the base are foundation concepts about digital information, at the next level are concepts about commerce activities. An Industry-Specific level contains categories used by specific industries, and a Customer-Specific level contains the customer's specific sales and marketing vocabulary. The categories build on one another, which reduces the number of categories required at the higher and more customer-specific levels.
Context Framework
The context framework is the basis for context maps. The framework defines all of the terms for a customer's context maps, including company-specific information. The framework design makes it practical for the invention to customize the context maps to individual customers by isolating changeable categories in the two high levels of the scheme (industry- and customer- specific.)
Category Engine
The context maps are navigated with a server called a category engine (See Figures 6 and 7), which runs at the e-service provider site. The category engine is the access and search front-end to the collection of metadata about the content under management by the application. The invention stores this metadata inside a relational DBMS 103. Actual content is stored in file systems, document management systems, content management systems, streaming servers, and other systems for managing unstructured data. The category engine handles the query, redirection, and routing operations.
CMA Core
At the core of the content management architecture lie four modules: the RDBMS, the document store, the session daemon, and the event daemon
(See Figure 1 ). These core modules implement the basic content management capabilities. Functionality provided by the CMA beyond that of the core is provided by extension modules.
RDBMS
The heart of the CMA is a relational database management system (RDBMS) which maintains the metadata, and — depending on implementation strategy — may also store content. Owing to the built-in support for backups and replication, the RDBMS is a convenient place to store configuration information and applications. The use of the term RDBMS in this report is not intended to exclude other applicable database technologies, but rather to distinguish this database from the document store.
Document Store Support
The RDBMS supports the chosen document store.
Platform
The RDBMS may run on any available platforms, e.g. HP PA-RISC or Intel x86 based servers.
Scalability
The RDBMS is scalable, allowing hardware to be added to provide acceptable performance as datasets and consumer base grow. Scalability support includes using multiprocessor servers.
Replication
The RDBMS allows the data to be replicated on other servers to meet performance and fault tolerance requirements. Mirror servers should perform incremental, low-latency update against a master. The RDBMS allows subsets of the data to be replicated. The selection of items to be included in the replicated subset is based on specified metadata conditions.
Administration
Administration is possible from any available platforms. Remote administration via a TCP/IP network is supported. Administration should not require the use of X-windows or other graphic interfaces.
Security
The RDBMS supports multiple levels of security, including user-based security. Data and meta-data supports security level specification and the RDBMS enforces security constraints based on these specifications. The RDBMS allows a nominal degree of functionality without requiring individual user identification.
Content
It is possible that the RDBMS will be required to store the bulk content of the system. This may be via Binary Large Objects (BLOBs) and/or extensible data types.
Attributes
The RDBMS stores attribute data associated with each item of content. Attribute types should include integers, fixed-length strings, dates, and sets of these basic types. The names and types of attribute data are configurable and extensible, at least on a per-document-type basis. The RDBMS provides an API for accepting metadata specifications from applications, such as keyword indexing programs. Tables
In addition to storing content, the RDBMS presents a complete RDBMS interface that can be used to store other data, for examplea solutions catalog.
Query Languages and APIs
The RDBMS supports a query language based on SQL. Support includes embedded SQL, a store procedure language, and a subroutine library interface. The interface allows applications to perform both data manipulation and administrative operations. The interface is secure and requires authentication from the application. The API supports C and an interpreted language, such as perl. Applications are executable on either the server or on a workstation or PC client connected to the server via a TCP/IP network.
Triggers
The RDBMS supports a trigger mechanism for executing code when certain events and conditions occur within the database.
HTTP Server Interface
The RDBMS is accessible from code linked into an HTTP server such as the Netscape Commerce Server. This includes support for multiple concurrent accesses.
Document Store
The document store adds features related to content management and is implemented as a layer on top of the RDBMS. Among the key features of the document store are revision control, access control, support for multiple renditions, and support for compound and structured documents. In some cases, the requirements of the document store may be fulfilled completely by the RDBMS. In most cases, the content store contributes a value-added interface to the underlying RDBMS functionality.
Renditions
Rendition refers to either the format, e.g. AmiPro/WordPro and HTML, or the language, e.g. English and Japanese, in which an item of content can be represented. The document store stores any type of rendition. It is possible to add new rendition types at any time. The document store manages multiple renditions of each document. Operations on the document store treat multiple renditions of the document as a single document, and not as separate individual documents. Rendition support includes manual, automatic, and on- demand rendition generation. The document store component does not include particular rendition converters. The document store supports an API flexible enough to support a wide range of rendition converters from multiple vendors. The content store preferably supports the following formats: HTML, SGML, AmiPro/WordPro, Freelance. 1 -2-3, PDF, Envoy, ASCII Text, an audio type, and a video type. The content supports the following languages: English and Japanese
Compound Documents
The document store allows multiple document elements to be grouped together and managed as a unit. All routine content operations, including but not limited to viewing, printing, and downloading are performed correctly using the same user interface for all simple and compound documents. Elements are included in more than one group. The extension API allows extensions to sequence through and individually access the elements of a compound document. The interface API allows compound documents to be delivered as a single unit. Compound document support is implemented via a proprietary interface, but must also include support for common compound document standards such as Microsoft's OLE. Compound document support includes support for documents where the individual document elements have differing document sensitivities.
Structured Documents
When structure, e.g. chapters and sections, is explicitly present in a document the document store preserves and uses this structure. An API is provided which allows extensions to sequence through and individually access the structure elements of a document. The document store supports at least the following structures: document parts, such as sections of an article and chapters of a manual; slides in a slide presentation; articles in a newsletter; and pages in print-ready material. The document store supports a mixture of structured and unstructured renditions for the same documents and should support renditions with different structures for the same document. Structured document support includes support for documents where the individual compo-nents have differing sensitivities.
Off-line Content
The content store allows authors to submit references to materials that are not available on-line. The off-line content type is capable of specifying content metadata, e.g. keywords, from the full item. The document store supports on- line ordering of the item via an extension.
Metadata
The content store stores metadata, such as document attributes, for each item of content. Attributes include information such as author and date of last change. Metadata is extensible. It is possible with minimal effort to add new metadata fields to the content store. Conversion Execution
The content store allows content to be filtered between extraction from the store and delivery to an interface application. Examples of filter extensions are hyperlink recognizers for HTML and search term highlighters for renditions supporting highlights.
Query Language
The document store provides a query language which allows selection of documents based on combinations of attribute values and extension data, e.g. a full-text search engine. The query language is compatible with SQL.
Revision Control
The document store maintains multiple revisions of each document. Operations on the document store treat multiple revisions of the document as a single document, unless specific indication is given to the contrary. Multiple revisions are not treated as separate individual documents. Revision handles revisions in both content and metadata. The content store determines the differences between revisions of a content item. Revision 'diffs' are viewable by consumers.
Workflow
Workflow requirements are small by document management standards, which generally call for a high level of collaborative authoring support. The content store provides workflow processes to support the production and authoring extensions. The document store workflow supports a production submission queue. The production submission queue accepts documents from authors and maintains them until the production group (SFC personnel and contractors) validate and approve the content. Content in the submission queue is not be visible to anyone outside of SFC.
Unreleased content
Unreleased content is content which has been approved by production personnel but which represents time sensitive material that must be released in synchronization with other events, e.g. a product roll out. Unreleased content is not viewable by general users. It should be viewable by SFC personnel, the content author, and other identified individuals. Content release should be configurable. For example, it is possible to release product information to the field before a product introduction date, while not releasing that information to channel partners.
Multimedia
The document store provides a means of synchronizing or cross-indexing the elements of a compound document to provide multimedia delivery.
Addressing
Each document in the store is associated with a unique ID which can be used to refer to the document. All renditions and revisions of a document must share a single ID. The ID for a document does not change when a document is revised or when the document store is reorganized.
Rendition Selection
When receiving a request, the content store receives information about the capabilities of the delivery channel. Using this information and user profiles, the content store automatically selects the most appropriate rendition to deliver to the consumer. Selection includes at least the following criteria: client hardware capabilities (graphics resolution, sound hardware); client software capabilities (installed viewers and applications); connection bandwidth; language preferences, content use (whether the request was for editing or viewing); and explicit indication by the consumer
User Profiles
The content store maintains user profiles. Profiles include a set of attributes of different types that are interpreted by the content store and extensions. The data store in a profile are updateable. It is possible to add new attribute types to profiles.
Security
The document store performs user authorization on all document accesses. A different set of authorized users is maintained for each document. Security flows through to all navigation and searching processes. Users who are not allowed access to a document are not presented with the title of the document during navigation or search. There is no indication that such documents exist. Users are placed in groups so that authorization is extended to an entire group. Operations on a document are only allowed when performed by users authorized to perform the operation on that document. Such operations include modification of the content and metadata, removal of the document, and modification of the authorization lists for the document.
Replication
All replication requirements for the RDBMS apply to the document store. When replicating subsets of content, culling of sensitive data is performed at the source. Sensitive data do not cross a firewall where one exists between the master and replicate server. Interface API
The content store provides an API used by interfaces to communicate with the content store. The API supports HTTP, SMTP, and fax, but also extensible to custom protocols, for example an audio telephone delivery mechanism. The interface API supports navigating content and administrative tasks. The interface API allows requests from one interface to result in content delivery via another interface, e.g. request via HTTP with delivery via e-mail.
Extension API
The document store supports an application programming interface that allows applications to perform data manipulation and administrative operations. The interface is secure and requires authentication from the application. The API supports C and an interpreted language, such as perl. Applications are executable on either the server or on a workstation or PC client connected to the server via a TCP/IP network. The extension API allows some extensions to play the role of content. They have document ID's, keywords, and are full-text searchable. Properties, such as document ID, are assigned by the content store. Properties, such as item contents, in the sense of full text search, are delegated to the extension for evaluation. When an extension produces results in a particular rendering, the content store provides the standard rendering conversion operations where required.
HTTP Server Interface
The content store is accessible from code linked into an HTTP server, such as, Netscape Commerce Server. This includes support for multiple concurrent accesses.
Session Daemon The session daemon maintains information about each session which interacts with the CMA. CMA clients are not required to maintain session state. Such state is maintained by the CMA core. The session daemon stores the state of a user while he navigates the content store. The session daemon links individual requests to session data using a magic cookie such as that provided by Netscape Navigator. It is possible that the session daemon is an integral part of one of the other components, for example the RDBMS or the content store.
Event Daemon
The event daemon allows actions to be bound to specific events, including new content submissions, feedback messages, change in content or metadata, administrative action, and time of day. Actions are specifiable through an API, allowing the development of new functions which respond to events. The following actions are supported: deliver a notification, alter metadata, begin an external process, and remove content. As with the session daemon, it is possible that the event daemon is implemented as a part of another component such as the RDBMS. Any implementation is as a sufficiently flexible API to enable the ease of event-driven extensions.
Interfaces
Interfaces translate users' requests into the low-level protocol of the CMA core and format responses for delivery to the user. The most common interface is HTTP, the protocol of the WWW. Other interfaces are electronic mail and fax. All interfaces must abide by a standard content store API.
Components of the standard interface are channel properties. Each interface communicates to the content store the properties of the communication channel they implement. Channel properties include connection bandwidth and latency. Viewer Capabilities
Each interface communicates to the content store the properties of the client used. These properties include multimedia capabilities (graphics, sound) and document format capabilities (AmiPro/WordPro, Envoy).
Batch Delivery
Interfaces delivers compound documents and collections of documents.
Alternate Delivery Interface
Selection request channels redirect response to another channel. For example, it is possible to make a request via e-mail and receive the content via fax.
Authentication and Privacy
Each interface provides a mechanism for performing authentication on the source of the request. Each interface provides an indication of the security of the channel with respect to issues such as eavesdropping and data interception. The difficulty of implementing privacy in e-mail and fax delivery requires that some content be excludable from these delivery mechanisms.
HTTP Interface
The HTTP interface is the primary interface to ESP. It provides each of the general requirements listed above.
The HTTP interface consists of two parts: the HTTP sever and the HTTP client. The HTTP server provides an API by which the HTTP interface is tightly integrated into the content store and RDBMS. The HTTP server responds quickly to requests. In particular, individual content requests should not require forking in response to each content request if this creates performance problems; and should not require new connections between the HTTP server and the content store, between the HTTP server and the database, or between the content store and the database if any of these operations limit performance.
The HTTP interface allows documents to be processed without requiring they be viewed first. In particular, documents and sets of documents are downloadable and/or printable without viewing them first. The HTTP interface supports multimedia types, including at least one audio type and one video type. These types may be supported by HTML browser pluggins. The client is externally specified to be Netscape Navigator 2.0 and the server to be Netscape Commerce Server or Netscape Communications Server.
E-mail Interface
The CMA provides an e-mail interface for users who have electronic mail capability, but do not have the TCP/IP connectivity required for HTTP access. The e-mail interface approximates the HTTP interface as closely as possible. Navigation through the e-mail interface takes place via forms, either textual or in a format suitable for processing by a client application, such as Lotus Forms. Delivery of content via e-mail uses MIME types to support compound and non-text content. The CMA provides delivery via e-mail, even if it does not accept requests via e-mail.
Fax Interface The CMA provides a fax interface for delivery and supports a fax-back interface (request by phone, delivery by fax). The CMA provides delivery via fax, even if it does not accept requests by phone. The fax-back interface approximates the HTTP interface as closely as possible.
Telephone Interface
An interface to audio content via telephone is required. This interface allows consumers to select audio renditions of content via telephone. This interface is used to access non-audio content if a text-to-speech rendition converter is acquired. The primary application for the telephone interface is to provide a value-added voice mail distribution mechanism. Combined with user profiling and other features of the system, it allows voice updates and urgent messages to be distributed worldwide with less overhead than individual voice mail implementations.
Extensions
Extensions are applications external to the CMA core which implement additional document management features. Extensions generally interact only with the core, not directly with each other. There is little impact on the remainder of the system when an extension is added, removed, or replaced. Extensions use the content store extension API to communicate with the CMA core.
Extensions can be broadly classified as content-like and non-content-like.
Content-like extensions appear to most components as a normal item of content but do not actually store content. They generally create content by accessing an external content source or by analyzing other data in the content store and RDBMS. Non-content-like extensions are generally administrative applications such an RDBMS management tool. They may not show up as normal content. The most significant aspect of content-like extensions is that they are treated as content by the document store.
Content extensions have keyword values and other content metadata. Attributes values such as keywords are determined by the extension and communicated through the extension API. Content-like extensions are indexed for keyword search and produce results in a standard rendition format. User access to individual extensions are controlled by the same access control mechanisms used for plain content.
Navigator
The navigator extension is the primary interface used by consumers to access the content store. It provides the functionality necessary to browse and search the content store. The navigator is constructed modularly. The selection mechanism is composable. It allows multiple selection modules to be combined. For example, the navigator allows a combination of metadata- based selection and keyword-based selection. Modules communicate with each other via the navigator session API. The metadata and search navigation modules are required. The navigator allows the related content module and other custom modules to be added at a later time.
Session API
The navigator defines an API that allows navigation modules to create, modify, delete, and examine the set of constraints that a consumer has selected during navigation. This API is used, for example, to communicate between the metadata navigation module and the search navigation module.
Metadata Navigation Module The navigator allows consumers to navigate the repository based on information represented in the metadata of the repository. For example, users can select only datasheet items or only items relating to a particular industry. Metadata navigation is incremental. For example, the user can select only datasheets, and then to narrow the list of datasheets to only those relating to a particular product line. The metadata schema of the content repository is not hard coded in the metadata navigation module. The module allows the specification of pertinent metadata fields via tables in the RDBMS.
Search Navigation Module
The navigator provides a full-text search component. This component communicates via an API to the search engine to select content which matches a consumer specification.
Related Content Navigation Module
Given a particular item of content, the related content module updates the session criteria to include content based on all or some of the attributes of the current document. Keyword indexes are used to find relevant relationships between documents. Documents are related if they are classified in similar ways.
Standard Rendition
The navigator generates a standard rendition, either HTML or a rendition which can be converted accurately and quickly to HTML.
Interface Support The navigator supports full functionality over an HTTP interface. It supports at least partial functionality over e-mail and fax-back interfaces. The verbosity and depth of navigational menus is adjusted according to interface type, display capabilities, and user preferences.
Consumer Profile Interface
The navigator provides a mechanism by which consumers indicate their preferences for navigation order, renditions, and searches. This profile is customizable on a session basis and on a permanent basis.
Search
A full text search extension provides a mechanism to index all content in the repository. It provides an API to the navigation module which allows queries to be composed and matching item references to be returned.
Query API
The search extension provides a query API that can take a keyword expression and create a list of references to matching content items.
Indexing API
The search extension provides an indexing API which includes references to dynamic content generated by CMA extensions. For example, navigational pages are included in keyword searches even though they are generated dynamically.
Taxonomic Specification The search extension provides for the specification of a concept taxonomy that can be created and extended to represent company terminology. The taxonomic specification supports multiple languages.
Structured Content
The search extension supports searching structured documents. It has a mechanism for identifying where in a structured document a hit occurs.
Scoring Mechanism
The search extension provides a mechanism for ranking hits. This mechanism tunes the scoring using system domain knowledge. The scoring mechanism, including tuning hooks, supports structured content. Users can influence scoring by indicating groups of items of high or low interest.
Renditions
The search extension supports multiple renditions. Multiple rendition structures are supported, e.g, structured references refer to sections within an HTML document and also to pages within a portable document.
Match Cuing
The search extension provides a mechanism for filtering content on-the-fly to provide match cuing. For example, a visual cue such as a font change is provided to indicate instances of a keyword match. Visual cuing is provided for as many renditions as possibly but must at least include HTML.
Internationalization The search extension differentiates rendition content by language and uses the appropriate processing for non-English content. The search module does not confuse English words with non-English words and visa versa.
Content Administration API
The search extension provides a content administration API to manage indexes when an item of content is added, removed, or updated within the content repository.
Index Processing
The search module allows indexes of subsets of documents to be created and combined.
Revision Processing
The search extension provides indexing of multiple revisions of an item of content but only returns outdated content when specifically requested.
Rendition Converters and Filters
Content is submitted to the system in one or more renditions. Rendition converters and filters are used to add value by either converting an item to a more suitable rendition or by annotating a rendition to represent additional information.
Rendition Converters
Some CMA tasks have a preferred rendition. For example, HTML is generally the preferred rendition for viewing documents via Netscape Navigator. In cases where the submitted rendition is not suitable for a given task, the CMA converts content from the submitted rendition to the desired rendition. All such conversions, as much as possible, maintain any formatting present in the original. The fundamental formats used by the CMA are HTML for online viewing and searching, a portable document format such as PDF or Envoy for page preview and printing, and TIFF for faxing
Accuracy and Programmability
It is not required that converters be 100% accurate, but they should provide a reasonable degree of accuracy. While not precisely defined, reasonable means that the result is more pleasing and useful to the consumer than the original rendition and that the result retains the meaning and semantic structure of the original. Converters are programmable so that their operation can be tuned for highly valued content such as the most frequently accessed content in the system.
SGML
Conversion to and from SGML is desirable but is not mandatory. Where a converter recognizes structure in a document, it is able to generate SGML representing that structure. Converters perform conversion between two formats using SGML as an intermediary if significant fidelity is not lost. For example, conversion from AmiPro/WordPro to HTML via SGML is acceptable if there is less loss of information between AmiPro/WordPro and SGML than there is between AmiPro/WordPro and HTML. However, conversion from AmiPro/WordPro to Envoy via SGML is generally not acceptable because Envoy expresses the visual characteristics of an AmiPro/WordPro document much more accurately than most SGML document type definitions.
x-to-HTML Conversion filters convert submitted or intermediate types to HTML. Any formatting instructions in the source file are mapped to the corresponding HTML markup, if available. Formatting features which are rendered appropriately in the resultant HTML include: section titles, paragraph breaks, and bulleted and enumerated lists. Sectioned documents are split into several HTML pages. Indexes and tables of contents in the source file are converted to hyperlinked lists in the produced HTML. Intra-document references are converted to hyperlinks. When plain text is converted to HTML, standard textual formatting conventions are recognized and converted to HTML markup, including the use of indentation or blank lines to indicate paragraph breaks; the use of underscores and backspaces, or a row of graphic characters beneath the text, to indicate emphasis or titles; the use of stars, dashes, and other graphic characters as bullets in lists; and a row of dashes or other graphic characters as a horizontal rule.
x-to-portable document format
A conversion path is provided from every submitted type to a portable document, format such as Envoy or PDF. Pages in the resulting format are scalable to both US letter and metric A4 paper sizes.
x-to-TIFF
A conversion path is provided from every submitted type to TIFF. The result of this conversion is similar in appearance to the portable document image but removes formatting features which are not appropriate for fax delivery, e.g. a gray background.
TIFF-to- x A conversion path from TIFF scanned images to other content types is required. This mechanism produces results of sufficient clarity to read. Optical character recognition is an option.
Rendition Annotators
Rendition annotators filter a document rendition, producing a new rendition of the document with one or more value-added features. The information used to guide annotation is generated from the document itself. For example, when recognizing URLs that do not have the necessary markup to make them live in HTML or PDF. More may come from an external source, for example, highlighting matches under the direction of the full text search extension.
Hyperlink Markup
The hyperlink markup extension scans renditions capable of specifying hyperlinks, such as HTML, and adds the markup to URLs if it does not exist. URL and other hyperlinks are specifiable by an extensible set of pattern matching rules.
Inter-content Hyperlink Markup
The inter-content markup extension annotates other types of hypermedia so that references to other content in the repository become hyperlinks to the associated content. The extension provides a mechanism for identifying potential links. This matching is customizable.
Knowledge Management Audit Annotator
An audit allows the addition of annotations to be added to all content to facilitate the collection of data regarding the value and classification of content and the overall operation of the CMA. An example of this class of extension is an HTML annotator that adds headers and forms to every page to allow the consumer to indicate the usefulness of the content returned.
External Content Sources
External content sources are extensions which bring content into the CMA. External content sources are integrated via one of two APIs. The pull API is used for external sources which only produce content when asked by a consumer. The push API is used by content sources that provide content streams which are fed directly into the repository. Extensions that provide access to content that has not been validated give a visual cue to this fact. The cue indicates the confidence of the source, e.g. high for a marketing research company, medium for unaudited intranet websites, and low for content retrieved from the Internet.
Extensions allow external content to be entered into the system but must also provide access without depositing new content into the content store. All external content sources generates valid and verifiable metadata.
Production Authoring Extension
This is the extension used by the authoring staff to enter new content into the repository. This extension is implemented as a native Windows application. The interface on this extension should be rich, even if this requires substantial training.
External Authoring Extension
An extension application is provided which allows content authors to submit content directly into the content store. Content so submitted is given provisional status until processed by knowledge management. The online submission process allows submission of document metadata along with the content. It supports compound documents.
Consumer Authoring Extension
An extension application is provided which allows consumers to create limited forms of content. All consumer authoring capabilities are completely accessible via HTTP. It is desirable to use an extended web client to perform consumer authoring.
Annotation Submission
An extension is provided to allow users to submit limited textual content related to existing content. Content so submitted is automatically entered into the content store. Author, date, keyword index, and related document metadata are automatically generated. Annotation authoring includes adding 'post-it' style notes to portable document formats such as Envoy or PDF.
Personal Collections
An extension is provided to allow users to build collections of documents for personal reference. The personal collection extension gives users a means to allow or disallow access to their collections by other users.
Plain Content
The consumer authoring extension allows users to create small documents, such as sales success stories. The critical aspect of this function is ease of use. A simple HTML authoring tool is integrated into a client, for example Netscape Navigator Gold.
Discussion Groups An extension implements threaded discussion groups similar to Usenet news groups. The mechanism stores discussion group submission as content in the repository and makes it fully accessible via navigator and search extensions. The discussion group interface has a well developed HTTP interface. It is also desirable that the extension allow participation via electronic mail.
WWW Extension
An extension is required that Web-crawls selected sites on the intranet, indexing the content found.
External WWW Extension
An extension is required that Web-crawls selected Internet Web sites, indexing the content found. External sites are classifiable. External web sites in the following are included: Customer Web Sites; Channel Partner Web Sites; and Competitor Web Sites.
Database API
A database API is provided which allows raw database tables to be maintained within the database. The API allows standard SQL database applications to be accessible via the system. The SQL is upwardly compatible with either standard SQL (SQL92 or later) or with the RDBMS vendor's SQL variant. The database API is compatible with the RDBMS vendors tools, including SQL interpreters, embedded SQL tools, and C language library bindings. The extension is flexible enough to provide the basis for a new version of the solutions catalog database browser, as well as a reference accounts database.
Pull API The pull API is used to communicate a query from the content repository, and to retrieve the results. Query parameters specify the content to be selected and the acceptable formats that can be returned. The extension returns the requested content in an acceptable rendition. Anticipated extension using the pull API are externally accessible databases such as the Order Fulfillment Initiative, and Dunn and Bradstreet reports.
Push API
The push API is used to communicate with content streams. Communication is instigated by the extension, not by the content store. The primary push API extensions are the authoring extensions used to submit content. While the product authoring extension may not use the push API because it is integrated with the content store, the push API extension is required to provide simplified submission by non-SFC personnel. Other push API extensions are Usenet news feeds, wire services such as NewsEdge, marketing research feeds from sources such as the Gartner Group and Dunn and Bradstreet, and electronic versions of trade publications.
Metadata Converters
External content sources may use metadata converters to extract metadata from content produced by content extensions. Metadata often occurs in the form of specially formatted headers which precede the content in newswire feeds, Usenet feeds, electronic mail messages, and reports delivered as text.
Validators
For each content rendition, an extension is provided which validates the format of the content. The rules applied by a validator are configurable. Each validator at least checks that the content is in the format in which the metadata claims it to be. Validators check that any intra-document or inter- document references are valid. Validators also check for conformance to standard syntax, especially when authoring tools allow generation of non- conforming documents, e.g. HTML. Validators also check for conformance to submission guidelines. Validators that can check format and links are required for HTML and possibly SGML. Validators that can validate links are required for the chosen portable document format, either Envoy or PDF. Validators that can at least determine rendition type are required for all other types, including AmiPro, Freelance, and ASCII text.
Clipping Agents
An extension is provided which automatically selects new or recently modified documents from the document store, composes a collection of new documents for each user, and delivers these collections to the users. In determining which content to select, the clipping filter considers the relevance of each document and the significance of the change to the document. The relevance of a particular document varies from user to user.
A user profile contains sets of attribute values and keywords to describe the user's interests. It may also contain explicit search criteria, unique document identifiers, and identifiers of documents for which similar documents have been requested. Content may carry an attribute which forces distribution to some or all users. A new document which matches the relevance criteria is always delivered to the user. Modified documents are only delivered if the modification significantly impacts the content.
One example of an insignificant modification is a visitor counter on a WWW page. The clipping filter includes an automatic means of excluding insignificant modifications.
Knowledge Management Extension The knowledge management extension is a set of tools used by a knowledge manager and content development specialists to collect and analyze data concerning items in the content store. The knowledge management extension provides a basic set of reports and provide an API or scripting language.
Usage Reports
A report generator is provided which summarizes system usage. User requests for all resources is included in the report, including access to the content store and all extensions. The usage report generator is configurable, allowing requests to be selected, sorted, and grouped according the following parameters: document metadata, including type, format, date, author, keyword classification; user profile data, including user type, language, and geographical location; and additional parameters used when invoking extensions, such as the scope of a navigator request.
Content Reports
A report generator is provided which characterizes the content in the CMA. This report generation capability is highly configurable, allowing results to be summarized by document metadata, including metadata extracted from the content.
Survey Reports
A report generator is provided which compiles survey results. The survey report generator is configurable, allowing results to be selected, sorted, and grouped by responses to survey questions and by the itemized request parameters.
Analysis API/Scripting Language Analysis tools in the form of an API or scripting language allow new reports to be generated from the state of the content store and the access patterns of consumers. The tools allow the generation of adhoc analyses when addressing any problems of accessibility and usability of content in the system.
Production Support Extension
The production support extension is used by production staff to maintain the validity and value of the content store, for example, by automatically removing obsolete items.
Audit Process
An audit process periodically scans the content store and identifies items requiring action. Actions supported are removal, notification of production personnel, and notification of item author. The audit process is configurable via an API or a scripting language. The programmable interface allows the selection of content based on metadata attributes and allows the action taken in response to a match to be programmable.
Feedback Collection and Reporting
A mechanism is provided for collecting feedback messages and generating reports from the feedback database. The feedback report generator is configurable, allowing messages to be selected, sorted, and grouped according to the following parameters, in addition to the parameters of the request which instigated the feedback, as itemized in status of the feedback message, severity of the issue, person responsible for addressing the issue, and date of feedback message. The feedback mechanism is capable of indicating or otherwise distinguishing item classes.
Author Issues
Author issues are issues related to the accuracy of an individual item. They involve the actual material in an item of content and are generally correctable only by the original content author.
Management Issues
Management issues are all other issues, including the operation of system servers and interfaces. The feedback mechanism allows for messages to be routed automatically based on the class of feedback and other metadata associated with individual content items. The feedback mechanism provides trouble issue tracking mechanisms. Feedback issues are maintained in a database and tracked throughout there lifetime, through creation, assignment to production personnel, analysis, and resolution. Issue tracking supports interaction with content provides, for example a Marcom division, but also maintains state within the system to allow progress by non-system personnel to be monitored.
Administrative Support Extension
This extension provides an interface that is used to manage the overall system. This interface is capable of starting, stopping, and backing up the repository. It is also used to add, delete, and update users. The bulk of this interface may be implemented via a custom application for PCs running Windows 3.1 1 or a workstation running HP-UX. Remote connections via TCP/IP are supported. A limited interface allows consumers to update a limited number of fields in their user authorization record, for example, their password. User Requirements
The primary users of the system are sales representatives whose information needs change across the phases of the sales process. Other users of the system include field sales specialists and the professional services organization. When the system is released to channel partners, it is the main place for them to get information about the company. The first section below outlines the sales process, which motivates a classification of the types of information that should be provided by the system. The remaining user requirements are presented in subsequent sections.
The Sales Process
The sales process is divided into five phases, each with its own information needs. This section describes each phase of the sales cycle and categorizes the types of content used during each phase. While most types do not place any unusual constraints on the architecture, influence system design and product selection. For example, the goal of being the single online information tool for the sales force requires that information originating outside of the company be incorporated, which impacts the content submission and acquisition processes. Specific requirements such as these are noted where pertinent. The relative importance of each requirement is determined by the value of the content it supports. Currently, the system is largely used for the same purposes for which hardcopy literature is used, i.e. to present the customer with information about company products.
Technical Details
This discussion of design (See Fig. 8) details provides an architectural model. Part of the invention comprises the relevant abstractions in the service bureau architecture. The primary abstraction is content 80, which goes beyond the traditional concept of documents to encompass a wide range of information, including databases and the results of running applications.
Uniform Content Model
Content is abstract information, divorced from its physical appearance or presentation. As an abstraction, content does not exist in its pure form. It is always manifest in some physical form when being stored or delivered. Associated with the abstraction are high level concepts such as rendition and structure.
The term data is used herein to refer to information in a more general sense, including the information which represents content for the purposes of storage and delivery. In contrast to content, data does not have the associated high level concepts. There is no concept of rendition for a database.
The architecture discussed herein revolves around content and relies on a uniform model for expressing content. The power of the uniform content model lies in the ability to leverage one code base to manipulate a wide range of content types. If content is not represented in a uniform manner, each function of the system must be reimplemented for each type of content.
The model affords uniform access to content across the system via a common API. Because the content model forms the infrastructure of the invention, it is difficult to implement incrementally. While it can be improved iteratively, flaws in the initial design mitigate any potential benefits. Design of a uniform content model requires that the range of content managed be characterized, noting commonalities and differences. This characterization also aids in the understanding of which elements are content and which are only data.
Content exhibits the following characteristics: it is identified by name, it is described by attributes, it bears relationships to other content, it has structure, it is updated by revisions, its distribution is governed by its sensitivity, and it is manifest in one or more renditions.
Names
Every element of content is identified by a unique and canonical name which is used to refer to the content.
Absolute Name
An absolute name references the same element of content regardless of where the name is used. Absolute names do not change. If an absolute name for an element of content such as a document is stored the same document can later be retrieved using the same name, regardless of the current state of the retrieving application or any changes to the attribution of the content which may have occurred.
For example, if the absolute name of a document is bookmarked, that document can always be retrieved, even if the document is moved to a different folder.
Contextual Name
The content referenced by a contextual name depends on other information, such as the state of a user's session. For example, if a user navigates to a product line and wishes to bookmark an overview document, the bookmark is made to the contextual name — the document role — not the particular document which is fulfilling that role at the current time.
Contextual names are particularly important in implementing applications which generate content dynamically, such as the solutions catalog database browser. Names are instrumental in storage management. Content is accessible only if its name is known, and inaccessible content need not be maintained by the system. This technique, known as garbage collection, is necessary to support a consistent content store in the presence of multiple references to data, which come about via user hot lists, mailing lists, and transient content. While the name model and the name manager that implements that model may appear inconsequential, inadequate initial consideration is potentially the most significant limiting factor of an extensible system.
Attributes
Content possesses various qualities which are expressed as attribute data. Attributes are intra-content metadata. Some attributes, such as expiration date, are fundamental to the operation of the system. This use of attributes is well-understood and implemented by most document management systems. Attributes can also be used to classify documents according to ontologies, which describe the ways in which content can be classified.
Ontologies also capture information about attribute values, for example, the fact that GSY is a division (a subset) of CSO. If available, the system uses such information to determine that the content authored by GSY is a subset of the content authored by CSO. This use of attributes to specify ontological metadata is not widely implemented or even well understood. Support for such usage is lacking in off-the-shelf products. Effective navigation of content relies on complete and accurate attribution. The greater the sophistication of an underlying ontology, the greater the potential for powerful navigation aids.
The design of an attribute system for a content management system is analogous to schema design for an RDBMS. Attribute design must be carried out prior to content migration. Subsequent modifications to the attribute system require content be reattributed, a problem akin to schema migration.
Relationships Two or more elements of content may bear a relationship to each other. Relationships are inter-content metadata. Relationships are a more powerful construct than attributes. Attributes can actually be implemented using relationships. For example, an author attribute of an item of content can be expressed as a relationship between the content and an object representing the author.
A number of simple relationships are well-understood and widely implemented. For example, the containment relationship of files in folders and the synonym relationship between words in a thesaurus. General inter-content relationships are not well understood. No support is available in existing document management systems.
Structure
Structure organizes and relates the data that comprises content. The structure inherent in content is its logical structure. Examples of this kind of structure are chapters in a book and rows in a table. Structure subdivides content into smaller elements of content, or sub-content.
Conceptually, sub-content is also content and has all the qualities of content. However, present-day content authoring and management tools treat the document as the smallest unit of content and introduce a disparity in functionality between documents and smaller elements of content. As a result, it is generally impossible to name, attribute, render, revise, or control access to sub-content.
Content is often grouped into collections to assist in content management and delivery. These collections are also structured content. For example, the set of documents which results from a search query is content. By extension, the invention in its entirety can be considered to be one element of content. It is critically important that the content model treat collections as content so that any operations defined for content are also applicable to collections. It is also very valuable that content operations be applicable to the smallest units of content, though this may prove impractical in some cases.
Revisions
Content is corrected or updated through revisions. Each revision is conceptually the same element of content. Users only see one revision of content at a given time. The revision mechanisms offered by document management systems provide greatest value to document authors. For a short period of time when content is being revised, multiple revisions exist. Authors and the production team must be aware of the new revisions of the content, while users still see the old content.
Sensitivity
The sensitivity of content determines the scope of its distribution. Sensitivity also impacts content generation. For example, when a search results list is generated, it must not contain references to any content whose sensitivity exceeds the user's authorization.
The sensitivity attribute of agents can be used to restrict access to system functionality. For example, the sensitivity of an application which allows the user to submit content can be set so that only users with author authorization may exercise this function.
Renditions
Any element of content is manifest in one or more formats, or renditions. There can be great diversity among multiple renditions of an item of content. The abstraction unifying the renditions is that they all convey the same meaning.
All renditions of an element of content share a common content name. Grouping all renditions of the same content regularizes names and avoids redundancy problems. If various renditions of the same content were available under different names, it becomes difficult to retrieve a different rendition of the same content. Moreover, search queries are likely to find the same information under many names, reducing search result quality.
Rendition is not a simple attribute, but a combination (the cross-product) of several attributes, including written language, file format, encoding, and media type. Rendition types bear relationships to each other through an ontology. For example, HTML and RTF are textual. They explicitly represent characters as distinct objects.
On the other hand, TIFF and GIF are raster formats. They only represent the pixels which compose an image of the document.
Rendition ontologies allow many rendition types to be. treated similarly, provided that they have a unifying quality represented in the ontology. For example, all textual renditions can be searched using full-text search. When submitted, content usually exists in a unique source rendition. This is true even if multiple renditions are submitted because one revision is usually used as a source from which the others are generated.
It is possible that the rendition designated as the source rendition changes when the content is updated, if for example, an AmiPro file is converted to Word for further editing, or if an English document is translated to Japanese and subsequently revised in the Japanese. Identification of source renditions is necessary to insure that dependent renditions are updated if the source rendition is revised. Conversion between renditions may be necessary for delivery and other operations performed on content. Rendition conversion may be fully automatic, machine-assisted, or fully manual. The use of an on- demand application such as Adobe Acrobat Distiller or Tumbleweed Publisher to convert Postscript into an electronic document format is an example of an automatic conversion. Generation of HTML from a word processor document is an example of a machine-assisted operation, because automatic conversion is imperfect and must be verified by a human. Translation between written languages is an example of a mostly manual conversion. Once converted, new renditions of a document may be cached by the system. This is essential for manual and machine-assisted conversions so that human effort is not lost. For automatic conversions, caching of the converted result represents a tradeoff between retrieval latency and storage requirements.
Generated renditions bear a dependence relationship to the source rendition. The architecture must maintain these dependences to determine when conversion must be repeated. For example, if an HTML rendition is generated from an AmiPro source file, changes to the AmiPro file require that the HTML file be regenerated from the AmiPro. Likewise, if an English document translated to Japanese is then revised in its Japanese form, the English document is out of date until the modifications are translated back into English.
The principal file formats required include a neutral format for operations on content, including indexing and annotation. SGML [(see Appendix D)] is the best choice of a neutral format; HTML for delivery of predominantly textual content; an electronic document format (PDF or Envoy) for delivery of print- quality content; TIFF for delivery to facsimile devices; AmiPro/WordPro for templates such as sales proposals; and Freelance for templates such as sales presentations.
Temporal Nature of Content Content is characterized by the duration of its lifetime and frequency of update.
Static Content
Static content changes infrequently and is maintained indefinitely by the architecture. Stored documents are static content. Newsletters and other serials are also static content. Each issue is new content and not a revision of the previous issue.
Transient Content
Transient content is re-created each time it is requested and is no longer available once it is delivered. Examples of transient content include a hit list resulting from a search and the result of a database query. Transient content is generated by applications. Each time the application is executed, a new agent (process or thread) is created and assigned a name which users may use to interact with that particular agent. Application instances appear to be content in that they have names and may respond to requests by creating content. They are not themselves content. Application programs, generally binary software, may be considered content, but this type of content is generally not exposed to users. Treatment of programs as content is beneficial primarily in administrative operations such as backup and replication.
Architectural Components
The integrated and evolutionary nature of the invention calls for a modular design. The provision of an electronic interface for the field sales force necessitates a system architecture which can integrate a diverse set of applications. A major aspect of the invention process is the isolation, classification, and specification of the required functional components. The product of that analysis is a set of functional components and a set of interfaces for communicating between those components.
In specifying functional components, both current and future needs have been considered. The resulting interfaces are sufficient to support a rich set of future enhancements. As discussed above, the functional components can be separated into four categories: core, services, channels, and agents.
The core comprises essential system functionality and serves as a hub for integrating other components.
Services are tightly integrated with the core but interoperate with the core through defined interfaces which provide interchangeability.
Channels are the means of exchanging requests and data between the core and users and other systems.
Agents are modular applications which extend the functionality of the core.
Core
The architecture's core (See Fig. 1 ) provides the system's fundamental capabilities and serves as an integration hub for other components. The fundamental capabilities provided by the core include content addressing (or naming), event scheduling, content caching, and session management. Because the core is a prerequisite for all other components, it is initially minimally functional, to facilitate expedited development, but it is also extensible so that inflexibility does not become a barrier to adding new features. The minimal core is difficult to implement in an incremental manner. A minimum critical functionality must be achieved before the core can serve basic requests. Trying to produce a design smaller than this minimum produces code that in the future is not sufficiently flexible. Name Manager
Each element of content has a unique absolute name which never changes and can be used throughout the system to reference the content. Storage for the data representing an element of content may be provided by any of a number of content management agents. For example, content may be stored in a local file system, a local database, or in a remote document management system. The name manager translates content names to physical content locations.
A namespace is a set of names, typically designated by a common name prefix. Each module which implements content storage registers its namespace with the name manager. For example, a document store interface might register all names beginning with docstore/. The name manager provides for communication between modules by routing messages, which are directed at individual items of named content, to the module implementing that particular content. The name manager may also interrogate a content cache to improve performance.
Garbage Collection
The most effective storage management policy for large highly-interconnected systems is garbage collection. Under this policy, data are removed only when they can be proven to be inaccessible. Data are accessible if they can be reached by following references — represented by inter-content relationships — starting from a set of known roots.
The roots of the garbage collector include data currently being operated on and data whose name is registered in a permanent namespace. Static content is accessible from a permanent namespace and is protected from the garbage collector as long as a reference to the content exists in the namespace. Deletion of static content occurs when its name binding is deleted from the namespace and no other references exist. By using garbage collection, the content expiration process can be modified to keep content referenced by a user's personnel folders from being removed even if the expiration period is exceeded. When, in the future, no users have links to this content, the item is automatically removed.
A similar modification is possible for cross-linked documents; all documents are maintained until all references have expired so that users do not encounter broken hyperlinks. Transient content is accessible from the current list of tasks being performed by the system. Once the task is completed, the content referenced by the task is no longer accessible. Transient content can be made static by binding it to a name in a permanent namespace.
The garbage collector is implemented as a background daemon process.
Event Scheduler
Event-driven programming is an effective means of providing modular communications between a large number of software components. In the event-driven paradigm, modules are able to send data to a large number of recipients without knowing their exact identities. Likewise, modules may receive data from many senders. Applications generate events when they perform operations affecting the state of the system at large.
Generally, such operations are performed through the architecture core, which generates an event as a side effect. For example, an event is generated every time content is modified. Certain applications must perform specific operations in response to events occurring elsewhere in the system. For example, a full-text search engine must update its indexes whenever any indexed content is modified. Applications can request notification from the event scheduler whenever events matching a specified pattern occur. The event scheduler is implemented as a daemon process, with possible assistance from the RDBMS in the form of triggers.
Content Cache
One unique aspect of the invention is its relationship to the content it manages. The system itself is not the definitive source for much of the content it provides to users. Instead, the system serves as a broker, distributing content gathered from various sources. The content storage capabilities at the core of the invention constitute a local subordinate store or cache. Because all content has a definitive source, the caching of the data representing that content is discretionary to a certain extent. There are, however, several practical reasons why some data must be cached:
• The source data are not available online, or the source does not meet the system reliability requirements,
• The data are the result of a rendition conversion which requires human intervention,
• The data are metadata not provided by the content source, but generated manually,
• The data are a manifestation of dynamically generated content, and must be stored until delivery, or
• The data are accessed frequently and remote access or regeneration would incur an unacceptable performance penalty.
Data Storage Content, when stored, is manifest as data in some particular rendition. Metadata, including attributes and relationships, are other types of data. The content cache maintains local copies of content by storing the data and metadata which represent the content. The invention uses an RDBMS to implement data storage for the content cache. Certain types of data, full-text indexes in particular, are stored separately in databases designed specifically for that type of data. Large data elements, such as documents, may be stored outside the database for reasons of efficiency.
Dependence Checking
The content cache maintains the validity of the data it contains by maintaining a set of dependence relationships between data elements and regularly checking dependences. This is used, for example, to ensure that when a source rendition is changed, all automatically generated renditions are regenerated. Each cache entry bears a dependence relationship to other data, either within the cache, or external to the core.
Ultimately, all dependences are rooted outside the cache. When the source of a dependence relationship is modified, dependences are checked to ensure that all dependent data are updated, invalidated, or deleted. The exact action depends on the data source, the demand for the data, and the nature of the dependence. Dependence checking can be implemented in an event-driven manner, using the event scheduler.
Session Manager
Many applications are session-based and maintain the current state of the session. For example, a navigator maintains the set of current navigation parameters entered by the user. The session manager provides a consistent mechanism for managing the state of a user's session. A central session manager ensures that the session state is always recoverable in the case of unintentional session termination, and provides a means for maintaining bookmarks and history lists.
While a session may involve the interaction of many applications, from a user's perspective, the session state is the union of the session states of all applications involved in the session. The session manager provides an interface by which clients can request state to be saved. It also brokers information about individual users and sessions, including log in, log out, and session splitting. The session manager uses an RDBMS as a persistent object store.
Services
Services are modular components which extend the basic functionality of the core. They generally operate at the data level rather than the content level and, unlike agents, do not have instance names. The service-level interface to the core is intended to facilitate tight integration of third-party software. A service is automatically invoked when the particular core functionality it provides is required. They are generally not invoked though a direct request. While the service interface is designed to be extensible to new classes of services, several classes have been identified as immediately valuable: rendition conversion, automatic annotation, and metadata extraction.
Rendition Converters
Rendition converters translate content from one rendition to another. The rendition conversion service architecture facilitates integration of rendition conversion software supplied by various third-party vendors. All rendition conversions should be content-preserving, i.e. all information, both textual and visual, should be maintained by the conversion process. However, automatic conversions are rarely perfect in this regard. Some conversions, AmiPro to text, for example, are imperfect because the resulting format is not capable of expressing all the information in the source format. Other conversions, AmiPro to HTML, for example, are imperfect because of inadequacies in the conversion software or because of the intrinsic difficulty of a particular conversion.
Each rendition conversion made available to the system is assigned a fidelity attribute, which is a measure of the ability of the conversion process to faithfully reproduce the content. When a particular rendition conversion is called for, the invention uses the conversion or series of conversions which yields the highest conversion fidelity.
A simple approach to representing fidelity such as that used by Documentum is straightforward to implement. In general, fidelity is a multidimensional value. Individual components of the value are generally partially ordered. Representing and manipulating multidimensional fidelity is very useful in consistently, automatically presenting the user with the highest fidelity rendition for each request.
From native word processor formats
The primary source of rendition converters for word processor (WP) formats is the Mastersoft Word for Word package, now marketed by Adobe as File Utilities. In addition to converting between WP formats, the Mastersoft filters convert between WP formats and HTML, and between WP formats and neutral formats such as RTF. Mastersoft filters can convert one line drawing format to another line drawing format and can convert one raster format to another raster format, but they cannot convert from a line drawing format to a raster format. The Mastersoft filters are available for HP-UX as well as Windows platforms.
to/from SGML Use of SGML as a neutral rendition requires faithful conversion of all formats to and from SGML.
to HTML
A number of HTML converters are currently being marketed. Unfortunately none is sufficiently robust for use in wholesale automatic conversion. All HTML converters, with the exception of the Mastersoft filter, are only available on Windows platforms, to PDF/Envoy Adobe and Tumbleweed both offer Windows printer drivers to generate output in their respective electronic document formats. Adobe offers Distiller and Tumbleweed offers Publisher, which convert Postscript to PDF and Envoy respectively. The Adobe products are available both on HP-UX and Windows platforms. The Tumbleweed products are available on Windows platforms only.
to TIFF
There are several products available which can produce TIFF, the code used in faxes. Some versions of Windows include printer drivers which generate TIFF images. Ghostscript, a freeware/commercial product, can produce TIFF from Postscript. There are other commercial products available.
Automatic Annotators
Annotators add value to content by gathering other data and interpolating it into the content. For example, when displaying an item of content identified using a full-text search, it is often valuable to highlight the search words in the body of the displayed document. Most annotators are rendition-specific. They can only annotate a single format, such as HTML or PDF. Theoretically, annotators for a neutral document format would be most effective. A single annotator for a neutral format can be leveraged to provide annotation for a wide range of source formats. However, this assumes there exist high-fidelity rendition conversions to and from the neutral format which is currently not the case. Due to their relationship to metadata and therefore the attribute schema, most annotators are custom applications or at least custom interfaces to standard applications. High-level languages designed for manipulating text or structured text (SGML) may be useful for rapid prototyping and development of automatic annotators. Useful automatic annotators include: Find URLs embedded in textual material and convert them to hyperlinks; For each hyperlink in the content, indicate the relevance of the link target to the current search state; Indicate the security level of the document; available from document metadata; through a header annotation or background; Insert navigational aids, including links to related sub-content; Warn about potential technical or content-related problems which have been re-ported through feedback or audit procedures; and Highlight words which match the current search criteria.
Metadata Extractors
In some cases, content imported into the system contains metadata intermixed with content data. Metadata extraction tools 81 (See Fig. 8) are required which recognize metadata in content and generate metadata in compliance with the metadata schema. An example of intermingled metadata is the city, date, and wire service name which often begins the first paragraph of a newswire story. If such data are not identified as metadata, the effectiveness of metadata-based navigation is compromised.
Where metadata is explicitly denoted in source data, accurate metadata extraction tools can be used. Examples of source data which contain explicit metadata are textual forms with labeled fields and HTML files with META tags. In other situations, metadata is not explicitly denoted but it may be inferred from the entire content. For example, a document which mentions competitors frequently is likely to be competitive information. Extraction of such implicit metadata can not be performed deterministically, and therefore its accuracy is questionable. In such cases, the extracted metadata should be analyzed and confirmed by a knowledge manager (See Fig. 9).
In the previous example, in most cases it would be accurate to infer that the document contains competitive information. However, the same company might also be a customer or a channel partner. To clarify the relationship, other data might be consulted. For example, the industry code associated with the document may be helpful, if the company is a competitor in some markets and a customer in others.
Alternately, the frequent mention of certain product names in the text may indicate that a competitive relationship is involved. Development of automatic attribution rules is a difficult but highly valuable endeavor. Without such rules, the bandwidth of the content import process is constrained by reviewer resources. Because extracted metadata must conform to the ESP 3 schema, some custom coding is required. High-level programming tools designed for text manipulation may be useful for rapid prototyping and development of explicit metadata extractors. Rule-based programming systems may be applicable for implicit metadata extraction.
Channels
Channels are the mechanisms by which requests are accepted and content delivered by the system A channel instance is created when a physical connection is established with an external source. For example, an instance of the HTTP channel is generated when the HTTP server receives a request. A channel instance may also be created by the system to initiate a new communication, such as to deliver a fax machine. Content must be delivered to a specific channel instance, not to the HTTP channel. This requirement follows from the fact that only instances of channels have concrete characteristics. For example, there is no concept of the HTTP channel bandwidth, but for a channel instance representing an individual interaction between an HTTP server and a Web browser, the bandwidth of the channel is well-defined.
The capabilities provided by a channel determine the range of content it may deliver and the relative preference of available renditions.
Media
The ability of the channel to support various media formats influences rendition selection and limits availability of some content. A channel may support any number of media formats to a variable degree. The media capabilities for a channel are expressed as the fidelity of the channel when delivering each media format.
Security
The security of the channel dictates the security level of all system operations performed while responding to a request from or delivering content to a channel. To be considered secure, a channel must provide private communication and user authentication. Privacy may be maintained by session encryption algorithms such as RC4. Authentication may be based on passwords or RSA public-key certificates.
Bandwidth The bandwidth of the channel influences rendition selection. Large graphical renditions are not appropriate for delivery over channels with limited bandwidth.
Latency
The latency of the channel influences the behavior of agents which converse over the channel. Over high-latency channels, the system attempts to deliver more content per communication to reduce the number of round-trip delays incurred. Where latency is not an issue, the use of smaller chunks of information is more ergonomic.
Channel capabilities vary between instances of a channel. For example, the media capabilities of the HTTP channel depend on the capabilities of the HTTP client which initiated the communication, which in turn depend on the set of plug-in media viewers which have been installed. The capabilities of another instance of the HTTP channel, initiated by a different HTTP client, may be markedly different. The primary channel is HTTP. Facsimile support is also desirable, especially for delivery. Support for electronic mail interaction is useful as a mechanism for users to send content to non-users. It is also potentially useful for channel partner access. Telephone and pager interfaces are also useful because they provide a uniform mechanism for reaching the worldwide field, cementing the use of the invention as the single electronic interface.
Channels communicate with the system through a channel API, which allows new channels to be added to the system at a later date. Some degree of early consideration of channels that may not be implemented immediately is useful in determining that the channel interface is sufficiently expressive.
HTTP The HTTP channel provides communication with Web clients, such as Netscape Navigator. The HTTP channel requires the use of an HTTP server to receive requests and deliver responses according to the HTTP protocol. Netscape Enterprise Server is the recommended server software for secure transactions over the World Wide Web. Access from Netscape Enterprise Server to the system can be implemented efficiently by binding the interface code into the server using Netscape's NSAPI protocol. The complexity of such interface code should be minimized to reduce the potential for adverse impact on the reliability of the HTTP server.
Electronic Mail
The electronic mail channel provides communication with electronic mail clients such as CC:MAIL. This channel makes the system accessible to users who have e-mail accounts but no direct network access. The e-mail channel is also useful for asynchronous delivery, such as notifications. The e-mail channel has capabilities similar to those of the HTTP channel, but generally with a significantly higher latency. Realization of all potential capabilities depends on vendors' adoption of the following standards for delivery, encapsulation, security, and forms submission: SMTP the Internet protocol for mail delivery; MIME a protocol for encapsulating one or more files of various formats in a single mail message; RSA public keys a series of cryptographic algorithms which provide privacy and authentication; Form Interface A form interface, such as Lotus Forms, to allow forms to be filled out and submitted using a mail client Implementation of simple delivery of content in a single-file rendition such as PDF or Envoy is possible with less effort.
Telephone
A voice telephone may be used to request content through menu navigation. This mechanism is most effective for finding content within a limited domain, such as a user's personal folder, or for finding specific documents, given a document identifier. Delivery of content via telephone is possible for content available as audio data.
Pager
Brief urgent messages, such as notifications, may be delivered via pager. This also requires a notification agent to implement the notification selection.
Agents
The remaining functionality required for the system is provided by a collection of applications, called agents, which interact with the core as automated clients. Agent processes, or instances, may be started when the system is initialized or when certain events occur, for example, initiation of a new user session. It is possible that multiple instances of the same agent are active simultaneously. For example, users are interacting with different instances of the navigation agent. Each instance of an agent has a unique address or name, used by the core to route requests to the agent. Once started, agent instances may be available to accept requests. They continue to service requests until explicitly terminated.
Some agents present a human interface, either to users or administrative personnel. These agents present a persistent session interface, in which case they use services of the session interface
In the following exposition, agents are categorized as user agents, administrative agents, and system agents. This classification has little bearing on implementation. All agents are treated uniformly by the core.
User Agents User agents maintain an ongoing dialog with a user and interact with the system on that user's behalf. User agents are created when a user begins a new session or requests content from a namespace registered to an agent. Each instance of a user agent serves only one user session. Only requests generated on behalf of that user session are accepted by the agent instance. All operations performed by the agent instance carry the access permissions of the user.
Navigation
The navigation agent maintains an ongoing session with a user, directing him toward relevant content.
Submission
The submission agent manages manual submission of content to the system. The user, typically an author or other content provider, is presented with a series of forms. The values of various attributes can be specified, and the content data can be submitted directly or by reference to an online location. The submission agent also allows content to be composed from existing content. For example, an author may compose an info kit from a set of product specifications and collateral literature. Submitted content is first approved and processed by the production staff before it is made accessible to the general audience. Authors are able to view content they have submitted prior to its release to the general audience.
The submission agent also allows authors to query the production status of content they have submitted. The submission agent provides a user-friendly interface to the submission management agent which, in coordination with the production agent, actually enters content into the database.
User Authoring Users are given a limited degree of authoring capability. These capabilities are implemented by the user authoring agent. User-authored content may include contributions to discussion groups; bookmarks and other personal collections; annotations attached to content; and simple notes.
By default, content authored by users is not accessible to other users. Users may extend to other users access to portions of their personal content. In most cases, it may be desirable to ascribe very low importance or relevance to user-authored content, so that such content rarely appears in search results.
In certain domains, such as discussion groups, where the primary intent is to distribute information to other users, it is desirable to ascribe greater importance to user-authored content.
Portals
Portal agents provide access to content which does not adhere to the attribution schema and cannot be fitted to the attribution schema without significant loss of information. Because it does not adhere to the attribution schema, such content is not accessible directly through the navigation agent. Content which does adhere or can be fitted to the attribution schema can be made fully accessible at the system level.
Portals facilitate a variable degree of integration between the system and non- system content managers. All communication between the portal with the core occurs via a portal API. The level of integration of a particular portal is determined by the portal agent's implementation of the access, search, navigation, and backup portal interfaces. The system may contain links to non systems, such as external Web sites. Integration through a portal agent offers several advantages over a hyperlink: Portal agents can cause their content to be indexed by the system, which allows users to find references to the portal agent from the standard system navigator; Content generated by a portal agent is amenable to rendition conversion and an-notation, and may be delivered by various channels; Navigation of content through a portal agent is integrated with the standard system navigation agent and uses the same bookmark and history mechanisms. The system provides standard portals for accessing WWW sites and Usenet news-groups.
User Preferences
Each user's profile, maintained by the system, includes a set of user preferences which specify the manner in which the system communicates with the user, including appearance and verbosity. Users may modify the user preference portions of their user profiles through the user preferences agent. User profiles also include a list of clipping requests, expressed as event descriptions. These event descriptions are automatically registered with the notification agent which detects content which matches the descriptions. Any such content is linked by the notification agent into the user's private collection.
Feedback
Feedback may be managed through various means, ranging from a simple electronic mailbox to a customer support database. Users report problems and submit comments regarding the system to the feedback system via the feedback agent.
Status reports
It should handle distinguishing production problems, such as missing files, from authoring problems, such as inaccuracies in the content. It should also provide generic feedback mechanisms, e.g. "I would really like to see a summary of the features in product X." Advanced feedback can be implemented via a custom agent but is probably better implemented through a commercial product. In either case, the interface should be implemented via a user agent to present a uniform interface to users.
Administrative Agents
Administrative agents are similar to user agents except that they are only made available to system administrators. This exclusion can be trivially implemented via sensitivity attributes on the agents.
Production
The production agent implements a workflow which prepares submitted content for distribution to the system audience.
Each element of content is tracked through several stages of preparation:
Approval
The content is examined by a knowledge manager to verify that it is relevant to the system audience and that the same content does not already exist in ESP.
Replication identification can be aided by searching for other content with identical or almost identical extracted metadata.
Attribution
The content is characterized and attributed by a knowledge manager, starting from the candidate attribute values supplied by the author. Audit
All submitted content is checked for conformance to the schema and other criteria before publication.
Rendition Conversion
Manual rendition conversions are performed by the production staff. The results of semi-automatic conversions are reviewed and corrected as necessary. The production agent also provides functions for modifying existing content and metadata and for deleting items from the database. The production agent provides a user-friendly interface to content management agents which actually manage content and associated metadata.
In its primary role of submitted content review, the production agent coordinates with the submission management agent which receives data to made into content from the submission agent.
Reporting and Analysis
The reporting and analysis agent allows administrative users to generate reports and graphs fromthe system logs. Systemlogs are created by a system agent. Several types of reports can be generated, including performance reports
Using data in the performance logs, the performance of the system can be analyzed and related to time of day, geography, channel, and content source.
Knowledge Reports Using data in the knowledge logs, knowledge management issues can be analyzed, including content demand, missing content, and content mis- attribution.
Access
Using data in the access logs, reports can be generated which indicate the number of active users or concurrent users, related to user profile and geography.
Security
Using data in the security logs, reports can be generated which indicate possible security concerns, including multiple concurrent sessions by the same user, denied requests for sensitive content, and frequent unsuccessful authentication attempts.
Account Management
User profiles are managed by administrative staff via an account management agent. This agent provides a means of assigning passwords to users and associating RSA public keys with users. Users may also be assigned to access control groups.
System Agents
System agents are anonymous automated clients which implement advanced system features. System agents are not associated with a user session. Instances of system agents are created when the system server is started and are not terminated until the server is shut down. Each instance operates as a particular user. Content Management
All content in the system is made available through a content management agent. Content management agents transform external data into internal content. In addition to providing a conduit for content to enter the system, content management agents communicate feedback and other change requests back to the content source. There is a content agent for each class of content source. By applying feedback and change requests to individual content managers, source-specific behavior can be obtained.
For example, while feedback on submitted content may be e-mailed to the content author, feedback on newswire content must generally be handled in a different manner. Similarly, modification of certain metadata on newswire content may contractually prohibited. All content must conform to the attribution schema. Content management agents are responsible for translating or extracting and auditing the metadata required by the schema. By virtue of its compliance with the ESP schema, internal content is browsable through the standard navigation agent.
If the content source and interchange mechanisms are reliable, content may be made immediately accessible to the general system audience. In this case, automatic content auditing is required to ensure that all content conforms to the schema and other acceptance criteria. Examples of this type of content management agent are newswire agents and external web site crawlers for trade magazines.
If the content source or interchange mechanisms are unreliable, content may be placed in a production queue. Such content must be reviewed via the production agent before final release into the system. The primary agent in this class is the submission management agent. The submission management agent internally maintains a queue which accepts data from the submission agent and holds it until the content has been reviewed and approved via the production agent.
Content management agents communicate with the core via a standard interface, which ensures uniform treatment by other components in the system. This interface must be sufficiently flexible to support a wide range of potential content sources, including document stores, newswires, feeds from market analysts, Web sites, and Usenet newsgroups.
Export content can be extracted from the system by export agents for the purpose of generating other views of the content. For example, the production of CD-ROMs containing content is implemented as an export agent which programmatically walks content and generates an indexed hierarchical file structure suitable for offline browsing.
A key to enabling export functions is the treatment of the results of navigation as items of content. An HTML exporter can be created modularly by having an agent create a navigator session and then communicate with the navigator to walk the set of content of interest. The navigator sends the content representing the navigation pages back to the exporter, rather than to an HTTP channel. The exporter makes small changes to the content and then writes it to a CD prototype file system.
Notification
The system notifies users when certain events occur, such as modification to a particular file. The notification agent monitors system events through the event scheduler and generates notification messages when user-specified criteria are met. Notification messages are either delivered via channel, such as e-mail, or linked in to the user's personal content namespace.
Logging The logging agent monitors system events through the event scheduler and writes messages to a log database. Each instance of the logging agent can be configured to monitor specific events, so that different types of logs can be created.
Performance Log: a log of the response time for each user access;
Knowledge Log: a log of accesses to content, including search requests;
Access; a log of successful user authentication requests, or logins;
Security: a log of successful and unsuccessful user authentication requests and denials of service;
The logging agent is additionally responsible for importing logs from various other components of the system including HTTP and RDBMS logs.
Auditing
Auditing agents check the validity of the content according to a specified set of rules. The auditing agent is invoked by other agents submitting content. Auditing is a stage in the production process and is performed automatically in conjunction with other content managers. The auditing agent may also scan existing content to check rules which cannot be tested at submission. One example is the rule which invalidates out-of-date content.
Interfaces (APIs)
A key element of the design of the system is the design of the interfaces which allow the system components to interoperate. To support current and future needs adequately, care must be taken to develop interfaces that are flexible and adaptable.
Content Interface
The content interface provides a uniform mechanism for accessing content. The interface is implemented by objects created by content management agents (repository content) and agents (dynamic content).
While the content interface is extensible, several fundamental aspects of the content interface; e.g. retrieval, storage, traversal, and searching must be implemented to provide base level functionality.
Retrieval
All objects which provide content implement an interface which provides access to attribute and rendition data. The retrieval interface provides a facility for determining the availability and fidelity of renditions.
Storage
All objects which store content implement an interface which allows data values to be assigned to attributes and renditions. Assignments can be used to modify the metadata or renditions associated with a particular content name. Assigning to a source rendition and assigning to a derived rendition are distinguished. The former is an update of the content which causes other renditions to be out-of-date, while the latter is the result of a manual rendition conversion.
Traversal All objects which provide structured content or collections of content implement a traversal interface which provides access to the sub-content. Through the traversal interface, clients can retrieve a list of the names of each contained element of content. Each name so retrieved can then be used to operate on the sub-content. Structure can be walked in a hierarchical manner by recursive application of the traversal interface.
Searching
By default, content is indexed for searching by traversal and retrieval of a textual — and preferably neutral — rendition. Content objects which wish to override the default search indexing may override the searching interface. Through the searching interface, objects may return search indexes, each of which maps terms to content names. A standard protocol that may be applicable in the definition of this interface is WAIS/Z39.50.
Delivery Interface
The delivery interface is implemented by channels and other components which accept content. There are two aspects the interface: access to channel capability information and delivery.
Channel capabilities may impact the behavior of applications which deliver content to the channel. For example, the range of renditions that are acceptable to a channel determines which rendition is chosen to manifest content retrieved from a content store. Other channel capabilities include security, bandwidth, and latency.
Channels also implement a delivery function, which transfers metadata and content to a recipient through the channel. The content must be provided in a rendition accepted by the channel. A channel may use content attributes in its operation. For example, an HTTP channel transmits expiration information to a web client if available.
Conversion Interface
Components which provide conversion capabilities, including rendition conversion, metadata extraction, and annotation, implement the conversion interface. This interface is similar to the delivery interface. Both interfaces accept content as input. Instead of a delivery function, the conversion interface specifies a conversion function, which generates new content from the input metadata and content. The conversion function returns the name of a newly generated item of content which represents the results of the conversion.
Event Interface
The event interface provides system components the capability of generating events and receiving notification when selected events occur. The event scheduler provides functions that agents may invoke to generate a new event or to request notification of an event. Each event has a classification which describes the occurrence which caused the event to be generated.
Events may contain additional information in the form of parameters. The exact parameters depends on the classification of the event. When requesting notifications, agents specify event selection criteria in terms of event classification and parameters. Agents which request to be notified of certain events also implement a notification function which is invoked by the event scheduler when an event matching the selection criteria occurs. The classification and parameters for the event which triggered the notification are passed to the agent as parameters to the notification function.
Session Interface The session interface provides a uniform means by which all agents maintain state between accesses. A typical user session involves multiple agents. The session state is the union of the states of all agents. Centralized maintenance of the combined session state allows the implementation of user session features such as bookmarks, histories, and session recovery.
The session interface is implemented by agents which carry state, and requires that two methods be defined: one for saving or checkpointing the state, and one to restore a saved state.
The session manager invokes an agent's session interface to save whatever state information is necessary to return the agent from an unknown state to the current state. This information can be maintained by the session manager to implement bookmarks, histories, and session recovery by requesting that the agent reload an earlier state.
Portal Interface
Agents which provide portals to content do so by implementing various aspects of the^portal interface. This interface allows portal agents to define custom methods for access, search, navigation, and backup.
Access A portal may deliver content online or offline. For example, a portal providing access to hardcopy literature may accept requests online, but only deliver content offline through the mail. An interactive portal, such as a stock quote service, delivers content online.
Search
A portal may implement the searchable content API, enabling portions of its content to be searched and allowing navigational agents to present the user with references to the portal when appropriate. For example, the solutions catalog could search -enable the product descriptions it contains so that the standard navigation techniques present the user with links into the solutions catalog when appropriate.
Navigation
The standard navigation agent may not be a convenient interface for certain types of structured data. For example, users expect to be able to browse an events database using the familiar calendar paradigm. In such circumstances, a portal may implement a custom navigation agent.
Backup
Backup of content accessed by a portal may be the responsibility of another system which manages the data, or may be implemented through the system via the portal API.
Content Management Interface
Content management agents are the sources of information for. The content management interface provides a mechanism by which the content source may influence the behavior of the content it provides. A content management agent may override several aspects of content behavior, including retrieval, modification, content creation, and feedback.
Retrieval
The content manager specifies the manner by which content is retrieved. Most content is cached by the core. Content managers may specify caching parameters or prohibit caching altogether, requiring that the system contact the content manager directly for every access to the content. Modification
Content managers must maintain the consistency of data between the system and the content source. To this end, content managers may control modification of the content from the system. For example, modifications originating from the system may be reflected back to the source, or they may generate a feedback message to the content provider. For some sources, modification of the content is disallowed under contractual agreement.
Content Creation
A new element of content is created when content is assigned to a name which previously had no content associated with it. Agents which store content provide a common means of creating new names within their registered namespace.
Feedback
Feedback messages which relate to content must be routed to the individual or organization responsible for such content. The content management interface allows agents which supply content to specify the means by which to deliver feedback on that content.
Information Routing
Information routing systems leverage content metadata and user profiles to deliver relevant information to interested parties. Information routing is a pushed-content model, in which providers publish content to subscription lists. Pushed content is convenient for content providers because it gives them a direct channel to their audience. Users, however, are at the mercy of the content providers. The result is that users either ignore most of the pushed content because they don't have time to determine its usefulness, or they save all pushed content, so that they have it should it become useful. In the first scenario, there is no communication. In the second scenario, content management be-comes the responsibility of each individual user. In contrast, the invention implements a pulled-content model, in which users request information when they need it. Users can specifically request notification of the presence of new material of interest by customizing their user profile. It is also possible for administrative users to enter notification requests that users may not modify, which can be used to force announcements, but this functionality is not available to unprivileged users or authors.
Although the invention is described herein with reference to the preferred embodiment, one skilled in the art will readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit and scope of the present invention. Accordingly, the invention should only be limited by the Claims included below.
[Note to Inventors: This is a bit rough. Some Redundancies. Please check carefully]

Claims

1. A service bureau architecture, comprising: a personal sales information channel comprising a custom information space containing only relevant marketing and sales information drawn from both inside and outside of a company; and a sales and marketing information exchange devoted to sales and marketing information for organizing said information for search, navigation, and delivery to a variety of audiences from a common base; wherein said information exchange enforces constraints on information flows.
2. A method for implementing a service bureau architecture, comprising the steps of: organizing and categorizing a company's sales and marketing content into one or more context maps; integrating said one or more context maps with new information from third parties; and facilitating exchanges of information to motivate closes and substantially shorten sales cycles.
3. The method of Claim 2, further comprising the step of: capturing metadata, wherein said metadata comprises information about content expressed as concepts and roles; using said concept and role metadata to categorize information; and applying ontological indexes and query planners to provide interactive response to objects representing said content.
4. A service bureau architecture, comprising: one or more context maps for capturing and describing sales information; information resource tools for allowing users to maintain and provide sales information exchanges; and one or more content exchange channels for allow organizations to target multiple audiences from a single base of sales and marketing information.
5. The architecture of Claim 4, wherein said one or more context maps embed any of marketing/sales domain categories, as well as an understanding of existing business rules, job functions, and terminology for targeted vertical markets, into said architecture.
6. The architecture of Claim 4, further comprising: a relational database; a category engine implementing resource description framework (RDF)-specified categories and attributes for retrieving sales information from said relational database.
7. The architecture of Claim 4, said content exchange channels implementing an information and content exchange (ICE) protocol to collect and exchange RDF metadata and content between servers and other internal/external content sources and clients.
8. A service bureau, comprising: an I/O interface module containing drivers for one or more interfaces which are used to access said service bureau; an extensions module containing one or more interfaces which provide value-added functionality; and a core for providing document management, said document management comprising: one or more context maps for capturing and describing sales information; information resource tools for allowing users to maintain and provide sales information exchanges; and one or more content exchange channels for allow organizations to target multiple audiences from a single base of sales and marketing information; said core comprising: a relational database management system (RDBMS) for storing content data along with metadata attributes which are used to organize said content; and a document store comprisng any of authoring support for entering content into said store, one or more mechanisms for fetching content from said store, one or more mechanisms for revision control, one or more mechanisms for specifying and enforcing access control, and audit tools for extracting information about said content.
The service bureau of Claim 8, said core further comprising: a session daemon for maintaining session state for active sessions.
10. The service bureau of Claim 8, said core further comprising: an event daemon for generating, propagating, and reacting to events.
11. A service bureau architecture, comprising:
00 one or more context maps that organize and facilitate access to a wide range of unstructured information for defining relationships between categories, between content and categories and between context maps, categorizing unstructured content in the terms familiar to sales and marketing professionals, and organizing information for searching and navigation of said categories and relationships to find information; one or more information resource tools for publishing, managing, and delivering said information; and one or more sales information channels tailored specific needs of different audiences, individuals, and communities of interest for supporting delivery of said information to multiple channels from a single information space.
12. The architecture of Claim 11 , further comprising: a content metadata extraction process for existing documents.
PCT/US2000/023355 1999-08-26 2000-08-24 Service bureau architecture WO2001015004A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU68010/00A AU6801000A (en) 1999-08-26 2000-08-24 Service bureau architecture

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15075899P 1999-08-26 1999-08-26
US60/150,758 1999-08-26

Publications (2)

Publication Number Publication Date
WO2001015004A2 true WO2001015004A2 (en) 2001-03-01
WO2001015004A8 WO2001015004A8 (en) 2001-12-20

Family

ID=22535874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/023355 WO2001015004A2 (en) 1999-08-26 2000-08-24 Service bureau architecture

Country Status (2)

Country Link
AU (1) AU6801000A (en)
WO (1) WO2001015004A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1696347A1 (en) * 2005-02-25 2006-08-30 Microsoft Corporation Data store for software application documents
WO2007044183A1 (en) * 2005-10-13 2007-04-19 Electronic Data Systems Corporation Locating documents supporting enterprise goals
US7707498B2 (en) 2004-09-30 2010-04-27 Microsoft Corporation Specific type content manager in an electronic document
US7730394B2 (en) 2005-01-06 2010-06-01 Microsoft Corporation Data binding in a word-processing application
US7752224B2 (en) 2005-02-25 2010-07-06 Microsoft Corporation Programmability for XML data store for documents
US7945590B2 (en) 2005-01-06 2011-05-17 Microsoft Corporation Programmability for binding data
US7953696B2 (en) 2005-09-09 2011-05-31 Microsoft Corporation Real-time synchronization of XML data between applications
US11768767B2 (en) 2021-10-29 2023-09-26 Micro Focus Llc Opaque object caching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
No Search *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707498B2 (en) 2004-09-30 2010-04-27 Microsoft Corporation Specific type content manager in an electronic document
US7712016B2 (en) 2004-09-30 2010-05-04 Microsoft Corporation Method and apparatus for utilizing an object model for managing content regions in an electronic document
US9110877B2 (en) 2004-09-30 2015-08-18 Microsoft Technology Licensing, Llc Method and apparatus for utilizing an extensible markup language schema for managing specific types of content in an electronic document
US7730394B2 (en) 2005-01-06 2010-06-01 Microsoft Corporation Data binding in a word-processing application
US7945590B2 (en) 2005-01-06 2011-05-17 Microsoft Corporation Programmability for binding data
EP1696347A1 (en) * 2005-02-25 2006-08-30 Microsoft Corporation Data store for software application documents
US7668873B2 (en) 2005-02-25 2010-02-23 Microsoft Corporation Data store for software application documents
US7752224B2 (en) 2005-02-25 2010-07-06 Microsoft Corporation Programmability for XML data store for documents
AU2006200047B2 (en) * 2005-02-25 2011-02-03 Microsoft Technology Licensing, Llc Data store for software application documents
US7953696B2 (en) 2005-09-09 2011-05-31 Microsoft Corporation Real-time synchronization of XML data between applications
WO2007044183A1 (en) * 2005-10-13 2007-04-19 Electronic Data Systems Corporation Locating documents supporting enterprise goals
US11768767B2 (en) 2021-10-29 2023-09-26 Micro Focus Llc Opaque object caching

Also Published As

Publication number Publication date
AU6801000A (en) 2001-03-19
WO2001015004A8 (en) 2001-12-20

Similar Documents

Publication Publication Date Title
US20190272293A1 (en) Automated creation and delivery of database content
US6662178B2 (en) Apparatus for and method of searching and organizing intellectual property information utilizing an IP thesaurus
US7680856B2 (en) Storing searches in an e-mail folder
US8484177B2 (en) Apparatus for and method of searching and organizing intellectual property information utilizing a field-of-search
US8495049B2 (en) System and method for extracting content for submission to a search engine
KR100601578B1 (en) Summarizing and Clustering to Classify Documents Conceptually
US8060513B2 (en) Information processing with integrated semantic contexts
US8473473B2 (en) Object oriented data and metadata based search
US20020138297A1 (en) Apparatus for and method of analyzing intellectual property information
US9858255B1 (en) Computer-implemented method and system for automated claim construction charts with context associations
Roth et al. Information integration: A new generation of information technology
US20060074980A1 (en) System for semantically disambiguating text information
US20080222105A1 (en) Entity recommendation system using restricted information tagged to selected entities
US20050210009A1 (en) Systems and methods for intellectual property management
US20100223250A1 (en) Detecting spam related and biased contexts for programmable search engines
US20070124319A1 (en) Metadata generation for rich media
WO2004097675A1 (en) Digital library system
US6963863B1 (en) Network query and matching system and method
WO2001015004A2 (en) Service bureau architecture
US8131752B2 (en) Breaking documents
Constantopoulos et al. On information organization in annotation systems
Sathiadas et al. Document management techniques & technologies
Heery et al. Metadata
Lu et al. Extensible information brokers
Berwick et al. Research Priorities for the World Wide Web

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

D17 Declaration under article 17(2)a
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP