WO2002017136A1 - System and method for digital information acquisition and distribution according to user profiles - Google Patents

System and method for digital information acquisition and distribution according to user profiles Download PDF

Info

Publication number
WO2002017136A1
WO2002017136A1 PCT/GB2000/003247 GB0003247W WO0217136A1 WO 2002017136 A1 WO2002017136 A1 WO 2002017136A1 GB 0003247 W GB0003247 W GB 0003247W WO 0217136 A1 WO0217136 A1 WO 0217136A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
digital information
users
web
user
Prior art date
Application number
PCT/GB2000/003247
Other languages
French (fr)
Inventor
Michael William Heaney
Original Assignee
Unisys Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisys Limited filed Critical Unisys Limited
Priority to GB0303891A priority Critical patent/GB2382901B/en
Priority to PCT/GB2000/003247 priority patent/WO2002017136A1/en
Priority to AU2000267113A priority patent/AU2000267113A1/en
Publication of WO2002017136A1 publication Critical patent/WO2002017136A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Definitions

  • the present invention relates to digital data processing methods and systems and, more particularly, to methods and systems for acquiring (capturing) data from a plurality of sources via a plurality of input channels and distributing (actively delivering and/or making available) acquired data to a plurality of end users via a plurality of output channels and a variety of access devices.
  • the present invention seeks to provide data acquisition and distribution systems adapted to collect and filter data from a range of sources, store that data in a structured manner, and to selectively distribute data in near-real-time to users via the users' choice of access device.
  • the invention is concerned with the collection and distribution of data on a wide area basis, via a variety of data/telecommunications networks, including wireless networks.
  • References herein to specific network systems such as the Internet, the World Wide Web (or, simply “the web”), etc. and associated communications protocols and the like will be understood to include and encompass other similar or equivalent network systems and protocols.
  • Preferred embodiments of the invention employ a set of data processing modules adapted to: • collect information from a range of on-line sources, e.g. news agencies, the web, business systems and other data services; • maintain profiles of individual users and/or classes of users and match acquired data to individual users and/or classes of users on the basis of said profiles; • deliver data to individual users in near-real-time through a range of communications channels and access devices such as email, facsimile, mobile messaging and browser-enabled mobile telephones (WAP) ; • enable the creation of dynamically built web portals, enabling users to access the same data via a range of access devices such as browsers, set-top boxes (interactive television) , mobile telephone etc.
  • WAP browser-enabled mobile telephones
  • a method of acquiring and distributing digital information comprising: defining and storing user profiles for a plurality of users, said profiles including parameters identifying categories of digital information of interest to users and users' preferred communications and access channels for receiving notification of and/or accessing newly acquired digital information; receiving digital information from a plurality of digital information sources via a plurality of digital information input channels; indexing and storing said digital information; selectively notifying users of newly acquired digital information via at least one of a plurality of digital information output channels on the basis of said user profiles and said indexing of said digital information; and selectively making said digital information available to users via at least one of a plurality of digital information access channels.
  • digital information is received from sources including at least one selected from a group comprising on-line news agencies, the Internet/Web, document imaging systems, e-mail and direct user input .
  • users are notified of newly acquired digital information via information output channels including at least one selected from a group comprising e-mail, Short Message Service and fax.
  • digital information is made available via digital information channels including at least one web server and at least one WAP gateway.
  • digital information distribution functions are administered via browser based user interfaces.
  • parameters identifying content of interest to users are defined by means of active software agents, and/or digital information filters and/or query definitions.
  • digital information made available to users in digital documents formatted by means of pre- defined templates.
  • the content of said documents may be defined by means of active software agents, and/or digital information filters and/or query definitions.
  • digital information is distributed by means of an application host system supporting a plurality of web application servers, a plurality of web servers in communication with said application host system, and a dynamic router operatively associated with said plurality of web servers .
  • said digital information is indexed using a relational multimedia database system and stored in at least one file system separate from said database system.
  • a digital data processing system comprising at least one computer configured and programmed to implement a method in accordance with the first aspect of the invention.
  • FIG. 1 is a block diagram illustrating a first example of a digital data acquisition and distribution system embodying the present invention.
  • Fig. 2 is a block diagram illustrating a second example of a digital data acquisition and distribution system embodying the present invention.
  • a first embodiment of a digital data acquisition and distribution system in accordance with the present invention comprises a set of modular data processing components providing functionality in four main areas, namely content capture and acquisition, content storage and management, service creation and content distribution, and content and user administration.
  • Content capture and acquisition functions are provided by a plurality of input modules 10, 12 and 14 for the capture or acquisition of content from a range of sources.
  • a news feeder module 10 is adapted to capture input from news ("wire") agency systems, typically comprising text and images;
  • a web hunter module 12 is adapted to capture and index web pages and other Internet content;
  • a scan collector module 14 is adapted for the acquisition of documents from a scanner or other document imaging device, and preferably provides optical character recognition (OCR) and full-text indexing functions.
  • OCR optical character recognition
  • Content storage and management functions are provided by a further plurality of modules 16, 18 and 44, providing the basis for content publication, targeted delivery and user-initiated query functions.
  • An active agent environment module 16 and multimedia relational database module 18 are adapted to receive content from the input modules 10, 12, 14.
  • the active agent environment module 16 also hosts active software agents which monitor incoming content against user profiles.
  • the multimedia database module 18 comprises a database system for the indexing of content.
  • the actual content may be stored in the database 18, but it is preferred that the content is stored in at least one separate file system 44 (which may include local or remote file- systems, video servers, etc), with corresponding database records stored in the database module 18 providing pointers to the locations where the actual content is stored.
  • user profiles are also stored in the database module 18. However, these could be stored in a separate database system.
  • a web port module 20 is adapted to provide full user access to the system from a web browser, via a web server 30, and includes end-user and administrator functions.
  • a WAP (wireless application protocol) port 22 is adapted to provide user access to the system through a WAP browser-enabled access device such as a mobile telephone, via a WAP server and gateway 32.
  • a targeted delivery port 24 is adapted to deliver content in near real-time to end users, based on individual preferences defined in the system's user profiles, via any suitable channels such as a short message centre 34 and email server 36.
  • a system management and service creation environment 42 supports system management and administration functions as follow.
  • a user and system manager module 38 provides user management functions, allowing the creation and management of user accounts, including permissions and membership of user groups, module management functions for control and inspection of particular system services, and database table management functions for simple on- going configuration of specific content-related parameters.
  • a dynamic web builder module 26 is adapted to provide functionality for quickly and easily creating a portal presence, and includes WAP capabilities.
  • a separate WAP port management module 40 may also be included providing tools for management and administration of the WAP port 22.
  • the system preferably further includes an API (application programming interface) module 28, incorporating a software developers kit (SDK) enables simplified integration of other systems.
  • API application programming interface
  • SDK software developers kit
  • the associated modules 10, 12 and 14 are designed and configured to make a range of content from a variety of sources available to the system, independent of the media or input channel from which the content has been acquired and independent of its original format.
  • the main channels for the input of content to the system may include: dispatches from wire agencies, including both text and images; text and images from paper documents; web pages and associated content from the Internet or from corporate intranets and/or extranets; manual insertion of content through a web interface.
  • any available categorisation information for an incoming item of content is used by the system, such information being normalised and adapted for use within the system.
  • the news feeder module 10 is adapted to capture text and images from wire agencies such as A SA, Reuters, Bloomberg etc., employing a robust design to help ensure the reliable capture of content at the very high frequencies with which wire agencies transmit dispatches.
  • the module 10 is preferably user- configurable so as to be able to receive dispatches from any content provider employing any one of a number of wire agency protocols.
  • the web hunter module 12 uses known server-side technologies to capture most types of web pages and associated content and to make that content available to users of the system through any of the range of content services and access devices supported by the system.
  • the module 12 effectively provides "screened" Internet access for users of the system and removes the need for users to actively search for information on the Internet. Instead, relevant content is captured automatically by the system, filtered and delivered automatically to users, thereby reducing the burden on users and minimising- external network usage.
  • the system provides tools which enable content administrators to create "subscriptions" to selected web pages and/or groups of web pages quickly and simply, on the basis of which the web hunter 12 regularly polls the nominated web pages or groups of linked web pages to check for changes in content.
  • a change in the content of a web page is detected, that content is loaded into the system and thus becomes available to users.
  • the content administrator may configure ' each web subscription to acquire the required depth of links (e.g. the target page only, or the target page plus all pages immediately linked to the target page, etc.) .
  • Subscriptions may also be configured to determine the richness of content which is to be captured, using retrieval filters to select from, for example, text, graphics, audio, video or other types of multimedia content .
  • the scan collector module 14 allows content to be acquired from printed matter, such as journals, newspapers or other documents, by means of a scanning (imaging) and OCR process.
  • a scanning (imaging) and OCR process Any of a variety of commercially available scanning and OCR packages may be employed, preferably of a type which is capable of learning and/or being trained in specialist vocabulary specific to the organisation employing the system, so as to maximise the accuracy of the OCR process and minimise the need for manual intervention.
  • the system may support multiple scanning/OCR workstations. Image and text files created by the process may be saved to a any suitable data storage system, from where they are captured by the present system.
  • module 16 hosts active agents which monitor incoming content against user profiles
  • database module 18 indexes content residing on file-systems, video servers etc 44.
  • the functions of the modules 16 and 18 are complementary.
  • the active agent environment 16 holds content only as long as necessary for the content to be delivered ("pushed" by the system) to relevant end users, on the basis of pre-defined user profiles.
  • the database module 18 provides a managed store of content which is of longer term interest and which may be accessed (“pulled” from the system) by users as and when necessary.
  • a set of software agents monitor all incoming content from the input modules 10, 12, 14, matching it against user profiles and actively delivering (pushing) relevant content to users in accordance with their individual requirements.
  • These agents may be created as required using an intuitive user interface which may be accessed via a standard web browser. Accordingly, agents may be created by a number of suitably authorised individuals, typically content administrators, but potentially by relatively large numbers of individuals including end-users. This user interface is described in more detail below.
  • the relevant item of content is forwarded to the targeted delivery port 24, together with details of the intended recipient (s) and their preferred access channel (e.g. email or short message) .
  • the targeted delivery port 24 then passes the content to the relevant targeted delivery output modules 34 and/or 36, with appropriate headers, formatting etc.
  • the targeted delivery port 24 is described in greater detail below.
  • the active agent environment is preferably configured to be able to manipulate content entirely within dynamic memory (RAM) , in a manner similar to existing high performance publishing systems as employed in editorial environments of newspaper publishing systems.
  • RAM dynamic memory
  • the database module 18 is preferably a high performance multimedia relational database system providing a single index of content, supporting search and retrieval by end users as well as the display of content in dynamically created web pages, WAP cards and other content delivery and access services provided by the system.
  • the database module 18 supports the provision of dynamically-updated content services such as these by means of query filters, which are similar in concept to the agents of the active agent environment 16 but which are passive in nature, rather than active.
  • Each query filter relates to a particular set of content and may be defined using the same intuitive user interface which is used to create agents in the active agent environment. Accordingly, any of the content services provided by or created using the system may call upon these query filters to define content which is presented to users or sets of users.
  • the multimedia database 18 is capable of indexing (and storing, if required) almost any type of content. Any of a number of commercially available database packages may be used for this purpose, such as Oracle, preferably providing full text indexing and/or other multimedia searching functions such as image matching.
  • Items of content indexed by the database may be associated into groups, by means of links, defined at the time of loading the relevant content into the database 18 and providing an easy means for end users to navigate between related items.
  • the system preferably stores content in the form of a pointer in the database identifying content objects which are themselves stored in at least one separate faster-to- access file system 44.
  • the associated modules 20, 22, 24, 26 and 28 provide for the distribution of content through a range of channels and access devices.
  • These capabilities provide a service creation environment which enables content administrators to maintain quality content services, which in turn provides the basis for three main modes of content delivery: • automatic, targeted delivery of content to individuals, on the basis of personal profiles, via a range of channels/devices including email, facsimile ("fax") and mobile telephone - that is, content is selectively "pushed" to users on the basis of user profile information; • access to the range of content in the database 18, either by means of search queries or by browsing through appropriate sets of categories - that is, users are able to "pull” content from the system in accordance with individual requirements; • publication of content through a set of web pages (which may be organised and managed as a web portal) or its equivalent in other media types - again, users are able to "pull” content from the system in accordance with individual requirements.
  • the web port 20 is a powerful module providing a range of functions both to content administrators and end users. These two types of usage are substantially different and will be discussed separately. Referring to user access via the web port 20, all user access capabilities may be accessed through a standard web browser augmented by any required browser plug-ins. This makes it inherently suited to distributed organisations since it requires no additional software installation at the client end, thereby assisting in the control of desktop management costs.
  • the module may be implemented as a "light client" (e.g. using JavaScript and HTML), so that it is quick to download over networks, including the Internet .
  • a simple search facility may provide a user interface similar to standard Internet search engines, providing a powerful means of access suited to all users, which may be supplemented by applying filters. For example, the user may choose to restrict the scope of a search by very simple filters based on content type or subject, or may choose more complex filters built up from previous search terms. Custom filters of this type may be created by any user of the system. Advanced searching capabilities may be provided for more experienced users, allowing more complex and precise searches to be carried out by exploiting the full capabilities of the database system 18 (e.g. using SQL (Structured Query Language) and text management capabilities.
  • SQL Structured Query Language
  • Both simple and advanced searches may return a list of matching items of content, which the user may simply click to view.
  • these may be sorted by any of the usual display fields (date, type, title, etc.) . Where a list of items spans several pages, this sorting applies to all of the pages, so bringing the highest priority items to the first page.
  • the item's categorisation information is displayed to the user with a prompt to start the relevant application.
  • server-side viewer technology may be integrated with the system.
  • the web port 20 may also provide the user with the option of adding items of content to a virtual shopping trolley. At any time, the contents of the trolley may be previewed, removed or downloaded to a specified directory.
  • the shopping trolley may also display prices associated with items of content, providing the capability for integration with payment (including micro-payment) and billing systems.
  • content management functions of the web port 20 depending on the requirements of the organisation employing the system the system may be configured to allow users (typically customers and/or employees) to access at least some of the advanced functionality which is normally reserved for content administrators. These functions include the creation and management of filters and agents and the management of content.
  • Filters and agents play an important role in the operation of the system. Agents are used in the active agent environment 16 to match incoming content against user profiles, whilst filters underlie the dynamic display of content in portal pages as well as in helping users to perform searches via the web port 20.
  • Filters and agents may be created very simply by any authorised user, building on the standard search capabilities of the web port 20.
  • the user may perform a search in the normal way, but using the terms of the query/agent employed for the search, check that the results are as expected and save the search parameters together with a few additional details required to manage the agent/query.
  • Creating a filter in this way depends on defining the search parameters with sufficient precision. This may be an iterative process allowing complex filters to be based on simpler, existing filters. The present system supports this process by allowing content administrators to apply a further filter to a search, so defining a new filter. In this way, very detailed and refined filters may be created without the need to resort to highly complex systems .
  • Content administrators may manage the set of filters and agents active in the system via the web port 20.
  • authorised users may view a list of filters and agents (with the option of sorting by name/description if necessary) . From this list, the user may choose to view details of particular filters/agents (e.g. associated SQL statements) and to delete those which are no longer wanted.
  • authorised users may modify any item of content in the database 18. This capability extends to the modification of categorisation information, captions etc. However, in order to maintain basic version control, modification of actual content will not normally be permitted. Where an item of content is required to be updated or replaced, the content administrator may simply insert a new item of content into the database/file system and, optionally, delete the old content .
  • the WAP port 22 a basic function of the present system is to provide end users with access to the same set of information regardless of the access device used by a particular user at a particular time (subject of course to inherent device limitations) .
  • the web port 20 supports this function for any type of device employing a standard web browser.
  • the WAP port extends access to the system to any type of WAP-based access device, including mobile handsets.
  • the interface provided by WAP port 22 mirrors that provided by the web port 20, as far as possible given the inherent limitations of WAP handsets and the like.
  • WAP and web interfaces
  • User authentication In its simplest form, this will involve a user name and password. For a WAP interface, this may also involve a unique ID associated with a particular WAP device or SIM card.
  • Personalised start page Immediately following authentication, the user will be presented with a list of content or list of categories or the like determined by their user profile.
  • Free search Similar in concept to a standard Internet search engine, allowing the user to input search parameters in order to search the content stored by the system in accordance with their particular requirements.
  • Category search In order to facilitate searching with no or minimal keypad input, allowing browsing through multiple levels of categories in order to locate content of interest. • Display of content.
  • the WAP port may be configured to return those items which are viewable through the relevant access device, rather than returning only information regarding items without the actual content .
  • Update of personal profiles Users may set or modify their profile parameters in order to determine both the content which is displayed on their personal start page and content which is pro-actively pushed to them by the system.
  • the dynamic web builder 26 the primary function of this module is to guide content administrators through the creation of portal pages and WAP cards, including those which are dynamically generated from content in the system.
  • the dynamic web builder is preferably implemented in a sufficiently light browser client for it to be used over almost any network connection, including relatively slow modem links.
  • the creation of portal pages and WAP cards may be facilitated by the use of pre-defined templates, allowing an appropriate layout to be selected and populated with static or dynamically-updated content.
  • the template design may ensure that pages and cards created using the template conform to any applicable standards; e.g. to ensure the inclusion of mandatory content such as corporate logos or copyright notices.
  • a set of templates appropriate to the needs of the specific organisation employing the system may be easily maintained by the content administrators.
  • Existing templates may be modified and new templates created using a set of simple template definitions.
  • Template definitions may be based, for example, upon XML (extensible mark-up language) so that templates may be created simply by personnel with reasonable web design skills.
  • the preferred embodiment of the dynamic web builder 26 provides an intuitive browser based interface for content administrators to create portal pages and other content services. Once an appropriate template has been chosen, pre- defined sections of the template may be populated by simply clicking on the corresponding portion of the screen. Depending on the type of content required for that section, the content administrator may be offered a choice of pre-defined filters (as discussed in relation to the active agent environment 16) which enable the inclusion of dynamically updated content from the database 18, and/or the ability to insert static content from the database 18 (using the standard web port search facilities) or from other sources such as the administrator's personal folder on the system server or from a local file system on the administrator's work station.
  • pre-defined filters as discussed in relation to the active agent environment 16
  • this module operates in conjunction with the active agent environment 16 to match new incoming content against user profiles and to deliver content to each individual's preferred access device (typically a short message to a mobile telephone, an email or a fax) .
  • the user profiling functions of the system may be implemented in a number of ways, allowing for effective customisation of the user interface (s) .
  • the least sophisticated approach is to provide a subscription form allowing the user simply to select subjects of interest.
  • users may manage their own user profiles.
  • the present system may also be integrated with other systems allowing user profile information to be derived from other sources, such as IVR (interactive voice response) or customer relationship management systems.
  • the content delivery channels provided by the system preferably include, at least, email (e.g. using the SMTP protocol), fax (e.g. via a fax/email gateway) and SMS (short message service) text messages to mobile telephones (e.g. implemented through direct integration to an operator's short message centre or through a SMS gateway, depending on the projected volume and local telephony infrastructure) .
  • email e.g. using the SMTP protocol
  • fax e.g. via a fax/email gateway
  • SMS short message service
  • the API module 28 provides a user interface to the functionality of the web port 20 and allows integration of third-party applications which may be required to interact with the present system. This is made accessible through the associated software development kit.
  • the system is preferably configured so as to separate, as far as possible, the administration of content and users from the technical management of the system itself.
  • the user and system manager module 38 provides a specific set of tools enabling non-technical administrators to manage the user and/or content aspects of the system. All of these administration tools, described further below, may be implemented using a browser client, thus enabling content and user administration without the need for specific software installed on PCs or other work stations used by various administrators .
  • the user and system manager module 38 includes user manager functionality providing for the creation and management of user accounts.
  • the system may also be able to acquire user information from existing multi- user information systems, removing the need to create separate user accounts and avoiding the need for users to log in separately to different systems.
  • the administrator may set a number of properties, including the user's administrator status, membership of user groups, access to particular system functions, details of access devices, etc. Administrators may also temporarily suspend or remove user accounts.
  • the user and system manager module 38 also includes module management functions, providing administrators with views of the status of key system processes, including those of the content acquisition and delivery modules 10, 12, 14, 16 etc., and enables the activation and de-activation of the relevant system modules. These functions also provide access to delivery queues of the targeted delivery port 24, allowing administrators to check the recent history of messages sent to the various content delivery systems, helped by the ability to sort queue listings by user name, message status (e.g. "pending", "sent”, etc.), date stamp, or other attributes.
  • the user and system manager module 38 further includes database table management functions, providing system and content administrators with the ability to modify certain tables of the database 18 in order to configure and manage the system on an on- going basis.
  • database tables which may be modified using this facility include: content types (e.g. "video”, “pdf document”, “text”, etc.); content subjects (e.g.
  • arrows indicate the main communication channels between modules involved in acquisition and delivery of content.
  • content is input from the input modules 10, 12 and 14 to the active agent environment 16 and database 18, with news agency content going directly to the active agent environment 16 and other content going directly to the database 18. It will be understood that these specific input paths may be varied.
  • the database is in two-way communication with the file system 44 for storage and retrieval of content. Both the active agent environment 16 and database 18 are in communication with the targeted delivery port 24 to allow content to be "pushed" to users.
  • the active agent environment 16 communicates with the database 18 at least to the extent necessary to allow relevant content acquired via the active agent environment 16 to be indexed and stored in the database 8 and file system 44.
  • Both the active agent environment 16 and database 18 are in two-way communication with the web and WAP ports 20 and 22 to allow content to be "pulled" from the system by users.
  • the ports 20, 22 and 24 communicate with the various associated output servers/gateways 30 - 36.
  • Fig. 2 illustrates an alternative architecture for a system embodying the invention, which may provide similar or enhanced functionality to the embodiment of Fig. 1.
  • external content 120 is acquired by a system server 100, which communicates with a database system 102, similar to the database 18 of Fig. 1, and file system(s) 118, similar to the file system (s) 44 of Fig. 1, in order to index and store the acquired content.
  • Content may be acquired from various sources as in the embodiment of Fig. 1, again using suitable input modules which may be incorporated into the server 100 or may be separate therefrom.
  • all content distribution is performed via a host system 104 supporting a plurality of web-type (suitably HTML or WML) applications and application servers 106.
  • the Application servers in turn communicate with a plurality of web servers 112.
  • the web servers 112 are connected to a dynamic router 114.
  • Internet/Web users interact with the system directly via the router 114.
  • WAP users interact with the system via a WAP gateway 116, which is connected to the system via the router 114.
  • the dynamic router 114 allows system traffic to be spread and balanced between the plurality of web servers 112 and application servers 106, thereby increasing the capacity and robustness of the system.
  • Fig. 2 arbitrarily shows six application servers 106 and web servers 112. This architecture is completely scalable to provide any required capacity and degree of robustness.
  • Targeted delivery output 122 selectively delivers content to users on the basis of user profiles via multiple output channels in a manner similar to the embodiment of Fig. 1.
  • User profiles may again be maintained in the database 102 or in a separate database.
  • the functionality of the active agent environment 16 and targeted delivery port 24 of Fig. 1 is incorporated into the database 102 and profiling server 110.
  • Browser based user and administrator interfaces similar to those of the embodiment of Fig. 1 are provided via the application servers 106, web servers 112 and router 114/WAP gateway 116.
  • all acquired content is processed through the database 102/file system 118.
  • preferred embodiments of the invention allow users' main Web/WAP access pages to be dynamically updated with information (typically hyperlinks) regarding new content on the basis of individual and/or group user profiles, again using query-based filters and agents. Active delivery may also involve transmission of a hyperlink to the relevant content rather than the content itself.
  • profile-based content delivery may be implemented by a process of checking all newly acquired content against user profiles and actioning any matches detected by active delivery and/or updating of relevant Web/WAP pages.
  • the indexing of new content may include cross referencing new and existing content and creating templates in accordance with predetermined criteria so as to generate aggregations of related content which may be incorporated into structured sets of dynamically updated XML/HTML documents or the like. Templates employed for this purpose may be structured so as to give such sets of related documents a common look and feel.
  • the categorisation (indexing) of new content is preferably at least partially automated (e.g. on the basis of subscription information related to Web/Internet content and by use of document parsers to extract key words etc. from documents), but may involve some degree of manual input .
  • Other possible content input and output channels include audio and video streaming and natural language recognition.
  • Systems embodying the invention may be implemented using known data processing and communications technologies, protocols and languages, selected to provide a suitable degree of scalability and robustness so as to support any required number of users.

Abstract

Methods and systems for the acquisition and distribution of digital information ('content') in which content is received from a number of sources (on-line news agencies, the Web/Internet, document imaging systems etc), indexed and stored using a database system, and distributed to users on the basis of user profiles against which newly acquired content is matched, via a number of different content delivery and access channels (e-mail, SMS, fax, Web and WAP browsers etc.) defined by said user profiles. Content may be actively delivered to users, stored for retrieval by users, and formatted for publication to users. Browser based user-interfaces simplify system administration and user interaction.

Description

SYSTEM AND METHOD FOR DIGITAL INFORMATION ACQUISITION AND DISTRIBUTION ACCORDING TO USER PROFILES
The present invention relates to digital data processing methods and systems and, more particularly, to methods and systems for acquiring (capturing) data from a plurality of sources via a plurality of input channels and distributing (actively delivering and/or making available) acquired data to a plurality of end users via a plurality of output channels and a variety of access devices.
There is an ever-increasing demand for organisations of various kinds to acquire large volumes of digital data (including multimedia data and referred to hereinafter as "data" or "information" or "content") from a variety of different sources and to distribute the data to end users. Existing database systems are suitable for organising and storing large volumes of data and for providing access thereto to large numbers of users. However, the acquisition and distribution of data using such systems is often inefficient. For example, users are frequently overloaded with unfiltered information, much of which is not relevant to them.
The present invention seeks to provide data acquisition and distribution systems adapted to collect and filter data from a range of sources, store that data in a structured manner, and to selectively distribute data in near-real-time to users via the users' choice of access device.
The invention is concerned with the collection and distribution of data on a wide area basis, via a variety of data/telecommunications networks, including wireless networks. References herein to specific network systems such as the Internet, the World Wide Web (or, simply "the web"), etc. and associated communications protocols and the like will be understood to include and encompass other similar or equivalent network systems and protocols.
Preferred embodiments of the invention employ a set of data processing modules adapted to: • collect information from a range of on-line sources, e.g. news agencies, the web, business systems and other data services; • maintain profiles of individual users and/or classes of users and match acquired data to individual users and/or classes of users on the basis of said profiles; • deliver data to individual users in near-real-time through a range of communications channels and access devices such as email, facsimile, mobile messaging and browser-enabled mobile telephones (WAP) ; • enable the creation of dynamically built web portals, enabling users to access the same data via a range of access devices such as browsers, set-top boxes (interactive television) , mobile telephone etc.
These services are based on a service-creation and content administration environment which is powerful but sufficiently intuitive for use by non-specialist staff.
In accordance with a first aspect of the invention, there is provided a method of acquiring and distributing digital information comprising: defining and storing user profiles for a plurality of users, said profiles including parameters identifying categories of digital information of interest to users and users' preferred communications and access channels for receiving notification of and/or accessing newly acquired digital information; receiving digital information from a plurality of digital information sources via a plurality of digital information input channels; indexing and storing said digital information; selectively notifying users of newly acquired digital information via at least one of a plurality of digital information output channels on the basis of said user profiles and said indexing of said digital information; and selectively making said digital information available to users via at least one of a plurality of digital information access channels.
Preferably, digital information is received from sources including at least one selected from a group comprising on-line news agencies, the Internet/Web, document imaging systems, e-mail and direct user input .
Preferably also, users are notified of newly acquired digital information via information output channels including at least one selected from a group comprising e-mail, Short Message Service and fax.
Preferably also, digital information is made available via digital information channels including at least one web server and at least one WAP gateway.
Preferably, digital information distribution functions are administered via browser based user interfaces.
Preferably, parameters identifying content of interest to users are defined by means of active software agents, and/or digital information filters and/or query definitions.
Preferably, digital information made available to users in digital documents formatted by means of pre- defined templates. The content of said documents may be defined by means of active software agents, and/or digital information filters and/or query definitions.
In preferred embodiments, digital information is distributed by means of an application host system supporting a plurality of web application servers, a plurality of web servers in communication with said application host system, and a dynamic router operatively associated with said plurality of web servers .
Preferably, said digital information is indexed using a relational multimedia database system and stored in at least one file system separate from said database system.
In accordance with a second aspect of the invention, there is provided a digital data processing system comprising at least one computer configured and programmed to implement a method in accordance with the first aspect of the invention.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Fig. 1 is a block diagram illustrating a first example of a digital data acquisition and distribution system embodying the present invention; and
Fig. 2 is a block diagram illustrating a second example of a digital data acquisition and distribution system embodying the present invention.
Referring now to Fig. 1 of the drawings, a first embodiment of a digital data acquisition and distribution system in accordance with the present invention comprises a set of modular data processing components providing functionality in four main areas, namely content capture and acquisition, content storage and management, service creation and content distribution, and content and user administration.
Content capture and acquisition functions are provided by a plurality of input modules 10, 12 and 14 for the capture or acquisition of content from a range of sources. For example, in this embodiment, a news feeder module 10 is adapted to capture input from news ("wire") agency systems, typically comprising text and images; a web hunter module 12 is adapted to capture and index web pages and other Internet content; and a scan collector module 14 is adapted for the acquisition of documents from a scanner or other document imaging device, and preferably provides optical character recognition (OCR) and full-text indexing functions.
Content storage and management functions are provided by a further plurality of modules 16, 18 and 44, providing the basis for content publication, targeted delivery and user-initiated query functions. An active agent environment module 16 and multimedia relational database module 18 are adapted to receive content from the input modules 10, 12, 14. The active agent environment module 16 also hosts active software agents which monitor incoming content against user profiles. The multimedia database module 18 comprises a database system for the indexing of content. The actual content may be stored in the database 18, but it is preferred that the content is stored in at least one separate file system 44 (which may include local or remote file- systems, video servers, etc), with corresponding database records stored in the database module 18 providing pointers to the locations where the actual content is stored. In this embodiment, user profiles are also stored in the database module 18. However, these could be stored in a separate database system.
Content distribution functions and associated administrative functions are provided by a further plurality of modules 20, 22, 24, 26 and 28, which allow non-technical administrative staff to define content services and which perform the delivery of content to end users via a range of access devices. A web port module 20 is adapted to provide full user access to the system from a web browser, via a web server 30, and includes end-user and administrator functions. A WAP (wireless application protocol) port 22 is adapted to provide user access to the system through a WAP browser-enabled access device such as a mobile telephone, via a WAP server and gateway 32. A targeted delivery port 24 is adapted to deliver content in near real-time to end users, based on individual preferences defined in the system's user profiles, via any suitable channels such as a short message centre 34 and email server 36.
A system management and service creation environment 42 supports system management and administration functions as follow. A user and system manager module 38 provides user management functions, allowing the creation and management of user accounts, including permissions and membership of user groups, module management functions for control and inspection of particular system services, and database table management functions for simple on- going configuration of specific content-related parameters. A dynamic web builder module 26 is adapted to provide functionality for quickly and easily creating a portal presence, and includes WAP capabilities. A separate WAP port management module 40 may also be included providing tools for management and administration of the WAP port 22. The system preferably further includes an API (application programming interface) module 28, incorporating a software developers kit (SDK) enables simplified integration of other systems.
The four main areas of the system functionality, and their associated modules, will now be described in greater detail.
Referring firstly to content capture and acquisition, the associated modules 10, 12 and 14 are designed and configured to make a range of content from a variety of sources available to the system, independent of the media or input channel from which the content has been acquired and independent of its original format. By way of example, the main channels for the input of content to the system may include: dispatches from wire agencies, including both text and images; text and images from paper documents; web pages and associated content from the Internet or from corporate intranets and/or extranets; manual insertion of content through a web interface. Wherever possible, any available categorisation information for an incoming item of content is used by the system, such information being normalised and adapted for use within the system.
The news feeder module 10 is adapted to capture text and images from wire agencies such as A SA, Reuters, Bloomberg etc., employing a robust design to help ensure the reliable capture of content at the very high frequencies with which wire agencies transmit dispatches. The module 10 is preferably user- configurable so as to be able to receive dispatches from any content provider employing any one of a number of wire agency protocols.
The web hunter module 12 uses known server-side technologies to capture most types of web pages and associated content and to make that content available to users of the system through any of the range of content services and access devices supported by the system. The module 12 effectively provides "screened" Internet access for users of the system and removes the need for users to actively search for information on the Internet. Instead, relevant content is captured automatically by the system, filtered and delivered automatically to users, thereby reducing the burden on users and minimising- external network usage.
The system provides tools which enable content administrators to create "subscriptions" to selected web pages and/or groups of web pages quickly and simply, on the basis of which the web hunter 12 regularly polls the nominated web pages or groups of linked web pages to check for changes in content. When a change in the content of a web page is detected, that content is loaded into the system and thus becomes available to users. The content administrator may configure' each web subscription to acquire the required depth of links (e.g. the target page only, or the target page plus all pages immediately linked to the target page, etc.) . Subscriptions may also be configured to determine the richness of content which is to be captured, using retrieval filters to select from, for example, text, graphics, audio, video or other types of multimedia content .
The scan collector module 14 allows content to be acquired from printed matter, such as journals, newspapers or other documents, by means of a scanning (imaging) and OCR process. Any of a variety of commercially available scanning and OCR packages may be employed, preferably of a type which is capable of learning and/or being trained in specialist vocabulary specific to the organisation employing the system, so as to maximise the accuracy of the OCR process and minimise the need for manual intervention. The system may support multiple scanning/OCR workstations. Image and text files created by the process may be saved to a any suitable data storage system, from where they are captured by the present system.
It will be appreciated that additional input modules could be added to the system for receiving content from other types of sources, such as e-mail servers, or direct user input.
Referring now to content and storage management, these functions are provided by the active agent environment module 16 and database module 18, which receive content from the input modules 10, 12 and 14. In this embodiment, module 16 hosts active agents which monitor incoming content against user profiles, and the database module 18 indexes content residing on file-systems, video servers etc 44. The functions of the modules 16 and 18 are complementary. The active agent environment 16 holds content only as long as necessary for the content to be delivered ("pushed" by the system) to relevant end users, on the basis of pre-defined user profiles. The database module 18 provides a managed store of content which is of longer term interest and which may be accessed ("pulled" from the system) by users as and when necessary.
Within the framework of the active agent environment 16, a set of software agents monitor all incoming content from the input modules 10, 12, 14, matching it against user profiles and actively delivering (pushing) relevant content to users in accordance with their individual requirements. These agents may be created as required using an intuitive user interface which may be accessed via a standard web browser. Accordingly, agents may be created by a number of suitably authorised individuals, typically content administrators, but potentially by relatively large numbers of individuals including end-users. This user interface is described in more detail below. Whenever an agent is triggered by a content match, the relevant item of content is forwarded to the targeted delivery port 24, together with details of the intended recipient (s) and their preferred access channel (e.g. email or short message) . The targeted delivery port 24 then passes the content to the relevant targeted delivery output modules 34 and/or 36, with appropriate headers, formatting etc. The targeted delivery port 24 is described in greater detail below.
The active agent environment is preferably configured to be able to manipulate content entirely within dynamic memory (RAM) , in a manner similar to existing high performance publishing systems as employed in editorial environments of newspaper publishing systems.
The database module 18 is preferably a high performance multimedia relational database system providing a single index of content, supporting search and retrieval by end users as well as the display of content in dynamically created web pages, WAP cards and other content delivery and access services provided by the system.
The database module 18 supports the provision of dynamically-updated content services such as these by means of query filters, which are similar in concept to the agents of the active agent environment 16 but which are passive in nature, rather than active. Each query filter relates to a particular set of content and may be defined using the same intuitive user interface which is used to create agents in the active agent environment. Accordingly, any of the content services provided by or created using the system may call upon these query filters to define content which is presented to users or sets of users.
In preferred embodiments, the multimedia database 18 is capable of indexing (and storing, if required) almost any type of content. Any of a number of commercially available database packages may be used for this purpose, such as Oracle, preferably providing full text indexing and/or other multimedia searching functions such as image matching.
Items of content indexed by the database may be associated into groups, by means of links, defined at the time of loading the relevant content into the database 18 and providing an easy means for end users to navigate between related items. In order to provide optimal performance in conjunction with the benefits of a well designed database, the system preferably stores content in the form of a pointer in the database identifying content objects which are themselves stored in at least one separate faster-to- access file system 44.
Referring now to service creation and content distribution, the associated modules 20, 22, 24, 26 and 28 provide for the distribution of content through a range of channels and access devices. These capabilities provide a service creation environment which enables content administrators to maintain quality content services, which in turn provides the basis for three main modes of content delivery: • automatic, targeted delivery of content to individuals, on the basis of personal profiles, via a range of channels/devices including email, facsimile ("fax") and mobile telephone - that is, content is selectively "pushed" to users on the basis of user profile information; • access to the range of content in the database 18, either by means of search queries or by browsing through appropriate sets of categories - that is, users are able to "pull" content from the system in accordance with individual requirements; • publication of content through a set of web pages (which may be organised and managed as a web portal) or its equivalent in other media types - again, users are able to "pull" content from the system in accordance with individual requirements.
The individual modules 20, 22, 24, 26 and 28 will now be described in more detail.
The web port 20 is a powerful module providing a range of functions both to content administrators and end users. These two types of usage are substantially different and will be discussed separately. Referring to user access via the web port 20, all user access capabilities may be accessed through a standard web browser augmented by any required browser plug-ins. This makes it inherently suited to distributed organisations since it requires no additional software installation at the client end, thereby assisting in the control of desktop management costs. The module may be implemented as a "light client" (e.g. using JavaScript and HTML), so that it is quick to download over networks, including the Internet .
Through the web port 20, end users have access to the full range of content stored in the database 18, as well as short term items managed in the active agent environment 16.
Users may be provided with simple and advanced searching functions. A simple search facility may provide a user interface similar to standard Internet search engines, providing a powerful means of access suited to all users, which may be supplemented by applying filters. For example, the user may choose to restrict the scope of a search by very simple filters based on content type or subject, or may choose more complex filters built up from previous search terms. Custom filters of this type may be created by any user of the system. Advanced searching capabilities may be provided for more experienced users, allowing more complex and precise searches to be carried out by exploiting the full capabilities of the database system 18 (e.g. using SQL (Structured Query Language) and text management capabilities.
Both simple and advanced searches may return a list of matching items of content, which the user may simply click to view. In order to assist in navigating long lists, these may be sorted by any of the usual display fields (date, type, title, etc.) . Where a list of items spans several pages, this sorting applies to all of the pages, so bringing the highest priority items to the first page.
For any items of content which require a browser plug-in or other application to view, the item's categorisation information is displayed to the user with a prompt to start the relevant application. Alternatively or additionally, server-side viewer technology may be integrated with the system.
The web port 20 may also provide the user with the option of adding items of content to a virtual shopping trolley. At any time, the contents of the trolley may be previewed, removed or downloaded to a specified directory. The shopping trolley may also display prices associated with items of content, providing the capability for integration with payment (including micro-payment) and billing systems. Referring to content management functions of the web port 20, depending on the requirements of the organisation employing the system the system may be configured to allow users (typically customers and/or employees) to access at least some of the advanced functionality which is normally reserved for content administrators. These functions include the creation and management of filters and agents and the management of content.
Filters and agents play an important role in the operation of the system. Agents are used in the active agent environment 16 to match incoming content against user profiles, whilst filters underlie the dynamic display of content in portal pages as well as in helping users to perform searches via the web port 20.
Filters and agents may be created very simply by any authorised user, building on the standard search capabilities of the web port 20. The user may perform a search in the normal way, but using the terms of the query/agent employed for the search, check that the results are as expected and save the search parameters together with a few additional details required to manage the agent/query. By way of example, a filter for use by business travellers requiring travel bulletins relating to Italy may be created by using the simple search facility to select type = "travel bulletin" and key word "Italy", executing the search and checking the results before saving the search parameters as a filter with a name and description.
Creating a filter in this way depends on defining the search parameters with sufficient precision. This may be an iterative process allowing complex filters to be based on simpler, existing filters. The present system supports this process by allowing content administrators to apply a further filter to a search, so defining a new filter. In this way, very detailed and refined filters may be created without the need to resort to highly complex systems .
Content administrators may manage the set of filters and agents active in the system via the web port 20. Using a normal browser interface, authorised users may view a list of filters and agents (with the option of sorting by name/description if necessary) . From this list, the user may choose to view details of particular filters/agents (e.g. associated SQL statements) and to delete those which are no longer wanted.
Apart from the search and retrieval capabilities of the web port 20, authorised users (normally only content administrators) may modify any item of content in the database 18. This capability extends to the modification of categorisation information, captions etc. However, in order to maintain basic version control, modification of actual content will not normally be permitted. Where an item of content is required to be updated or replaced, the content administrator may simply insert a new item of content into the database/file system and, optionally, delete the old content .
Referring now to the WAP port 22, a basic function of the present system is to provide end users with access to the same set of information regardless of the access device used by a particular user at a particular time (subject of course to inherent device limitations) . The web port 20 supports this function for any type of device employing a standard web browser. The WAP port extends access to the system to any type of WAP-based access device, including mobile handsets. Preferably, the interface provided by WAP port 22 mirrors that provided by the web port 20, as far as possible given the inherent limitations of WAP handsets and the like.
Particularly preferred functions of the WAP (and web) interface include: • User authentication. In its simplest form, this will involve a user name and password. For a WAP interface, this may also involve a unique ID associated with a particular WAP device or SIM card. • Personalised start page. Immediately following authentication, the user will be presented with a list of content or list of categories or the like determined by their user profile. • Free search. Similar in concept to a standard Internet search engine, allowing the user to input search parameters in order to search the content stored by the system in accordance with their particular requirements. • Category search. In order to facilitate searching with no or minimal keypad input, allowing browsing through multiple levels of categories in order to locate content of interest. • Display of content. The WAP port may be configured to return those items which are viewable through the relevant access device, rather than returning only information regarding items without the actual content . • Update of personal profiles. Users may set or modify their profile parameters in order to determine both the content which is displayed on their personal start page and content which is pro-actively pushed to them by the system.
Referring now to the dynamic web builder 26, the primary function of this module is to guide content administrators through the creation of portal pages and WAP cards, including those which are dynamically generated from content in the system. In order to make this functionality as widely available as possible, the dynamic web builder is preferably implemented in a sufficiently light browser client for it to be used over almost any network connection, including relatively slow modem links. The creation of portal pages and WAP cards may be facilitated by the use of pre-defined templates, allowing an appropriate layout to be selected and populated with static or dynamically-updated content. The template design may ensure that pages and cards created using the template conform to any applicable standards; e.g. to ensure the inclusion of mandatory content such as corporate logos or copyright notices.
A set of templates appropriate to the needs of the specific organisation employing the system may be easily maintained by the content administrators. Existing templates may be modified and new templates created using a set of simple template definitions. Template definitions may be based, for example, upon XML (extensible mark-up language) so that templates may be created simply by personnel with reasonable web design skills.
The use of templates in this way enables new content services to be supported with minimal upgrade effort, typically only requiring the creation of a small number of new templates.
The preferred embodiment of the dynamic web builder 26 provides an intuitive browser based interface for content administrators to create portal pages and other content services. Once an appropriate template has been chosen, pre- defined sections of the template may be populated by simply clicking on the corresponding portion of the screen. Depending on the type of content required for that section, the content administrator may be offered a choice of pre-defined filters (as discussed in relation to the active agent environment 16) which enable the inclusion of dynamically updated content from the database 18, and/or the ability to insert static content from the database 18 (using the standard web port search facilities) or from other sources such as the administrator's personal folder on the system server or from a local file system on the administrator's work station.
In addition, fully featured text editors and image editors are available to the administrator to allow changes and additions to items of content which may require amendment, or for input of new text and/or graphic content. For services which employ HTML pages, an immediate preview is available at any time, enabling the content administrator to ensure that the limitations of HTML do not cause the page to be misrepresented.
Referring now to the targeted delivery port 24, this module operates in conjunction with the active agent environment 16 to match new incoming content against user profiles and to deliver content to each individual's preferred access device (typically a short message to a mobile telephone, an email or a fax) .
The user profiling functions of the system may be implemented in a number of ways, allowing for effective customisation of the user interface (s) . The least sophisticated approach is to provide a subscription form allowing the user simply to select subjects of interest. On this basis, users may manage their own user profiles. The present system may also be integrated with other systems allowing user profile information to be derived from other sources, such as IVR (interactive voice response) or customer relationship management systems.
The content delivery channels provided by the system preferably include, at least, email (e.g. using the SMTP protocol), fax (e.g. via a fax/email gateway) and SMS (short message service) text messages to mobile telephones (e.g. implemented through direct integration to an operator's short message centre or through a SMS gateway, depending on the projected volume and local telephony infrastructure) .
Referring now to the API module 28, this provides a user interface to the functionality of the web port 20 and allows integration of third-party applications which may be required to interact with the present system. This is made accessible through the associated software development kit. Referring now to content and user administration, the system is preferably configured so as to separate, as far as possible, the administration of content and users from the technical management of the system itself. For this purpose, the user and system manager module 38 provides a specific set of tools enabling non-technical administrators to manage the user and/or content aspects of the system. All of these administration tools, described further below, may be implemented using a browser client, thus enabling content and user administration without the need for specific software installed on PCs or other work stations used by various administrators .
The user and system manager module 38 includes user manager functionality providing for the creation and management of user accounts. The system may also be able to acquire user information from existing multi- user information systems, removing the need to create separate user accounts and avoiding the need for users to log in separately to different systems.
When creating or editing a user account, the administrator may set a number of properties, including the user's administrator status, membership of user groups, access to particular system functions, details of access devices, etc. Administrators may also temporarily suspend or remove user accounts. The user and system manager module 38 also includes module management functions, providing administrators with views of the status of key system processes, including those of the content acquisition and delivery modules 10, 12, 14, 16 etc., and enables the activation and de-activation of the relevant system modules. These functions also provide access to delivery queues of the targeted delivery port 24, allowing administrators to check the recent history of messages sent to the various content delivery systems, helped by the ability to sort queue listings by user name, message status (e.g. "pending", "sent", etc.), date stamp, or other attributes.
The user and system manager module 38 further includes database table management functions, providing system and content administrators with the ability to modify certain tables of the database 18 in order to configure and manage the system on an on- going basis. Typically, the database tables which may be modified using this facility include: content types (e.g. "video", "pdf document", "text", etc.); content subjects (e.g. "proposal", "employee broadcast", "classified advert", "sports result" etc.); scan parameters (similar to content subject but adapted for use with scanned documents such as press clippings etc.), country codes (to define country codes for content delivery via SMS) ; server hardware devices (intended for advanced system administrators, for use when hardware devices such as disk drives are added or changed) ; and module function definitions (again intended for advanced system administrators, to define new system modules as they are implemented) .
In Fig. 1, arrows indicate the main communication channels between modules involved in acquisition and delivery of content. In this example, content is input from the input modules 10, 12 and 14 to the active agent environment 16 and database 18, with news agency content going directly to the active agent environment 16 and other content going directly to the database 18. It will be understood that these specific input paths may be varied. The database is in two-way communication with the file system 44 for storage and retrieval of content. Both the active agent environment 16 and database 18 are in communication with the targeted delivery port 24 to allow content to be "pushed" to users. The active agent environment 16 communicates with the database 18 at least to the extent necessary to allow relevant content acquired via the active agent environment 16 to be indexed and stored in the database 8 and file system 44. Both the active agent environment 16 and database 18 are in two-way communication with the web and WAP ports 20 and 22 to allow content to be "pulled" from the system by users. The ports 20, 22 and 24 communicate with the various associated output servers/gateways 30 - 36. Typically, there is one- way communication between the targeted delivery port 24 and the SMS centre 34 and email server 36, and two-way communication between the web and WAP ports and the associated servers/gateways 30, 32.
It will be understood that the various modules 26, 38, 40 of the system management and service creation environment 42, and the API 28, communicate with the other system modules as required by their various functions. The numerous communication paths between these modules have been omitted for the sake of clarity.
Fig. 2 illustrates an alternative architecture for a system embodying the invention, which may provide similar or enhanced functionality to the embodiment of Fig. 1. In this embodiment, external content 120 is acquired by a system server 100, which communicates with a database system 102, similar to the database 18 of Fig. 1, and file system(s) 118, similar to the file system (s) 44 of Fig. 1, in order to index and store the acquired content. Content may be acquired from various sources as in the embodiment of Fig. 1, again using suitable input modules which may be incorporated into the server 100 or may be separate therefrom.
In this embodiment, all content distribution (user access and active delivery) is performed via a host system 104 supporting a plurality of web-type (suitably HTML or WML) applications and application servers 106. The Application servers in turn communicate with a plurality of web servers 112. For the purposes of Internet/Web and WAP users, the web servers 112 are connected to a dynamic router 114. Internet/Web users interact with the system directly via the router 114. WAP users interact with the system via a WAP gateway 116, which is connected to the system via the router 114. The dynamic router 114 allows system traffic to be spread and balanced between the plurality of web servers 112 and application servers 106, thereby increasing the capacity and robustness of the system. Fig. 2 arbitrarily shows six application servers 106 and web servers 112. This architecture is completely scalable to provide any required capacity and degree of robustness.
For the purposes of targeted content delivery, the web servers 112 communicate with an authentication server 108 and profiling server 110, which manage user authentication/session management functions and targeted delivery functions. Targeted delivery output 122 selectively delivers content to users on the basis of user profiles via multiple output channels in a manner similar to the embodiment of Fig. 1.
User profiles may again be maintained in the database 102 or in a separate database. The functionality of the active agent environment 16 and targeted delivery port 24 of Fig. 1 is incorporated into the database 102 and profiling server 110. Browser based user and administrator interfaces similar to those of the embodiment of Fig. 1 are provided via the application servers 106, web servers 112 and router 114/WAP gateway 116. In this embodiment, all acquired content is processed through the database 102/file system 118.
Apart from active, targeted delivery of content, preferred embodiments of the invention allow users' main Web/WAP access pages to be dynamically updated with information (typically hyperlinks) regarding new content on the basis of individual and/or group user profiles, again using query-based filters and agents. Active delivery may also involve transmission of a hyperlink to the relevant content rather than the content itself.
In all embodiments, profile-based content delivery may be implemented by a process of checking all newly acquired content against user profiles and actioning any matches detected by active delivery and/or updating of relevant Web/WAP pages.
In all embodiments, the indexing of new content may include cross referencing new and existing content and creating templates in accordance with predetermined criteria so as to generate aggregations of related content which may be incorporated into structured sets of dynamically updated XML/HTML documents or the like. Templates employed for this purpose may be structured so as to give such sets of related documents a common look and feel. The categorisation (indexing) of new content is preferably at least partially automated (e.g. on the basis of subscription information related to Web/Internet content and by use of document parsers to extract key words etc. from documents), but may involve some degree of manual input .
Other possible content input and output channels include audio and video streaming and natural language recognition.
Systems embodying the invention may be implemented using known data processing and communications technologies, protocols and languages, selected to provide a suitable degree of scalability and robustness so as to support any required number of users.
Modifications and improvements may be incorporated without departing from the scope of the invention as defined in the Claims appended hereto.

Claims

Claims
1. A method of acquiring and distributing digital information comprising: defining and storing user profiles for a plurality of users, said profiles including parameters identifying categories of digital information of interest to users and users' preferred communications and access channels for receiving notification of and/or accessing newly acquired digital information; receiving digital information from a plurality of digital information sources via a plurality of digital information input channels; indexing and storing said digital information; selectively notifying users of newly acquired digital information via at least one of a plurality of digital information output channels on the basis of said user profiles and said indexing of said digital information; and selectively making said digital information available to users via at least one of a plurality of digital information access channels.
2. A method as claimed in Claim 1, wherein digital information is received from sources including at least one selected from a group comprising on-line news agencies, the Internet/Web, document imaging systems, e-mail and direct user input.
3. A method as claimed in Claim 1 or Claim 2, wherein users are notified of newly acquired digital information via information output channels including at least one selected from a group comprising e-mail, Short Message Service and fax.
4. A method as claimed in any preceding Claim, wherein digital information is made available via digital information channels including at least one web server and at least one WAP gateway.
5. A method as claimed in any preceding Claim, wherein digital information distribution functions are administered via browser based user interfaces .
6. A method as claimed in any preceding Claim, wherein parameters identifying content of interest to users are defined by means of active software agents.
7. A method as claimed in any preceding Claim, wherein parameters identifying content of interest to users are defined by means of digital information filters.
8. A method as claimed in any preceding Claim, wherein parameters identifying content of interest to users are defined by means of query definitions.
9. A method as claimed in any preceding Claim, wherein digital information made available to users in digital documents formatted by means of pre- defined templates.
10. A method as claimed in Claim 9, wherein the content of said documents is defined by means of active software agents.
11. A method as claimed in Claim 9, wherein the content of said documents is defined by means of digital information filters.
12. A method as claimed in Claim 9, wherein the content of said documents is defined by means of query definitions.
13. A method as claimed in any preceding Claim, wherein digital information is distributed by means of an application host system supporting a plurality of web application servers, a plurality of web servers in communication with said application host system, and a dynamic router operatively associated with said plurality of web servers.
14. A method as claimed in any preceding Claim, wherein said digital information is indexed using a relational multimedia database system.
15. A method as claimed in Claim 14, wherein said digital information is stored in at least one file system separate from said database system.
16. A digital data processing system comprising at least one computer configured and programmed to implement a method as claimed in any one of Claims 1 to 15.
PCT/GB2000/003247 2000-08-23 2000-08-23 System and method for digital information acquisition and distribution according to user profiles WO2002017136A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB0303891A GB2382901B (en) 2000-08-23 2000-08-23 System and method for digital information acquisition and distribution according to user profiles
PCT/GB2000/003247 WO2002017136A1 (en) 2000-08-23 2000-08-23 System and method for digital information acquisition and distribution according to user profiles
AU2000267113A AU2000267113A1 (en) 2000-08-23 2000-08-23 System and method for digital information acquisition and distribution accordingto user profiles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB2000/003247 WO2002017136A1 (en) 2000-08-23 2000-08-23 System and method for digital information acquisition and distribution according to user profiles

Publications (1)

Publication Number Publication Date
WO2002017136A1 true WO2002017136A1 (en) 2002-02-28

Family

ID=9885475

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2000/003247 WO2002017136A1 (en) 2000-08-23 2000-08-23 System and method for digital information acquisition and distribution according to user profiles

Country Status (3)

Country Link
AU (1) AU2000267113A1 (en)
GB (1) GB2382901B (en)
WO (1) WO2002017136A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004109576A1 (en) * 2003-06-09 2004-12-16 National University Of Singapore System and method for providing a service
WO2005059676A2 (en) * 2003-12-16 2005-06-30 Sunil Goyal A method and system for personalized request/subscription based advertising and content services
GB2411079A (en) * 2004-02-11 2005-08-17 Urban Fun Ltd Data processing and transmission
US6999972B2 (en) 2001-09-08 2006-02-14 Siemens Medical Systems Health Services Inc. System for processing objects for storage in a document or other storage system
US7003529B2 (en) 2001-09-08 2006-02-21 Siemens Medical Solutions Health Services Corporation System for adaptively identifying data for storage
EP1698986A3 (en) * 2005-02-25 2007-06-27 Microsoft Corporation Creation and composition of sets items
US8135801B2 (en) 2002-06-18 2012-03-13 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997041654A1 (en) * 1996-04-29 1997-11-06 Telefonaktiebolaget Lm Ericsson Telecommunications information dissemination system
WO1999033293A1 (en) * 1997-12-23 1999-07-01 Global Mobility Systems, Inc. System and method for controlling personal information and information delivery to and from a telecommunications device
US5931907A (en) * 1996-01-23 1999-08-03 British Telecommunications Public Limited Company Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information
WO2000039666A1 (en) * 1998-12-28 2000-07-06 Spyglass, Inc. Converting content of markup data for wireless devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5931907A (en) * 1996-01-23 1999-08-03 British Telecommunications Public Limited Company Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information
WO1997041654A1 (en) * 1996-04-29 1997-11-06 Telefonaktiebolaget Lm Ericsson Telecommunications information dissemination system
WO1999033293A1 (en) * 1997-12-23 1999-07-01 Global Mobility Systems, Inc. System and method for controlling personal information and information delivery to and from a telecommunications device
WO2000039666A1 (en) * 1998-12-28 2000-07-06 Spyglass, Inc. Converting content of markup data for wireless devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ERLANDSON C ET AL: "WAP - THE WIRELESS APPLICATION PROTOCOL", ERICSSON REVIEW, ERICSSON. STOCKHOLM, SE, no. 4, 1998, pages 150 - 153, XP000792053, ISSN: 0014-0171 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6999972B2 (en) 2001-09-08 2006-02-14 Siemens Medical Systems Health Services Inc. System for processing objects for storage in a document or other storage system
US7343385B2 (en) 2001-09-08 2008-03-11 Siemens Medical Solutions Usa, Inc. System for processing objects for storage in a document or other storage system
US7003529B2 (en) 2001-09-08 2006-02-21 Siemens Medical Solutions Health Services Corporation System for adaptively identifying data for storage
US8135801B2 (en) 2002-06-18 2012-03-13 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US8793336B2 (en) 2002-06-18 2014-07-29 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US8825801B2 (en) 2002-06-18 2014-09-02 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US9032039B2 (en) 2002-06-18 2015-05-12 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US9619578B2 (en) 2002-06-18 2017-04-11 Engagelogic Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US9922348B2 (en) 2002-06-18 2018-03-20 Engagelogic Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US10839427B2 (en) 2002-06-18 2020-11-17 Engagelogic Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US11526911B2 (en) 2002-06-18 2022-12-13 Mobile Data Technologies Llc Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
WO2004109576A1 (en) * 2003-06-09 2004-12-16 National University Of Singapore System and method for providing a service
WO2005059676A3 (en) * 2003-12-16 2006-01-26 Sunil Goyal A method and system for personalized request/subscription based advertising and content services
WO2005059676A2 (en) * 2003-12-16 2005-06-30 Sunil Goyal A method and system for personalized request/subscription based advertising and content services
GB2411079A (en) * 2004-02-11 2005-08-17 Urban Fun Ltd Data processing and transmission
EP1698986A3 (en) * 2005-02-25 2007-06-27 Microsoft Corporation Creation and composition of sets items

Also Published As

Publication number Publication date
GB2382901B (en) 2004-10-06
GB2382901A (en) 2003-06-11
GB0303891D0 (en) 2003-03-26
AU2000267113A1 (en) 2002-03-04

Similar Documents

Publication Publication Date Title
KR100573037B1 (en) Content extraction server on the rss and method thereof, service system for idle screen on mobile using the same
US7406329B2 (en) Method and apparatus for subscribing and receiving personalized updates in a format customized for handheld mobile communication devices
US8296324B2 (en) Systems and methods for analyzing, integrating and updating media contact and content data
US8386513B2 (en) System and method for analyzing, integrating and updating media contact and content data
US20020080170A1 (en) Information management system
US8886704B2 (en) Method, system, and computer program product for automatically performing an operation in response to information
AU2005231112B2 (en) Methods and systems for structuring event data in a database for location and retrieval
US20020083093A1 (en) Methods and systems to link and modify data
US20040230566A1 (en) Web-based customized information retrieval and delivery method and system
US20040122912A1 (en) Method and apparatus for automatic document generation based on annotation
US20020059251A1 (en) Method for maintaining people and organization information
US20050108363A1 (en) Web page update notification method and web page update notification device
JP2009523284A (en) Search platform
US7421476B2 (en) Method for converting internet messages for publishing
GB2350758A (en) Message broker providing a publish/subscribe sevice and method of processing messages in a publish/subscribe environment
WO2002017136A1 (en) System and method for digital information acquisition and distribution according to user profiles
JP4642903B2 (en) Message conversion system and method with enhanced context recognition
KR20030004653A (en) Information support system and the method using real-time web search
Arregui et al. Yaka: Document notification and delivery across heterogeneous document repositories
US8239401B2 (en) System for sharing network accessible data sets
JP2009054166A (en) Posted data clipping system
US20110313970A1 (en) Method and device for resource management and recording medium for said method
JP2005085010A (en) Information service providing method and providing device
KR20000050072A (en) System and method of searching data using internet
JP2002007453A (en) System to release internet message through plug-in filter

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref document number: 0303891

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20000823

Format of ref document f/p: F

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP