WO2009070841A1

WO2009070841A1 - Social multimedia management

Info

Publication number: WO2009070841A1
Application number: PCT/AU2008/001794
Authority: WO
Inventors: Svetha Venkatesh; Stewart Ellis Smith Greenhill; Brett Adams; Dinh Phung
Original assignee: It Au0801806Rsity Of Technology
Priority date: 2007-12-05
Filing date: 2008-12-04
Publication date: 2009-06-11

Abstract

This invention concerns social multimedia management, and more specifically the organisation and sharing of multiple media among multiple users. In particular the invention is a computer operated method and system for managing social multimedia by receiving and storing plural types of user generated media items, such as photos, audio, activity and video. Receiving and storing plural streams of contextual information about user activity, such as collected media metadata, time stamps, location, user identity and calendar events. Automatically organising stored media items collected from more than one user to create plural streams of media items, each arranged according to a spatial or temporal pattern extracted from the collected contextual information. And presenting the streams of media items in a graphical user interface for user navigation, search and sharing of both media items and contextual data. Also, a processor automatically organises stored media items collected from more than one user to create plural streams of media items, each arranged according to a spatial or temporal pattern extracted from the collected contextual information. A graphical user interface presents the streams of media items for user navigation, search and sharing of both media items and contextual data. In another aspect it is a GPS cell phone and an application to run in such a phone. In a final aspect it is computer browsing application.

Description

Social Multimedia Management

This application claims priority from Australian Provisional Patent Application Nos 2007906640 and 2008902318, and the contents of the specifications of both these applications are incorporated herein by reference.

Technical Field

This invention concerns social multimedia management, and more specifically the organisation and sharing of multiple media among multiple users. In particular the invention is a computer operated method and system for managing social multimedia. In other aspects it is a GPS cell phone and an application to run in such a phone. In a final aspect it is computer browsing application.

Background Art

Media can be collected using different devices, including video or phone cameras, and these devices store data in different formats. Captured files must be offloaded from the devices into a file repository. This has led to the development of media specific management applications, like PhotoMesa, ACDSee, ProShow, iPhoto and Picassa, which allow photo collections to be organised and navigated in different ways. Many such systems require meta-data to be added manually by the user. This is a typical example of the current browsing paradigm that deals with single media at a time for single users.

In addition to personal search and retrieval, it is becoming increasingly important to be able to share material with others. Sites like YouTube and Flickr are good at disseminating material to a wide audience, but they rely heavily on meta-data such as tags and descriptions for findability. These sites also tend to focus on just one type of media, so if a user has a variety of interests they end up with fragmented profiles and social networks on different sites.

In order to share with particular people, rather than with everyone, those people need to be on the same site. Each site only knows about one facet of our experience, so users lose the ability to find or make relationships between different types of media.

While applications exist for streaming and uploading media from phones to the web, their use of context is limited to what can be done at the time of upload, such as adding tags, or deciding who to share with.

Disclosure of the Invention

The invention is a computer operated method for managing social multimedia, comprising the steps of:

Receiving and storing plural types of user generated media items, such as photos, audio, user activity and video.

Receiving and storing plural streams of contextual information about user activity, such as collected media metadata, time stamps, location, user identity and calendar events.

Automatically organising stored media items collected from more than one user to create plural streams of media items, each arranged according to a spatial or temporal pattern extracted from the collected contextual information. And

Presenting the streams of media items in a graphical user interface for user navigation, search and sharing of both media items and contextual data.

This method integrates multiple media types and multiple users, and transcends the gap required to deliver shared multi-media repository browsing experiences, by effectively harvesting context. As a result the invention is able to deliver a rich browsing experience, and supports sharing of both media and meta-data between users. The users typically carry media collection devices, such as digital cameras and other types of recorders. The devices generally time stamp new media at the time of creation. Where these devices are equipped with one or more radio technologies, then copresence data may be harvested to create another stream of contextual information. The joint use of shared context, especially copresence, and multiple media across users enables the synchronisation of media from multiple users to give composite perspective of places and events. For example, audio from one user can be automatically fused with photos from another user to produce an ad-hoc video.

Where the media collection devices include a radio communications capability the plural types of user generated media items may be extracted from radio transmissions to or from the mobile devices. Alternatively, or in addition the plural streams of contextual information about user activity may be extracted from radio transmissions to or from the mobile devices.

The invention makes use of devices such as cell phones, smart phones, PDAs, ultra- mobile PCs, digital cameras and MP3 players. These devices are not only able to capture traditional media such as images, audio and video, but often have sensing capabilities using radio technologies such as Bluetooth, RFID, WIFI, and satellite positioning systems such as GPS, and physical sensors such as accelerometers, thermometers, and infrared. All of these are able to capture contextual information.

The contextual information may also include media metadata harvested from media directly, for instance the EXIF meta-data of photos, or the activity time-stamps of Twitter, Facebook or Gmail where users communicate their activity or status on-line and in real time. Alternatively, or in addition metadata may also be collected indirectly, for instance from the file time for video and audio.

The information received from each mobile device may be first filtered of noise based on physical constraints such as velocity and acceleration. The media items may be arranged in time ordered streams, one stream for each user, for display at a graphical user interface.

The location of each device at the time of media recording may derived directly from positioning information, including coarse location data based on cell tower ID or fine resolution data from satellite positioning systems, such as GPS. Location may also be interpolated by collating locations of that user within a spatial tolerance. Additional locations of a user may be interpolated from the information received from one or more copresent users.

Parameters defining locations, times and the identities of copresent users may be extracted from the contextual information.

Context analysis algorithms may include, for example, time and location clustering, or image similarity based on content features. Other sources of context could also be added, such as calendar or blog entries, and activity streams could be analysed for semantics using topic modelling. Clustering may also make use of DBSCAN [6].

As well as simple time and location clustering of media items, it is possible to cluster by:

Events involving place, users and time. Simple places involving location and user. And Social Context

For Social Context the extracted parameters may be used to generate one or more social spheres for each user, where each social sphere is defined in terms of a location, a period of time and one or more copresent users.

A social sphere of a user may be generated from the extracted parameters using clustering. When clustering is performed in real time, a social sphere may be updated incrementally whenever new information is received using an incremental DBSCAN algorithm.

The method may further comprise the step of assigning labels for the social spheres generated for a user, based on one or more of the following: time of day the user is in the social sphere, job profile of the user and label assigned to neighbouring social spheres. Labels may also be set by the user.

One or more social rhythms may be generated for each user by clustering the social spheres generated for that user, along one or more dimensions of location, time and copresent user. Clustering of social spheres may be performed using a multidimensional DBSCAN (M-DBSCAN) algorithm.

One or more types of social rhythms may be generated, as follows: A frequency-based social rhythm of a user may be generated by clustering one or more social spheres generated for that user along the dimensions of the number of visits to the social spheres and start time of the last visit. A timed rhythm, which is associated with scheduled activities, may be generated by clustering one or more social spheres generated for that user along to the dimensions of location, start time of a visit to the location and optionally the duration of the visit.

A relational rhythm of a user may be generated by clustering one or more social spheres generated for that user along the dimension of a copresent user in the social spheres. Relational rhythms are associated with the interpersonal relationship between pairs of users.

An optional rhythm, which is associated with unscheduled activities, may be generated by clustering one or more social spheres generated for that user along any dimensions, except the time dimension.

The method may further comprise the step of determining a measure of social tie between a pair of users, from overlapping social spheres or social rhythms, or both, of that pair of users. A measure of social tie between a first and second user may be based on the frequency of the second user appearing in the one or more social spheres generated for the first user. The measure may be weighted by a weight factor assigned to each of the social spheres.

Another measure of social tie between a first and second user may be based on joint entropy along any one dimensions of location, time and copresent user of the one or more social spheres generated for each of the users.

For the purposes of display in a browser, a stream of media items, or thumbnails, may first be ordered chronologically in a "time strip" with a header showing the date for each new day. For visual media (images, video) thumbnails may be extracted from the media file. For text objects, the text may be rendered in-line. Other objects may be shown using icons.

A time-line may represent a global timebase across the entire media repository. When we move the cursor in the time-line, the browser dynamically creates JMF players for media items, synchronising their time-bases as required. The media view includes embedded players for stream-based media, and rendered images for other media like photos. By synchronising all media, we get the impression of a "compound stream", composed of contributions of multiple users.

A number of filter components may be presented on the graphical user interface to provide different ways of filtering the media thumbnails for presentation. This includes media type, creator, time and place. The results of multiple filters are applied to the time-strip display, and to media displayed, for contextual navigation.

The invention enables searches of the media via spatial and temporal filters, which may be used together with a chronological time-strip view for the traditional "top-down" navigation of media collections. Alternatively, the invention may provide contextual navigation of multiple users' media by using co-presence, spatial clustering and temporal clustering over time and location across the separate media streams. This enables an alternative "bottom-up" navigation of media collections.

Contextual links may allow navigation from one displayed media item to another based on time, place, and co-present users. Thumbnails, or some other token, may be used in place of media items for navigation purposes. In any case, selection of the item or token may return one or more hyperlinks to other media items, or tokens, related in time, space or by another copresent user.

By use of meta-data tuples, such as:

{media, place, time, activities, friends}, rich queries can be made by selecting subsets of available meta-data. Examples include: 1. What did my friends do on the weekend? (User, Time — _* Place, Media, Activity)

2. What do people do in place X? (Place — ► Media, Activity)

3. What does this place sound like? (Place → Media, Place)

4. Where do people do activity A? (Activity -→ Place) Other rich queries can be made by selecting different subsets of available meta-data.

Context analysis algorithms could include, for example, joint space-time clustering, or image similarity based on content features. Other sources of context could also be added, such as calendar or blog entries, and activity streams could be analysed for semantics using topic modelling.

In a second aspect the invention is a computer system for managing social multimedia, comprising:

An input port to receive and store plural types of user generated media items, such as photos, audio, activity and video. An input port to receive and store plural streams of contextual information about user activity, such as collected media metadata, time stamps, location, user identity and calendar events.

A processor to automatically organise stored media items collected from more than one user to create plural streams of media items, each arranged according to a spatial or temporal pattern extracted from the collected contextual information. And,

A graphical user interface to present the streams of media items for user navigation, search and sharing of both media items and contextual data.

The processor may be located on a personal computer or server where media repositories are stored separate from the source media files. For each user, the repository may include a media index, thumbnail database (rendered or extracted from original files), a location log, a log of bluetooth contacts, and a temporal and spatial cluster index.

A browser application may run on the personal computer or server to present a "time- strip " component on a graphical user interface that summarises all existing media items using small low-resolution thumbnail images. The media items may be ordered chronologically, with a header showing the date for each new day.

In a third aspect, the invention is a GPS cell-phone equipped one or more media collection devices, and an application for collecting and storing plural streams of contextual information about user activity, such as media metadata, time stamps, location, user identity, calendar events and co-presence data. The application may periodically monitor the cell-phone's file system to detect new media files, and when it discovers a new media item, it may activate the GPS to sample position, and initiate a bluetooth enquiry. The application also collects posts from the phone related to current activity.

The invention is able to represent and structure heterogeneous signals from various devices embedded in daily life, and inherently noisy sensors, to support tele-mediated relationships. The invention takes advantage of multi-user settings to propagate persistent GPS trace data and copresence among the mobile users, and thus is able to provide robustness to the heterogeneity of device types and signal qualities.

Brief Description of the Drawings

An example of the invention will now be described with reference to the accompanying drawings, in which:

Fig. 1 is a block diagram of a system exemplifying the invention.

Fig. 2 is a flowchart showing an overview of processing flow.

Fig. 3 is a flowchart showing the flow of social context analysis.

Figs. 4(a) and (b) are graphs of single user entropy plotted against dimension (a) by duration; and (b) by count.

Figs. 5(a), (b), (c) and (d) are plots that illustrate social tie by proximity among the users (a) by duration, equal weights; (b) by count, equal weights; (c) by duration, weighted spheres; and (d) by count, weighted spheres.

Figs. 6(a), (b), (c) and (d) are plots that illustrate joint entropy of users by (a) location; (b) time; (c) duration by counts; and (d) people.

Figs. 7(a), (b) and (c) are plots of ranked timed rhythms for User X, and presence of Users Y and Z in those rhythms.

Fig. 8 is an image of the browser interface layout, showing a spatial filter, time- strip, media view and time-line. Fig. 9(a) and (b) are diagrams showing examples of two filter interfaces; and Fig. 9(c) is a block diagram showing the propagation of a selection through the filters to the viewer.

Fig. 10 is an image of a synchronised media display, with contributions from three users.

Fig. 1 1 is two collections of images illustrating two modes of the time-strip: (a) sequential and (b) event display.

Fig. 12 is a series of images of context within a display, showing the media producer and other people present. Also a filmstrip at the bottom shows related images.

Figs. 13(a) and (b) are diagrams that illustrate another application of the invention for blog navigation via social rhythms.

Best Modes of the Invention

Referring first to Fig. 1 , the system 100 comprises plural mobile devices 1 10 in communication with a server 120 and a database 130 over a mobile communications network. All the mobile devices are equipped with GPS and Bluetooth capabilities, and have embedded image, audio or video capture capabilities. The standard applications include a camera which captures still images up to 1600x1200 pixels. It records MP4 video for up to 1 hour at a time at 320x240. The voice recorder supports 16-bit 8KHz recordings with durations up to 1 hour.

The mobile devices 1 10 are programmed with a background task to log context items including GPS location, time, user identity data and calendar events. GPS location, Wifi and Bluetooth signals, are logged for example, using software such as Placelab. Bluetooth contacts are used to indicate user co-presence.

The user's activity can be logged via applications like "Twitter".

The logging applications are scheduled by a watchdog process that also logs useful information such as device usage, available memory, and battery charge. The watchdog process schedules upload of logs and media to the server 120, for example, via a client- side executable and server-side PHP. Uploading is performed automatically whenever a connection to the repository is possible, using wireless connection with a base station 150 or wired connection with an upload station 160.

Uploading occurs when, for example, the device is charging or a wireless network is detected. Uploaded items are routinely imported into the database 130 and all subsequent processing accesses data and stores results via the API of the database. For example, the database 130 may be a SQLite and the API SQLite API. The data collected 200 (see Fig. 2) is generally large-scale, multimodal, noisy and missing in parts, such that it requires further processing by server 120.

The main limitation on logging is power consumption so a number of power saving features are employed: Rather than sampling GPS continuously, we passively record position every ten minutes, switching the GPS off between samples. The system also passively scans for bluetooth devices every two minutes; these scans are synchronised with the GPS samples. The combination of location and co-presence is called the users' physical context. As well as passive sampling the system also actively samples physical context when events such as media creation are detected.

Users are able to capture media using the watchdog application but to save power they capture media via the inbuilt applications. In this case the watchdog application periodically monitors the file system to detect new media files. When it discovers a new media item, it activates the GPS to sample position, and initiates a Bluetooth enquiry.

The result of the logging process is that the user's position is sampled sparsely, but the locations of media items are still known with reasonable precision.

Preliminary Data Analysis

Referring now to Fig. 2 the multimodal data 200 is pre-processed 210 to separate it into media items 220 and contextual items 230. In addition, an analysis application "Geode" 245 extracts activity information, and this is treated as media items (time-stamped text descriptions).

Organisation of Media Items

Media items 220 are imported into a media repository 240 by nominating a folder for each user to contain media files. The system recursively scans folders, extracting basic data from the files. For images it gathers the size, creation time, and a thumbnail, which are normally available in EXIF data. For video and audio files there are no standards for embedding meta-data. Geode 245 computes start time and duration from the file modification time and media stream duration. If necessary, time corrections can be applied to the file times or EXIF times for particular media folders. This allows us to handle errors in time setting, or offsets due to factors like timezone or daylight savings. At the end of this process, Geode knows the temporal extent of every media item.

Implementation Geode is implemented in Java and uses the Java Media Framework (JMF) for media access and display. JMF is a flexible multimedia platform, but only supports a narrow range of media formats. A small set of formats is implemented in Java, but it relies on "performance packs" to use native codecs, giving poor portability across platforms. We use the FOBS plugins (fobs.sf.net) which address this problem by implementing a JMF interface to the ffmpeg platform (ffmpeg.mplayerhq.hu), supporting a diverse range of codecs, formats and media transport across many platforms (Windows, Macintosh, Unix).

Media repositories are stored separate from the source media files. For each user, the repository includes a media index, thumbnail database (rendered or extracted from original files), a location log, a log of bluetooth contacts, and a temporal and spatial cluster index. These are stored using a mixture of plain-text and binary files, but could easily be stored in a relational database if required. The system works well for moderately sized repositories (eg. around 15,000 objects: 20Gb photos, and 50Gb video).

The map viewer retrieves maps from tile servers over HTTP. Maps are available from many providers including NASA, Yahoo, Microsoft and Google. Geode maintains its own persistent local cache of map tiles, so once the system has been primed the network load for map browsing is very low. The cache also allows the map browser to be used off-line.

Time Clustering

The media items are arranged in time ordered streams 250, one stream for each user, for display at a graphical user interface 260.

Location Clustering

The media items 220 (photo, video, audio, activity) are compared to the GPS stream within the contextual data to geolocate each item. The GPS traces are filtered based on physical constraints. Successive points are subject to velocity and acceleration reality checks and removed if they violate them. Raw GPS traces tend to be sparse, due to a combination of device, environment and user characteristics.

A two-pass approach is used to interpolate or attenuate the amount of missing data. The first pass seeks to estimate when the user is stationary at a position, and interpolates GPS fixes at that position. This is done with a simple spatio-temporal threshold test. If a trace disappears and reappears within a time threshold, and within a distance threshold of the last position, the user is inferred to have remained at the last seen position for the duration. The second pass becomes possible in a multi-user context: sharing GPS (and other) information by virtue of copresence.

Geode 245 also extracts position and co-presence information, and where possible, location and co-presence is propagated between users. This helps deal with the vagaries of wireless reception, which sometimes result in GPS and bluetooth visibility being different for users at the same location. It also allows location to be available to users without GPS (eg. if they are with a user with a GPS phone).

We first infer copresence of users directly from either absolute collocation via (interpolated) GPS, within a spatial tolerance; or Bluetooth device discovery, within a temporal tolerance. Copresence is then propagated among users up to a given number of hops. The worst case algorithm complexity is O(NU²) worst case, where N is the number of GPS or Bluetooth samples, and U is the number of users in the database.

In an online setting, jV can be reduced arbitrarily, for example, when logs are processed a day at a time. If copresence that falls outside the time range of the currently considered sample is culled from consideration when propagating copresence, complexity becomes O(UV), where V is the number of others copresent with the user on average. In the experiment, we processed 10 million samples (GPS fixes with variable duration plus Bluetooth device discoveries) with 8 users at 10 hops in under a minute on desktop hardware without culling. Hops, and spatial and temporal thresholds are proportionally related to resolution, that is higher thresholds allow users further away to be considered copresent. In the experiment, spatial threshold is set to 150m and temporal window is set to 1 minute.

Copresence may be viewed as a junction through which other information can pass. We pass not only copresence (generating a fully spanned copresence network subject to the parameters mentioned above), but also location information if a user has missing data. If more than one donor is available, the user with the longest GPS range donates their information to the user. Other policies that, say, preferred higher signal quality to duration could be used. Fig. 3(b) depicts an example of GPS data shared through copresence. The total signal comprises filtered and intepolated signal 315 and copresence signal 320 derived from data collected by the user or shared by other users. Single user interpolation is then performed again on the augmented data.

For each user, Geode 245 builds an index to the media repository 240. Geode also attempts to derive additional layers of meta-data by clustering over time and location across the basic media streams. This information could be shared between distributed Geode instances via a network, but for this example we consider only the possibilities of this scenario with data hosted on a single machine.

Event Clustering The system uses clustering to attempt to identify events (using place, users and time).

Events are deduced by clustering over the media creation times. Often photos appear in bursts, as the user captures different aspects of a scene over time. The system implements hierarchical event clustering based on the method of [8]. Starting with an initial time difference of 4 hours the system derives an initial set of clusters which is then further refined by searching for outliers. The system computes the time differences between all photos within a cluster then find the boundary points Q\ and Qh for the first and third quartiles of the distribution. An outlier is any difference greater than ζ>3+2.5*(g3(?l), where 2.5 is an empirically selected value. Outliers are used hierarchically to partition clusters into sub-clusters.

Events are used by the system to simplify summaries of user activity. A cluster of images can be replaced with a single image which can optionally be expanded to show the original set. Events are also used as a "coarse" scale at which to navigate a user's media stream. Lastly, events are used to find objects related to a particular media item. Simple Places Clustering

The system uses clustering to attempt to identify simple places (using GPS and user).

Simple places are regions where users spend time or perform activities. The system uses them in two ways. Firstly, if a media item can be associated with a place, it can easily be linked to other objects at that place. In the browser, this corresponds to the operation "Find Other Objects Here". In this sense it is important that the place is a symbolic object rather than an arbitrary positioning of an area on a map, because we want this operation to be free of user interaction.

Secondly, places are used to simplify the display of objects on maps. If we attempt to display all objects at their location the display quickly becomes cluttered, and objects become obscured by other objects at the same location. Instead, we display a marker indicating the number of objects at the place. The user can "mouse over" the marker to see a representative set of these objects. This clustering can be done hierarchically with increasing granularity so that low-level clusters merge as the user zooms out on the map display.

Geode implements spatial clustering using DBSCAN [6], a density-based clustering algorithm. Advantages of DBSCAN are that it (a) discovers clusters of arbitrary shape, (b) requires minimal assumptions about the data, and (c) works efficiently on large data sets. DBSCAN requires the specification of two parameters: ε, the threshold for neighbourhood reachability, and D, the minimum number of points per cluster. The system derives places by clustering over the location of media collected by all users. We found that ε =100w works well for our data, but note that this assumes "urban- scale" activity which might not be appropriate in all scenarios (eg. world travel). These situations would require either a hierarchical approach (clustering at multiple scales of ε), or use of variable-density [H]. Events and Places offer a wealth of browsing possibilities. Social Context

In addition to simple clustering the following social context can be extracted:

(i) Social spheres are defined by a location, a period of time and one or more copresent users. Clustering of GPS traces is performed to identify social spheres, and these are labelled according to their locations. Social spheres characterise where users go and are labelled geo-spatial locations of significance to a user.

Location is a useful piece of knowledge. From a media management point of view, it is a useful index of recall. The knowledge of location also allows users to be mapped loosely to role(s). For example, a user may be a father at home but a consumer at the shops. This may then be used to provide proactive device behaviour, as well as forming the basis of relational metrics such as social tie strength.

(ii) Social rhythms are generated by clustering a user's social spheres along one or more of the dimensions of location, time and copresent user. Social rhythms are latent pursuits of daily life that characterise what users do.

(iii) Social ties are determined based on the frequency of a user appearing in one or more social sphere of another user. The tie strength of a user is measured based on time spent with a person, weighted by the nature of the spheres of the interaction. The nature of ties is also characterised based on the type of social rhythms shared. Social ties are the relationships a user has with others, and they capture who users know.

Social Spheres Extraction (see Fig. 3, 320)

To extract social spheres, the system performs clustering of GPS traces and labels discovered spheres, for instance, as 'Home', 'Work' or 'Other' using heuristics. In displaying the spheres subtle differences in cluster shapes are preserved in a compact, efficiently updatable manner over time by using convex hulls. For example, adjacent buildings can coincide with markedly different roles or activities, such as an office and a cafe, and preservation of these distinctions provides a better foundation for subsequent analyses than representations with lower degrees of freedom, such as bounding boxes or centroid-radius pairs.

Clustering GPS Traces to Discover Places and Stays A Place is an approximation of a real-world area visited by a user, as evidenced by his or her GPS traces. The invention may represent the area characterizing a place as a convex hull of latitude-longitude pairs. Simple indices of a user's particular idiosyncratic use of a place is also stored, including the total time spent there (totalduration), time of their last stay (lastseen), and the number of stays (numstays). Hence, for a user a:

Place(a) = {cvhull, totalduration, lastseen, numstays}

A stay is a period of time spent at a discovered place, and is characterized by the start time and duration of the visit, together with the set of people copresent at any time during the stay. Stays are the unit of clustering to discover social rhythms. For a user a: stay(a) = {place, time, duration, people}

Places are extracted online using IncDBSCAN, that is an incremental version of DBSCAN. DBSCAN has the threefold advantage of being deterministic, tractable for large datasets, and able to discover clusters of arbitrary shape. Convex hulls are also computed online, incrementally in O(log m) for each insertion, where m is the number of points in the cluster during the update. These algorithms are suitable to resource- constrained mobile devices, and the streaming nature of radio sensor data.

Assume that all points prior to the arrival a new point have been clustered. The essential step in IncDBSCAN is the insertion of a new point/?. When this occurs, the set of core points after the arrival ofp are affected and need to be refined. Convex hull descriptions of clusters are maintained and updated incrementally using techniques from computational geometry whenever a point is added to a cluster (Absorption and Merge), or computed from scratch (Creation). Informally, the algorithm divides an existing convex hull into a sequence of pie pieces' anti-clockwise by maintaining the list of points sorted by order of angles. When a new point p arrives, a search is performed to find the corresponding slice that p belongs to and the convex hull is updated accordingly.

Two-speed thresholds are applied separately to GPS traces to obtain places at two resolutions: the first filters out all points that are not approximately stationary, while the second filters out points above slow walking speed. The second pass was added after initial experiments found areas connected by travel on foot to be a useful index of recall. Use of coarser resolution will also depend on the particular application. To determine if a user is staying at a place, points are tested for inclusion within the smaller convex hulls obtained with the first threshold, and then the latter. Table I presents results for place discovery.

User Dci.cctcd Unknown Home (hours) Work (hours)

2 H) I M) V V (270)

4 :57 0 V V ( 1 2 !

5 20 :i m v^' (:',87) V (U)

f

I lsspr U) is a st ny-iit . |iθ!«e Mum: Heine aiut Wo i k ;ir>? correct ly det ect ed ris t he s;mιe sphe-re

Table I: Social sphere discovery results.

Labelling Places

At this point we have extracted static representations of places visited by a user, together with some attributes derived from the history of the user's appearance at those places. We now desire to label those places with socially semantic categories. That is, beyond anonymous coordinates on a map, what significance does a place hold in the web of a user's social life? This information will provide the necessary building blocks for more complex inference about social context. We choose to label places from the set Home, Work or Other, as it is a widely applicable trichotomy and offers potential for inferring about the nature of activities occurring at a place or the relationships carried on there. We use a simple heuristic, which is by no means the final word on this classification task, but despite this, it demonstrates surprising effectiveness.

The algorithm attempts to discover the place corresponding to Home or Work by applying a time filter, τ(x_t), to GPS fixes, X₁ , to obtain only those in the assumed appropriate time ranges, and returns the cluster with maximal duration. To detect Home we specify that T_hOme(^χι) is true if ^χι is collected before 7am or after 7pm. Similarly, T_wor_k(xι) is set to 8am- 1 lam and lpm-4pm on a weekday. An example of labelled sphere is shown in Fig. 5(b).

A more sophisticated classifier might handle varying user profiles, such as having many part-time jobs, a night-time job, or even no job at all, by using fundamental assumptions about the need for sleep or socialization, and try alternate hypotheses about temporal patterns. Urban zoning information, such as can be obtained from GIS services, might provide a valuable input. A complementary approach could represent places as nodes in a graph with edges as travel routes between places. Home is often a ^'decision point,' and would appear as a hub in such a graph, and might be detected by typical graph measures, such as centrality.

Social Sphere Discovery Results and Observations

All algorithms in this paper are implemented in C# or C++, an R*-tree data structure is used for the database (query region is thus in 0(log N)), and a binary tree for the convex hull (searching is thus in O(logm).

Referring to Table I, the users with the highest Unknown counts are User 2 and 6. User 2's false positives come from unfiltered noise (6) and car parks (4). The spheres derived from noise consist of (anomalous) convex hulls of only 1 , 2 or 3 vertices. The car parks achieve sufficient cluster duration in cases where the device was turned off and left in the car, for convenience, and GPS was subsequently interpolated for the entire period the car was parked.

User 6 went on holidays for 2 weeks during data collection to an unfamiliar city. As a tourist, his movements consisted of many fragmented stays at places not always easy to recall. We note spheres can be refined based on total duration spent there. Significance of a sphere is user- and application-dependent, but unknown places tend to cluster at the tail and can be pruned with thresholds corresponding to differing confidence levels. All homes and workplaces were discovered with the labelling algorithm, barring only the home of User 8, which is due to the small amount of data collected for him, exacerbated by a daily routine of starting late and working late.

Social Rhythms Extraction (330)

Latent pursuits of daily life give rise to repeated occurrences along the dimensions of people, place and time, and we collectively call this complex set of projections social rhythms. We hypothesise that instances of specific, real-world activities entail inherent constraints that will leave footprints in the dimensions of people, place and time. Rhythms may be characterized by repeated start time or duration often indicate an inherent constraint imposed by an institutional timetable or felt expectation.

Repeated duration can also indicate structure inherent to an activity. For example, a basketball game might be timetabled at various start times, but is always forty minutes long. Rhythms formed by similarity of people present arise from activities constrained by who must attend, such as the choir at both practice and performance; rhythms comprised of repeated place can indicate the presence of a resource (animal, mineral, vegetable or even intangible) that is a necessary component of an activity tied to a location, and consequently draws the user to that location, such as the requirement of a pool for swimming lessons.

Social rhythms do not define an activity to the degree of high-level textual label, such as is the case with work aimed at (usually supervised) classification of highly specific situations. Rather, to continue with the footprint metaphor, a paw print could come from either the family dog or a hungry wolf. It is up to the user to close the semantic gap in their own unique context and name it a "chess club event" or "skydivers outing."

Again, we use the DBSCAN algorithm for our rhythm clustering task. DBSCAN can handle multi-dimensional data in any metric space as long as the distance function is a metric. We term this adapted algorithm as multi-dimensional DBSCAN (M-DBSCAN). Note that a function μ() is a metric if it is positive definite, symmetric and possesses triangle inequality properties. A stay is the fundamental unit in the clustering process. A stay, represented as a vector, geometrically spans a multi-dimensional space 6 where each axis corresponds to a contributing element.

The rhythm extraction problem is thus refined as a problem of clustering stays detected for each user by folding in certain dimensions. If n is the number of dimensions, there are (^) + (^) + . . . + (£) = (2ⁿ - 1) different ways to perform clustering, which grows exponentially with n. Even for a small number of n = 4 or 5, this number is already 15 or 31 which could be undesirably large in this setting. Table II presents a subset of all possible configurations of more readily interpretable rhythms in terms of the constraints.

C tmoiMl shop on Silt < Us., i { simirπnit's iliHi¹! * tir honii's ior mi

Table I Rhythm detection as a combinatorial clustering process by folding in certain dimensions. Where and when are, arguably, two of the most important aspects of a person's life, and we cast this collective footprint as rhythms bound by place and time.

We first examine rhythms as a function of the number of stays, numstays, at a place and the last time it was visited, normalized by the period of data collection up to the time the set of places are updated, and term them rare and frequent. This functional relationship is a natural one. For example, going to "a Pink Floyd live show at Langley Park" yesterday is not a rare event today, but it may be in a month's time. Recall that for a user a, the set of discovered places Ω(α) is actually changing over time as new places are discovered and existing ones are refined. This classification is subjective and controllable by two parameters R_7n and R_d which are thresholds on the number of stays and the time last seen at that location.

Examples of rare rhythms 410 are a visit to a knee surgeon (30) and attending a conference (33). Examples of frequent rhythms 425 include basketball games on Monday nights (12) and visits to GP2s at random times (9).

Rare Frequent Timed Optional

User # Places ;/ St ays H- ROCHII it- Rtjcall it- Kecall

2 101 41 14 0.92 16 1.00 75/48 1.00 OT/25

4 M 132 21 0.75 ό 1 .00 0 , 0 1.00 fi/fl

5 20 1401 8 0.78 H 1.00 46 /.^'54 1.00 41/ LO

6 121 558 88 1.00 14 L .00 12 /0 1.00 29/27

20 1834 10 1.00 6 1.00 4.V. 'SO 1.00 2 L / L0

8 4 14 :\ X 1 X 0 0 X 2/ 1 f) 55 4 1 0 2 1 1 .00 10 L.00 1 1 /9 1.00 20/ I H

H) 4<i 245 ;so 0.55 8 0.(57 1.00 15/ 14

Table III. Social rhythms extraction results.

Quantitative results for rare and frequent rhythms using R_7n = 3 and R_d = 180 days are presented in Table III. Groundtruth from each user indicates activities and their rough frequency such as "often, sometimes, once or twice". Recall rates for rare and frequent rhythms are reported using groundtruth. As expected, the number of rare rhythms is much higher than frequent ones (on average of 31.6 and 8.9 respectively). Recall rates for frequent rhythms is also higher than that for rare rhythms. This is expected because frequent rhythms tend to be easier to recall than rare ones, which can be more sensitive to users' 'memory noise.' We also note that since the data collection spans a long period of time, manual self-report, and even semi-automated experience sampling tools, are untenable, and hence calculation of precision is not possible.

From a social perspective, punctuality plays an important role in shaping one's life patterns: arriving at work on time, dropping your child at childcare, etc. Far from being rare, this type of social pattern, termed timed rhythms, is usually bounded by institutional timetables and schedules such as work, training sessions. For example, rhythms in Table II are clustered on time. Timed rhythms are thus detected by clustering on the set of stays constrained by place, day of the week and approximate start time, and when coupled with duration gives us a stricter form of timed rhythm. Optional rhythms, on the other hand, are flexible and elastic sequences resulting from pursuits during free time, and may or may not relate to institutional schedules and timetables. This type of social rhythm is somewhat spontaneous and may or may not have fixed duration. For examples, rhythms in Table II are not clustered using time.

Using M-DBSCAN, clustering for timed and optional rhythms is controlled by three parameters: 7τ_mιn__stays is the minimum number of stays in a cluster to claim it as a rhythm; and ττ_time.reiax and πdur_reiax are respectively the variation in starting time and duration to make two stays directly density reachable. Unless otherwise stated, the following parameters are used: π_mirι__stays = 3, π_tirne__reι_ax = 15 minutes and πdurjreiax = 15 minutes.

Relational Rhythms: People-based Clustering

We now add the dimension of people to reflect that social life is interactive in nature: hanging out with relatives, friends, working with colleagues. Specifically, we consider the set of people co-located in stays. This type of social rhythm constrained by whom the user interacts with is termed relational rhythms (for examples, see rhythms in Table II clustered using people). Relational rhythms are clustered using similar settings to timed and optional rhythms but with added constraints on collocated people. Table IV presents a matrix of results detected for these rhythms where clustering on stays is constrained by place, any day of the week, approximate starting time (πtime-r_eiax = 15 minutes) and copresence with a given user. Each cell (i,j) reports the number of relational rhythms shared by users i andy, i,j e {2, 4, 5, 6, 7, 8, 9, 10}.

Table III. Relational rhythms result matrix: each cell (i,j) contains the total number of shared rhythms from user i's perspective, and a description for a small sample.

It can be noted that Table IV is asymmetric. Recall that both places, and hence stays, are user specific. Hence rhythms built atop these are also user-specific. For example, User 2 has three specific places connected with the university, whereas User 10 has two. Thus User 2's stays at the university may be more fragmented than User lO's, which will affect clustering performed on those stays.

Relational rhythms form a natural basis for inferring something about the nature of interpersonal relationships. For example, two actors may have shared relational rhythms based on Home, Grandparents (GP2), Parks (ParkLake), or Work. As shown in Table IV, User 2 shares timed rhythms with both User 10 and User 5, but those with User 5 occur exclusively at Work, whereas those with User 10 are more spread. Users {2, 5, 6, 7, 8, 9} are colleagues with a high proportion of rhythms occurring at work, with the exception of the shared anomaly, Heathcote, a social gathering. Lifestyle Measures Based on Social Rhythms

Social rhythms provide a basis for formulating measures of the kinds of activities a person engages in. For example, questions that may be answered using the extracted social rhythms are "is a person run by the clock?" or "do the people tend to choose activities for the nature of the activity, or the people who can be involved?" Below we present results of calculating entropy on each dimension for a user. The resulting vector characterizes a user's predictability of each dimension on his mix of activities.

Figs. 4(a) and (b) plot entropy on each dimension for all users, where cluster entropy is calculated by total duration in each cluster, and number of stays in each cluster, respectively. These, and all subsequent, calculations are performed on a five month subset of the data where copresence is detected using Bluetooth and GPS. Referring to Fig. 4(a), it is interesting to observe that User 10 has the lowest entropy on location (455), being a stay-at-home mum while User 9 has the highest (450), having a working week split between home and the university campus. Observations regarding the people dimension must be tentative given the small number of users, but we note that User 10 has the highest entropy on people by duration (460), stemming from the fairly even split of time alone and with User 2. Users 2 and 7 are involved in a number of projects at work, and therefore see more people, and the lowest, User 6, doesn't collaborate with any of the other users in the study (465). User 8 has by far the smallest amount of data, and his probabilities, and hence, entropies, are severely skewed. This might be attenuated by interpolating over periods of missing data with synthetic data. Entropy by count is is higher for greater diversity in the location of stays, regardless of the amount of time spent their. Entropy by duration takes time spent into account. Thus, if a user visits many locations, but spends 95% of time at one of those locations, entropy by duration will be low, but by count will be high. By count sensitizes the measure to the degree the person 'gets around¹; by duration sensitizes it to the diversity of activities (presumably different for different locations) the user performs. In addition to a static vector calculated from the totality of a user's data, the vector, or its elements, can be plotted in time. For example, User 2 has edges in both location and time entropy calculated week by week that coincide with a holiday period.

Social Ties Extraction (340)

Social context, in addition to place, and activity, includes relationships termed social ties. A tie may be characterized by an ordered pair of actors, to borrow a term from social network theory, its nature (such as familial, friends and work-related), the strength of the bond, and shared social spheres.

We focus on two aspects. The first is an approximation of tie strength, formulated as a function of proximity, which falls easily out of the collected data. Simply put, how much time do you spend with someone? This time may be further weighted, for example, by ascribing more or less significance to the social spheres it occurs within.

The second aspect is the more subjective notion of closeness, and we present some initial investigative measures aimed at providing a foundation for its inference. For example, closeness may be categorised as close and somewhat close. Close people were those with whom respondents "discuss important matters with" or "regularly keep in touch with", whereas somewhat close people were more than just casual acquaintances, but not "very close".

Tie Strength from Proximity

We require an estimate of the user's interaction with others, and there are many ways this can be estimated, such as detection of presence through audio, co-located GPS, active RPID and so on. We focus on physical proximity, while noting that virtual proximity or interaction via information communication technologies (ICTs), such as Instant Messaging, email or even web documents, is an important input to social tie strength. Regardless of the technique used to assess copresence, tie strength may be formulated with respect to user i as follows. Let user i be observed over a set of 5 sampled periods, and let I₁ be the social sphere of this user at sample s. Then let p_τ denote the Boolean presence of another actor j in sample s, 1 denoting present and 0 not present. To account for the relative importance of spheres when users interact, we introduce ωι, a weight expressing the relative significance of sphere I₁. For example, depending on the application, home might be more socially significant than the dry cleaners. Social tie strength T between actors i and j is defined as:

1 ⁵ T(i,j) = ^- X)p,(θ,J>i(/.(s))

where N_l<s is a normalizing constant for actor % over the sample set. T = I represents strong tie, that is 'actor j is always with ι' for the sample set, and T = O represents no tie, that is 'j is never seen with V It can be noted T is not commutative, by virtue of the weights, reflecting that the strength of a bond from one person's point of view isn't necessarily shared by the other.

Relationships carried on at familiar places imbue those places with a derivative significance, and those places in turn may imbue continuing or new relationships carried on there with significance reciprocally. One option for generating sphere weights is to use a media- favoured approach: the significance of a sphere is proportional to how much media is captured there. Let l_m be 1 if media item m was created at sphere ι, and M be the total number of media items captured in sample set 5 at all spheres. Then, ωι is defined as:

Social ties used in the media browser Socio-Graph are calculated this way. Other possibilities for calculating sphere significance include sphere type, such as Home or Work, cumulative time spent there, or even the kind of social rhythms that occur there. For example, spheres that host optional rhythms might be weighted higher. Weighting schemes will depend most on the immediate application of the tie strength approximation, but whatever the favour of U>_L, the assumption is that time spent together is a coarse indicator of significance of the relationship, and social spheres factor this.

Figs. 5(a) to (d) each present matrices of social ties, T(i,j), mapped on a log scale to a colour gradient, where each row is calculated from a single user's logs, which gives rise to the observed asymmetry. Black indicates lowest strength (never proximate), white highest (always proximate). Social ties by proximity among the users are illustrated (a) by duration, with equal weights; (b) by count, with equal weights; (c) by duration, with weighted spheres; and (d) by count, with weighted spheres.

Figs. 5(a) and (b) are for T(i,j) calculated with unit sphere weights. By duration, User 2 has an approximately equal tie with User 5, a close collaborator at work, and User 10, his spouse; see 505 and 510 in Fig. 5(a). By count, the tie with User 5 is stronger than that with User 10; see 515 and 520 in Fig. 5(b).

Figs. 5(c) and (d) are for T(i,j) calculated with weights ascribing a higher significance to Home (3) and Other (2), over Work (1). Accordingly, User 2's ties with Users 10 and 5 become about equal by count; see 535 and 540 in Fig. 5(d). The tie significantly stronger with User 10 by duration; see 525 and 530 in Fig. 5(c).

T(i,j) is affected by the distribution of logging times, as exemplified by User 8. The user has consistently higher tie strengths with work colleagues, a product of his small and work-skewed data set (day time is best represented due to the short battery life on the type of device carried).

Indicators of Closeness

Tie strength, as formulated above, does not make use of the approximation of activities supplied by social rhythms. Intuitively, the kinds of activities two people share ought to provide a valuable insight into the closeness of their relationship. That is, proximity is useful, but the reason why the users are proximate might be even more useful. (i) Symmetric: Joint Entropy.

A natural extension of single user measure based on social rhythms is joint entropy on dimensions. This is a symmetric measure that treats the copresence of x and y as a new actor. Given users x and y, joint entropy is: H(x, y) = - ∑_{x y} p_X:y log(p_a,_y) where p_Xty is calculated by duration of stays in each cluster. Note that H(x,y) is not strictly symmetric. That is, H(x,y) doesn't necessarily equal H(y,x), as discrepancies arise from the likely difference in social sphere definition for each user. Recall that social spheres are discovered per user, based on their GPS data. Stays are recreated from scratch, such that x anάy are strictly copresent, that is, \fy leaves, the stay is terminated at that point. The definition of stays only required that another user be present at any point during it to be registered as present.

Figs. 6(a), (b), (c) and (d) illustrate joint entropy matrix H(x, y) of users by (a) place; (b) start time; (c) duration by counts; and (d) people. The lower the entropy between two users, the darker the colour of the box associated with the joint entropy, indicating that dimension plays a greater role in bringing two people together. For example, User 2 and User 10 have low entropy on location but high on time of day; see 550 and 555 in Figs. 6(a) and (b). They see each other at various times, but home is often the shared location. User 2 and User 5 collaborate according to a timetable, indicated by the shade ofcell (2, 5) .

(ii) Asymmetric: Presence of User Y with respect to User X's Time Rhythms. Ego-centred networks are often studied in social science, and pose questions like: how does actor x perceive his relationship with actor/? In our terminology, we could ask: how does actor y fit into an actor x's social rhythms? The time and location dimensions of our experiment are the strongest compared to duration (which is noisy) and people (which is sparse), so we formulate the following asymmetric measure by way of demonstration. Timed rhythms for user x are generated with π_min __stays ~ 1 to include lone stays. The resulting rhythms are partially ranked according to the number of stays (numstays) in each cluster. The largest cluster is ranked 1, and those rhythms with equal stays are ranked equally. For example, clusters of a single stay are all pooled with equal, lowest rank. Given this reference ranking of social rhythms in time for user x, we now calculate a measure of how user y appears in them. If user y is present in any of the stays in a cluster at a given rank, it is interpreted as user y sharing that rank of timed social rhythm.

From the list of shared ranks we generate a probability density function (pdf), p_x^._y(rank), where the weight of each rank corresponds to the proportion of rhythm clusters userjy appears in. We then calculate the median rank, med(p_x<-_y), which provides the average type of user x 's timed rhythms in which user y appears, and rank entropy, H(p_x^y), which characterizes how scattered across all of x 's rhythms y's presence is. For example, for a user y who only sees user x in activities strongly tied to a timetable, median rank med(p_x^y) will be closer to 1, whereas for a relationship whose context is impromptu from x's point of view, med(p_χi^_y) will be closer to user x's maximum rank. Different application contexts might call for different emphases, and correspondingly, different choices for each of these components of the measure, such as a weighted ordinal mean that takes the number of rhythms at a given rank into account, or even avoids ranking altogether.

Fig. 7(a) to (c) plots ranked timed rhythms for users, together with the presence of other users in those rhythms. Median rank and rank entropy are med(p2<-s) = 4 H(p2_*-₅) = 0.94 respectively. Referring to Fig. 7(a), this indicates that User 5 is toward the timed end (see 570) of User 2's 13 rhythm ranks (see 572) with even spread. Conversely,

= 0.86, which puts User 2 slightly towards the untimed end (580) of User 5's 9 rhythm ranks (see 585) from his perspective in Fig. 7(b). In regards to User 2's relationship with User 10,

= 7 and H(P₂—io) — 0.76. Fig. 7(a) shows User lO's relationship to be both more untimed and less scattered (see 574) from User 2's perspective. Conversely, as indicated in Fig. 7(c), med(pιo_^2) = 1 and -fiT(pio<-2) = 1-0, which may be due to User lO's home rhythms being her most timed (see 590), which is not mirrored as strongly by User 2. That is, the variation of User 2's time of arrival at home causes fragmentation of rhythms occurring at home, pushing them to higher ranks.

Measures like these might provide a cheap alternative or supplement to expensive and time-consuming data collection techniques, such as surveys and interviews.

The extracted social context may then be used for various applications via an Application Interface 170 in Fig. 1; step 250 in Fig. 2.

Time, Event and Simple Place Browsing

A browser fulfils several roles: It allows a user to browse and retrieve their own media via meaningful concepts like time, events and simple places. Events and simple places offer a wealth of browsing possibilities. As a result the browser can provide a visual diary that relates media and activities. It allows exploration of other users' media. It discovers and highlights relationships with other users. Lastly, it allows users to explore events by browsing multiple media streams synchronously.

Fig. 8 shows an example of the user interface, which consists of several interlinked displays. The display is divided into regions using resizable dividers. The right-hand side is for query and selection and mcXuάcs filters 640 and the "time-strip " 610. The left-hand side is the player, including the media viewer 620 and time-line 630.

The Browser works like this:

1. To search for particular events, users can specify constraints via filters on time, place, people, and media. 2. The filters select a set of media items which are presented in the time-strip. This is a chronologically ordered sequence of thumbnail images, intermixed with activity descriptions. We can choose to see all media, or see clusters of media corresponding to events. Items can be selected from the time-strip for display in the media viewer. 3. The media viewer renders multiple media items, each in its own frame of a tiled display. Each frame provides navigation to contextually related media, and can be used to explore interrelated items.

4. The time-line allows multiple media streams to be played back synchronously. This means that an event can be observed from multiple perspectives if there were multiple observers present.

The "time-strip " component 610 summarises all existing media items using small low- resolution thumbnail images. Media are ordered chronologically, with a header showing the date for each new day. For visual media (images, video) the thumbnail is extracted from the media file. For text objects, the text is rendered in-line. Other objects (eg. audio) are shown using icons. Clicking in the time-strip toggles selection of the object for media viewing.

The filters component 640 (Fig. 8, top right) provides several different ways of filtering media for selection. This includes media type, creator, time and place. The results of multiple filters are applied to the time-strip display, and to media displayed for contextual navigation. The filters pane may optionally be hidden, maximising space available for the time-strip.

The media viewer 630 (Fig. 8, top left) renders one or more media items in a tiled display. The tiling adapts automatically as the number of selected objects changes. Time-based media (audio, video) are rendered using JMF players. Multiple video/audio streams can be rendered in parallel. The system handles this situation in one of two ways, lnfree mode each object has an independent time-base allowing non- synchronous objects to be played in parallel. This is useful for producing photo or video montages. In linked mode, the time-bases of displayed objects are synchronised to a master clock which is controlled by absolute time.

The time-line 820 (Fig. 8, bottom left) shows the times at which media exist in the repository. This can be displayed either in absolute time or relative time (a representation that preserves duration of media, but removes the gaps where no media exist). A cursor in the time-line view is synchronised with the playback time in the media view.

Filters

A filter is a Geode component that selects media based on criteria that the user defines. Some filters, such as time and simple place filters, display their input in different ways as part of the user interaction; see Fig. 9(a) for the display mode of a time filter, and Fig. 9(b) for the display mode of a simple place filter.

In any given application the filters are arranged in a sequence with the first taking the set of all media as its input, then there are the media, people, time and simple place filters and the last passes its output to the time-strip display, see Fig. 9(c). Any ordering of filters is allowed.

The Media filter 650 selects media by criteria such as type (photo, audio, video, activity) and constraints like duration and keywords in text. Optionally, objects in the same event cluster can be selected too.

The People filter 655 selects media created by a given set of people.

The Time filter 660 selects media by creation time. A calendar can be used to select specific days of interest. It displays one thumbnail image from its input for each day, see Fig. 9(a). The Place filter 665 selects media by location. Clusters of media items are displayed on a map. The user can select groups of clusters using the mouse, or by defining geometric regions; see Fig. 9(b).

Note that the algorithm used to cluster media for map display is different from the place clustering algorithm. This is because we want to show the actual distribution of samples in a way that quickly adapts with changes in zoom level, so a quick but granular display is preferred. Each dot in the map display is labelled with the number of objects at that place.

These filters can be combined to answer a variety of questions. For example:

What did my friends do on the weekend. Select friends and time of interest using the People and Time filters. This shows all of the relevant activity in the time-strip. The Place filter shows the corresponding locations, and can be used to focus on media related to a particular location.

Where do people go for coffee? Use the Media filter to select just Activity, then enter the keyword "coffee". Select "Include Event". This selects all media in events that include the activity "coffee". These locations appear on the map in the Place filter.

Synchronised Playback

In the Geode media viewer, the time-line works like the familiar seek bar in media players. However, note that the Geode's time-line represents a global timebase across the entire media repository. Temporal extents are computed by Geode in the initial analysis phase so that media items can be related in a common time frame.

When we move the cursor in the time-line, Geode dynamically creates JMF players for media items, synchronising their time-bases as required. The media view includes embedded players for stream-based media, and rendered images for other media like photos. By synchronising all media, we get the impression of a "compound stream", composed of contributions of multiple users.

One issue in this type of display is how we deal with objects that have no inherent duration (eg. photos). We need to allow photos to persist for a few seconds but also need to minimise the number of re-tiling operations in the media player; both rapid flickering and retiling can be disturbing. One approach is to allow just one image per stream per user to be displayed, but to have each image persist until the next one is available. Another option is to display each image in its own frame for a fixed time- window around its creation time. The latter approach can increase the level of visual concurrency at busy times, but results in a less cluttered display between photo events.

Fig. 10 depicts notorious sword-swallower Matty Blade busking in Fremantle's Henderson Street mall. The images are from three separate users, synchronised via Geode's media viewer. The top left frame 670 is a video stream, the other frames represent photos from two other users. The time-line 680 (below) shows the overlapping extents of the video and photo times. Note the absolute time on the time- base.

Photos can be imported into Geode from a Flickr account where there is no context information associated with them. This procedure involves:

1) creating a new user and importing their avatar image;

2) downloading the original photos from Flickr into the repository;

3) defining the bluetooth address of the phone; and 4) defining a time offset for the imported images.

This is sufficient to have all the photos geo-tagged via propagation of location information, and synchronisable to media of other users.

One exciting possibility enabled by Geode is the ad-hoc fusion of media based on synchronism (ie. events occurring at the same time). For example an audio stream recorded at a concert can be synchronised with another users photos to create an ad-hoc audio-video presentation. The audio consists of discrete clips of roughly ten minutes each. Images are imported from a Flickr photo-stream, and are synchronised with the audio via the Geode media viewer. Both streams are seekable in parallel using the timeline without any special preprocessing.

Event Browsing

The time-strip can be used to browse many images at once, but if we want a concise summary of activity over a day we can use event clustering to simplify the display. In "event" mode, media in the same temporal cluster is reduced to one representative image. Icons 690 on the thumbnails indicate media type. Controls 695 on the thumbnails allow the clusters to be opened and closed.

Fig. 1 1 shows how this works. Fig. 1 1 (a) on the left is part of the raw time-strip in time sequence. Fig. 1 1 (b) on the right is the event-clustered time-strip. On this day, one hundred and twenty three photos were collected by five users. This reduces to just five events: i Userl : shopping, ii User 1 : listening to discussion iii User 1 : at a party, iv User 2: concert v User 1 at earth hour gig

Context-based Navigation

Contextual information allows us to answer many meaningful questions about an image: where is this place? who else was here? and what did they see?

The Analysis above outlines examples of relationships that can be established from available data. Geode exposes these contextual queries via the image frame in the media view. When the user moves the mouse over an image frame, contextual data overlays the image.

Fig. 12 shows an example of the context overlay. The "postage stamp" image 700 in the top right indicates the user who created the image. The other avatars 710 and 720 show other users that were present, based on either their GPS location, or bluetooth co- presence. For each user present, the associated transport controls jump to the previous or next image by that user, or the previous or next cluster of images.

The filmstrip 730 at the bottom of the image indicates images or events occurring at the same place. Clicking on an image replaces the current image tile with the linked image. Clusters can be expanded or collapsed by clicking the open and close icons in the corner of each thumbnail.

The filmstrip can also be used to display other relationships. For example: other events happening around the same time as the image (ie. at different places). Clicking on any of the avatars shows the stream of events for that user around that time. This shows us what was happening for other users at that time.

This kind of browsing can often reveal common interests. For example, while browsing your photos of an event, you may discover other users who were at the same event, or at other events at the same venue. By automating the discovery and presentation of contextual relationships, we enable both purposeful and serendipitous navigation of image collections.

Applications of Social Context (250)

Encoded social context can drive applications requiring representation of embodied social life, and are conceivably useful in a range of application domains, including automated media annotation and sharing, diarizing and collaboration tools, to name a few. We choose two examples of automated media annotation and browsing, and perform user studies aimed at examining their comprehensibility and perceived usefulness.

Personal Media Browsing with Socio-Graph The first application is a another media browser, called Socio-Graph, which aims to demonstrate the utility of social context metadata for the task of personal media exploration and sharing. Specifically, it is a multi-user spatio-temporal browser with the ability to render images, video, and movies (structured for flexible delivery and containing cinematic and content annotation) in a unified environment, and filter media items on time, position, labelled significant place, shared places, presence of actor, and social tie strength by proximity.

This metadata can be used to filter media in isolation or combination. Spatial and temporal filtering are provided by the field of view and timeline scope, respectively; social spheres are labelled on the map; display of media from actors who share a social sphere with the user can be on or off; actor presence at media capture can be specified to be any sub-set; and media can be thresholded on the user's social tie strength with its owner. For example, queries expressing the following intentions can be formulated with the simple interactions detailed below: find media taken at home in the previous month; media owned by anyone from the party I missed; or media from the last family outing to the park near the city.

At root, Socio-Graph is an attempt to explicitly support navigation of personal media with a set of conceptions common to human beings, regardless of experience with computers, namely the who, what, where and when of social context. This can be viewed as an instance of aligning the software's 'structure or paths' with the user's, thus promoting the ^'disappearance of the interface'.

The browsing environment is primarily a 3D, first-person point of view. A timeline is displayed in a pane on the right. Full traversal to any latitude or longitude on the globe is supported in order to enable visualization of shared repositories around the globe. Specific design decisions include unified navigation around the concept of zooming, the metaphor used in both time and space (left and right click zoom in and out respectively), which is able to simultaneously deal with volume of items while providing a measure of orientation. Media item access is simplified also, lacking an array of complex widgets.

Double-clicking selects an item, and if it is time-based, clicking again plays it. Serendipity, the possibility for a user to stumble upon unlooked for items, is a desirable trait, and is achieved via field of view and the hidden interplay between tie strength and shared place filters. For example, adding shared places viewing relaxes the tie strength threshold at the current place in view if the initial result set is very small. Media items entering or leaving the query set due to filter changes fade in and out dynamically, respectively, providing an additional cue as to the effect of the new filter configuration.

A user can import photos, videos, and movies in the aforementioned format. Time of creation for each item is extracted from the EXIF header of JPEGs, and thumbnails for videos created with digital cameras. Movie creation time is obtained from the file creation time stamp of the first shot.

Media items are tagged with position and actor presence when available. If an (interpolated) position is not available for the exact creation time of a media item, a widening neighbourhood in time is searched. For this study, a maximum search range of 30 minutes was used. Media items are tagged with any actors detected present in the 15 minute sample in which its creation time falls. No attempt has been made to improve this annotation using coarser resolutions than 15 minutes or confidence values of actor presence. Media items without position are indexed on the timeline only and rendered in the main pane when selected.

Media items clustered in time may be recognized as signatures of events. For example, we cluster timestamps agglomeratively with hardwired cut-out at 1 hour, while noting that dynamic navigation of the entire cluster tree within Socio-Graph is desirable. The cut-out was set with an aim to preserve micro-events, for example, the cluster of photos of cutting the cake within the party event. Euclidean distance of time is represented as seconds since an origin was used, and experiments found distance between cluster centroids performed best by cophenetic distance.

Socio-Graph was developed in tandem with the experiment explained, with an overlap of 3 users. The chief difference was a lack of Bluetooth logs. Consequently, actor presence is derived from diary study Groundtruth taken by a user as part of the audio presence detection component of that work. Social sphere weightings for the calculation of social tie by proximity are calculated from the number of media capture instances at a given sphere.

Blog with Jive A second example application is a ubiquitous blog. Blogs are an inherently serial genre, but non-linear browsing behaviours are often exhibited. Frequent use of manually maintained categories (e.g. "Holidays" or "Scrapping") attest to this. In our demonstration application, named Jive (conjuring the notion of navigating by rhythm), each blog entry is treated as an item of media. Timestamps of photos contained in an entry are used to anchor it in the user's signal stream, and hence social rhythms.

Similarity metrics derived from the users rhythms are used to navigate to entries about similar events or on similar days.

Fig. 13(a) depicts typical inter-entry links generated by an underlying social rhythm. On the left of the figure are stay clusters generated for a place-constrained, timetable- constrained dimension folding; see 850. On the right of the figure is a sequence of blog entries, two of which (852 and 854) have photos that associate them with stays which have been clustered. If the user begins navigating the blog from the topmost entry 852, the other entry 854 appears as a linked entry. This is an example of Query by Example (QBE), with social rhythms providing an implicit similarity measure. Our similarity measure is a function of all social rhythms, and awards a higher score to more constraining clustering.

Entries that have more photos from a rhythm also accrue a higher score. This can be viewed as increased confidence in inferring the presence of a topic when a strongly associated term has higher frequency. Notably, the user may not be aware of why these entries have been deemed similar. For an explicit use of social rhythm categories, Jive collects all entries discovered in one or more social rhythm types and allows the user to navigate parameter space. For example, in order to find entries that refer to, say, swimming lessons, the user can specify location-bound, people-bound, structured activities (with variable starting time each term), and peruse the entries grouped by this criterion.

Explicit search by social rhythm proved comprehensible (reflected in the preference for the taxonomic label), more so than the simple related posts links, where a couple of users wanted a clearer idea of the sense by which entries were related. Explicit search was also perceived as being more useful (median of 5), but this result may be skewed by the large portion of computing students in the study. One user commented that ^"listing entries in social rhythm clusters gave an interesting overview of the organization of the blog.

Discovering Friends with compatible rhythms

A third application of social rhythms is to discover compatible people to share social activities. Suppose someone engages in a regular activity such as sport or recreation. This activity is likely to have particular repeated patterns in time and/or space. For example if someone regularly jogs around the park each morning around 7AM, we expect the corresponding rhythm to be place-bound and structured.

In a social-network application, it is important to be able to establish new contacts or "friends". In the first instance this is normally done by searching for known members by name, but subsequently can occur via common interests that may be exposed through the application (eg. participation in discussions on particular topics, or listening to similar music). Recommendation systems try to find correlations between interests. If a significant amount of users who like A also like B, then the system might recommend B to a new user who is known to like A.

Social rhythms could be used in such systems to:

Recommend new activities or places based on correlations between the rhythms of groups of users. For example, if a place is associated with particular types of activity (eg. Exercise) then that place might be recommended to someone who is interested in exercise and is known to be near that area.

Recommend potential friends to members based on common rhythms. For example, if person A regularly exercises at the park at a particular time then the system might offer "friendship" with other users known to have similar rhythms. Depending on the users requirements this could happen automatically ("find people with similar rhythms") or by request ("find someone else who exercises in this park around this time").

Such facilities would of course need to respect the privacy of all members concerned. While this sort of thing could be achieved via explicitly placed "want ads", this would require individuals to generate an exhaustive and probably overly detailed description of their interests and activities. Using social rhythms as the basis for such a system would in some senses allow for the automated discovery of propinquity, the physical and psychological proximity between people.

Using this invention, a user's social networking pattern is represented using one or more social contexts including: (i) social spheres characterising physical locations of significance of the user, (ii) social rhythms characterising activities performed by the user and (iii) social ties characterising relationships the user has with others. Applications

The method may further comprise the step of applying one or more of the social spheres, rhythms and ties generated to categorise and search media items. A media item may be photos, videos, blogs and audio files.

It is an advantage of the invention to generate a combination of social spheres, social rhythms and social ties to mediate the provision of online applications that enhance user satisfaction. For example, the invention may be used to provide semi-automatic calendaring or collaboration tools, information discovery and personalized advertising, media browsing and sharing, demographic tools for market research and surveillance, and context-sensitive device management. Qualitative feedback from user studies of these applications demonstrates the effectiveness of using social context to organise and navigate media items.

Although the invention has been described with reference to a particular example, it should be appreciated that it could be exemplified in many other forms and in combination with other features not mentioned above. For instance, we have implemented a centralised version, but the framework can implemented in a distributed way, allowing each user to maintain their repository separately, yet giving them the power of shared media and context.

References

[1] B. Adams, D. Q. Phung, and S. Venkatesh. Extraction of social context and application to personal multimedia exploration. In ACM Int. Conference on

Multimedia, Santa Barbara, USA, Oct. 2006. [2] Benjamin B. Bederson. Photomesa: a zoomable image browser using quantum treemaps and bubblemaps. In UIST Ol: Proceedings of the 14th annual ACM symposium on User interface software and technology, pages 71-80, New York, NY,

USA, 2001. ACM.

[3] Matthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn Wilcox. Temporal event clustering for digital photo collections. ACM Trans. Multimedia

Comput. Commun. Appl, l(3):269-288, 2005.

[4] Marc Davis, Nancy Van House, Jeffrey Towle, Simon King, Shane Ahern,

Carrie Burgener, Dan Perkel, Megan Finn, Vijay Viswanathan, and Matthew '

Rothenberg. MMM2: mobile media metadata for media sharing. In CHI '05: CHI '05 extended abstracts on Human factors in computing systems, pages 1335-1338, New

York, NY, USA, 2005. ACM.

[5] N Eagle, A Pentland, and D Lazer. Inferring social network structure using mobile phone data, 2007.

[6] M. Ester, H-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of Second

International Conference on Knowledge Discovery and Data Mining, pages 226-231 ,

1994.

[7] David Frohlich, Allan Kuchinsky, Celine Pering, Abbe Don, and Steven Ariss.

Requirements for photoware. In CSCW '02: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 166-175, New York, NY, USA, 2002.

ACM.

[8] Adrian Graham, Hector Garcia-Molina, Andreas Paepcke, and Terry Winograd.

Time as essence for photo browsing through personal digital libraries. In JCDL '02:

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pages 326-335, New York, NY, USA, 2002. ACM. [9] Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. Why we twitter: understanding microblogging usage and communities. In WebKDD/SNA-KDD '07:

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56-65, New York, NY, USA, 2007. ACM. [10] N O 'Hare G Jones, C Gurrin, and A F Smeaton. Combination of content analysis and context features for digital photograph retrieval. In in 2nd IEE European

Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies,

2005.

[1 1] P Liu, D Zhou, and N Wu. Vdbscan: Varied density based spatial clustering of applications with noise. In 2007 International Conference on Service Systems and

Service Management, 2007.

[12] S Mann, J Fung, and Raymond Lo. Cyborglogging with camera phones : Steps toward equiveillance. In ACM Multimedia 2006, 23-27 October, Santa Barbara, USA,

2006. [13] Tao Mei, Bin Wang, Xian sheng Hua, He qin Zhou, and Shipeng Li.

Probabilistic multimodality fusion for event based home photo clustering. ICME,

0: 1757-1760, 2006.

[14] Andrew D. Miller and W. Keith Edwards. Give and take: a study of consumer photo-sharing culture and practice. In CHI '07: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 347-356, New York, NY, USA, 2007.

ACM.

[15] Mor Naaman, Yee Jiun Song, Andreas Paepcke, and Hector Garcia-Molina.

Automatic organization for digital photographs with geographic coordinates. In JCDL

'04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pages 53-62, New York, NY, USA, 2004. ACM.

[16] John C. Platt, Mary Czerwinski, and Brent A. Field. Phototoc: Automatic clustering for browsing personal photographs, 2002.

[17] Mika Raento, Antti Oulasvirta, Renaud Petit, and Hannu Toivonen.

Contextphone: A prototyping platform for context-aware mobile applications. IEEE Pervasive Computing, 4(2):51-59, 2005. [18] Kerry Rodden and Kenneth R. Wood. How do people manage their digital photographs? In CHI '03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 409^16, New York, NY, USA, 2003. ACM. [19] Risto Sarvas, Mikko Viikari, Juha Pesonen, and Hanno Nevanlinna. Mobshare: controlled and immediate sharing of mobile images. In MULTIMEDIA '04:

Proceedings of the 12th annual ACM international conference on Multimedia, pages 724-731 , New York, NY, USA, 2004. ACM.

[20] Bongwon Suh and Benjamin B. Bederson. Semi-automatic image annotation using event and torso identification. Technical Report Tech Report HCIL-2004-15, Computer Science Department, University of Maryland, College Park, MD, 2004.

[21] Appan, P. and Sundaram, H. 2004. Networked multimedia event exploration. In Proceedings of the 12th annual ACM international conference on Multimedia. [22] Ester, M., Kriegel, H. -P., Sander, J., and Xu, X. 1998. Incremental clustering for mining in a data warehousing environment. In Proceedings of 24th Int. Conf on Very Large Databases. 323_333.

Claims

1. A computer operated method for managing social multimedia, comprising the steps of: receiving and storing plural types of user generated media items; receiving and storing plural streams of contextual information about user activity; automatically organising stored media items collected from more than one user to create plural streams of media items, each arranged according to a spatial or temporal pattern extracted from the collected contextual information; and, presenting the streams of media items in a graphical user interface for user navigation, search and sharing of both media items and contextual data.

2. A method according to claim 1, wherein the media items include photos, audio, user radio activity and video, comprising the further step of extracting media items from radio communications with the devices that create the media items.

3. A method according to claim 1 or 2, wherein the contextual information includes media metadata, time stamps, location, user identity, copresence data, velocity, acceleration, temperature, blog and calendar events, comprising the further step of extracting the contextual information from radio communications with the devices that collect the contextual information or create the media items, or both.

4. A method according to claim 1, 2 or 3, comprising the further step of filtering positioning information received from the devices that collect it to remove noise, based on physical constraints such as velocity and acceleration.

5. A method according to claim 1, comprising the further step of arranging the media items in time ordered streams, one stream for each user, for storage and display at a graphical user interface.

6. A method according to claim 1 or 5, comprising the further step of clustering the media items or streams for display at a graphical user interface, according to one or more of the following criteria: location; events involving place, users and time; simple places involving location and user; and social context.

7. A method according to claim 6, when media items are clustered on the basis of social context, comprising the further step of extracting parameters related to location, a period of time and one or more copresent users, and clustering these parameters to generate one or more social spheres for each user.

8. A method according to claim 7, when clustering is performed in real time, comprising the further step of incrementally updating the social spheres whenever new contextual information is received.

9. A method according to claim 6 or 7, further comprising the step of automatically assigning labels for the social spheres, based on the time of day when the user is in the social sphere, a role the user has or a label assigned to neighbouring social spheres.

10. A method according to claim 7, further comprising the step of generating one or more social rhythms for each user by clustering the social spheres generated for that user, along one or more dimensions of: location; time; copresent user; the number of visits to the social spheres and start time of the last visit; the dimensions of location, start time of a visit to the location; and the duration of a visit.

1 1. A method according to claim 10, further comprising the step of determining a measure of social tie between a pair of users from overlapping social spheres or social rhythms, or both, of that pair of users.

12. A method according to claim 1 1, wherein a measure of social tie between a first and second user is based on the frequency of the second user appearing in one or more social spheres generated for the first user.

13. A method according to claim 12, wherein the measure is weighted by a weight factor assigned to each of the social spheres.

14. A method according to claim 10, further comprising the step of determining a measure of social tie between a pair of users based on joint entropy along any one dimensions of location, time and copresent user of one or more social spheres generated for each of the users.

15. A method according to claim 1 , comprising the further step of arranging media items, or representation of them, in one or more chronologically ordered streams, with a header showing calendar divisions, for the purpose of display on a graphical user interface.

16. A method according to claim 15, comprising the further steps of: establishing a time-line upon which all the media items sit; selecting a point on the timeline; automatically and dynamically creating media players to play the media items sitting at that time; and playing the media items sitting at that time.

17. A method according to claim 15 or 16, comprising the further step of filtering one or more chronologically ordered streams to restrict the media items that are displayed or played to those having a chosen combination of contextual data.

18. A method according to claim 15 comprising the further step of using contextual links to navigate from one media item to another based on time, place, and co-present users.

19. A method according to claim 15 comprising the further step of using meta-data tuples to make enquires.

20. A computer system for managing social multimedia, comprising: an input port to receive and store plural types of user generated media items; an input port to receive and store plural streams of contextual information about user activity; a processor to automatically organise stored media items collected from more than one user to create plural streams of media items, each arranged according to a spatial or temporal pattern extracted from the collected contextual information; and, a graphical user interface to present the streams of media items for user navigation, search and sharing of both media items and contextual data.

21. A browser application comprising machine readable code on a machine readable medium for running on a personal computer or server to present a "time-strip " component on a graphical user interface that summarises all media items stored by the personal computer or server, using small low -resolution tokens of the items, ordered chronologically, with headers indicating natural time divisions.

22. A cell-phone equipped one or more media collection devices, satellite positioning capability and an application for collecting and storing plural streams of contextual information about user activity, wherein the application periodically monitors the cell-phone's file system to detect new media items, and when it discovers a new media item, it operates to determine its position, collect data related to current activity and initiate a bluetooth enquiry.

23. A method according to claim 15 comprising the further step of responding to the selection of a media item by returning one or more hyperlinks to other media items, or tokens representing media items, related in time, space or by another copresent user.