FBM Banner FBM

Search

Using FBM to Benchmark Content Search

In any Content Distribution Network it is required to be able to search for the stored items. In a file sharing scenario, it is necessary to obtain the peers which store the desired content. In the P2P streaming applications it is required to know the root transmitting a content or the peers that are collaborating in the retransmission of that content. Therefore, it is needed to evaluate the search mechanisms in Content Distribution networks.

Related Work with Content Search in Local Databases

Most of the existing search solutions are based on text search. The user inputs the text string, which is then searched in the repository. The textual content in the repository may have different origins. The repository can simply consist of textual documents, as in case of search within the Internet web pages. In the case of media repositories, the textual search can be performed by using the file names, as it is the most common case in P2P networks, or by using the manual annotations done by the repository managers. Using the search method based on the textual annotations we can search for movies with Marlon Brando by typing the name and hoping that the movies in the repository have been tagged with the actor's name. So we see that the effectiveness of this type of media search depends on the accuracy of the textual description of the media and is independent of the media itself. The textual search method for media is the easiest, but also the crudest. More advanced content-based search techniques make use of low-level features of the content, such as dominant colour or shape histogram. This gives us an opportunity to perform search by providing an example. If the repository provides tools for low-level, content-based QbE search, it is possible to search for a Marlon Brando wearing blue suit by providing a visual example of the actor wearing a blue suit. Benchmarks to this local databases exist, however they are insufficient for Content Delivery Networks.

Content Search Benchmarking Framework

In Content Delivery Networks, the evaluation of the measurable attributes of a search system can be done within three regimes, namely, the performance metric, the cost metric and the Quality of Experience (QoE) metric. There are several aspects of a search system that can be measured.

The first important aspect is search accuracy, which can be defined as the ability of the system to find the desired results upon the query. The second measurable attribute of the search system is the search time. According to works describing the search benchmarking frameworks for database-based systems: the speed is not of central concern [8]; this is due to the high performance and locality of database systems. However, distributed P2P systems are characterized by a considerable and varying delay in communications and also dependable on the different overlay approaches and topologies. Therefore the search time will also be a measured factor. When taking into consideration the search time and accuracy it has to be noted that there is a trade-off between the search time and search accuracy. When searching in the P2P overlay network, a longer search may, but does not have to, yield more accurate results. The importance of the measured aspects depends on the type of application that the search supports. For one kind of application the search accuracy may be critical, whereas for another, search time may be of the most importance.

The resource consumption of the service is another measurable factor. The resource consumption induced by the search service can be thought of as the cost in the search system. Resources are here understood as the memory, processing power, disk space and network load. Memory is consumed mainly by the routing operations of the overlay and can be treated as service-independent. Processing power is consumed mainly by the operations of query preparation and processing and contributes to the search delay. The disk space is required, apart from the content storage, to contain the descriptors. Because the space consumed by the descriptors is minimal when compared to the disk area required for the media it will not be analyzed in this work.

How the user perceives the quality of the search system utilization is also another possible assessed factor. We can call this metric the usability of the system, which is part of the QoE perceived by the end user. We can measure it by the number of interaction with other systems in the preparation of the QbE. The rationale behind it is how less interaction with search support systems, easier to use the search system will be. Then, these search support systems, e.g., transforming text into exemplary images must perform in a transparent way to the end user. And the implication on the search time can be measured as the bootstrap time, in terms of preparing the query to be executed.

The Architecture of the FBMS

The FBMS architecture is built on two branches: horizontal and vertical designs.

In the horizontal design the benchmarking framework is performed in a layered layout. At each of the three layers (user, application and overlay), three core abstract elements are defined. The System Under Test (SUT) is the benchmarked subject layer of the search service. The Upper Interface defines the outputs of the benchmarking. These outputs are fed to the Lower Interface of the benchmarking of the Upper Layer. The Lower Interface defines the inputs to the benchmarking of the current layer. These are the factors that have a direct influence on the results of the benchmarking. These core elements provide and also are supported by input and output parameters describe in the sequence. The horizontal branch can be viewed on the Figure 1, Figure 2 and Figure 3, which show the 3 layers on this branch (user, application and overlay).

Image-Image007.png


Fig. 1. Benchmarking Framework Design: the user layer

Image-Image005.png


Fig. 2. Benchmarking Framework Design: the application layer

Image-Image006.png


Fig. 3. Benchmarking Framework Design - the overlay layer

The vertical branch (shown in the Figure 4 and Figure 5) covers the cross-layer design and dependencies of accuracy and search time benchmarking. Each of these benchmarking topics is analyzed at the three benchmarking layers. The analysis covers both the metrics as well as the most important parameters influencing the measurements.

Image-Image014.jpg


Fig. 4. The vertical design of the search accuracy benchmarking

Image-Image017.jpg


Fig. 5. The vertical design of the search time benchmarking

Accuracy

In order to benchmark the accuracy of the search system, it is necessary to have a ground truth, which may be defined as a full knowledge of all data stored in the system. This ground truth serves as a reference level for benchmarking accuracy. In the case of benchmarking multimedia, a ground truth is usually a collection of manually annotated media files, used to make sure that the annotations are accurate. In the case of search for images, there are several requirements for the reference collection.

First, although the search system may have access to an almost unlimited number of images, having a voluminous benchmark database is not critical (with the number of 10,000 being enough) [8]. The second consideration is the content of the images. To make the measurements coherent with the real environment of the application, the content should be natural (e.g. photographs) and complex (semantically rich and multi-object). The available sources of such annotated images can be the annotated collections intended to be used in benchmarking and typically available to the research community free of charge. The content of the databases varies from natural images [1] to strongly artificial multi-angle shots of single objects on a uniform background [3].

The existing evaluation metrics for accuracy can be divided into quantitative and qualitative metrics. The first are measured objectively and refer to the broadly understood performance of the system. The latter refer to the quality of the system. An example of a qualitative metric is a Mean Opinion Score (MOS) scale, which was initially standardized by the International Telecommunication Union [6]. In this metric the quality of the system is subjectively assessed by the users in a five-grade scale, where 5 is the best quality and 1 is the worst. Another example of a qualitative metric is an R-factor, which may be utilized in a way similar to MOS, it is used for subjective evaluation of speech quality in the voice transmission systems [7].

There are numerous quantitative metrics for the assessment of search accuracy in the retrieval process [10]. The most commonly used are Precision and Recall. Precision is the number of detections (defined as the number of relevant items detected) divided by the total number of the returned items. Recall is defined as the number of detections divided by the total number of the relevant items in the system.

The methodology for the quantitative measurement of the search accuracy gives the numeric values which describe several aspects of the system. In order to draw a conclusion about the accuracy of the system, evaluation methods need to be defined.

The recommended evaluation method is the Retrieval Effectiveness (the comparison of precision versus recall). The best evaluation method will be the one that reveals the weaknesses of the benchmarked system. An overview of the recommended evaluation methods is presented in [10].

Time-of-Search

The time-of-search is, simply speaking, the time from the moment of querying to the moment of the presentation of results to the user. The time-of-search problem is complex in the world of the P2P overlay networks because the search process is composed of several stages. An example model of query delay for a structured P2P overlay network is covered in detail in [2].

In fact, it must be considered that a trade-off between accuracy and search exists in Content Deliver Networks. The Content Delivery are based on distributed platforms like peer-to-peer networks, and the accuracy is closely correlated with time. The queries to search contents in a CDN depends on the workload and environment of the system. The delay is directly proportional to the network conditions and peer-to-peer overlay used to support the contents. Furthermore, it is also necessary to take into account that a query can be launched in parallel to different paths in order to reach as much as possible peers. Therefore, the obtained queries would arrive in different time intervals. If more time is used to receive the queries, higher would be the accuracy, however the delay would also increase. This trade-off must be adjusted and QoE can be a good mechanism to do this.

P2P streaming systems comprise both, on-demand streaming of stored content, like Video-on-Demand and live streaming. These distribution approaches are characterized for the FBM by the following elements:

To provide new input comments, etc, please send an email to Piotr Srebrny, Gareth Tyson, Andreas Mauthe or Thomas Plagemann