![]() |
![]() |
|
|
SearchUsing FBM to Benchmark Content Search In
any Content Distribution Network it is required to be able to search
for the stored items. In a file sharing scenario, it is necessary to
obtain the peers which store the desired content. In the P2P streaming
applications it is required to know the root transmitting a content or
the peers that are collaborating in the retransmission of that content.
Therefore, it is needed to evaluate the search mechanisms in Content
Distribution networks. Related Work with Content Search in Local Databases Most of the existing search solutions are based on text search. The user inputs the text string, which is then searched in the repository. The textual content in the repository may have different origins. The repository can simply consist of textual documents, as in case of search within the Internet web pages. In the case of media repositories, the textual search can be performed by using the file names, as it is the most common case in P2P networks, or by using the manual annotations done by the repository managers. Using the search method based on the textual annotations we can search for movies with Marlon Brando by typing the name and hoping that the movies in the repository have been tagged with the actor's name. So we see that the effectiveness of this type of media search depends on the accuracy of the textual description of the media and is independent of the media itself. The textual search method for media is the easiest, but also the crudest. More advanced content-based search techniques make use of low-level features of the content, such as dominant colour or shape histogram. This gives us an opportunity to perform search by providing an example. If the repository provides tools for low-level, content-based QbE search, it is possible to search for a Marlon Brando wearing blue suit by providing a visual example of the actor wearing a blue suit. Benchmarks to this local databases exist, however they are insufficient for Content Delivery Networks. Content Search Benchmarking Framework In
Content Delivery Networks, the evaluation of the measurable attributes
of a search system can be done within three regimes, namely, the
performance metric, the cost metric and the Quality of Experience (QoE)
metric. There are several aspects of a search system that can be
measured. The
first important aspect is search accuracy, which can be defined as the
ability of the system to find the desired results upon the query. The
second measurable attribute of the search system is the search time.
According to works describing the search benchmarking frameworks for
database-based systems: the speed is not of central concern [8]; this
is due to the high performance and locality of database systems.
However, distributed P2P systems are characterized by a considerable
and varying delay in communications and also dependable on the
different overlay approaches and topologies. Therefore the search time
will also be a measured factor. When taking into consideration the
search time and accuracy it has to be noted that there is a trade-off
between the search time and search accuracy. When searching in the P2P
overlay network, a longer search may, but does not have to, yield more
accurate results. The importance of the measured aspects depends on the
type of application that the search supports. For one kind of
application the search accuracy may be critical, whereas for another,
search time may be of the most importance. The
resource consumption of the service is another measurable factor. The
resource consumption induced by the search service can be thought of as
the cost in the search system. Resources are here understood as the memory, processing power, disk space and network load. Memory is consumed mainly by the routing operations of the overlay and can be treated as service-independent. Processing power is consumed mainly by the operations of query preparation and processing and contributes to the search delay. The disk space
is required, apart from the content storage, to contain the
descriptors. Because the space consumed by the descriptors is minimal
when compared to the disk area required for the media it will not be
analyzed in this work. How
the user perceives the quality of the search system utilization is also
another possible assessed factor. We can call this metric the usability
of the system, which is part of the QoE perceived by the end user. We
can measure it by the number of interaction with other systems in the
preparation of the QbE. The rationale behind it is how less interaction
with search support systems, easier to use the search system will be.
Then, these search support systems, e.g., transforming text into
exemplary images must perform in a transparent way to the end user. And
the implication on the search time can be measured as the bootstrap
time, in terms of preparing the query to be executed. The Architecture of the FBMS The FBMS architecture is built on two branches: horizontal and vertical designs. In
the horizontal design the benchmarking framework is performed in a
layered layout. At each of the three layers (user, application and
overlay), three core abstract elements are defined. The System Under Test (SUT) is the benchmarked subject layer of the search service. The Upper Interface defines the outputs of the benchmarking. These outputs are fed to the Lower Interface
of the benchmarking of the Upper Layer. The Lower Interface defines the
inputs to the benchmarking of the current layer. These are the factors
that have a direct influence on the results of the benchmarking. These
core elements provide and also are supported by input and output
parameters describe in the sequence. The horizontal branch can be
viewed on the Figure 1, Figure 2 and Figure 3, which show the 3 layers
on this branch (user, application and overlay). ![]()
![]()
![]()
The
vertical branch (shown in the Figure 4 and Figure 5) covers the
cross-layer design and dependencies of accuracy and search time
benchmarking. Each of these benchmarking topics is analyzed at the
three benchmarking layers. The analysis covers both the metrics as well
as the most important parameters influencing the measurements. ![]()
![]()
Accuracy In
order to benchmark the accuracy of the search system, it is necessary
to have a ground truth, which may be defined as a full knowledge of all
data stored in the system. This ground truth serves as a reference
level for benchmarking accuracy. In the case of benchmarking
multimedia, a ground truth is usually a collection of manually
annotated media files, used to make sure that the annotations are
accurate. In the case of search for images, there are several
requirements for the reference collection. First,
although the search system may have access to an almost unlimited
number of images, having a voluminous benchmark database is not
critical (with the number of 10,000 being enough) [8]. The second
consideration is the content of the images. To make the measurements
coherent with the real environment of the application, the content
should be natural (e.g. photographs) and complex (semantically rich and
multi-object). The available sources of such annotated images can be
the annotated collections intended to be used in benchmarking and
typically available to the research community free of charge. The
content of the databases varies from natural images [1] to strongly
artificial multi-angle shots of single objects on a uniform background
[3]. The
existing evaluation metrics for accuracy can be divided into
quantitative and qualitative metrics. The first are measured
objectively and refer to the broadly understood performance of the
system. The latter refer to the quality of the system. An example of a
qualitative metric is a Mean Opinion Score (MOS) scale, which was
initially standardized by the International Telecommunication Union
[6]. In this metric the quality of the system is subjectively assessed
by the users in a five-grade scale, where 5 is the best quality and 1
is the worst. Another example of a qualitative metric is an R-factor,
which may be utilized in a way similar to MOS, it is used for
subjective evaluation of speech quality in the voice transmission
systems [7]. There
are numerous quantitative metrics for the assessment of search accuracy
in the retrieval process [10]. The most commonly used are Precision and
Recall. Precision is the number of detections (defined as the number of
relevant items detected) divided by the total number of the returned
items. Recall is defined as the number of detections divided by the
total number of the relevant items in the system. The
methodology for the quantitative measurement of the search accuracy
gives the numeric values which describe several aspects of the system.
In order to draw a conclusion about the accuracy of the system,
evaluation methods need to be defined. The
recommended evaluation method is the Retrieval Effectiveness (the
comparison of precision versus recall). The best evaluation method will
be the one that reveals the weaknesses of the benchmarked system. An
overview of the recommended evaluation methods is presented in [10]. Time-of-Search The
time-of-search is, simply speaking, the time from the moment of
querying to the moment of the presentation of results to the user. The
time-of-search problem is complex in the world of the P2P overlay
networks because the search process is composed of several stages. An
example model of query delay for a structured P2P overlay network is
covered in detail in [2]. In
fact, it must be considered that a trade-off between accuracy and
search exists in Content Deliver Networks. The Content Delivery are
based on distributed platforms like peer-to-peer networks, and the
accuracy is closely correlated with time. The queries to search
contents in a CDN depends on the workload and environment of the
system. The delay is directly proportional to the network conditions
and peer-to-peer overlay used to support the contents. Furthermore, it
is also necessary to take into account that a query can be launched in
parallel to different paths in order to reach as much as possible
peers. Therefore, the obtained queries would arrive in different time
intervals. If more time is used to receive the queries, higher would be
the accuracy, however the delay would also increase. This trade-off
must be adjusted and QoE can be a good mechanism to do this. P2P streaming systems comprise both, on-demand streaming of stored content, like Video-on-Demand and live streaming. These distribution approaches are characterized for the FBM by the following elements: |
| To provide new input comments, etc, please send an email to Piotr Srebrny, Gareth Tyson, Andreas Mauthe or Thomas Plagemann |