|
|
Software testing is
essentially an exercise of
continuous exploration,
learning and questioning.
This exercise becomes very
interesting and challenging
at times, when application
under test is as complex as
Maps. You must have used
applications like Google
Maps, Yahoo Maps etc.
Primary use of these
applications is to help users
in finding route. As an input
to these applications, user
gives source and destination
and based on this
information, maps give them
directions to reach from
source to destination. You
might think from the
description that application
is simple, but it has got
numerous challenges.
|
As a
tester you need to find out
relevant queries and also
quality of results produced
by the system.During Beta
testing of the application,
we got thousands of queries
and input data which were
used by the end users. To give
an idea about the amount of
data we had, for every city
there are more than
8000 queries. For example,
Hotels in Mumbai, Escort
Mumbai, Taj Mumbai etc.
Finding relevant data from
these queries is a very
difficult and time consuming
task.
This data can be analyzed
for relevant queries in two
different ways, either apply
human resources to analyze
this or use Artificial
Intelligence and write some
smart tool. Since getting
human resource is very
expensive :) , we decided to
develop some tool to
classify input data.
After looking at the
various possible solutions,
we decided to use Bayesian
Classifier. For people who
are interested to know more
about Bayesian Classifier ,
this is what Wikipedia say
about it --
Bayes' theorem (also known as Bayes' rule or Bayes' law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. In some interpretations of probability, Bayes' theorem tells how to update or revise beliefs in light of new evidence a posteriori.
The probability of an event A conditional on another event B is generally different from the probability of B conditional on A. However, there is a definite relationship between the two, and Bayes' theorem is the statement of that relationship.
Use of classifier based on
the Bayesian's theorem is
well known in the email spam
filtering. Generally in spam
filters, they have a large
set of data in terms of good
mail and spam mail. It works
on the probability that
certain words will be
present in spam mails rather
than normal email. System of
spam mail filtering also
learns from it's users every
time user hit report spam or
not a spam button. So we
decided to write our own
tool based on the Bayseian
theorem with the
capabilities of learning
what is good data and what
is bad data. This tool will
learn how to classify data based
on how we train it. In
simple terms, input for the
tool would be definition of
what is good, what is bad
and sample data. Based on
this, it will classify data
in good or bad, as simple as
that. Normally to classify
a set of text, we have to
teach the tool what is good
and what is bad. During the
training, classifier will
keep track of how often
words categorized as good or
bad are showing up in each
category.
Implementation in Ruby -
Next Page
|