|
|
Implementation
This tool was developed
in Ruby, as Lucas Carlson's
Classifier library is
already available as
classifier gem. This library
provides a naive Bayesian
classifier. More information
about this can be found
here. In our
implementation, following
code reads three files
- good.yml
- not_good.yml
- input file
|
For the execution, we
need to give two command
line arguments. City Name
and Input File Name. Now
based on the definition of
good and bad, it will create
a directory by city name and
put good.txt and bad.txt in
that directory containing
information classified as
good or bad.
require
'stemmer'
require
'classifier'
#
if
ARGV.empty?
puts
"***You Should supply CityName and Input File name
to script****\n"
else if
ARGV[1]
puts
"I am searching for the
city #{ARGV[0]}\n"
puts
"The input file is
#{ARGV[1]}\n"
inputfile=ARGV[1].to_s.downcase
pwd=Dir.getwd
city=ARGV[0].to_s.downcase].to_s.downcase
Dir.mkdir("#{city}")
# Load
previous classifications
good = YAML::load_file('good.yml')
not_good
=
YAML::load_file('not_good.yml')
data=File.open("#{inputfile}","r")
goody=File.open("#{pwd}"+"\\"+"#{city}"+"\\good.txt","a")
nogood=File.open("#{pwd}"+"\\"+"#{city}"+"\\nogood.txt","a")
classifier =
Classifier::Bayes.new('good',
'No good')
# Train
the classifier
not_good.each { |not_good|
classifier.train_no_good
not_good }
good.each
{ |good_one|
classifier.train_good
good_one }
while
line3=data.gets
if
classifier.classify(line3)=="Good"
goody.write line3
else
nogood.write line3
end
end
else
puts
"***second argument that is
name of file is
required***\n"
end
end
Quality Of Results
Quality
of result depends on how
much training we have given
to the classifier. Its kind
of a learning system where
quality of result depend
upon training. The major
benefit of this approach is
reduction in human efforts
required to classify the
data. Similar to this, there
are many applications where
human intervention is
required to classify what is
good and what is bad. A
properly trained classifier
similar to this, can be
helpful in similar
situations.
Hope
you find this article
interesting and you will be
able to use it if you need
to classify data for your
application.
Back
|