| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.

View
 

Architecture

Page history last edited by PBworks 14 years, 4 months ago

This Wiki Page discusses the High Level Component Architecture of OpenMatchEngine.

 

 

Search

 

Search component is responsible for identifying likely matching records (a.k.a potential candidates) of a record. The idea is to reduce the number of redundant comparisons by generating an array of finger printing bands for a record in which all its likely matching records will have atleast one of their finger prints.

 

Each record will have n finger prints and m finger printing bands. A finger printing band will have a start finger print and an end finger print. The number of finger prints and finger printing bands will differ from record to record. Every record will have atleast one of its finger prints falling between atleast one of its finger printing ranges.

 

Match

 

Match component is responsible for doing fuzzy matching between two records. The idea is to generate a similarity score or confidence score or match score between two records being compared on the basis of how far apart they are.

 

RulesProcessor

 

RulesProcessor component is responsible for applying various sets of rules on a record as it is being searched and matched. One example of a rule is to ignore "Inc." and "Corp" from a record.

 

Rules

 

Rules represents a Knowledge base of rules which must be processed as a record is being searched or matched.

 

Stablization

 

Stablization component is responsible for stablizing a record for Phonetic and typo errors.

NYSIIS and Double Meta Phone are example of Stablization algorithms.

Comments (0)

You don't have permission to comment on this page.