Skip to main content
All CollectionsDocument Processing
Understanding Duplicate Document Detection
Understanding Duplicate Document Detection

This article explains how a similar document search works and the factors considered in detecting duplicates.

Romeo Bellon avatar
Written by Romeo Bellon
Updated over 2 weeks ago

Receiptor AI's document management system includes a powerful duplicate detection feature that helps you maintain clean and organized financial records. This article explains how a similar document search works and the factors considered in detecting duplicates.

Duplicate Detection Criteria

Receiptor uses a multi-factor approach to determine if two documents are duplicates. The system considers the following criteria:

  1. Exact ID Match: If the documents have the same invoice ID, receipt ID, or transaction ID, they are considered exact duplicates and will be automatically linked.

  2. Amount and Date Match: The documents are considered duplicates if the transaction amount (after accounting for currency conversion) and transaction date are the same or within a reasonable tolerance.

    • Amount Tolerance: 5 cents

    • Date Tolerance: 1 day

  3. Vector Similarity: Receiptor AI generates a unique vector embedding for each document using natural language processing. If the vector similarity between two documents is above 80%, they are considered potential duplicates.

  4. Document Quality: Receiptor AI also evaluates the overall quality of the documents, considering factors like document type, completeness of information, and presence of key identifiers. To be considered duplicates, documents must meet a minimum quality threshold (70%).

How Receiptor AI's Duplicates Detection Algorithm Works

Duplicate Detection Workflow

When Receiptor AI processes a new document, it follows these steps to detect potential duplicates:

  1. ID Match Check: Receiptor AI first checks if the new document has the same invoice ID, receipt ID, or transaction ID as any existing documents. If a match is found, the documents are automatically linked as duplicates.

  2. Amount and Date Match: If no ID match is found, Receiptor AI checks if the transaction amount and date of the new document match any existing documents within the defined tolerances.

  3. Vector Similarity Evaluation: If the amount and date do not match any existing documents, Receiptor calculates the vector similarity between the new document and all other relevant documents. Documents with a vector similarity above 80% are considered potential duplicates.

  4. Quality Verification: For potential duplicates identified based on vector similarity, Receiptor checks if the documents meet the minimum quality threshold of 70%. Only documents that pass this check are considered true duplicates.

  5. Linking and Updating: Once Receiptor AI has identified the best matching document(s), it links the new document to the existing one(s) and updates the database accordingly.

Benefits of Duplicate Detection

Receiptor's duplicate detection feature provides several key benefits:

  • Accurate Financial Records: By automatically linking related documents, Receiptor ensures that your financial records are complete and accurate, reducing the risk of missed transactions or double-counting.

  • Streamlined Bookkeeping: When Receiptor detects duplicates, it can automatically handle tasks like updating the linked documents, reducing the manual effort required for bookkeeping.

  • Improved Audit Readiness: Clearly identified and linked duplicate documents make it easier to prepare for financial audits, as all relevant information is readily available.

  • Enhanced Reporting and Analysis: Receiptor's duplicate detection enables more reliable financial reporting and analysis, as the data is free from duplicates and inconsistencies.

If you have any more questions about Receiptor AI's duplicate detection capabilities, please feel free to let us know.

Did this answer your question?