Navigation:


FORUMS > SentryFile Version 5 Topics > SF5 - API Integration / Programming Assistance < refresh >
Topic Title: OCR'ing a Document When Adding it?
Created On Thu September 18, 2014 12:37 PM
Topic View:

View thread in raw text format


cearl

Posts: 5
Joined: Jun 2014

Thu September 18, 2014 12:37 PM
User is offline View users profile

Appreciate the support forums, hoping one of the developers can help me out with this one!

I'm adding customer purchase orders to one of our Cabinets programmatically. It's scanning our orders email, saving the attachments/converting HTML emails to PDF files and saving those, and then adding the resulting document(s) to SentryFile. Everything is working great, but I just realised that SentryFile is not OCR'ing those documents, because I'm not telling it to!

So, I've looked at the Add_PutOCRFile() method but it's not really clear how I use that method call. Do I do that in conjuction with Add_PutFile or instead of? Could someone please explain how I can add a PDF file and request the system to OCR it as well? Thank you! I've posted my current method for adding the customer orders to the document management system.

        internal static bool AddToDocMgt( string fileName, string subject, string message )
        {
            bool result = false;
            try
            {
                if ( Global.DocMgtSessionId != null && Global.DocMgtSessionId != "" )
                {
                    int cabinetId = Global.DocMgtInstance.Repository_RetrieveID( Global.DocMgtSessionId, "A/R Customer POs" );
                    int docId = Global.DocMgtInstance.Add_CreateRecord( Global.DocMgtSessionId, cabinetId );
                    int revisionId = Global.DocMgtInstance.Revision_RetrieveCurrentRevisionID( Global.DocMgtSessionId, cabinetId, docId );
                    Global.DocMgtInstance.Add_PutFile( Global.DocMgtSessionId, cabinetId, docId, Path.GetFileName( fileName ), "", File.ReadAllBytes( fileName ) );
                    // Add index data, first processed field
                    int procId = Global.DocMgtInstance.RepositoryField_RetrieveID( Global.DocMgtSessionId, Properties.Settings.Default.IdxProcessed, cabinetId );
                    if ( Global.DocMgtInstance.Add_Validate_YesNo( Global.DocMgtSessionId, procId, false ) )
                    {
                        Global.DocMgtInstance.Add_PutIndex_YesNo( Global.DocMgtSessionId, cabinetId, docId, procId, false );
                    }
                    // Now the Date Entered Field
                    int dateId = Global.DocMgtInstance.RepositoryField_RetrieveID( Global.DocMgtSessionId, Properties.Settings.Default.IdxOrderDate, cabinetId );
                    if ( Global.DocMgtInstance.Add_Validate_Date( Global.DocMgtSessionId, dateId, DateTime.Now ) )
                    {
                        Global.DocMgtInstance.Add_PutIndex_Date( Global.DocMgtSessionId, cabinetId, docId, dateId, DateTime.Now );
                    }
                    // Next, the Email Subject if not empty
                    if ( subject != null && subject != "" )
                    {
                        int subjectId = Global.DocMgtInstance.RepositoryField_RetrieveID( Global.DocMgtSessionId, Properties.Settings.Default.IdxSubject, cabinetId );
                        if ( Global.DocMgtInstance.Add_Validate_Text( Global.DocMgtSessionId, subjectId, subject ) )
                        {
                            Global.DocMgtInstance.Add_PutIndex_Text( Global.DocMgtSessionId, cabinetId, docId, subjectId, subject );
                        }
                    }
                    // Finally, the Email Body, again if not empty
                    if ( message != null && message != "" )
                    {
                        int messageId = Global.DocMgtInstance.RepositoryField_RetrieveID( Global.DocMgtSessionId, Properties.Settings.Default.IdxBody, cabinetId );
                        if ( Global.DocMgtInstance.Add_Validate_Notes( Global.DocMgtSessionId, messageId, message ) )
                        {
                            Global.DocMgtInstance.Add_PutIndex_Notes( Global.DocMgtSessionId, cabinetId, docId, messageId, message );
                        }
                    }
                    result = Global.DocMgtInstance.Add_RecordIsActive( Global.DocMgtSessionId, cabinetId, docId );
                }
            }
            catch ( Exception e )
            {
                Global.ErrHandlingToConsole( "Processing.AddToDocMgt", e );
            }
            return result;
        }

 
Reply
   
Quote
   
Top
   
Bottom
     



SupportRep

Posts: 6587
Joined: Feb 2004

Fri September 19, 2014 8:56 PM
User is offline

Actually, just add a line of code that calls: Add_OCRPDF_LowPriority

This will kick off a background process that OCR's the document!

Let us know if you need anything else!

-------------------------
-SentryFile Support
 
Reply
   
Quote
   
Top
   
Bottom
     



cearl

Posts: 5
Joined: Jun 2014

Fri September 19, 2014 10:39 PM
User is offline View users profile

Perfect, thanks for the reply! Much appreciated.

Actually, one more question if I can about the built-in Full Text Indexing service. We've got SentryFile installed on a server running Windows Server 2012 R2 and it's running fine but obviously I can't use the Microsoft Indexing Service for the full-text index since the indexing service no longer exists on Server 2012.

So, in the course of us testing all of our cabinets and auto-capture jobs and the full-text indexing it looks like we've got some leftover junk in that folder (\SentryFile\App_Data\FullTextIndex). We've deleted all of the documents out of the various cabinets/repositories but is there a procedure for clearing out the built-in indexing database? If I simply delete the contents of that folder will it re-create an empty database or will I wreak havoc? LOL

Thanks again!
 
Reply
   
Quote
   
Top
   
Bottom
     



SupportRep

Posts: 6587
Joined: Feb 2004

Fri September 19, 2014 10:43 PM
User is offline

Hmmm... Good question!

I think deleting those files will corrupt the index. So, I wouldn't do that.

Generally speaking, the index should maintain itself. When a document is added through the SentryFile interface or API, then the system adds the keywords to the index. When a document is deleted through SentryFile interface or API, then the entries in the index are also removed.

Why do you want to delete those files? Would it be possible to leave them as-is?

-------------------------
-SentryFile Support
 
Reply
   
Quote
   
Top
   
Bottom
     



cearl

Posts: 5
Joined: Jun 2014

Fri September 19, 2014 10:45 PM
User is offline View users profile

No, absolutely it is possible to leave them. I was just wondering if the index was self-maintaining or not. If the index entries are eventually removed when the files are removed through the SentryFile interface then I'll leave everything be and let it take care of itself. Thanks for the answer.
 
Reply
   
Quote
   
Top
   
Bottom
     

View thread in raw text format
FORUMS > SentryFile Version 5 Topics > SF5 - API Integration / Programming Assistance < refresh >

Navigation:

FuseTalk 4.0 © 1999-2003 FuseTalk Inc.