INTELLIGENT WORK FORUMS
FOR ENGINEERING PROFESSIONALS

Log In

Come Join Us!

Are you an
Engineering professional?
Join Eng-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Eng-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

searching and form filling software

searching and form filling software

(OP)
For projects we need to manually search thru documents and find answers to the same questions on a recurring basis. Essentially there is a form that is filled with technical data pulled from documents that could either be from the customer or internally created. It is time consuming because because of variations in the document structures.

What I am looking for is a software tool that I could point to a directory of documents and it would find the answers to the standard questions. All the data mining software I have looked at is either geared towards marketing or legal firms, and actually are more complex than is required.

Windows advanced search is not that effective on it's own (or that is my experience anyway). It seems like a simple concept considering the power of searching, text mining and semantic searching tools on the market. Just find the answers/data for 30 repeating questions per project.

thanks
Snowshoe2

RE: searching and form filling software

When I was using Windows, I found Agent Ransack to be far, far superior to Windows Search.
Also, I hated that stupid dog.

Note also that Windows Search is preset to not bother searching certain kinds of files; MS does not go out of its way to make this point clear. There's a system setting for it, I forget where or how to change its behavior, but Google should help with that.

None of that will help with your problem.

Are the documents stored as word processor files, e.d. *.doc, or as scanned images, or what?

If they'e in a form that the computer recognizes as text, you might be able to use *nix tools like grep and awk (Windows versions exist) to extract data from them. This becomes easier if the formats are standardized or the data fields are tagged somehow, but probably cannot be completely automated.

... But wait a sec. The hospitals I have visited in recent years have equipped every one of their computers with a document scanner, and they scan _everything_. I was thinking it was odd, because of the huge amount of space required to store all those documents as scanned images.

It occurs now that maybe they don't keep the images forever. If instead, they cached the scanned images and had them OCR'd and, er, mined, they could get whatever information they needed from each image, and just store the information, which takes up much less space than an image, and can be sorted, searched, etc.

A Google search on "data mining service", without the quotes, was quite productive. Clearly, multiple outfits have made a business out of data mining for other businesses, so maybe you don't have to buy the data mining software yourself.

Mike Halloran
Pembroke Pines, FL, USA

RE: searching and form filling software

(OP)
Thanks for the ideas Mike, and your hospital experience with computers pulling data and making it work is what I what I would like to create here, it is such a time waster looking for information, I just want to define what I need and let the computer do the work.

all the best
Snowshoe2

RE: searching and form filling software

snowshoe2,

Do you know what these questions are?

Could you write a FAQ?

Learn HTML. The language is dead simple. Composing it in NOTEPAD probably is simpler than using the commercial editors. A FAQ written in HTML allows you to answer the questions, and link to the more detailed documentation.

--
JHG

RE: searching and form filling software

Wrong thread, drawoh?

Mike Halloran
Pembroke Pines, FL, USA

RE: searching and form filling software

It should be straightforward to write a program in Python or Perl to parse your data and then output it to a text document. You could then write it to a text file with the proper spacing to fill out your forms.

You may, in fact, be able to use tools like awk, sed, and grep to do what you'd like to do without having to get into a full-fledged programming language like Python or Perl.

Not knowing how much variation you have from document to document, it's hard to say how successful an automated search will be... but if there are known keywords, etc. you can probably work out some logic to parse the files.

RE: searching and form filling software

(OP)
Thanks flash3780,
there are known keywords that we can work with, I was hoping to find an off the shelf program that would just take a bit of setup.

RE: searching and form filling software

I think that grep and sed are probably the closest thing to an off-the-shelf solution that you'll come across. grep is a tool that parses text files and returns lines which match an expression (e.g. your keywords). sed can manipulate that text to output only the information that you want. grep can be piped into sed, so between the two, you should be able to grab lines containing the keywords you're looking for, trim any info you don't need from those lines, and dump it into a text file with stdout.

If you're dealing with Word documents, you may need a tool like vmware (http://wvware.sourceforge.net/) or docx2txt(http://docx2txt.sourceforge.net/) to batch convert them to text files so that you can parse them more easily. I haven't tried those, so I can't vouch for how well they work... but the price is right.

At the end of the day, you should be able to put together a short shell script to do what you want, I think.

RE: searching and form filling software

Textpad http://www.textpad.com/ has a "find in files" function that works perfectly to find a certain string/word,etc.. in all sorts of file types.

RE: searching and form filling software

Textpad has a batch mode? Neat. If the OP is still following the thread, an example of the text to be parsed would be helpful.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Eng-Tips Forums free from inappropriate posts.
The Eng-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Eng-Tips forums is a member-only feature.

Click Here to join Eng-Tips and talk with other members!


Resources


Close Box

Join Eng-Tips® Today!

Join your peers on the Internet's largest technical engineering professional community.
It's easy to join and it's free.

Here's Why Members Love Eng-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close