Automating Windows Applications with Win32::OLE

Getting My Feet Wet

My first glimpse of the Internet happened at Lotus Development somewhere around 1994. A Mosaic demonstration duly impressed me. A few months later, working at a small Cambridge, MA company called Dataware brought me closer to the online revolution.

I landed a job at America Online in 1995 working for their browser team. They originally had their own browser, which they had purchased from CMGI. It was kind of cool at the time because it included a tabbed frames feature about eight years before the Gecko engine.

It was there that Pete Deschanes wrote a tool using Microsoft Visual Test to automate the America Online embedded web browser. I was especially interested in certain functions he was using that captured browser events.

Later, I moved on to a company in Chelmsford, MA that was using OLE Automation to drive the translation of Microsoft Word Documents into Fax documents. Rob Murtha explained a lot to me about OLE Automation and introduced me to Perl and Java.

Washed Up

The internet bubble burst for me in 2002 and I wound up stranded on the shores of Fidelity Investments as a manual tester for one of their many investment Web page groups. After a few weeks of manual testing, I was ready to automate several of my tasks.

There was a problem. There were plenty of available Winrunner licenses, but I was not allowed to use them because I was not part of their automation group. In fact, I was reprimanded and almost lost my job for being too persistent in asking to use one. Then I decided to write my own automation tool.

I had experience with C, C++, Java, and Perl. I decided to start with a scripting language just to get a prototype going. I also thought it would be cool to have a real open source script language to write code with instead of something that a few engineers had developed exclusively for Web automation.

My process consisted of:

  • Ask a question.
  • Do some research.
  • Write some code.

Getting Started

The first thing I needed to do was to see if I could start IExplore.exe from Perl. I knew I did not want to simply start the process with system("C:\\Program Files\\Internet Explorer\\IExplore.exe");. Yes, I could get IE up and running that way, but I would not be able to do anything useful with it except to kill it.

I noticed that Active Perl contained an interesting module called Win32::OLE. I opened the OLE.pm file and began to read the comments.

Comments like this looked very promising:

This module provides an interface to OLE Automation from Perl. OLE Automation brings VisualBasic-like scripting capabilities and offers powerful extensibility and the ability to control many Win32 applications from Perl scripts.

This one also looked pretty good:

The MessageLoop() class method will run a standard Windows message loop, dispatching messages until the QuitMessageLoop() class method is called. It is used to wait for OLE events.

Keeping in mind that faith is the substance of things hoped for and the evidence of things not seen, I set about to write a Simple Automation Module for Internet Explorer using Active Perl’s Win32::OLE.

Going back to the Win32::OLE documentation I found out how to start IE through the COM object. I translated some examples for Excel and Word and wound up with this:

$IE = Win32::OLE->new("InternetExplorer.Application")
    || die "Could not start Internet Explorer.Application\n";

That was nice, but nothing appeared on my computer screen. I could hear the hard drive making a sound like it was starting an application but I couldn’t see Internet Explorer. I decided to Google for some examples. The information out there was very sparse but I found something that set the visible attribute to 1:

$IE->{visible} = 1;

This time Internet Explorer appeared with a blank screen. It was a start. I figured that there were a bunch of IE processes that I could not see on my machine from my previous efforts, so I killed those using the task manager.

Next, I started a free Microsoft tool called OLEVIEW.exe. This gave me a tree view of all automation objects registered on my machine. There were hundreds of them. I found the one called Internet Explorer (Ver. 1.0) and expanded the tree looking for methods. IWebBrowser2 looked interesting so I clicked on that and selected the View Type Info button. Out popped up a new window with a list of methods. This was looking better all the time.

I clicked on a method called Navigate and saw:

[id(0x00000068), helpstring("Navigates to a URL or file.")].
void Navigate(
    [in] BSTR URL,
    [in, optional] VARIANT* Flags,
    [in, optional] VARIANT* TargetFrameName,
    [in, optional] VARIANT* PostData,
    [in, optional] VARIANT* Headers);

I decided to try it:

$IE->Navigate("http://www.google.com");

Now I had navigated to my first website with my new automation tool.

On the Road

I knew I was getting close to hitting the wall. I had seen through Google and newsgroups that lots of folks had reached this point in the game and turned around. Remembering back to Pete Deschanes’ Visual Test Tool, I was certain that I could make more progress if I could capture Internet Explorer’s events. I went back to the OLE.pm documentation to do a little more reading about events.

=item Win32::OLE->WithEvents(OBJECT[, HANDLER[, INTERFACE]])

This class method enables and disables the firing of events by
the specified OBJECT.

If I could just grok what OBJECT, HANDLER, and INTERFACE represented, I felt that I could get my events. I made some guesses.

The Object: that would be what Win32::OLE->new() returned. Everyone knows you instantiate an object with the new operator.

The Handler: I read further through OLE.pm:

The HANDLER argument to Win32::OLE->WithEvents() can either be a CODE reference or a package name. In the first case, all events will invoke this particular function. The first two arguments to this function will be the OBJECT itself and the name of the event. The remaining arguments will be event-specific.

Win32::OLE->WithEvents($Obj, \&Event);

Now I understood that WithEvents was going to tell Internet Explorer to call my Perl Handler whenever IE fired an event. I had to give the WithEvents call a reference to my subroutine like this:

\&Event

What was the Interface name going to be? I went back to OLEVIEWER and looked through the interface folder. It looked like DwebBrowserEvents2 would deliver what I wanted.

This is what I came up with:

Win32::OLE->WithEvents($IE,\&Event,"DWebBrowserEvents2");

Now all I needed was to write an Event subroutine for IE to call. The OLE.pm comments told me that “the first two arguments to this function will be the OBJECT itself and the name of the event”, so I just used the document example.

sub Event {
    my ($Obj,$Event,@Args) = @_;
    print "Event triggered: '$Event'\n";
}

I noticed that there was a third argument called @Args and assumed that this was there to catch all other unknown parameters for each event.

How was I going to block my code to wait for the events to transpire? I returned to the Win32::OLE comments:

The MessageLoop() class method will run a standard Windows message loop, dispatching messages until the QuitMessageLoop() class method is called. It is used to wait for OLE events.

I added this code, just to see if I could capture IE events:

Win32::OLE->MessageLoop();

I’ve Been to the Mountain Top

It was quite a moment when I saw these events pouring out of my nine lines of code:

Event triggered: CommandStateChange
Event triggered: OnVisible
Event triggered: PropertyChange
Event triggered: BeforeNavigate2
Event triggered: DownloadBegin
Event triggered: StatusTextChange
Event triggered: ProgressChange
Event triggered: FileDownload
Event triggered: DownloadComplete
Event triggered: TitleChange
Event triggered: NavigateComplete2
Event triggered: OnQuit

Internet Explorer was firing off events and its COM object was calling my Perl Event subroutine.

From here, I kept searching the newsgroups. I found one sentence where someone mentioned acquiring the DOM from the DocumentComplete Event. I knew this was a key, but how could I take this reference using Perl?

I read up about the DOM. I borrowed a book from someone at work. Microsoft has their own version of the DOM called DHTML and I came across their Web page. After reading this documentation for a while, I saw that the DOM could give me everything I needed to have a full-blown automation tool. All I needed was the reference.

Finding the DOM

I broke up my Event subroutine into pieces. I wanted to do something different for each triggered event. Specifically I wanted to try to take a reference to the DOM when the DocumentComplete event triggered. My idea was that if I shifted out the first element of the @Args array, I would find the reference I was looking for. I rewrote the Event subroutine:

sub Event {
    my ($Obj,$Event,@Args) = @_;
    print " Event triggered: $Event\n";
    if ($Event eq "DocumentComplete") {
        $IEObject = shift @Args;
        print "Here is my reference: $IEObject\n";
    }
}

This printed out:

Here is the Event: DocumentComplete
Here is my reference: Win32::OLE=HASH(0x1a524fc)

This was looking better all the time.

I had a reference, but was it to the DHTML on the page that had just loaded? There was only one way to find out: could I use it to make a DHTML call? I looked on the Microsoft DHTML reference page for a property that would tell me I had a reference. URL looked like a good one, so I tried this code:

print "URL: " . $IEObject->URL . "\n";

That gave me nothing. I went back to the OLEVIEWER and found this interesting method

[id(0x000000cb), propget, helpstring("Returns the active 
   Document automation object, if any.")]
IDispatch* Document();

The Active document sounded good, so I tried:

print "URL: " . $IEObject->Document->URL . "\n";

This gave me:

URL: http://www.google.com/

Golden! My next step was to find a way to break out of the MessageLoop() I was in.

Win32::OLE->QuitMessageLoop();

Going On Home

My last step was to do something more useful with my reference to the DHTML. It was time to write a subroutine that would enter text into an edit box. That would seal my proof of concept.

Other automation tools need you to make a GUI map (Mercury Winrunner) or an include file (Segue Silk) of every page you view before running the automation. I wanted something that would just look through the code that was already on the page and pick out a control on the fly using the power of regular expressions.

I found all of the methods and properties in the following example in the Microsoft DHTML API documentation. First I needed a name for my subroutine. SetEditBox seemed easy to understand.

sub SetEditBox {
}

Next I need to pass in two parameters. The first would be the name of the control and the second would set the text.

sub SetEditBox {
    my ($name,$value) = @_;

I had to start with the document object from my reference to the DOM:

sub SetEditBox {
    my ($name,$value) = @_;
    $IEDocument = $IEObject->{Document};

To save time iterating, I made the assumption that edit boxes would only appear inside forms. I used the collection called forms to return all forms on the page.

$forms = $IEDocument->forms;

Now it was time to iterate.

for ($i = 0; $i < $forms->length; $i++) {
}

First I needed each item in the form collection.

$form = $forms->item($i);

Inside this iteration I wanted to find a specific element of the form with the name of the edit box.

if (defined($form->elements($name))) {
}

Inside this if statement I wanted to set the value of the edit box to the value passed into the subroutine.

$form->elements($name)->{value} = $value;

Then it was time to get out of Dodge so I wouldn’t waste time continuing the iteration.

return;

Here is the final initial subroutine.

sub SetEditBox {
    my ($name, $value) = @_;
    my $IEDocument     = $IEObject->{Document};
    my $forms          = $IEDocument->forms;

    for (my $i = 0; $i < $forms->length; $i++) {
        my $form       = $forms->item($i);
        if (defined($form->elements($name))) {
           $form->elements($name)->{value} = $value;
        }
        return;
    }
}

Here is something very similar to the first original proof of concept version of SAMIE:

use Win32::OLE qw(EVENTS);
my $URL = "http://samie.sf.net/simpleform.html";
my $IE  = Win32::OLE->new("InternetExplorer.Application")
    || die "Could not start Internet Explorer.Application\n";
Win32::OLE->WithEvents($IE,\&Event,"DWebBrowserEvents2");

$IE->{visible} = 1;

$IE->Navigate($URL);

Win32::OLE->MessageLoop();
SetEditBox("name","samie");

sub Event {
    my ($Obj,$Event,@Args) = @_;
    print "Here is the Event: $Event\n";
    if ($Event eq "DocumentComplete") {
        $IEObject = shift @Args;
        print "Here is my reference: $IEObject\n";
        print "URL: " .  $IEObject->Document->URL . "\n";
            Win32::OLE->QuitMessageLoop();
    }
}

sub SetEditBox {
    my ($name, $value) = @_;
    my $IEDocument     = $IEObject->{Document};
    my $forms          = $IEDocument->forms;

    for (my $i = 0; $i < $forms->length; $i++) {
        my $form       = $forms->item($i);
        if (defined($form->elements($name))) {
           $form->elements($name)->{value} = $value;
        }
        return;
    }
}

It made me laugh and tip my hat to Larry Wall, to think that I had the basic proof of concept of a $3,000 dollar per seat automation tool with about thirty lines of Perl. See more at the SAMIE home page.

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub