Logic Programming with Perl and Prolog
by Robert Pratte
|
Pages: 1, 2
Now that I have loaded my Prolog database, I need to feed it some more information. I need to take my data, in Dot format, and translate it into something that my Prolog interpreter will understand. There are some modules out there that may be helpful, such as DFA::Simple, but since I can assume that my data will look a certain way--having written it from my other application--I will build my own simple parser. First, I am going to take a look at the data.
The visualization program created the diagram in Figure 1 from the code:
digraph family_tree {
{ jill [ color = pink ]
rob [ color = blue ] } -> { ann [ color = pink ]
joe [ color = blue ] } ;
{ sue [ color = pink ]
dan [ color = blue ] } -> { sara [ color = pink ]
mike [ color = blue ] } ;
{ nan [ color = pink ]
tom [ color = blue ] } -> sue ;
{ nan
jim [ color = blue ] } -> rob ;
{ kate [ color = pink ]
steve [ color = blue ] } -> dan ;
{ lucy [ color = pink ]
chris [ color = blue ] } -> jill ;
}

Figure 1. A family tree from the sample data
There are a few peculiarities worth mentioning here. First, it may seem that the all-lower-case names are a bit strange, but I am already preparing for the convention that data in Prolog is typically lower-case. Also, I inserted an extra space before the semicolons in an effort to make matching them easier. While both of these conventions are easy to code around, they seems to create extra questions when illustrating a point. Therefore, assume that the above Dot snippet illustrates the range of possible formats in the example. While the "real-world examples" may provide a richer set of possibilities, the fact that applications with defined behavior generated this data will limit the edge cases.
Returning to the data, it will be easiest to parse the Dot data using a simple state machine. Previously, I had defined some constants to represent states:
use constant { modInit => 0,
modTag => 1,
modValue => 2 };
Basically, I assume that anything on the left-hand side of the = is a parent and anything on the right is a child. Additionally, modifiers (in this case only color) begin with a left square-bracket and males have the blue modifier, whereas females are pink. I know that I have completed a parent-child relationship "block" when I hit the semicolon. Past these stipulations, if it isn't a character I know that I can safely ignore, then it must be a noun.
sub parse_dotFile {
##----------------------------------------
## Examine data a word at a time
##----------------------------------------
my @dotData = split( /\s+/, shift() );
my ( $familyBlock, $personName, @prologQry ) = ();
my $personModPosition = modInit;
my $relationship = 'parent';
for ( my $idx = 3; $idx < @dotData; $idx++ ) {
chomp( $dotData[$idx] );
SWITCH: {
## ignore
if ( $dotData[ $idx ] =~ /[{}=\]]/ ) {
last SWITCH; }
## begin adding attributes
if ( $dotData[ $idx ] eq '[' ) {
$personModPosition = modTag;
last SWITCH; }
## switch from parents to children
if ( $dotData[ $idx ] eq '->' ) {
$relationship = 'child';
last SWITCH; }
## end of this block
if ( $dotData[ $idx ] =~ /\;/ ) {
##-----------------------------------------
## Generate is_parent rules for Prolog
##-----------------------------------------
foreach my $parentInBlock ( @{ $familyBlock->{ parent } } ) {
foreach my $childInBlock ( @{ $familyBlock->{ child } } ) {
push( @prologQry,
"is_parent(${parentInBlock}, ${childInBlock})" );
}
}
$familyBlock = ();
$relationship = 'parent';
last SWITCH; }
## I have a noun, need to set something
else {
## I have a modifier tag, next is the value
if ( $personModPosition == modTag ) {
$personModPosition = modValue;
last SWITCH;
} elsif ( $personModPosition == modValue ) {
##--------------------------------------
## Set modifier value and reset
## We currently assume it is color
##--------------------------------------
if ( $dotData[ $idx ] eq 'blue' ) {
push( @prologQry, "is_male(${personName})" );
} else {
push( @prologQry, "is_female(${personName})" );
}
$personModPosition = modInit;
$personName = ();
last SWITCH;
} else {
##--------------------------------------
## Grab the name and id as parent or child
##--------------------------------------
$personName = $dotData[ $idx ];
push( @{ $familyBlock->{ $relationship } }, $personName );
}
}
}
}
return( \@prologQry );
}
Rather than simply pushing my new rules into the Prolog interpreter directly, I return an array that contains the full ruleset. I am doing this so that I can easily dump it to a file for troubleshooting purposes. I can simply write the rules to a file, and consult this file in a Prolog shell.
With a subroutine to parse my Dot file into Prolog rules, I can now push those rules into the interpreter:
##-------------------------------------------
## Read in Dot file containing relations
## and feed it into the Prolog instance
##-------------------------------------------
open( DOTFILE, 'family_tree.dot' ) or die "$! \n";
my $parsedDigraph = parse_dotFile( <DOTFILE> );
close( DOTFILE );
foreach ( @$parsedDigraph ) {
$prologDB->do("assert($_).");
}
Now I can easily query my Prolog database using the query method in AI::Prolog:
##-------------------------------------------
## Run the query
##-------------------------------------------
$prologDB->query( "is_cousin(joe, sara)." );
while (my $results = $prologDB->results) { print "@$results\n"; }
What Next?
Even though this is a trivial example, I think that it provides an idea of the powerful ways in which Perl can be supplemented with Prolog. Just within the context of evaluating genealogical data (a mainstay of Prolog tutorials and examples), it seems that a Perl/Prolog application that uses genealogical data from open source genealogical software or websites would be a killer application. The possibilities seem endless: rules based upon Google maps, mining information from online auctions or news services, or even harvesting information for that new test harness are all tremendous opportunities for the marriage of Perl and Prolog.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 1 of 1.
- Alternate solution to an interesting problem
2005-12-20 15:14:07 trwww [Reply]
Thanks for the article. This is a problem space I have been thinking about a lot lately.
Your article has definitely made me aware of what Prolog is and made me want to start looking in to it to solve relationship problems.
After getting your example running, I solved the problem using my solution to this kind of thing. It involves a rdbms (mysql in this case), perl, and Class::DBI.
I am NOT saying which way is better, more efficient, more scaleable, etc, etc... It is just how I would have done it had I never read this article.
I am definitely interested in a discussion on how the two solutions might be merged together for optimization (think large data sets).
The beginning of the program sets up the Class::DBI subclass and two CDBI classes. The first class abstracts a table with two columns, id and name. The second class is a mapping class used to signify a parent <-> child relationship between two people.
The driver part of the code loops over each person in the table. For each person in the table, another loop over each person in the table is ran. Inside the looping code, a check is made to see if the two people are cousins. If they are, a message is printed to the screen.
There is some SQL after the __END__ marker to recreate the tables.
Here is the output of the program:
$ perl ancestry.pl
ann and sara are cousins
ann and mike are cousins
joe and sara are cousins
joe and mike are cousins
sara and ann are cousins
sara and joe are cousins
mike and ann are cousins
mike and joe are cousins
and now the code :0)
use warnings;
use strict;
package FamilyTree::DBI;
use base qw|Class::DBI|;
our($dsn, $user, $pass) = qw(dbi:mysql:test user pass);
__PACKAGE__->connection( $dsn, $user, $pass );
package FamilyTree::Person;
use base qw|FamilyTree::DBI|;
__PACKAGE__->table("people");
__PACKAGE__->columns(All => qw(id name));
__PACKAGE__->has_many(
children =>
[ 'FamilyTree::Relationship' => 'child' ] =>
'parent'
);
__PACKAGE__->has_many(
parents =>
[ 'FamilyTree::Relationship' => 'parent' ] =>
'child'
);
sub is_sibling {
my( $self, $person ) = @_;
return if ( $self->id == $person->id );
my $sql = 'SELECT DISTINCT parent FROM parentchild where child in (?, ?)';
my $sth = __PACKAGE__->db_Main->prepare( $sql );
$sth->execute( $self->id, $person->id );
my $count = @{ $sth->fetchall_arrayref };
$sth->finish;
return( $count == 2 ? 1 : 0 );
}
sub is_cousin {
my( $self, $person ) = @_;
return if ( $self->id == $person->id );
return if ( $self->is_sibling( $person ) );
my @parents = $self->parents;
# Danish word for aunt or uncle
my @onkel = $person->parents;
my $sql = <<' END_OF_SQL';
SELECT
count(id) AS counter
FROM
parentchild
WHERE
child IN (?, ?, ?, ?)
GROUP BY
parent
HAVING
counter > 1
END_OF_SQL
my $sth = __PACKAGE__->db_Main->prepare( $sql );
$sth->execute( map $_->id, @parents, @onkel );
my $result = $sth->fetchrow_array;
$sth->finish;
return( $result ? 1 : 0 );
}
package FamilyTree::Relationship;
use base qw|FamilyTree::DBI|;
__PACKAGE__->table("parentchild");
__PACKAGE__->columns(All => qw(id parent child));
__PACKAGE__->has_a( parent => 'FamilyTree::Person' );
__PACKAGE__->has_a( child => 'FamilyTree::Person' );
package main;
my $people = FamilyTree::Person->retrieve_all;
while ( my $person = $people->next ) {
my $cousins = FamilyTree::Person->retrieve_all;
while ( my $cousin = $cousins->next ) {
if ( $person->is_cousin( $cousin ) ) {
print( $person->name, ' and ', $cousin->name, " are cousins\n" );
}
}
}
__END__
CREATE TABLE people (
id int(10) unsigned NOT NULL auto_increment,
name varchar(31) NOT NULL default '',
PRIMARY KEY (id),
UNIQUE KEY id (id),
KEY id_2 (id)
) TYPE=MyISAM;
INSERT INTO people VALUES (NULL,'chris');
INSERT INTO people VALUES (NULL,'lucy');
INSERT INTO people VALUES (NULL,'jim');
INSERT INTO people VALUES (NULL,'nan');
INSERT INTO people VALUES (NULL,'tom');
INSERT INTO people VALUES (NULL,'steve');
INSERT INTO people VALUES (NULL,'kate');
INSERT INTO people VALUES (NULL,'jill');
INSERT INTO people VALUES (NULL,'rob');
INSERT INTO people VALUES (NULL,'sue');
INSERT INTO people VALUES (NULL,'dan');
INSERT INTO people VALUES (NULL,'ann');
INSERT INTO people VALUES (NULL,'joe');
INSERT INTO people VALUES (NULL,'sara');
INSERT INTO people VALUES (NULL,'mike');
CREATE TABLE parentchild (
id int(10) unsigned NOT NULL auto_increment,
parent int(10) unsigned NOT NULL default '0',
child int(10) unsigned NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE KEY id (id),
KEY id_2 (id)
) TYPE=MyISAM;
INSERT INTO parentchild VALUES (NULL,1,8);
INSERT INTO parentchild VALUES (NULL,2,8);
INSERT INTO parentchild VALUES (NULL,3,9);
INSERT INTO parentchild VALUES (NULL,4,9);
INSERT INTO parentchild VALUES (NULL,4,10);
INSERT INTO parentchild VALUES (NULL,5,10);
INSERT INTO parentchild VALUES (NULL,6,11);
INSERT INTO parentchild VALUES (NULL,7,11);
INSERT INTO parentchild VALUES (NULL,8,12);
INSERT INTO parentchild VALUES (NULL,9,12);
INSERT INTO parentchild VALUES (NULL,8,13);
INSERT INTO parentchild VALUES (NULL,9,13);
INSERT INTO parentchild VALUES (NULL,10,14);
INSERT INTO parentchild VALUES (NULL,11,14);
INSERT INTO parentchild VALUES (NULL,10,15);
INSERT INTO parentchild VALUES (NULL,11,15);



