ComicSearch 1.0

[PIC]
Apisitt Rattana and Andrew Davison
Dept. of Computer Engineering
Prince of Songkla University
Hat Yai, Songkhla, Thailand
E-mail: ad@coe.psu.ac.th

15th March 2001
Last Updated: 21st November, 2001
[PIC]

Contents

  1. What is ComicSearch?
  2. What do I need to run ComicSearch? [PIC]
  3. How to Install ComicSearch
  4. Using ComicSearch
  5. The Run Bar
  6. Database Sources
  7. Getting Past a Proxy/Firewall
  8. Extending the Local Database
  9. Software Used by ComicSearch
  10. Changes History
  11. Contacting Us

Quick Links


1. What is ComicSearch?

ComicSearch searches for comic covers and information using:


2. What do I need to run ComicSearch?

Your machine must have a copy of Java 2 since ComicSearch is a Java application. Java can be obtained for free from http://java.sun.com/j2se/. Either download the Standard Development Kit (SDK) if you plan to write your own Java programs, or the Runtime Environment (JRE) which is sufficient for running ComicSearch.

The source, compiled code, support files, and documentation for ComicSearch only amount to a bit under 360K, so don't worry about disk space.

ComicSearch requires a network connection, so it can do its searching. ComicSearch can be used through a proxy/firewall (details below).

It displays its URLs by calling your machine's default browser, such as Netscape, Internet Explorer, or Opera (my current favourite).

ComicSearch has only been tested on Win 9* and 2000, but it should work on any platform that supports Java.


3. How to Install ComicSearch


4. Using ComicSearch


5. The Run Bar

The 'run bar' at the top right of the GUI shows the status of the various searches. SS stands for Nick Simon's Marvel Silver Age search engine, GG for Google, DB for the local database, GCD for the Grand Comic Database search engine, and AH for the AuctionHawk search engine.

When searches are in progress, the squares will turn green. When the searches have finished, the boxes will revert to gray.

A user doesn't need to wait for the searches to complete, they can be asked to stop by the user presssing the "Stop Search" button. Note: the searches may continue for a short time after the button has been pressed.


6. Database Sources

The local database contains a large number of sites which ComicSearch will examine for comics. A list of these sources can be seen by clicking on the "Sources" combo box in the GUI:

Sources Combo Box

Clicking on a source ID (e.g. GCD) will display a dialog box giving details on the source's name and URL:

Sources Dialog Box

The source IDs are also used in the fourth column of the table, to identify where comic information comes from. If you click on a sourceID in the column then the URL of that site will be loaded into your browser.


7. Getting Past a Proxy/Firewall

ComicSearch can be supplied with proxy and authorization information to permit it to access Web sites through a firewall.

The information must be added to the file proxyInfo.txt, which is in the ComicSearch directory. The format of the file consists of three special lines:

   proxyHost: [your proxy's address]
   proxyPort: [your proxy's port number]
   authID: [your login ID for authorization]

The authID line may not be necessary for some proxies. You need an authID line if you get a login/password dialog when you normally use the browser on your machine.

The file format allows blank lines, and lines beginning with // are treated as comments.

proxyInfo.txt is used if ComicSearch is called with the -proxy option:

    C> java ComicSearch -proxy

If ComicSearch sees an authID line in the file then it will display a password dialog box for you to enter your authorization password. DO NOT TYPE YOUR PASSWORD INSIDE proxyInfo.txt.

If you have several proxy/authorization IDs, then you can put the details in several text files (e.g. proxyInfo.txt and p2.txt) and supply the filename as an argument to the -proxy option:

    C> java ComicSearch -proxy p2.txt

8. Extending the Local Database

An important part of ComicSearch is its database of sites to search for comic information. The database is just a text file (titlesDB.txt) in the ComicSearch directory, with a fairly simple format.

One of my hopes for ComicSearch is that the large army of Comic collectors out there will add their information to the database, so improving ComicSearch for everyone.

Contributing your details is simple:

  1. Put your information into the database format (explained in a moment).
  2. E-mail the information to me (Andrew Davison, ad@coe.psu.ac.th).
  3. I will update the database file (titlesDB.txt), which is located at http://coe.psu.ac.th/~ad/ComicSearch/titlesDB.txt.
  4. When you want to upgrade your database, download this file and replace the old version. Bingo!!

8.1. The Database Format

Probably the best thing for you to do is to look through titlesDB.txt. The format is pretty obvious after a few examples.

A new site is identified by three source lines. e.g.:

     sourceID: AD
     sourceName: Andrew's Banana Collection
     sourceURL: http://foobar.com/~ad/banana/

The source URL is normally the top page of the collection.

This information is used by the "Sources" combo box and the fourth column of the display table.

After the source details, information on each title is supplied using 4 lines. e.g.:

    Title: The Mighty Banana
    Issues: 2 4 56-67 1002
    imageURL: http://foobar.com/ban***.htm
    $

This describes my extensive collection of the well-known comic "The Mighty Banana". Issues can be single numbers or ranges. imageURL is the URL of the comic details (e.g. the cover scans). But notice that the URL contains *'s, which will be replaced by a issue number when ComicSearch displays the URL.

For example, if I search for issue 2 of "The Mighty Banana", then the image URL returned will be http://foobar.com/ban002.htm. The *'s are left-padded with 0's if the issue number is smaller than the *'s.

If I search for issue 1002, the returned URL will be http://foobar.com/ban1002.htm. Numbers which are bigger than the *'s are inserted without any padding.

This means that your cover scans will have to be labelled in this style (e.g. the URLs must contain the issue number). This seems a fairly standard thing, and is actually helpful when you're storing 100's of scans at your site.

A collection of issues with the same title, but different image URLs can be grouped without repeating the title:

    Title: The Mighty Banana
    Issues 2 4 56-67 1002
    imageURL: http://foobar.com/ban***.htm
    $
    Issues: 1005-1222
    imageURL: http://narfoo.com/ban***.html 
    $

Blank lines and lines beginning with // are ignored inside titlesDB.txt when it is read in.


9. Software Used by ComicSearch

Of course, we used the wonderful class library in Java 2.

The analysis of the search engine results was greatly helped by using regular expressions supported by the COM.stevesoft.pat package, shareware release 1.3.2. It can be obtained from http://javaregex.com

We used two Java Tips from the JavaWorld Web site (http://www.javaworld.com):


10. Changes History


11. Contacting Us

This ComicSearch Web page is: http://coe.psu.ac.th/~ad/ComicSearch/readme.html

Andrew Davison can be sent e-mail at: ad@coe.psu.ac.th


Back to Home Page