ComicSearch 1.0

Apisitt Rattana and Andrew Davison
Dept. of Computer Engineering
Prince of Songkla University
Hat Yai, Songkhla, Thailand
E-mail: ad@coe.psu.ac.th

15th March 2001
Last Updated: 21st November, 2001

What is ComicSearch?
What do I need to run ComicSearch?
How to Install ComicSearch
Using ComicSearch
The Run Bar
Database Sources
Getting Past a Proxy/Firewall
Extending the Local Database
Software Used by ComicSearch
Changes History
Contacting Us

Quick Links

the winzipped ComicSearch application (about 205K) (last updated 21st November, 2001)
the local database titlesDB.txt (last updated 30th April, 2001)

1. What is ComicSearch?

ComicSearch searches for comic covers and information using:

Google
AuctionHawk
The Grand Comic Database (GCD)
Nick Simon's Marvel Silver Age Site
a database of good sources, including:
- Two Tub Man's site
- Mike's Amazing World of DC site
- and many more, which you can add to (details below)

2. What do I need to run ComicSearch?

Your machine must have a copy of Java 2 since ComicSearch is a Java application. Java can be obtained for free from http://java.sun.com/j2se/. Either download the Standard Development Kit (SDK) if you plan to write your own Java programs, or the Runtime Environment (JRE) which is sufficient for running ComicSearch.

The source, compiled code, support files, and documentation for ComicSearch only amount to a bit under 360K, so don't worry about disk space.

ComicSearch requires a network connection, so it can do its searching. ComicSearch can be used through a proxy/firewall (details below).

It displays its URLs by calling your machine's default browser, such as Netscape, Internet Explorer, or Opera (my current favourite).

ComicSearch has only been tested on Win 9* and 2000, but it should work on any platform that supports Java.

3. How to Install ComicSearch

Download the winzip file from http://coe.psu.ac.th/~ad/ComicSearch/CS.zip.
Unzip the file to produce the directory ComicSearch.
That's it.

4. Using ComicSearch

Open a DOS window inside the ComicSearch directory.
Start ComicSearch by typing:
```
C> java ComicSearch
```
If you access the Web through a firewall, you will have to do a bit more than this. See below for details.
After starting ComicSearch, the following application window will appear:

Also look occasionally at the DOS window since error messages and less important information messages are printed there.
Type in a title and issue number, then press the "Start Search" button.
For example, I entered Iron Man and 1. After a few seconds the table contains numerous entries, as shown in the picture:
Select a row from the results table by clicking anywhere in the row, then click on the title or URL to load the page into your browser.
To start a new query, press the "Reset" button.

5. The Run Bar

The 'run bar' at the top right of the GUI shows the status of the various searches. SS stands for Nick Simon's Marvel Silver Age search engine, GG for Google, DB for the local database, GCD for the Grand Comic Database search engine, and AH for the AuctionHawk search engine.

When searches are in progress, the squares will turn green. When the searches have finished, the boxes will revert to gray.

A user doesn't need to wait for the searches to complete, they can be asked to stop by the user presssing the "Stop Search" button. Note: the searches may continue for a short time after the button has been pressed.

6. Database Sources

The local database contains a large number of sites which ComicSearch will examine for comics. A list of these sources can be seen by clicking on the "Sources" combo box in the GUI:

Clicking on a source ID (e.g. GCD) will display a dialog box giving details on the source's name and URL:

The source IDs are also used in the fourth column of the table, to identify where comic information comes from. If you click on a sourceID in the column then the URL of that site will be loaded into your browser.

7. Getting Past a Proxy/Firewall

ComicSearch can be supplied with proxy and authorization information to permit it to access Web sites through a firewall.

The information must be added to the file proxyInfo.txt, which is in the ComicSearch directory. The format of the file consists of three special lines:

   proxyHost: [your proxy's address]
   proxyPort: [your proxy's port number]
   authID: [your login ID for authorization]

The authID line may not be necessary for some proxies. You need an authID line if you get a login/password dialog when you normally use the browser on your machine.

The file format allows blank lines, and lines beginning with // are treated as comments.

proxyInfo.txt is used if ComicSearch is called with the -proxy option:

    C> java ComicSearch -proxy

If ComicSearch sees an authID line in the file then it will display a password dialog box for you to enter your authorization password. DO NOT TYPE YOUR PASSWORD INSIDE proxyInfo.txt.

If you have several proxy/authorization IDs, then you can put the details in several text files (e.g. proxyInfo.txt and p2.txt) and supply the filename as an argument to the -proxy option:

    C> java ComicSearch -proxy p2.txt

8. Extending the Local Database

An important part of ComicSearch is its database of sites to search for comic information. The database is just a text file (titlesDB.txt) in the ComicSearch directory, with a fairly simple format.

One of my hopes for ComicSearch is that the large army of Comic collectors out there will add their information to the database, so improving ComicSearch for everyone.

Contributing your details is simple:

Put your information into the database format (explained in a moment).
E-mail the information to me (Andrew Davison, ad@coe.psu.ac.th).
I will update the database file (titlesDB.txt), which is located at http://coe.psu.ac.th/~ad/ComicSearch/titlesDB.txt.
When you want to upgrade your database, download this file and replace the old version. Bingo!!

8.1. The Database Format

Probably the best thing for you to do is to look through titlesDB.txt. The format is pretty obvious after a few examples.

A new site is identified by three source lines. e.g.:

     sourceID: AD
     sourceName: Andrew's Banana Collection
     sourceURL: http://foobar.com/~ad/banana/

The source URL is normally the top page of the collection.

This information is used by the "Sources" combo box and the fourth column of the display table.

After the source details, information on each title is supplied using 4 lines. e.g.:

    Title: The Mighty Banana
    Issues: 2 4 56-67 1002
    imageURL: http://foobar.com/ban***.htm
    $

This describes my extensive collection of the well-known comic "The Mighty Banana". Issues can be single numbers or ranges. imageURL is the URL of the comic details (e.g. the cover scans). But notice that the URL contains *'s, which will be replaced by a issue number when ComicSearch displays the URL.

For example, if I search for issue 2 of "The Mighty Banana", then the image URL returned will be http://foobar.com/ban002.htm. The *'s are left-padded with 0's if the issue number is smaller than the *'s.

If I search for issue 1002, the returned URL will be http://foobar.com/ban1002.htm. Numbers which are bigger than the *'s are inserted without any padding.

This means that your cover scans will have to be labelled in this style (e.g. the URLs must contain the issue number). This seems a fairly standard thing, and is actually helpful when you're storing 100's of scans at your site.

A collection of issues with the same title, but different image URLs can be grouped without repeating the title:

    Title: The Mighty Banana
    Issues 2 4 56-67 1002
    imageURL: http://foobar.com/ban***.htm
    $
    Issues: 1005-1222
    imageURL: http://narfoo.com/ban***.html 
    $

Blank lines and lines beginning with // are ignored inside titlesDB.txt when it is read in.

9. Software Used by ComicSearch

Of course, we used the wonderful class library in Java 2.

The analysis of the search engine results was greatly helped by using regular expressions supported by the COM.stevesoft.pat package, shareware release 1.3.2. It can be obtained from http://javaregex.com

We used two Java Tips from the JavaWorld Web site (http://www.javaworld.com):

Base64Converter.java
David W. Croft, Tip 47
http://www.javaworld.com/javaworld/javatips/jw-javatip47.html
A class that encodes a string using BASE 64 encoding.
BrowserControl.java
Steven Spencer, Tip 66
http://www.javaworld.com/javaworld/javatips/jw-javatip66.html
A class that simplifies the displaying of a URL in a browser.

10. Changes History

April 30th, 2001: added "From Cover to Cover" (Disney, Warner, etc) to titlesDB.txt
April 24th, 2001: updated "Mike's Amazing World of DC Comics" URL in titlesDB.txt
April 11th, 2001: updated URL for AuctionHawk search page
June 11th, 2001: updated anchor extraction for GCD searches
August 27th, 2001: updated URL extraction for AuctionHawk
November 21st, 2001: updated URL for Google search

11. Contacting Us

This ComicSearch Web page is: http://coe.psu.ac.th/~ad/ComicSearch/readme.html

Andrew Davison can be sent e-mail at: ad@coe.psu.ac.th

Back to Home Page

ComicSearch 1.0

Contents

Quick Links

8.1. The Database Format