A Tour of Blue Crab

Download Blue Crab


Welcome to the Blue Crab Tour. This is not an exhaustive tour of features. Please read the help documentation for more complete information on available features. Help documentation is available from the Help menu when Blue Crab is running. You should also consult the version history, included in the software distribution, for a complete list of recent changes.

Context sensitive help is also available in most windows. Click the button with the help icon "?" to open the reference documentation for that window in the Help Viewer. The Help Viewer also enables you to search the reference material using keywords.

Blue Crab also provides help with help tags. If you hold the mouse over any window item, such as a button, and wait a couple of seconds a help tag will display briefly describing that item, as shown here:

To get started please download a copy of Blue Crab. This is a hands on tour of Blue Crab to get you familiar with the product in a short amount of time.

You can get a copy of Blue Crab here. After you download Blue Crab place it anywhere you like. A good place for it is in the Applications folder of your computer.

This tour will cover the following topics:

Launch Blue Crab and enter a password

Grab a web site

Troubleshooting

Create a bookmark

Create a configuration

Search the downloaded files

Improving speed

The discussion below refers to a fictional web site named "www.some-host.com". Please use a real web site of your own choosing.

1) Launch Blue Crab and enter a password (top)

If this is the first time you are running Blue Crab a dialog box will be displayed alerting you that Blue Crab requires a password to run. Use the "Register..." button to obtain a password.

The "Register..." button takes you to our Purchase page where you can purchase any of our products, including Blue Crab. After you purchase a product you will receive a permanent password for each product by email. Permanent passwords are valid through the life of the program, i.e. all versions.

After you enter the password in the Password dialog box the "Save" button is activated. Click the Save button to verify and save the password. Click the red close button in the upper left portion of the dialog to dismiss the window when you are done.

2) Grab a web site (top)

Once a valid password has activated Blue Crab you are ready to start grabbing web sites. By default Blue Crab creates a folder named "Grabbed files" in the Documents folder of your Home folder. This is where Blue Crab will save the files it downloads. Later on we will see how you can change this location, as well as many other downloading parameters using configurations.

Select "Crawl URL..." under the Grab menu. The "Grabber" window will appear.

The Grabber is used to monitor the progress of a "crawl". At the top of the window is a text field named "URL". Enter the page address you want to start crawling. When you click the "Start" button at the bottom right of the Grabber (or press the "return" or "enter" keys) Blue Crab will retrieve the page at "URL" and save it to your hard drive in the location:

~/Documents/Grabbed files/www.some-host.com/

(The ~ refers to your "Home" directory. Blue Crab always creates a top level directory named after the domain name of the starting URL, in this case "www.some-host.com", and puts all downloaded files for that domain inside it.)

Blue Crab then scans the page at "URL" for URL's it may contain and proceeds to grab them, one at a time, saving them into the proper subdirectories of the "www.some-host.com" directory.Blue Crab continues in this fashion until it runs out of pages which were not yet visited, at which point the process stops.

The starting URL corresponds to "Level 1". All the URL's referenced by the starting URL comprise the "Level 2" URL's, and so on. Note that the Grabber window displays progress by Level, as well as the progress of the current URL Blue Crab is downloading.

Note: unless you configure Blue Crab differently, it will not only download the page, but all resources like graphics which are on that page. You can change this using Configuration files, discussed below.

At the bottom of the Grabber window is the "Status" and "log" panels. The Status panel details the current operation Blue Crab is performing. The log panel shows the URL's that were grabbed, along with header information provided by the server.

Blue Crab also provides two other methods for grabbing web sites.:

3)Troubleshooting (top)

Blue Crab does best on web sites. that consist of standard HTML for formatting because the content on such sites is not dynamic. If the site you download is not working, in the sense you can't navigate it well offline, then consider the following:

Sites that contain JavaScript do not always work well because Blue Crab does not process JavaScript.

So, for example, if a page contains JavaScript code to generate links to other pages dynamically then those links are probably unreachable by Blue Crab.

Sites that rely on CGI's etc. may not work well because the CGI is not running on your local copy.

So, for example, if a page contains a form then submitting that form using the downloaded copy will not work because the program that processes the form resides on the server.

In fact any content which is generated programmatically on the server will not work on the local copy of the web site

In other words functionality may be lost due to the fact that the pages are not being served by a web server.

4) Create a bookmark (top)

The "Bookmarks" window is useful for collecting common starting URLs for the Grabber. You can also use this window to assign a Configuration to a bookmark.

To create a bookmark select the Bookmarks window by clicking on it and then choose "Insert Bookmark" from the "Bookmarks" menu, or type "Command-B". You can also drag and drop from other applications to add bookmarks. Blue Crab will insert a new entry and select it. Click on the URL to begin editing it. Change the value to "www.some-host.com". When you double click the bookmark a Grabber window will open. The starting URL of the grabber will be set to the contents of the bookmark, which in this case would be "www.some-host.com".

In the next section we will discuss "configurations", showing how to create and edit them. For now just note that a configuration can be associated with a URL by double clicking the "preferences icon" in the "Configuration" column of the URL.

5) Create a configuration (top)

A configuration is a collection of settings that affect various attributes of a crawl. To create a configuration select "Configurations..." from the File menu, or press "Command-K". The "Configurations" window appears:

Click "New..." to create a configuration. Enter the name "Test" and click okay. Then click "Edit...". The "Configuration editor" window is opened:

Click on the "+Filters" tab, and then select "Save filter" from the popup menu. In the "Suffix (Extension) is" text field enter "jpeg jpg gif", without the quotes, adding a space between the words. Now click the windows red close button and save the changes when prompted.

In the Bookmarks window double click the "preferences icon" in the "Configuration" column of the bookmark entry for the URL "www.some-host.com". Select the "Test" configuration you just edited. This configuration will only save files whose suffix is "jpeg" or "jpg" or "gif".

Now double click the URL "www.some-host.com" to launch the Grabber again for this web site Click the "Start" button of the Grabber window in the lower right hand corner to start the crawl. This time Blue Crab will use the "Test" configuration file to control the crawl. In particular in this case Blue Crab will only save graphics files whose extensions are "gif", "jpg" or "jpeg" to your hard drive.

You can specify the exact location you would like to have the downloaded files saved. On the "General" panel there is a sub panel named "Grabbed Files Folder Location" which you can use to change the location for this configuration.

 

6) Search the downloaded files (top)

You can use the Search window to perform a content search of any local collection of files, not just the files you download using Blue Crab. One of the objectives of Blue Crab was to provide a means of searching the contents of a web site The Search tool built into Blue Crab was developed to help meet that objective.

Select "Search..." from the File menu to open up the Search window. Use the "Choose folder to search..." button to navigate to the folder of files you want to search. Enter the search terms in the text field named "Search terms", and click "Search".

Blue Crab presents the results of the search in a hierarchical manner based on their location with the folder's hierarchy. You can double click on a found file to open it up in the program which can edit files of that type.

8) Improving speed (top)

A web site can be very large, consisting of thousands of pages and resources. Obviously the larger a site is the longer it will take to download. Here are some suggestions for improving the speed of Blue Crab:

1) Blue Crab provides filters that enable you to download parts of a web site For example, part of a web site may be contained in a directory named "Galleries". You can restrict Blue Crab to only download files from that directory. You can do that by creating a pathname filter that tells Blue Crab to only download resources that have the word "Galleries" in the pathname. Blue Crab supports various other filters for the same purpose. Refer to the online documentation for more information.

2) Blue Crab can grab files using two different methods. The first method is the standard Grabber which is used when you double click a bookmark or select "Grab..." from the File menu. The second method is available using the "Grab quickly..." menu item in the File menu.

This method of grabbing is about 20% faster because it

a) Does not provide detailed progress information.
b) Uses a faster mechanism to download files (which is possible because of a) )

3) Turn OFF these options on the General configuration panel (default values shown in parentheses):

a) Save offsite URL's (off, by default)
b) Save errors (off, by default)
c) Remove JavaScript when saving files (off, by default)
d) Rename dynamic URL's. (on, by default)
e) Relink URL's (on, by default)

d) and e) can be the most time consuming, especially for web sites that contain many URL's that contain search and/or path arguments, or URL's that specify directories not not files.

Note that d) and e) are on by default. This aids in offline navigability of the downloaded files. But if you don't need to navigate the files offline (say because you just want to be able to search content) then you should turn these off.

4) Yield Rate

Finally, also on the General configuration panel is a number called "Yield Rate". This is the rate at which Blue Crab will yield time to other processes. The smaller the value the more Blue Crab yields. Thus the larger the value the faster Blue Crab becomes.

A recommended value is the default value of 10; but feel free to raise the value. Note that if you put the value to high the user interface will become very sluggish during an intensive crawl.

Also bear in mind that it is kind to be easy on the server you are grabbing, and to not grab URL's too fast.


* The display of some images or other user interface items may be different from the current version of the program due to intervening software updates.