The Internet Archive is a wonderful resource for old books, many of which are available as PDF files. One minor quibble I have with this fabulous library is that a lot of the material was scanned in color, resulting in pages with a nasty yellow tint (reflecting their age). However, I'm mostly interested in textbooks, that were usually published in black and white. In those cases, a color scan is unnecessary, and actually something of a drawback since a PDF reader can take noticeably longer to render such pages.
The three programs available at the bottom of this page (GetPage.java, WhitenPage.java, and WhitenDoc.java) address this issue by allowing you to adjust the brightness and contrast of a grayscale version of a PDF file. The resulting document has 'cleaner' looking pages which also load faster.
GetPage.java is used to extract a page from a PDF file as a PNG image. For example,
This image (elements20-17.png) is loaded into the ImageJ application and converted to a 8-bit grayscale using its "Image > Type > 8-bit" menu item. Then its brightness and contrast are manipulated using the dialog displayed by the "Image > Adjust > Brightness/Contrast" item. The Minimum and Maximum sliders should be adjusted until the image is suitable, and their values (displayed underneath the graph) noted down for stages 3 and 4. In the picture below, the settings are 19 and 137.
There's no need to save any changes to the image when the application is closed.
WhitenPage.java is used try out the ImageJ minimum and maximum settings on a single page from a PDF file before converting the entire document. For example,
The resulting page is saved as a PDF file (elements20-W17.pdf) which can be examined in any PDF reader.
WhitenDoc.java applies the ImageJ minimum and maximum settings to every page in a document. For example,
This step can be time consuming. For example, the conversion of a 1,000 page encyclopedia took nearly 10 minutes.
My code uses Apache PDFBox for PDF manipulation and ImageJ for image manipulation. These libraries can be downloaded from their websites, or from below.