- Jenn's wiki page analyzing the extraction of text from pdf files: https://wikis-mit-edu.ezproxy.canberra.edu.au/confluence/pages/viewpage.action?pageId=59034461
- Question to OCA about level and type of compression and dpi of their pdfs (per request from Jenn Morris):
Hello Beverly, I am not well versed in the answer to that question but here is the response I received from one of our tech people: PDF compression is (currently) done by proprietary software from LuraTech. It uses the "mixed raster content" technique. We use whatever dpi was set during the scanning, so the page-size info in the PDF should match the original images. This is all going to change, though, at the end of the year, when we drop our LuraTech licenses for OCR and PDF compression; at that point, PDFs will be generated by Abbyy FineReader, when it does the OCR. I don't know anything about what kind of compression Abbyy uses. Please let me know if this does or does not answer your question! Thanks,
...