THIS IS A TEST SITE!!! For the real site, go to http://www.pgdp.net.     

Content Providing FAQ

From DPTWiki
DP Official Documentation - Content Providing and Content Management
Content of this page is being reviewed. If you have questions, please contact one of the page editors (shown in the footer at the bottom of this page).



A Content Provider does not have to be a registered Distributed Proofreaders (DP) volunteer on the DP website. However, it might be a little difficult to get in contact with a member to be a Project Manager for the project if you are not. If you wish to provide scans, you can use the OCR Pool.


Selecting a Project

Which book you pick is up to you. The only requirement is that it be copyright clearable (discussion below). It is best if it is something in which you have interest. Chances are that you will find others who will work on it as well.

Finding a book.

There are several ways to find a project to CP (Content Provide). You can search the library, buy from a local bookstore, raid your own bookshelves, ask a friend, pull them out of the trash, or find projects that are already scanned at some of the many on-line sources for scans. Be sure to pick a project on which you will enjoy working, because you will be shepherding this project through up to 5 Rounds (if you choose to be the project manager), and until the project is posted. This may take several years from start to finish, but much of this time can be spent waiting for your project to be released in a round.

If you want the project to go through the system quickly, pick a popular genre; watch which release queues are moving fast, as this changes regularly.

If you choose to get a book from one of the on-line book archives, please follow the individual site guidelines regarding acceptable use and protocol. We don't want to be bad neighbors. It is considered good form to credit the source of the scan when the text is submitted to Project Gutenberg, so make sure the PM knows its source.

Difficulty.

Some things can make the project harder than others. The amount of time you wish to spend on this should be considered. Check the inner margin (gutter) of the book. The wider this is, the easier it will be to scan, and the fewer extra measures you'll need to take in OCR and answering forum questions. This does not mean that you should not work with books that have a narrow gutter, just that they will be much harder. Projects with a lot of illustrations are also harder and more time-consuming. This will be discussed more under Scan/Download images and Prepare the Illustrations.

Copyrights and clearances.

Do a preliminary check to see if it is clearable. Usually that means it was published before 1923. See clearances, below, for more information.

Does the project already exist?

Make sure the book is not already available by searching Project Gutenberg. You will also want to make sure nobody else is working on it by checking David Price's in-progress list split by author last name. This list is ordered by author's name, but you may want to search the entire page for the title as well. At the end of each line there is a status tag. Most likely this will be either "copyright cleared" with a date, or "released" with a number. An In-Progress Search by title at DP is also available.

"Copyright cleared" means someone has requested and received copyright clearance, but has not yet finished the project. If this clearance is several years old, it has probably (though not certainly) been abandoned. After you get clearance, you will get an e-mail along with the other clearance holder, letting each of you know that the other is working on it. You can then communicate with them to find out if they are working on it, or if you are free to begin processing it.

Some projects, most notably periodicals and multi-volume editions, will have "blanket clearances." This does not mean that the person who requested the original clearance has all of the volumes ready to scan! Most of these clearances are associated with DP in some way, so if an Überproject doesn't exist for the periodical/set you have (where the PM will often list the volumes they have available), you can post in the Content Provider's forum to find out who's working on what.

If the project says "released," it has been posted to Project Gutenberg with the accompanying ebook number. It is a good idea to search both these lists by author and separately by title.

Running a project that is already in PG.

Even if a book is already in PG, it may be worth processing again. This will require some legwork to determine, so be sure you feel strongly about the book before pursuing this. PG welcomes different editions, illustrated versions, different translations, etc. In addition, many of the older ebooks have more errors than we would find acceptable today and reprocessing them through DP may be the best way to change that. If the book has a PG number under 10,000 then it probably doesn't have an illustrated version and might be a good candidate for an upgrade.

Below is a list of reasons you might provide an existing PG project through DP. You will need a copyright clearance for each of these cases. For reworks, the PM should put a note into the Project Comments section explaining why the project is being redone. If you are a CP only, then you should include details of why the project is being redone in a text file attached in the project zip file.

Basic upgrade
You have the same text version, there are no illustrations, and the PG version is riddled with errors: Be sure to let PG know, when you upload the final version, that this is a revision of an existing ebook, based on a paper copy in hand. If there are only a few problems, submit them via the PG errata process.
Illustration Upgrade
You have the same text version, but there are illustrations and they are not present in the PG version: Same as the basic upgrade except that you'll be submitting an illustrated html version.
Different Translation
PG will treat this as a completely different ebook and welcomes them. There are already at least half a dozen translations of the Iliad, for example, and more are always welcome.
Different Edition
Some books were published in very different editions. Where this is the case, PG welcomes them as separate ebooks. You will have to document the fact that your edition has significant differences from the version that is already in PG.


Get a clearance

You have obtained a book, and have decided that it is both clearable and not already in PG or in progress, or you have a book you think is clearable and need to find out for sure. In both cases, it is time to ask the experts.

You will need to have scans of the Title Page and Verso (the back of the title page), also known as the TP&V. You may need scans of other material as well, such as an inscription on the fly-leaf, in order to establish date.

Conflicts.

The clearance team does not check for duplicate or pre-existing clearances.

All checking is done after a clearance has been granted. Conversely, having a clearance does not mean you "own" that title for some period of time.

Copyright Clearance.

Copyright clearance is a process by which Project Gutenberg determines if a book is in the public domain according to the copyright laws of the United States. Project Gutenberg maintains a set of Rules that are used to determine if a book is clearable. This DP site operates under U.S. law; if you cannot obtain a clearance, your book cannot be processed through this site.

Please read PG's copyright clearance rules for details.

If your book is not clearable under PG's rules, but the author and everyone else associated with the book (i.e., the illustrator, editor, translator) has been deceased for at least 50 years, you may wish to send the book to our sister site, DP-EU. You may also wish to send it to DP-EU if it requires unicode text (e.g., Hebrew, Russian, Chinese).

Create a PGLAF Account.

The next step is to set up an account at PGLAF (the branch of PG that handles clearances). If you have Direct Uploading or PPV access, you already have a PGLAF account.

To create a new account, browse to PGLAF and read the welcome page. This contains a lot of useful information on the clearance process, and a number of useful links. Next, Click the New username link, and fill out the form. Be sure that the email address you enter is valid and is checked regularly; this is the address where posted notices and clearance notifications go, and also where you will be contacted if a conflict occurs.

Submit a Clearance Request.

After completing the registration process, log in, and select "Submit a New Clearance Request". A large form will appear; most of the information required should be available directly from the title page of your book. If not, you will have to do some research. Document any findings in the field provided; be sure to list the source of any information not found on your book's title page and verso page (the page immediately following the title page). If a date is listed twice in different contexts (separate publication date and copyright date, for example) enter it twice. Remember when attaching images that they should be small in size (100k is a reasonable maximum; most should be smaller), but the smallest text should still be legible. Multi-volume works can be cleared in a single clearance request if the dates are the same, or if you provide the earliest and latest title and verso.

Types of Clearances.

There are several types of clearances. The most common is rule 1, but some others are used on occasion. Project Gutenberg only clears based on the United States Copyright Laws. However, if you would like a detailed discussion of copyrights in other countries, visit The Online Books Page.

Normal (Rule 1) Clearance
This is the most common way in which books are cleared. Basically this means that the book has left copyright ("risen to the public domain") through normal expiration. No special rules need to be applied. If the work was first published prior to 1923 then it falls in this category. If the work does not have a copyright date printed on it, you will need to do some research. You will need to find a library catalog (or 2) that lists the date. The best catalogs to use are the state catalogs such as The Library of Congress Online Catalog and The British Library Public Catalogue.
Non-Renewal (Rule 6) Clearances
Rule 6 clearances (see here for a list of rules; this is the non-renewal case) require a fair amount of research, which needs to be documented in the clearance submission. Start with a keyword search at the Rutgers' search engine. You may also need to use the copyright office db for later potential renewals. Be sure to use multiple keywords from your work; there may be typos in the renewal data. If you find a match for your book, you cannot proceed any further. If it comes up clean, confirm your findings by following the official Rule 6 HOWTO.
Government Publication (Rule 8) Clearances
In the United States the government places no copyright on the works it creates. This means that they are free for us to use. Be careful to check and make sure that the work was created by the government and not just for the government. Works created for the government by a third party and published by the government do have a copyright. Works created by the U.K. or Canada Crown have a 50-year copyright.
Other Clearances
There are other copyrights that can be cleared, but the three listed here are most likely the only ones you will get clearance for. For Project Gutenberg's take, see the Copyright HOWTO.

Wait.

All that is left now is to wait for the results of your request. Basic clearances using the standard rules are usually processed quickly, anywhere from a day to a week. Rule 6 clearances, which require more research, usually take longer (and may require further research on your part before it clears). You must receive the clearance before loading the project onto DP.

You may get a response that says NOT OK. A reason for the denial of the clearance will always be given. Be sure to check that reason, since technical difficulties such as corrupted files can easily generate this response. Feel free to resubmit your clearance request after correcting whatever problem was noted.


Scan/Download images

There are two ways to get these images. You can scan them yourself, or you can find an Image Provider that has already scanned them.

Scanning.

For the text of the project, it is best to scan this within your OCR package. Many OCR packages deskew in a way that works great for text, but mangles illustrations, so do not use it for the illustrations. If you have a few illustrations it is best to make two runs with the scanner. The first pass scans every page in black and white, for the OCR package. On the second scan only the illustrations. IrfanView and xnview both have a scanning interface that is good for this. Be sure to get full-color scans of all color illustrations, and grey-scale scans of all black and white or grey-scale illustrations. Also it is nice to get a scan of the cover and spine of the book. The back is also nice if it is illustrated. If there are any advertisements in the book, please scan them as well.

When you first use your scanner, check to see if it dithers in black and white mode. Dithering is a method of simulating colors you don't actually have available by scattering dots around and fooling the eye. The first image has been dithered, and is actually somewhat easier to read, but will confuse the OCR program and inflate the file size. The second has been thresholded, and is the preferred method. If your scanner driver dithers, consider scanning in grey scale and letting your OCR engine convert it to black and white.

Dithered scan
Thresholded scan


Generally you should not despeckle the images, because this process often removes punctuation marks. If you find that despeckling improves the OCR quality, then do so, but use the non-despeckled versions for the page scans that you upload to DP.

For instructions on how to scan using Abbyy, see the Abbyy Scanning Documentation.

Scanning advice

Avoiding the most common pitfalls

Image providers

There are many online image archives that make available scans of public domain books. For a list of some of these sites see Details of Image Sources.

Please do not use scans from any archives that charge for the use of their service. These archives usually have a compilation copyright, and other restrictions on their use. Please follow the individual site guidelines regarding acceptable use and protocol. We don't want to be bad neighbors. It is considered good form to credit the source of the scan when the text is submitted to Project Gutenberg.

If you've downloaded images to process from an online source, it's important that you record the source of the scans. Filling out the "image provider" field when you create a project allows DP to coöperate with online image archives' policies. It's also nice to let Project Gutenberg know the source of the scans at clearance time, but it is not required.

DP accepts PNG and JPG images only for proofreading. If the images that you've downloaded are in a different format, you'll need to convert them as part of your preparation process.

You'll need to prepare the page scans, plus the illustration scans if there are any, prior to uploading the images to DP. See those links for recommendations. If this results in images of reduced quality, consider adding a link to the original images in the project comments, but do ensure that the images you provide for proofers are legible themselves without reference to outside sources.

Two programs to make harvesting from some sites easier have been developed by DP volunteers:

Snatch: [1] Allows you to "snatch" images from several of the on-line archives. It can be updated to snatch from others as well.

Gharvest: [2] This program is made specifically for Google Print.

Scanners.

Scanners are devices we use to create images of books. When choosing a scanner, or checking to see if a scanner is useful for DP, there are a few important factors to consider. First is form factor. The most common types of scanner are Flatbed scanners, ADF (Automatic Document Feeder) scanners, and scanners with both a flatbed and an ADF. There are also some less common types scanners discussed below. Flatbed scanners are useful for scanning books while they are still intact, while ADF scanning is faster but requires that the spine of the book be removed. Modern scanners almost always use an USB interface and have sufficient optical resolution for our needs, so we will focus on other aspects of the scanner.

If you have questions, or just want to see what others have discussed, there is a thread on scanner recommendations. Scanner Reviews

Flatbed Scanners.

Flatbed scanners have a number of advantages for providing content for DP. Most CPs start with a flatbed scanner. They are cheap and relatively common, you can scan material that is still bound, and they are moderately fast. You can also scan two pages at a time and have the OCR software or image preparation software separate them for you. Flatbed scanners have a fixed glass plate where you place the book, and an internal moving head that passes underneath the glass plate.

When choosing a flatbed scanner for providing content, there are a few key factors to consider: Scanning speed, maximum size, and type of scan head. Speed is obvious; you're going to be scanning a few hundred impressions per book, and the difference between 10 second scans and 45 second scans adds up. Maximum size affects what type of material you can scan; most books will fit entirely on a standard A4/Letter-sized scanner, but periodicals and large portfolio-sized books are much easier to scan on A3/11x17-sized scanners. The type of scan head is important because of book gutters; you want a flatbed scanner with a CCD (charge-coupled device) scanning element, not a CIS (Contact Image Scanner) scanning element. CCD scanners can focus much further above the glass plate than CIS scanners, and keep the letters in the gutter from getting too blurry.

There are also a few specialized book scanners like the Plustek Optibook that avoid gutter problems by having a very narrow margin on one side, and scanning a single page at a time. For this type of scanner it doesn't matter what type of scanning element it uses, as the page lays flat upon the glass.

ADF Scanners.

ADF (Automatic Document Feeder) scanners pass the pages of a book over a stationary scanning head. They usually have a hopper that allows you to load a number of pages and let the computer handle the scanning. This can be much faster than than a flatbed scanner.

Some important factors for selecting an ADF scanner include simplex/duplex (whether the scanner can digitize both sizes of the paper at the same time), hopper size (how many pages the scanner can hold at once), paper path size (letter/A4 scanners are much more common, but can't handle folio/quarto-sized books or periodicals), double feed and jam detection, and ease of maintenance/availability of spares (the rubber rollers and other parts tend to wear out faster on old books).

Just as important as the scanner is a reliable method of removing the book spine. The best method is to use a professional-grade paper trimmer; this will slice cleanly through the entire book and reduce the odds of the paper double feeding or jamming. You may be able to find a local print shop that will do this for a small fee or for free. You can also use a band saw or scroll saw, but these tend to leave more ragged edges that induce more double feeding and jamming.

Pen scanners/handheld scanners.

These are primarily useful for scanning texts in a reference library with strict rules about scanners and cameras. Pen scanners scan a single line of text at a time, while handheld scanners scan a larger swath as you pass them over a page. These are not recommended for general scanning use because they are much slower than other methods.

Digital cameras.

A few CPs, and many large scale scanning operations use digital cameras instead of a traditional scanner. They have the advantage of not requiring the book to be laid flat, and can be very fast. They do tend to be more expensive and much larger than traditional scanners. Some even have automatic page turners. Results can vary depending upon the quality of the cameras, placement of the cameras relative to the pages, method of holding the pages flat, lighting, and vibration. See the Internet Archives scanning robot and DIY Book Scanner for some examples.

OCR

You have the scans, now you need the text. (Alternatively, this might be a type-in project.) optical character recognition (OCR) is the process through which a program takes the image, and "reads" it, producing the text files. There are many programs that do this. Some are very good, some are adequate, and many are not good at all. Some have more functions than others, and some are fairly expensive.

Please note, since the changeover to separate proofing and formatting rounds, pre-formatting should not be added to the project. Pre-formatting gets in the way of the proofers doing their jobs.

OCR Software.

If you do not have Abbyy FineReader Pro, do not feel that you need to go out and buy the software in order to OCR. You can use any OCR program. So long as you get it into the correct format in the end, that is fine. The instructions given below are for Abbyy FineReader Pro. We will attempt to make them as general as possible, so that you can convert them to other programs, but some software will not do everything that Abbyy FineReader does.

ABBYY FineReader

The most popular program is ABBYY FineReader. It does an excellent job, and you can find an older version on eBay without breaking the pocket book. Try to stick with the Pro version. The home and sprint versions are much less expensive, and far less feature rich. Instead of getting the newest home edition, get a 1 version old pro version, you will be much happier. There is a forum thread ABBYY Finereader Tips and Tricks for help with Abbyy FineReader.

Readiris

Readiris v.11 has adequate character recognition, but does have some limitations when it comes to use for DP. It has a limit of 50 pages per batch recognition, meaning for a 300 page book, you need to run 6 separate batches of OCR, then rename the files.

more specifics to follow.

Ocrad

See Ocrad.

Tesseract

Tesseract is an open source OCR engine, see Tesseract's homepage or wikipedia for more details.

ABBYY FineReader Scanning Instructions.

Prepare the Illustrations.

ok. You have page scans, your text is ready, but you still have some illustrations to prepare. It is also a good idea to make a scan of the cover and the spine if they have any decoration on them. Some people will get them even if they have no decoration, as this gives a nice feel to the HTML version.

How to handle illustrations on a page.

Many books have illustrations within the text. We like to create HTML versions of all books with illustrations. This means the CP or PM must get these illustrations and include them in the project. It is not OK to just say "The illustrations will be provided by the PM at a later date" or "You can download the illustrations from this location" as the PM or site may not be around when the text is finished.

In order to get the illustrations, scan in full color, or greyscale as needed, in an application other than Abbyy Finereader. IrfanView, the Gimp, or your scanner's software should all provide decent image scanning. Abbyy finereader processes images in several ways that are effective on text, but unacceptable for illustrations.

Illustrations should be scanned at a sufficient resolution to capture fine detail. While it may not be needed now, it is important if the book is to be reprinted or screen technology improves. Generally speaking, 300 DPI is adequate for line art, continuous tone, and descreened images; screened images often require 600 DPI to avoid moire effects. ** Add images to illustrate various types **

Then crop around the illustration, leaving some space around the illustration in order to rotate and clean up the illustration. Do not feel that you need to provide clean rotated images in perfect, ready to post format. This can be done by the PP. If you do wish to clean them up, many PPs appreciate this, however, please leave them larger than you think the PP will need. This allows the PP to resize them to the way they like it.

How to handle plates

Plate are handled in much the same way. But please, make sure that you do keep a black and white copy of the PNG in the page images. This provides a place marker that the PP can use to put the illustration in the right place. Some PMs leave the blank page following unnumbered plates, some do not. If the page numbers include the plate and blank page, then the blank page must be included or the PP will be hunting down the missing page later.

OCR Pool.

If you don't have an OCR package at all, don't want to bother with it, or really want it done with a good OCR program, then you can use the OCR Pool. This group will take the scans you provide and produce the text for you.

Check the project.

  • Check that every page image is there, and is complete. Include all pages, including title, verso, all illustrations, and plates and all blank pages. Leading (prior to the first printed page) and trailing (after the final printed page) blank pages should be removed. If any pages are missing or damaged they should be replaced or repaired before continuing.
  • Check that every page has been OCR'd. You should have one text file for each page image file, and they should have the same base name (e.g., 001.txt and 001.png). (If you're submitting a type-in project, just create empty text files with the appropriate names.)
  • Image & text files should be named so that a simple sort of the filenames (e.g., an "order by name" listing of the files) puts them in the proper (book-binding) sequence.
    • One common convention is to simply number each page serially starting from 001 (or 0001 if there are more than 999 pages). Note that typically, this serial number will not agree with the page number printed in the book, but the difference (or 'offset') will usually be consistent over the body of the book. Check for a consistent offset, and investigate any anomalies, as they may indicate missing or duplicated pages. Changes in the offset can also occur due to unnumbered content pages (plates/appendices/introductions/whatever), which are fine. (Don't try to achieve a consistent offset at the expense of the proper sequence of pages.)
    • Another possibility is to name the image & text files according to the original printed page number. This is complicated by books with multiple page-numbering sequences (e.g., frontmatter numbered with roman numerals) and pages without an explicit or implied page number (e.g., plates). Such complications can be accommodated by judicious use of extra characters in the filename. For page files, the filename can be up to 12 characters long, so in practice the base name can have up to 8 characters. Allowed characters include digits, letters, underscore, hyphen, and dot. Just make sure that a simple sort of the filenames puts them in the proper sequence. (Don't rely on a particular collation for uppercase vs. lowercase letters. To be safe, only use one or the other.)

GuiPrep

GuiPrep is a software package created by DP's own Thundergnat. This is a great package that takes the OCR output, checks it for common OCR errors and then spits out a ready for DP version of the text file. It will also renumber the images, and run PNGCrush to make the images smaller. This is a very handy tool indeed. You can find it here. Note that you can use the post-processing tool Guiguts to import the text files into a single document for pre-processing, such as spell check, which can be used to produce the initial good words list. In Guiguts, use "File, Import Prep Text Files".

GuiPrep has a lot of options. Don't let that scare you. Most you do not need to touch, but can change if you have special texts that do not function well with the defaults. Here we will discuss only the basics. If you want more information you can read the manual at the GuiPrep site, or post in the Providing Content Forum.

Installing GuiPrep

Instructions yet to come

Using GuiPrep

This is the basics of how to use GuiPrep. Detailed instructions can be found on the GuiPrep home page. These instructions will need to be altered a little for your specific program.

Setup text files

Save two copies of the text output from your OCR program. The first should be in the "textw" directory of your project folder. Save the text with the settings: Save as text document (or UTF8 if applicable). Check Save all pages and Create a separate file for each page. Under Formats Setting, check Keep page breaks and Use a blank line as paragraph separator. It doesn't matter what the File name is set to.

The second file should be set just like the first, except it should be in the "textwo" directory, and Keep line breaks should be unchecked.

This will allow GuiPrep to merge words split across lines.

If your version of ABBYY has the Legacy Options, use with caution: If you opt for "Read as plain text formatted with spaces" you will disable guiprep's de-hyphenating routine (because each line of the plain text is treated as a paragraph). The end result is plain text with a line of text alternating with a blank line.

Run GuiPrep

Now you should have in your project folder at least 3 directories. pngs (with the png page scans. textw with the txt files that have line breaks and textwo with the txt files that have no line breaks.

Open GuiPrep, go to the change directory tab and navigate to the folder your project is in.

Go to the process text tab and make sure all the options you want to run are checked. If your project includes the long s character then you should check "Fix Olde Englifh" and an extra routine to check for f/s mistakes will be run. This should only be checked if you have a long s project. If your project is for a site using Unicode, such as DP-EU or DP-Canada, then you should uncheck the "convert to ISO 8859‑1" box. If the project is for the main site, this should be checked. At a pinch you can use guiprep reduce png size, but other programmes are more efficient -- pngcrush will only save roughly 7-9 per cent.

Click start. When GuiPrep is done it will say "Finished all selected routines."

There have been reports of unicode reappearing after using the search and replace tools; you may wish to rerun the "Convert to ISO 8859-1" step again if you run any search and replaces.

Once the text files are ready, you may want to use guiguts (File, Import Prep Text Files) to pre-process your text for instance by spell checking and developing a good words list.

FAQ, or what do I need to know?

What is the difference between a CP and a PM? And what do those abbreviations mean?

A: The CP or Content Provider supplies the scans to be processed at DP, and may also prepare the files for the proofreaders, but does not necessarily deal with the project beyond that. CPs do not have to be members of DP.

The PM or Project Manager is responsible for creating the project at DP, guiding it through the rounds, answering proofreader/formatter questions, and making decisions that will help create the most consistent output possible for the post-processor. PMs may provide their own content or acquire scans from another CP. The term PM in a different context means Private Message.

How much of my time will CPing take?

A: It depends on the book you choose to CP. If you choose a short novella with no illustrations, then it could take a couple hours to scan, OCR, check and prep your project. If, on the other hand, you are working on a thousand-plus page book on ship construction with 33 fold-out plates and a couple hundred illustrations, then it could take a year or more to finish the scanning alone.

What are the qualifications necessary to become a CP?

A: There are no qualification requirements to be a CP. You just must be able to get the images into good order and find a PM willing to work with you.

What kind of equipment do I need to CP?

A:

  • In order to CP you need a scanner that is capable of scanning the material you want to provide. Some libraries have scanners for public use if you do not have one.
  • You will also need some sort of OCR package capable of providing the OCR text needed to start from. If you do not have an OCR package, there is an OCR Pool with volunteers willing to do this for you.
  • You will also need GuiPrep installed. However if you use the OCR pool, they can run GuiPrep for you, if you ask them nicely.

Are there deadlines? Who sets the schedule? What if the schedule is not met?

A: The only deadlines and schedules are set by the CP. If as the CP you do not want to set a deadline or schedule, then don't. If you do set a deadline and it is passed, then the only one who is going to come down on you, is you. Some projects take very little time, others take a long time.

What files do I need to provide?

A: You should provide clear black and white png images of every page. These should be large enough to be read easily, but not too large to be downloaded over a dial-up modem. Usually if you can get them below 100K the latter is fine.

You will also need to provide text files containing the OCR output of each page. The png and the text file must have the same base name. For example, 005.png goes with 005.txt or the upload software won't know what to do. (Note: Guiprep has a tool to help getting the names to match so long as both sets of file are in correct alpha-numeric sort order.)

If there are any illustrations in the book, or a decorative cover, grey-scale or color images of each should be provided. It is best if these are in jpg format. Depending on the image, png format may also be used. If the illustration is black and white, these can be provided in png format. Post processors appreciate if these files are named to correspond to the correct project pages. For example, i005-1.jpg and i005-2.jpg would be illustrations from 005.png.

If my project has music in it, is there anything special I need to do?

Consult the Music Guidelines for detailed help with projects containing music.


Last edited: 2017-02-16

To comment or request edits to this page, please contact srjfoo.

Return to DP Official Documentation Menu