PaperJet uses LLMs in the extraction process. The size and capabilities of the LLM can vary greatly. If you want to get good results, especially with smaller models, you must limit the amount of “work” that the LLM must perform. This is a set of guidelines to help you get good results with your documents. We’ve found this to work well for both small inputs (images) as well as very large documents (500+ pages)Documentation Index
Fetch the complete documentation index at: https://docs.getpaperjet.com/llms.txt
Use this file to discover all available pages before exploring further.
Field names are important
Field names are used to match the data from the source document. This means that using descriptive, self-explaining field names is crucial for good results. You don’t have to avoid spaces in field names, but it is recommended. Don’t try to be clever.Good example: invoice_number
Bad example: the text from the 3rd row on the left side
Use descriptions to clarify intent
Descriptions act as an additional layer of instructions to help match data from the source document. You can add a description to any object, field or column to help the engine with extracting the correct data.Good example: Full name of the individual
Field name: “Tax ID”
Description: “Can also appear as VAT ID or EIN number”
Description: “Can also appear as VAT ID or EIN number”
