Quickly and easily parse any document with just a few simple steps! With our unique API, you can build your own custom document rules in seconds to save valuable time and resources.
What is Custom Document Parser?
A custom document parser API is a tool that can extract specific information from unstructured text-based documents, such as PDFs or web pages to be further analyzed and manipulated.
With Custom Document Parsing, you can easily input a query to search for the exact data you need, and the system uses advanced optical character recognition (OCR) technology to scan the document and advanced natural language processing (NLP) models extract the relevant information automatically.
Additionally, Custom Document Parsing is capable of answering yes/no questions, which can help in document classification and organization. This innovative technology streamlines document processing, allowing businesses to focus on more important tasks.
This can be particularly useful to extract data from a large number of documents quickly and accurately, such as invoices or legal documents. By using a custom document parser API, businesses can automate the process of data extraction, saving time and increasing efficiency.
Access many Doc Parsers with one API
Our standardized API allows you to use different providers on Eden AI to easily integrate OCR APIs into your system and offer your users a convenient way to automatically parse any document.
AWS Textract’s Query functionality — Available on Eden AI
In addition to its powerful data extraction capabilities, AWS Textract offers a Query functionality that allows users to search for specific information within a document. With this feature, you can input a query and AWS Textract will scan the document to find and extract the relevant information. This makes it a highly efficient tool for tasks such as data analysis and document organization.
AWS Textract’s Query functionality also offers a high level of accuracy, with the ability to search for information in different document layouts and structures, including tables and forms. By using AWS Textract’s Query functionality, businesses can save valuable time and resources by automating the search for information in large volumes of documents.
Google Cloud — Available soon
Google Cloud Custom Document Parsing API allows users to define custom document layouts and train the system to recognize specific data points such as names, dates, and addresses. It supports a variety of file formats including PDF, PNG, and TIFF. The API can also be integrated with other Google Cloud services such as Cloud Storage, Cloud Functions, and Pub/Sub. It offers features like document classification, entity extraction, and natural language processing to make document processing more efficient and accurate. With Google Cloud Custom Document Parsing API, businesses can automate their document processing workflows and improve operational efficiency.
Microsoft Azure — Available soon
With Microsoft Azure Custom Document Parsing, users can define custom document layouts and train the system to recognize specific data points such as names, dates, and addresses. The API uses machine learning to improve accuracy over time, and it can be integrated with other Azure services like Cognitive Search and Azure Functions. Custom Document Parsing API supports a variety of file formats including PDF, JPG, PNG, and TIFF, making it a versatile solution for document processing needs.
Benefits of using a Custom Doc Extractor API
Using a custom document parser API offers a wide range of benefits for businesses, including increased efficiency, accuracy, and productivity. Here are some of the key advantages of using a custom document parser API:
- Automated data extraction: Custom document parser APIs can quickly and accurately extract data from large volumes of unstructured documents, saving time and reducing the risk of errors associated with manual data entry.
- Improved accuracy: With advanced machine learning algorithms and optical character recognition technology, custom document parser APIs can accurately identify and extract specific information from even the most complex documents.
- Enhanced productivity: By automating the process of data extraction, businesses can increase productivity and free up valuable resources to focus on other important tasks.
- Better data analysis: Custom document parser APIs can help businesses to organize and analyze large volumes of data more efficiently, allowing for better insights and decision-making.
- Reduced costs: By eliminating the need for manual data entry, businesses can reduce costs associated with labor and errors, leading to significant cost savings over time.
What are the uses of Custom Doc Parsing APIs?
Custom document parser APIs have a wide range of use cases across many different industries. Here are some examples of how custom document parser APIs can be used:
1. Customer service
Custom document parser APIs can be used to extract customer information, such as contact details and order history, and answer yes or no questions to classify customer inquiries, improving the efficiency of customer service operations.
For example, Customer service representatives can use custom document parsing APIs to quickly analyze and extract key information from customer inquiries, such as account numbers, order numbers, and product names. This can help to streamline the inquiry process and ensure that all relevant information is considered.
2. Legal discovery
Custom document parser APIs can be used to search through large volumes of legal documents to find specific information, such as relevant case law or precedent.
Lawyers can use custom document parsing APIs to analyze contracts and other legal agreements to identify key terms and clauses. This can help to identify potential risks and opportunities in a legal agreement and improve negotiation outcomes.
3. Real estate
Custom document parser APIs can be used to extract property information from real estate documents, such as zoning and tax information, and answer yes or no questions to classify the type of property.
Real estate agents can use custom document parsing APIs to extract key information from property listings, such as property address, price, square footage, and number of bedrooms and bathrooms. This can help agents to more easily search and compare properties and provide more accurate and complete information to clients.
4. Fraud detection
Custom document parser APIs with query and classification capabilities can be used to search for patterns and anomalies in financial documents, helping to identify potential fraud.
For instance, a custom document parsing API can be used to verify the identity of individuals by analyzing identification documents such as driver’s licenses, passports, and social security cards. By comparing the information on these documents to other records, such as credit reports or employment records, the API can identify potential cases of identity theft.
5. Healthcare
Custom document parser APIs can extract important information from medical records, such as patient information, diagnoses, and treatments, improving the efficiency and accuracy of healthcare providers.
For example, a custom document parsing API can be trained to recognize specific patterns or keywords in medical records that are associated with particular diagnoses or conditions. This can help healthcare professionals to quickly identify potential diagnoses and rule out others, without having to manually review every piece of information in a patient’s medical record
How to use Custom Document Parsing with the Eden AI API?
To start parsing your documents you’ll need to create an account on Eden AI for free. Then, you’ll be able to get your API key directly from the homepage with free credits offered by Eden AI.
Best Practices for Custom Document Parsing on Eden AI
To optimize input documents for better results with Eden AI, consider the following:
General best practices
- Ensure that the document text is in a language supported by the chosen engine. For instance, Amazon Textract supports English, Spanish, German, Italian, French, and Portuguese.
- Provide a high-quality image, ideally at least 150 DPI.
- If the document is already in a supported file format (PDF, TIFF, JPEG, PNG), don’t convert or downsample it before uploading.
- For text extraction from tables, ensure tables are visually separated from surrounding elements, and text within the table is upright.
Best practices for queries
- When asking a question, use natural language and start with “What is,” “Where is,” or “Who is,” unless you’re extracting standard key-value pairs (in which case you can pass the key name as a query.)
- Avoid ill-formed or grammatically incorrect questions and be specific as possible.
- Use words from the document to construct the query.
- Construct a query that contains words from both the row header and column header.
- Ask questions that respond with answers fewer than 100 words to avoid response latency and timeouts.
- When the document contains multiple sections, ask specific questions for each section.
- When the document has multiple date-related fields, be specific in the query language.
- Give location hints to improve accuracy of results if you know the layout of the document beforehand.
NOTE: When working with queries for multipage documents, you can use the Page parameter to specify which pages to look for the query answer on.
How Eden AI can help you?
Eden AI is the future of AI usage in companies: our app allows you to call multiple AI APIs.
- Centralized and fully monitored billing on Eden AI for all OCR APIs
- Unified API for all providers: simple and standard to use, quick switch between providers, access to the specific features of each provider
- Standardized response format: the JSON output format is the same for all suppliers thanks to Eden AI’s standardization work. The response elements are also standardized thanks to Eden AI’s powerful matching algorithms.
- The best Artificial Intelligence APIs in the market are available: big cloud providers (Google, AWS, Microsoft, and more specialized engines)
- Data protection: Eden AI will not store or use any data. Possibility to filter to use only GDPR engines.