In this tutorial, we’ll show you how to integrate Eden AI’s Invoice parser API into your data processing workflow using Dataiku to help streamline your financial operations and free up time for more important tasks.
The same process applies if you want to include other features like : Image tagging, Explicit content detection, Text analysis and many more AI APIs we offer.
Build AI on Dataiku with Eden AI
Eden AI is used by AI experts to quickly test, choose and integrate ready-to-use AI APIs. Managing multiple accounts for each app can be a tough job, but with Eden AI, you can connect and manage all your APIs on a single account.
Since some AI providers can be complex to implement, we wanted to simplify the integration to make AI APIs accessible as fast as possible.
Eden AI allows you to solve multiple AI tasks on Dataiku:
- Document parsing: extracting text from images and parsing invoice, receipt or resume to extract data
- Computer vision: detect faces, objects, logos, explicit content, etc.
- Text analysis : sentiment analysis, keyword extraction, summarization, etc.
- Machine Translation
- Speech recognition & Speech generation
Another advantage of using Eden AI on Dataiku is the flexibility it provides in terms of selecting the best AI features and providers for a particular task, or even combining multiple providers to create a solution more suited for their use case.
Let’s practice with Invoice parsing!
Just like Receipt and Resume Parsing, Invoice Parsing is a tool powered by OCR to extract and digitalize meaningful data, Computer Vision to identify structure of the document, and NLP techniques to pin down the fields. Invoice parser technology extracts key information from an invoice (.pdf, .png or .jpg format) such as the invoice ID, total amount due, invoice date, customer name, etc.
Invoice Processing implies the necessity of software and technology to automate the processing and management of invoices. It includes tasks such as capturing invoice data, validating it in comparison to purchase orders, and routing it for approval, payment and archiving. The goal of AI in invoice processing is to improve efficiency, accuracy, and speed in handling invoices without any human intervention.
How to execute invoice parsing in Dataiku?
If you’re looking for an easier and faster way to execute invoice parsing API in Dataiku, skip the tutorial and watch the video below:
The steps to extract information from invoices using Eden AI invoice parser in Dataiku are as follows:
- Get your API key and install Dataiku DSS
- Create or open a Dataiku project.
- Create a folder dataset and upload your invoices.
- Create a new recipe and choose the type of recipe you want to create.
- Code the connection to Eden Ai invoice parser API and extract basic information from the invoice.
- Import the invoices from the folder dataset.
- Call the function defined in your code and write the dataframe response into the output dataset.
1. Get started with Dataiku
To use the Eden AI API with Dataiku, you’ll need the following requirements:
- Dataiku DSS installed and configured.
- Your API key for FREE on Eden AI:
2. Create a project in Dataiku
To begin with, you’ll need to create a new Dataiku project or open an existing one:
Once your project is open, click on “New Dataset” located on the right-hand side panel, then select the “Folder” option to create a folder dataset:
3. Create your first code recipe in Dataiku
Next, you’ll need to upload your invoices in the folder as follows:
Once your invoices are imported into the folder, you’ll need to create a new recipe by clicking on the action button. Then, select the new code recipe:
You can choose the type of recipe you want to create, such as Python or Shell. You will also need to create a dataset output for the recipe and give it a name:
4. Start coding the connection to Eden AI
After creating the recipe, you can start coding the connection to Eden AI invoice parser. You’ll need to define the invoice parser endpoint that you want to connect to and call the API with your key:
def edenai_invoice(invoice, providers):
url = "https://api.edenai.run/v2/ocr/invoice_parser"
totals = []
sub_totals = []
customer_names = []
customer_addresses = []
headers = {
"authorization": "Bearer Your API KEY"
}
data={"providers": ','.join(providers), "language":"en"}
files = {"file": ("image.png", invoice, "application/octet-stream")}
try:
response = requests.post(url, data=data, files=files, headers=headers).json()
except ValueError as e:
raise ValueError(str(e))
if 'error' in response:
raise Exception(response['error']['message'])
5. Put the response in a Pandas dataframe
Once you have retrieved the data from the API, you’ll need to put the response in a Pandas dataframe. In this example, we chose to extract some basic information from the invoice, such as total, subtotal, customer name, and customer address:
for pro in providers:
total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_total')
sub_total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_subtotal')
customer_name = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_name')
customer_address = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_address')
totals.append(total)
sub_totals.append(sub_total)
customer_names.append(customer_name)
customer_addresses.append(customer_address)
df = pd.DataFrame(list(zip(totals, sub_totals, customer_names, customer_addresses)),columns =['total', 'sub_totals','customer_name','customer_address'])
df.insert(loc=0, column='providers', value=providers)
6. Import your invoices
Once you have coded your Eden AI invoice call and returned the data in a structured format (Pandas dataframe), you’ll need to import the invoices from the folder dataset.
# You can either go through all the files in the folder or in our case one file.
for img in [item["fullPath"] for item in invoices.get_path_details()["children"]]:
with invoices.get_download_stream(path=img) as stream:
data=stream.read()
7. Call the function defined in your code
Finally, you’ll need to call the function defined early on and apply it to the invoices with the providers that you want.
# Call the process function
output_df = edenai_invoice(data, ['mindee','google','microsoft','amazon','base64'])
Last but not least, don’t forget to write the dataframe response into the output dataset!
invoice_output_dataset = dataiku.Dataset("invoice_output") # import your output data
invoice_output_dataset.write_with_schema(test) # Write your results
By following these steps, you’ll be able to get the extracted information from the invoices in a structured format as follow :
Congrats 🥳 You’re all set and ready to automate your invoice processing with Dataiku!
You can access to the full code sample for the recipe here :
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import requests
# Read recipe inputs
invoices = dataiku.Folder("OqrjsSjE")
invoices_info = invoices.get_info()
# Compute recipe outputs
# ==============================================================================
# AUXILIARY FUNCTIONS
# ==============================================================================
def edenai_invoice(invoice, providers):
url = "https://api.edenai.run/v2/ocr/invoice_parser"
totals = []
sub_totals = []
customer_names = []
customer_addresses = []
headers = {
"authorization": "Bearer Your API KEY"
}
data={"providers": ','.join(providers), "language":"en"}
files = {"file": ("image.png", invoice, "application/octet-stream")}
try:
response = requests.post(url, data=data, files=files, headers=headers).json()
except ValueError as e:
raise ValueError(str(e))
if 'error' in response:
raise Exception(response['error']['message'])
for pro in providers:
total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_total')
sub_total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_subtotal')
customer_name = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_name')
customer_address = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_address')
totals.append(total)
sub_totals.append(sub_total)
customer_names.append(customer_name)
customer_addresses.append(customer_address)
df = pd.DataFrame(list(zip(totals, sub_totals, customer_names, customer_addresses)),columns =['total', 'sub_totals','customer_name','customer_address'])
df.insert(loc=0, column='providers', value=providers)
return df
# You can either go through all the files in the folder or in our case one file.
for img in [item["fullPath"] for item in invoices.get_path_details()["children"]]:
with invoices.get_download_stream(path=img) as stream:
data=stream.read()
# Call the process function
output_df = edenai_invoice(data, ['mindee','google','microsoft','amazon','base64'])
invoice_output_dataset = dataiku.Dataset("invoice_output") # import your output data
invoice_output_dataset.write_with_schema(test) # Write your results
If you’re interesting in more low-code tools, have a look at our step-by-step tutorials on how to bring AI to your application with Power Apps, Google App Script, Retool, Make, IFTTT, n8n, Bubble, and Zapier.