Read Business Card with ChatGPT

Read Business Card with ChatGPT

ChatGPT API as assistant

·

4 min read

Background

I had hundreds of business cards collected from entrepreneurship times around 2014 to 2019. The state of the art then was to use an OCR software to turn the cards into structured format. Example softwares are like CamCard, Evernote, etc. I used Evernote to store majority of the cards as it was also my notes taking platform. I decided to migrate from Evernote from Obsidian last year and was in the process of cherry picking useful pieces (most notes are uselesss; more on this topic later) batch by batch.

Most notes are usable after Export Evernote to Markdown. However, when it comes to the collection of business cards, the formatting is very garbled. Following is an example.

Example of markdown business card from evernote export.png

While it is easy for human to hand pick the data into a spreadsheet, the job is tedious. I used to post a message in my friend circle asking if there is any service to turn those namecards into structured format. The common quotation was about US$0.1 for one card.

ChatGPT API in 10 minutes

It turns out ChatGPT API is very slick. I literally get the POC done in 10 minutes.

The first thing is to setup the API key.

import openai
openai.api_key = 'The Key from Open AI'

A simple zero-shot learning example with system prompt can be implemented as:

def talk(system_prompt, text):
    r = openai.ChatCompletion.create(
      model="gpt-3.5-turbo",
      messages=[
        {"role": "user", "content": system_prompt},
        {"role": "user", "content": text}
      ]
    )
    return r['choices'][0]['message']['content']

Following test shows that we can use system prompt to set a style for the bot (plain v.s. comedy):

talk('You are a helpful assistant.', 'Can you tell me the date Google was founded?')

'Google was founded on September 4, 1998.'


talk('You are a comedy actor that always talks in a humorous tone.', 'Can you tell me the date Google was founded?')

"Well, I could tell you the date Google was founded, but where's the fun in that? Let's put a comedic spin on it! On September 4, 1998, two Stanford University PhD students, Larry Page and Sergey Brin, decided to start a little search engine company. Who knew that years later, we would all be googling everything from the meaning of life to how to boil an egg? So, there you have it folks, the birth of Google in all its nerdy glory."

Python's functools now comes handy to turn the talk function into a business card parser by partial evaluation:

import functools

read_business_card = functools.partial(talk, 
  'You are a data input expert.'
  'The user gives a piece of garbled text from an OCR software by scanning a business card.'
  'Output in JSON format of those fields if you can find them: Name, Telephone, Phone, Mobile, Fax, Email, Address, Company, Evernote Snapshot Path'
)

Next we send the garbled raw text in the front into the read_business_card() function and gets the following nice JSON output:

chatgpt api example output of business card to json.png

Note that the error handling with ChatGPT is a bit different as the response is also in natural language. Following shows a ilformed input (raw) and the response from ChatGPT (parsed).

a broken business card image.png

Stitch it

Now is the famliar part to every programmer: stitching codes. This part takes me about 1 hour to write/ troubleshoot/ iterate/ add error handling/ finish parsing 100 business cards. I will ignore the details but summarize a few considerations when developing with an AI assistant.

  • Handle openai.error.RateLimitError
  • Handle openai.error.InvalidRequestError. A common issue is exceeding input token limit of 4096. One token is roughly 3/4 word.
  • Suppose the input and ChatGPT API both are correct, the output could still give logical errors in natural language style. A more generic error handling is needed.
  • Consider to log the request and response for iteration/ troubleshooting purposes. Especially usage is useful to self count token consumption (thus dollars) and id is useful to talk with customer support.
  • Check more before scale even if there is no technical error. In my example, the output JSON sometimes becomes too clever, e.g. Address itself becomes a dictionary that includes City and Address. Also Phone itself becomes a dictionary that includes Mobile and Telephone. Those situations may not surface when we initially develop the prompt, so we may need to go back to iterate the sytem prompt to better control the output.
  • Use it strategically. I find the sweet spot is a task of size ~100. If less than that, I may prefer to do it manually. If much larger than that, it is worth to develop more specific programs. In ChatGPT for Data Wrangling Works, I asked ChatGPT to parse an XML. When the result is satisfactory, I asked it to turn the logics into a piece of Python code.

Did you find this article valuable?

Support HU, Pili by becoming a sponsor. Any amount is appreciated!