Background
I had hundreds of business cards collected from entrepreneurship times around 2014 to 2019. The state of the art then was to use an OCR software to turn the cards into structured format. Example softwares are like CamCard, Evernote, etc. I used Evernote to store majority of the cards as it was also my notes taking platform. I decided to migrate from Evernote from Obsidian last year and was in the process of cherry picking useful pieces (most notes are uselesss; more on this topic later) batch by batch.
Most notes are usable after Export Evernote to Markdown. However, when it comes to the collection of business cards, the formatting is very garbled. Following is an example.
While it is easy for human to hand pick the data into a spreadsheet, the job is tedious. I used to post a message in my friend circle asking if there is any service to turn those namecards into structured format. The common quotation was about US$0.1 for one card.
ChatGPT API in 10 minutes
It turns out ChatGPT API is very slick. I literally get the POC done in 10 minutes.
The first thing is to setup the API key.
import openai
openai.api_key = 'The Key from Open AI'
A simple zero-shot learning example with system prompt can be implemented as:
def talk(system_prompt, text):
r = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": system_prompt},
{"role": "user", "content": text}
]
)
return r['choices'][0]['message']['content']
Following test shows that we can use system prompt to set a style for the bot (plain v.s. comedy):
talk('You are a helpful assistant.', 'Can you tell me the date Google was founded?')
'Google was founded on September 4, 1998.'
talk('You are a comedy actor that always talks in a humorous tone.', 'Can you tell me the date Google was founded?')
"Well, I could tell you the date Google was founded, but where's the fun in that? Let's put a comedic spin on it! On September 4, 1998, two Stanford University PhD students, Larry Page and Sergey Brin, decided to start a little search engine company. Who knew that years later, we would all be googling everything from the meaning of life to how to boil an egg? So, there you have it folks, the birth of Google in all its nerdy glory."
Python's functools now comes handy to turn the talk function into a business card parser by partial evaluation:
import functools
read_business_card = functools.partial(talk,
'You are a data input expert.'
'The user gives a piece of garbled text from an OCR software by scanning a business card.'
'Output in JSON format of those fields if you can find them: Name, Telephone, Phone, Mobile, Fax, Email, Address, Company, Evernote Snapshot Path'
)
Next we send the garbled raw text in the front into the read_business_card()
function and gets the following nice JSON output:
Note that the error handling with ChatGPT is a bit different as the response is also in natural language. Following shows a ilformed input (raw
) and the response from ChatGPT (parsed
).
Stitch it
Now is the famliar part to every programmer: stitching codes. This part takes me about 1 hour to write/ troubleshoot/ iterate/ add error handling/ finish parsing 100 business cards. I will ignore the details but summarize a few considerations when developing with an AI assistant.
- Handle
openai.error.RateLimitError
- Handle
openai.error.InvalidRequestError
. A common issue is exceeding input token limit of 4096. One token is roughly 3/4 word. - Suppose the input and ChatGPT API both are correct, the output could still give logical errors in natural language style. A more generic error handling is needed.
- Consider to log the request and response for iteration/ troubleshooting purposes. Especially
usage
is useful to self counttoken
consumption (thus dollars) andid
is useful to talk with customer support. - Check more before scale even if there is no technical error. In my example, the output JSON sometimes becomes too clever, e.g.
Address
itself becomes a dictionary that includesCity
andAddress
. AlsoPhone
itself becomes a dictionary that includesMobile
andTelephone
. Those situations may not surface when we initially develop the prompt, so we may need to go back to iterate the sytem prompt to better control the output. - Use it strategically. I find the sweet spot is a task of size ~100. If less than that, I may prefer to do it manually. If much larger than that, it is worth to develop more specific programs. In ChatGPT for Data Wrangling Works, I asked ChatGPT to parse an XML. When the result is satisfactory, I asked it to turn the logics into a piece of Python code.