2024-04-28

A Prototype Document of a LLM Running on an AR Device

Hi there! I’d love to hear your thoughts. Feel free to leave a comment, even if it’s just a simple ‘Hi’.

Meta Smart Glasses do not have screens that makes the voice commanding the only option for the users. For me, typing and reading is a way more convenient way than speaking and listening.

Having an AI assistant on a VR or an MR device is clearly not a wise choice either; not only due to the neck killing weight that most people cannot hold for more than an hour, but also the light speed like battery drainning ratio that saves the users’ necks being broke.

The AR glasses became the best choice among all technological devices, which obviously cannot be a smart phone that staring at one too long will only causes hemiplegia.

More importantly, I got a pair of AR glasses.

And here is the plan that I got:

I will create an app that calls API from the existing LLMs, which can be a ChatGPT or a home server running llama, and the best part is that the chatting UI will not disappear from your sight unless you take off the glasses.

There will be a gigantic button on the middle of the phone screen that would start the users chatting without moving the eye sight from the AR glasses. Once the button is pressed, the keyboard pops up. Since now days everyone is able to type on the smartphone without looking at it, which also helps users keep their neck releases pressure. No one still needs to look at the phone keyboard to type, right?

Then the AI assistant will hit the answer right in front of the user, elegant. To satisfy the other group of users who prefer speaking and listening over typing and reading, there will be another button that would triger this specific need. The text of the answer will also be typed on the screen, to make sure people wont miss out certain extremely important information while the AI shouts out an ancient people don’t familar with.

Furthermore, a picture input with a phone camera will also be supported. People use AR glasses only because they do not want to be segragated from the reality, and the camera is the only way the device can communicate with the information from the reality that carried by photons. Which means, there will be another button to activate the camera.

So,

There will be two parts of the app. One is the information gaining interface, which would be the screen on the AR glasses. The other one is the input interface, which would be the phone. The AR glasses will show the info below:

Typing process animation with text.
Final text typed in.
Result text.
Listening animation.
Camera capture in realtime.
Final camera capture.
A cartoon figure follows you all the time like a Microsoft Office Clippy.

The phone will do:

A button activate the keyboard.
A button activate the microphone.
A button activate the camera.
A button calibrate the axis.
A button exits.

From now on, I will start working on this project. Hopefully I can demostrate it to the public in my lifetime if my procastination does not drag my feet. One mothed to prevent it happens is that I write blogs more frequently, and that is what I am going to do.

If you have any comments please share it in the issues