Ad

Our DNA is written in Swift
Jump

My Hope in Apple’s “AI Sauce”

sauce from a small pitcher over the apple. The source sparkles with zeros and ones and stars and is a metaphor for AI.

Since publishing My AI Company Vision, I’ve been deeply immersed in developing a framework aimed at automating various aspects of development. This journey has led me to explore LLM-based AI technologies extensively. Along the way, I’ve kept a close watch on Apple’s efforts to enhance their OS-level AI capabilities to stay competitive with other tech giants. With WWDC 2024 on the horizon, I am eagerly anticipating Apple’s announcements, confident they will address many current shortcomings in AI development.

In my daily work, I see the limitations of LLMs firsthand. They are getting better at understanding human language and visual input, but they still hallucinate when they lack sufficient input. In enterprise settings, companies like Microsoft use Retrieval-Augmented Generation (RAG) to provide relevant document snippets alongside user queries, grounding the LLM’s responses in the company’s data​​. This approach works well for large corporations but is challenging to implement for individual users.

I’ve encountered several interesting RAG projects that utilize mdfind on macOS to perform Spotlight searches for documents. These projects align search queries with suitable terms and extract relevant passages to enrich the LLM’s context. However, there are challenges: the disconnect between query intent and search terms, and the inaccessibility of Notes via mdfind. If Apple could enable on-device Chat-LLM to use Notes as a knowledge base, with necessary privacy approvals, it would be a game-changer.

On-Device Built-In Vector Database

SwiftData has greatly simplified data persistence on top of CoreData, but we need efficient local vector searches. Although NLContextualEmbedding allows for sentence embeddings and similarity calculations, current solutions like linear searches are not scalable. Apple could enhance on-device embedding models to support multi-language queries and develop efficient vector search mechanisms integrated into SwiftData​.

I’ve experimented with multiple embedding vectors aside from the Apple-provided ones: Ollama, LM Studio, and also from OpenAI. Apple’s offering is supposedly multi-language, using the same model for both English and German text. However, I found its performance lacking compared to other embedding models, especially when my source text was in German, but my search query was in English.

My prototype utilizes a large array of vectors, performing cosine similarity searches for normalized vectors. While this approach works well and is hardware-accelerated, I am concerned about its scalability. Linear searches are not efficient for large datasets, and actual vector databases employ techniques like partitioning the vector space to maintain search efficiency. Apple has the capability to provide such advanced vector search extensions within SwiftData, allowing us to avoid third-party solutions.

Local LLM Chat and Code Generation

In my daily work, I heavily rely on AI tools like ChatGPT for code generation and problem-solving. However, there’s a significant disconnect: these tools are not integrated with my local development environment. To use them effectively, I often have to copy large portions of code and context into the chat, which is cumbersome and inefficient. Moreover, there are valid concerns about data privacy and security when using cloud-based AI tools, as confidential information can be at risk.

I envision a more seamless and secure solution: a local LLM that is integrated directly within Xcode. This would allow for real-time code generation and assistance without needing to expose any sensitive information to third-party services. Apple has the capability to create such a model, leveraging their existing hardware-accelerated ML capabilities.

Furthermore, I frequently use Apple Notes as my knowledge base, but the current setup doesn’t allow AI tools to access these notes directly. Not only Notes, but also all my other local files, including PDFs, should be RAG-searchable. This would greatly enhance productivity and ensure that all information remains secure and local.

To achieve this, Apple should develop a System Vector Database that indexes all local documents as part of Spotlight. This database would enable Spotlight to perform not only keyword searches but also semantic searches, making it a powerful tool for retrieval-augmented generation (RAG) tasks. Ideally, Apple would provide a RAG API, allowing developers to build applications that can leverage this extensive and secure indexing capability.

This integration would allow me to have a code-chat right within Xcode, utilizing a local LLM, and seamlessly access all my local files, ensuring a smooth and secure workflow​.

Large Action Models (LAMs) and Automation

The idea of Large Action Models (LAMs) emerged with the introduction of Rabbit, the AI device that promised to perform tasks on your computer based solely on voice commands. While the future of dedicated AI devices remains uncertain, the concept of having a voice assistant take the reins is very appealing. Imagine wanting to accomplish a specific task in Numbers; you could simply instruct your Siri-Chat to handle it for you, much like Microsoft’s Copilot in Microsoft Office​.

Apple has several technologies that could enable it to leapfrog competitors in this area. Existing systems like Shortcuts, user activities, and Voice-Over already allow for a degree of programmatic control and interaction. By combining these with advanced AI, Apple could create a sophisticated action model that understands the screen context and uses enhanced Shortcuts or Accessibility controls to navigate through apps seamlessly.

This essentially promises 100% voice control. You can type if you want (or need to, so as not to disturb your coworkers), or you can simply say what you want to happen, and your local agent will execute it for you. This level of integration would significantly enhance productivity, providing a flexible and intuitive way to interact with your devices without compromising on privacy or security.

The potential of such a feature is vast. It could transform how we interact with our devices, making complex tasks simpler and more intuitive. This would be a major step forward in integrating AI deeply into the Apple ecosystem, providing users with powerful new tools to enhance their productivity and streamline their workflows.

Conclusion

Contrary to what many pundits say, Apple isn’t out of the AI game. They have been carefully laying the groundwork, preparing hardware and software to be the foundation for on-device, privacy-preserving AI. As someone deeply involved in developing my own agent framework, I am very much looking forward to Apple’s continued journey. The potential AI advancements from Apple could significantly enhance my day-to-day work as a Swift developer and provide powerful new tools for the developer community.


Categories: Apple

Leave a Comment