Open Source AI

Open Source Document AI

In this presentation, we introduce two cutting-edge developments in the realm of AI-powered document processing and multimodal understanding. First, we present Docling, a user-friendly, MIT-licensed open-source package for PDF document conversion. Powered by advanced AI models like DocLayNet for layout analysis and TableFormer for table structure recognition, Docling efficiently operates on commodity hardware with minimal resource requirements. Its extensible code interface makes it adaptable for future AI model integration and custom features.

Next, we explore the innovative Large Language and Vision Assistant (LLaVA), a breakthrough AI model that seamlessly integrates visual and linguistic understanding. LLaVA represents a significant leap forward in AI, with potential applications spanning from intelligent document understanding to complex real-world problem-solving, showcasing the future of AI-driven technology.

This talk will delve into the technical foundations of these systems, their practical applications, and their future potential in the AI landscape.

Contact the speaker: Peter Staar TAA@zurich.ibm.com