Recent Episodes
-
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Nov 1, 2024 – 32:59 -
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Oct 31, 2024 – 30:12 -
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Oct 30, 2024 – 39:12 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Oct 18, 2024 – 35:59 -
LightRAG: Simple and Fast Retrieval-Augmented Generation
Oct 17, 2024 – 37:42 -
Aria: An Open Multimodal Native Mixture-of-Experts Model
Oct 16, 2024 – 17:56 -
AgentKit: Structured LLM Reasoning with Dynamic Graphs
Oct 15, 2024 – 30:22 -
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
Oct 14, 2024 – 33:45 -
Diffusion Models are Evolutionary Algorithms
Oct 10, 2024 – 31:05 -
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering
Oct 9, 2024 – 39:11 -
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Oct 8, 2024 – 36:51 -
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Oct 7, 2024 – 01:20:28 -
On the Diagram of Thought
Oct 2, 2024 – 17:27 -
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Oct 1, 2024 – 46:12 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Sep 30, 2024 – 28:41 -
On the limits of agency in agent-based models
Sep 24, 2024 – 32:39 -
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
Sep 23, 2024 – 17:23 -
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Sep 22, 2024 – 29:56 -
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery
Sep 21, 2024 – 33:14 -
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Sep 20, 2024 – 29:56 -
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Sep 19, 2024 – 30:36 -
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Sep 18, 2024 – 32:15 -
GeoCalib: Learning Single-image Calibration with Geometric Optimization
Sep 17, 2024 – 19:16 -
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
Sep 13, 2024 – 01:10:54 -
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Sep 12, 2024 – 29:24 -
rerankers: A Lightweight Python Library to Unify Ranking Methods
Sep 11, 2024 – 15:39 -
Automated Design of Agentic Systems
Sep 10, 2024 – 23:55 -
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Sep 9, 2024 – 42:53 -
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Sep 5, 2024 – 35:05 -
Sapiens: Foundation for Human Vision Models
Sep 4, 2024 – 25:58 -
OctFusion: Octree-based Diffusion Models for 3D Shape Generation
Sep 3, 2024 – 33:00 -
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Sep 2, 2024 – 29:22 -
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
Aug 30, 2024 – 19:53 -
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
Aug 29, 2024 – 18:01 -
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
Aug 28, 2024 – 27:28 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Aug 23, 2024 – 47:39 -
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Aug 21, 2024 – 38:53 -
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
Aug 20, 2024 – 26:50 -
OpenResearcher: Unleashing AI for Accelerated Scientific Research
Aug 19, 2024 – 29:59 -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Aug 14, 2024 – 33:50 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Aug 13, 2024 – 41:29 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Aug 9, 2024 – 38:55 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Aug 8, 2024 – 29:11 -
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Aug 7, 2024 – 31:47 -
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Aug 5, 2024 – 26:22 -
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Jul 31, 2024 – 34:03 -
FinanceBench: A New Benchmark for Financial Question Answering
Jul 30, 2024 – 41:34 -
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Jul 29, 2024 – 30:25 -
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Jul 26, 2024 – 31:03 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Jul 25, 2024 – 34:06
Recent Reviews
-
Bland25Amazing consistencyI love what you are doing.
Similar Podcasts
Disclaimer: The podcast and artwork on this page are property of the podcast owner, and not endorsed by UP.audio.