What is new on Article Factory and latest in generative AI world

ByteDance Unveils Seedance 2.0, Multimodal AI Video Generator

ByteDance Unveils Seedance 2.0, Multimodal AI Video Generator
ByteDance announced Seedance 2.0, a next‑generation AI model that can create short video clips from combined text, image, audio, and video prompts. The system supports up to nine images, three video clips, and three audio clips per request and can produce 15‑second videos that respect camera movement, visual effects, and physical laws. Demonstrations include synchronized figure‑skating routines, anime‑style scenes, and celebrity‑lookalike cinematic fights. Seedance 2.0 is currently available through ByteDance’s Dreamina AI platform and the Doubao assistant, with no clear plan for TikTok integration. Leia mais →

Moonshot AI Launches Kimi K2.5 Multimodal Model and Open-Source Coding Tool Kimi Code

Moonshot AI Launches Kimi K2.5 Multimodal Model and Open-Source Coding Tool Kimi Code
Moonshot AI, backed by major investors, announced the release of Kimi K2.5, a multimodal model trained on a massive dataset of text, image, and video tokens. The model is positioned to match or exceed the performance of proprietary competitors in coding and video understanding benchmarks. Alongside the model, Moonshot introduced Kimi Code, an open‑source coding assistant that lets developers work with text, images, and video inputs across popular development environments. The moves underscore Moonshot's push to become a leading player in AI‑driven software development tools. Leia mais →

Google Search Adds Gemini 3 Pro AI for Multimodal Queries

Google Search Adds Gemini 3 Pro AI for Multimodal Queries
Google has integrated its Gemini 3 Pro artificial‑intelligence model into Search through AI Mode, allowing users to ask chatbot‑style questions directly in the search interface. The multimodal model can handle text, images, video, code, reasoning and planning, and aims to understand intent and provide richer answers. The rollout includes example prompts for tasks such as summarizing long‑form videos, planning meals, weekend trips, workout routines, and building custom games. The article also offers general prompting tips for AI chatbots like ChatGPT, emphasizing specificity, role assignment, and iterative questioning. Leia mais →

AI Glossary: Essential Terms Explained

AI Glossary: Essential Terms Explained
A comprehensive glossary of artificial intelligence terminology has been compiled to help readers understand the rapidly expanding AI landscape. The guide covers core concepts such as generative AI, large language models, and deep learning, as well as emerging topics like AI safety, ethics, and agentive systems. Definitions are presented in clear language, highlighting practical examples—from chatbots like ChatGPT and Claude to multimodal models that process text, images, and audio. The resource serves as a reference for anyone looking to navigate AI‑driven products, research, and industry trends. Leia mais →

Google Gemini Gains Personalization by Tapping Into Your Apps

Google Gemini Gains Personalization by Tapping Into Your Apps
Google has rolled out a new personalization feature for its Gemini AI, allowing the model to draw on data from connected Google apps such as Calendar, Photos, and Gmail. The capability, currently in beta for Google AI Pro and Ultra subscribers, lets Gemini provide answers that reflect a user’s personal context, from travel preferences to specific product recommendations. Users control which apps are linked, and the system does not use the full content of those apps to train its models, adhering to existing privacy policies. The update aims to make Gemini’s responses more useful and individually tailored. Leia mais →

OpenAI Launches GPT Image 1.5, a Faster, Cheaper Native Multimodal Image Model

OpenAI Launches GPT Image 1.5, a Faster, Cheaper Native Multimodal Image Model
OpenAI has introduced GPT Image 1.5, an AI image‑synthesis model that runs within the same neural network that processes language prompts. The new model is reported to generate images up to four times faster and at roughly 20 percent lower cost than its predecessor. Integrated into ChatGPT, GPT Image 1.5 enables users to edit photos with simple text commands, from adding objects to changing clothing, while preserving facial likenesses. The rollout marks a shift toward more seamless, conversational image editing without needing specialized graphics skills. Leia mais →

Google launches Gemini 3 Flash as default model in Gemini app

Google launches Gemini 3 Flash as default model in Gemini app
Google unveiled Gemini 3 Flash, a faster and cheaper AI model built on the recent Gemini 3 architecture. The company is making Flash the default model in its Gemini app and AI‑enabled search, while still offering the Pro version for more demanding tasks. Gemini 3 Flash delivers notable performance gains on benchmark tests, supports multimodal inputs such as video, sketches, and audio, and is available through Vertex AI, Gemini Enterprise, and an API preview. Early adopters like JetBrains and Figma are already integrating the model, and Google highlights its suitability for bulk, work‑horse workloads. Leia mais →

Character.ai Launches “Stories” as It Phases Out Open‑Ended Chat for Under‑18 Users

Character.ai Launches “Stories” as It Phases Out Open‑Ended Chat for Under‑18 Users
Character.ai is ending open‑ended AI chat for users under 18 and replacing it with a new visual adventure mode called Stories. The shift follows a tragic suicide involving a 14‑year‑old user and a subsequent wrongful‑death lawsuit that prompted the company to add safety measures. While the unrestricted chat feature will disappear for minors, the platform will still provide tools such as Feed, Imagine, Avatar FX, Streams, and the newly introduced Stories, which let teens pick characters, genres, and plot premises and make choices that shape the narrative. Leia mais →

ChatGPT, Gemini, and Claude Compete in Multimodal Image Understanding

ChatGPT, Gemini, and Claude Compete in Multimodal Image Understanding
A side‑by‑side evaluation examined how three leading AI chat models—ChatGPT, Gemini, and Claude—interpret complex images. The test used a bustling Times Square scene, Michelangelo’s densely populated "Last Judgment," and a cluttered indoor room to gauge each system’s ability to identify objects, read text, and describe spatial relationships. ChatGPT delivered careful, structured inventories, Gemini produced highly detailed, context‑rich descriptions, and Claude offered more narrative‑style overviews with occasional imaginative leaps. The findings highlight Gemini’s precision, ChatGPT’s reliability, and Claude’s creative flair, offering clear guidance for users seeking specific strengths in visual AI tasks. Leia mais →

Mistral closes in on Big AI rivals with new open-weight frontier and small models

Mistral closes in on Big AI rivals with new open-weight frontier and small models
French AI startup Mistral unveiled its Mistral 3 family, featuring a large frontier model with multimodal and multilingual capabilities and nine smaller, fully customizable models. The launch emphasizes open-weight access, allowing developers to run models on a single GPU and fine‑tune them for specific enterprise tasks. Mistral positions its models as cost‑effective alternatives to closed‑source rivals, highlighting efficient architecture, extensive context windows, and suitability for on‑premise deployment. The company also announced collaborations with partners in robotics, cybersecurity, and automotive sectors to integrate its models into specialized applications. Leia mais →

Salesforce CEO Marc Benioff Switches from ChatGPT to Google Gemini 3

Salesforce CEO Marc Benioff Switches from ChatGPT to Google Gemini 3
Salesforce chief executive Marc Benioff announced a rapid shift from OpenAI's ChatGPT to Google's Gemini 3 after testing the new model for just two hours. He praised Gemini 3’s speed, reasoning and multimodal abilities—covering text, images, code, audio and video—as a major leap forward, stating he will not return to ChatGPT. Benioff’s public endorsement reflects his long‑standing involvement in enterprise AI, including Salesforce’s early partnership with OpenAI, and may signal broader corporate realignment toward Google’s AI platform. Leia mais →

Google’s Gemini 3 Takes Lead in AI Race, But Challenges Remain

Google’s Gemini 3 Takes Lead in AI Race, But Challenges Remain
Google launched Gemini 3, its newest large‑language model, to immediate fanfare and strong early adoption. The model outperformed competitors on a range of benchmarks, topped the LMArena leaderboard, and attracted over a million users within its first day. Industry leaders praised its speed, reasoning and multimodal abilities, while some professionals noted that real‑world performance still varies by domain. Google plans to roll Gemini 3 into its suite of products, acknowledging that future iterations will address current limitations. Leia mais →

Google's NotebookLM Adds Nano Banana Pro for AI‑Generated Infographics and Slide Decks

Google's NotebookLM Adds Nano Banana Pro for AI‑Generated Infographics and Slide Decks
Google has expanded its NotebookLM platform with the Nano Banana Pro AI model, enabling users to turn research into polished infographics and slide decks without leaving the notebook. The new tools synthesize accurate information, render text within images, and maintain visual consistency across styles. Early tests show the system can visualize complex topics—such as the evolution of Arthurian legend—by creating coherent, publication‑ready graphics that still benefit from human refinement. The integration marks a significant step toward seamless, multimodal research and design within a single AI‑driven workspace. Leia mais →

Google Unveils Gemini 3, Boosting Multimodal Reasoning and Agentic AI

Google Unveils Gemini 3, Boosting Multimodal Reasoning and Agentic AI
Google has launched Gemini 3, the newest generation of its AI model, bringing notable upgrades in reasoning, accuracy, and multimodal understanding. The update powers the Gemini app, AI Mode in Google Search, NotebookLM, and developer platforms, and introduces generative interfaces that can produce magazine‑style layouts, dynamic interactive views, and an experimental Agent mode for task automation. Demonstrations include trip planning, educational visualizations, inbox organization, and rental car logistics, showcasing the model’s ability to handle complex, multi‑step prompts with greater autonomy. Leia mais →

Google Gemini 3 Pro Delivers Mixed Performance in Real‑World Tests

Google Gemini 3 Pro Delivers Mixed Performance in Real‑World Tests
Google’s Gemini 3 Pro model introduces upgraded reasoning, visual generation, and agentic capabilities, but hands‑on testing shows results that fall short of the company’s demos. The new Canvas workspace can combine text, images, and video to create interactive 3D visualizations, while the generative UI offers magazine‑style layouts for travel itineraries and other topics. Gemini Agent can organize Gmail, set reminders, and attempt reservations, yet it occasionally misstates costs and requires multiple confirmations. Compared with other AI assistants, Gemini excels at Gmail integration but lags in speed and consistency, delivering a mixed overall experience. Leia mais →

Gemini 3 vs. ChatGPT 5.1: How the New AI Chatbots Stack Up in Real‑World Use

Gemini 3 vs. ChatGPT 5.1: How the New AI Chatbots Stack Up in Real‑World Use
A recent hands‑on test compares Google’s Gemini 3 and OpenAI’s ChatGPT 5.1 across everyday scenarios such as gift shopping, school explanations, travel planning, smart‑home troubleshooting, and bedtime routines. Both models deliver accurate answers, but Gemini 3 leans toward tidy, structured responses while ChatGPT 5.1 offers a more conversational tone. The review highlights each model’s strengths—Gemini’s organized layouts and multimedia aids, and ChatGPT’s nuanced emotional framing—suggesting that user preference will hinge on the desired balance between precision and personable dialogue. Leia mais →

Google Unveils Gemini 3, Its Latest AI Model

Google Unveils Gemini 3, Its Latest AI Model
Google has introduced Gemini 3, a new AI model that powers the Gemini app, Search and developer tools. The rollout includes two variants—Gemini 3 Pro for everyday use and Gemini 3 Deep Think for enhanced reasoning. The Gemini app receives a major redesign with a My Stuff folder and an AI agent that can execute multi‑step tasks across Google services. Search now leverages Gemini 3 to generate visual, interactive answers. The model also adds multimodal strengths such as video analysis and a million‑token context window, positioning it as a significant step forward in Google’s AI ecosystem. Leia mais →

Google Unveils Gemini 3, Its Most Intelligent Multimodal AI Model

Google Unveils Gemini 3, Its Most Intelligent Multimodal AI Model
Google announced the launch of Gemini 3, branding it as the company’s most intelligent and factually accurate AI system to date. The flagship Gemini 3 Pro model, available in the Gemini app and to select Search subscribers, is natively multimodal, handling text, images and audio together. Google highlighted new capabilities such as translating recipe photos, generating interactive flashcards, and creating visual, magazine‑style layouts. The rollout includes generative interfaces, an upgraded query‑fan‑out technique, reduced sycophancy, and an experimental Gemini Agent that can manage emails and book travel. The model is accessible to all users in the Gemini app, with additional features for Google AI Pro and Ultra subscribers. Leia mais →

Google Unveils Gemini 3 AI Model with Deeper Understanding and New Agentic Tools

Google Unveils Gemini 3 AI Model with Deeper Understanding and New Agentic Tools
Google announced Gemini 3, its most advanced AI model to date, highlighting improved ability to grasp user intent and richer multimodal features. The model can transform long video lectures into interactive flash cards and analyze sports footage for performance insights. Gemini 3 will appear in AI Mode in Search, AI Overviews for Pro and Ultra subscribers, and powers new agentic platform Antigravity, which can autonomously plan and execute software tasks. The company also noted enhancements in security against prompt‑injection attacks and reduced sycophancy. Gemini 3’s advanced capabilities are initially available to Google AI Ultra subscribers. Leia mais →

ChatGPT Expands Hands‑Free Interaction with Voice Mode

ChatGPT Expands Hands‑Free Interaction with Voice Mode
OpenAI has broadened the capabilities of its ChatGPT assistant by adding a Voice Mode that lets users speak their queries and hear spoken answers. The feature works across mobile, desktop and web platforms, allowing a natural back‑and‑forth conversation without typing. Two versions are offered: a standard, free voice option and an advanced, paid option that provides real‑time, multimodal interaction. Users report that the hands‑free experience improves speed, accessibility, language practice and on‑the‑go brainstorming, while still relying on the same underlying language model. Leia mais →