ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting
The creation of high-quality 3D assets is critical in areas such as digital heritage, entertainment, and robotics, traditionally relying on professionals and specialized software. As demand for 3D resources in gaming and virtual reality (VR) increases, 3D image generation technologies are progressively reducing the dependence on specialized skills. However, current methods often struggle to achieve both fine texture detail and geometric consistency. To address this, we propose the ScalingGaussian framework, which combines 3D and 2D diffusion models, optimized through the introduction of Gaussian noise and Score Distillation Sampling (SDS) loss. This approach enhances the geometric stability and texture fidelity of 3D assets.
The Role of Deductive and Inductive Reasoning in Large Language Models
Large language models (LLMs) have made significant progress in AI, excelling in reasoning tasks. However, their reliance on static prompts limits their adaptability to complex, dynamic problems. To address this, we propose the Deductive and Inductive (DID) method, combining inductive reasoning for extracting general rules and deductive reasoning for applying them flexibly. Inspired by cognitive science, DID dynamically adjusts reasoning paths based on task context. Empirical tests on multiple datasets, including the challenging Holiday Puzzle, show DID significantly improves accuracy and reasoning quality without adding computational burden. This method offers a robust framework for advanced problem-solving in LLMs.
Human Motion Instruction Tuning
Introducing LLaMo (Large Language and Human Motion Assistant), a multimodal framework for human motion instruction tuning. Unlike traditional methods that turn motion data into text tokens, LLaMo preserves motion in its original form, keeping important details intact and improving the model’s ability to understand complex human behaviors. By integrating video, motion, and text data, it provides flexible, human-centered analysis. Experiments show that LLaMo performs exceptionally well in complex areas like human behavior and professional activities, enhancing understanding and predictions in motion-related scenarios. It also has potential applications in fields like sports analytics and behavioral prediction.
3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting
3DSceneEditor simplifies 3D scene editing with a fully 3D-based approach using Gaussian Splatting, enabling real-time, precise edits. Unlike traditional methods, it streamlines the process by integrating semantic labeling, zero-shot object grounding with CLIP, and direct scene modifications like adding, repositioning, or recoloring objects. Experiments show it outperforms current methods in speed and accuracy, setting a new standard for efficient 3D scene customization.
Graph Canvas for Controllable 3D Scene Generation
GraphCanvas3D is a flexible and adaptable framework for 3D scene generation, designed to overcome the limitations of traditional methods reliant on predefined datasets. Using in-context learning, it dynamically adjusts without retraining, allowing real-time object manipulation and scene adjustments. By representing spatial elements as graph nodes with hierarchical relationships, it also supports 4D scene generation to model changes over time. Experiments show GraphCanvas3D improves usability, flexibility, and adaptability for creating and modifying 3D scenes.