AI playground: Trellis


Github https://github.com/microsoft/TRELLIS

Trellis is a cutting-edge AI framework designed for efficient 3D asset generation from textual or visual inputs. Its primary goal is to streamline and democratize the process of creating high-quality, versatile 3D models for applications in gaming, virtual reality, digital art, and industrial design. By focusing on surface-based representations, Trellis enables the generation of diverse 3D formats, making it a comprehensive solution for various needs.
Key purposes include:

  • Simplifying and speeding up 3D Content Creation: Providing a user-friendly interface to generate complex 3D assets without requiring expertise in 3D modeling.
  • Enhancing Cross-Format Utility: Allowing users to export assets in multiple formats like meshes, radiance fields, and 3D Gaussians.
  • Advancing AI-driven Design: Pioneering structured latent representation techniques that push the boundaries of 3D content creation.

Core Features of Trellis

  • Structured Latent Representation (SLAT): Trellis employs a novel method called Structured LATents (SLAT), which focuses on the surfaces of 3D objects. This approach enables efficient data processing and allows the generation of multiple 3D asset formats from a single representation, enhancing versatility and quality.
  • Diverse Output Formats: The model can produce 3D assets in various formats, including Radiance Fields, 3D Gaussians, and meshes, catering to different needs in fields like gaming, virtual reality, and digital art.
  • Scalability and Versatility: Trellis has been developed in three model sizes—Basic (342 million parameters), Large (1.1 billion parameters), and X-Large (2 billion parameters). Trained using 64 A100 GPUs with 400,000 steps and a batch size of 256, the model's generation quality improves with size, offering scalable solutions for diverse applications.
  • Integration with CLIP and Custom Transformers: Trellis utilizes CLIP to convert text into features and employs a custom-developed Rectified Flow Transformer for efficient 3D grid generation. This integration ensures that the model effectively interprets and generates 3D assets based on textual or visual input.

Basic Hardware Requirements

  • GPU: An NVIDIA GPU with a minimum of 16 GB VRAM is essential. The code has been tested on NVIDIA A100 and A6000 GPUs.
  • CUDA Toolkit: Necessary for compiling certain submodules. The code has been tested with CUDA versions 11.8 and 12.2.
  • Python: Version 3.8 or higher is required.

Setting Up Trellis on Different Operating Systems

  1. Linux:
    • Prerequisites:
      • Ensure the NVIDIA GPU drivers and CUDA Toolkit are installed.
      • Install Conda for managing dependencies.
    • Installation Steps:
      1. Clone the Repository:
      2. Set Up the Environment:
      3. Install Dependencies:
      4. Execute:
  2. Windows:
    • Prerequisites:
      • Install the latest NVIDIA GPU drivers and CUDA Toolkit.
      • Install Conda for dependency management.
    • Installation Steps:
      1. Clone the Repository:
      2. Set Up the Environment:
      3. Install Dependencies:
      4. Execute:

Applications and Accessibility

Trellis is particularly beneficial for professionals in gaming, virtual reality, and digital art, providing a streamlined process for creating detailed 3D models from simple text descriptions or images. A demo is available on Hugging Face, allowing users to generate 3D assets from images, facilitating hands-on exploration of Trellis's capabilities.

Demos

Hugging Face online demo: https://huggingface.co/spaces/JeffreyXiang/TRELLIS
  • Image-to-3D Transformation:
    • Upload an image, and Trellis converts it into a structured 3D representation, useful for creating virtual objects from 2D references.
  • Multi-format Output:
    • Generate the same object in different formats (e.g., radiance fields for rendering, meshes for 3D printing) to suit various use cases.
  • Text-to-3D Asset Creation:
    • Users input descriptive text prompts, such as "a futuristic flying car", and Trellis generates a detailed 3D model representing the object.

Future Work

Trellis represents a foundational step in structured 3D asset generation, and its future development could focus on:
  1. Improving Generation Quality:
    • Enhancing the resolution and fidelity of generated 3D models, especially for high-precision use cases like medical imaging or engineering.
  2. Real-time Applications:
    • Optimizing for real-time 3D generation in VR/AR environments, enabling dynamic object creation during gameplay or simulations.
  3. Integration with Design Tools:
    • Developing plugins for popular design software (e.g., Blender, Unity) to streamline workflows for artists and developers.
  4. Comparison Studies and Benchmarks:
    • Conducting systematic comparisons with other frameworks like DreamFusion or NerfStudio to establish Trellis's advantages in quality, speed, and versatility.
  5. Fine-tuning for Specific Industries:
    • Training specialized versions for niches such as automotive design, architecture, or virtual try-ons in e-commerce.

Comparison

Feature Trellis DreamFusion NerfStudio
Representation Structured Latent Surfaces Neural Radiance Fields Neural Radiance Fields
Input Text/Image Text Image
Output Formats Mesh, Radiance, 3D Gaussian Radiance Fields Radiance Fields
Applications Multi-purpose, versatile Creative asset generation Visual rendering focus
Model Size Scalable (Basic to X-Large) Fixed Fixed
Training Efficiency Highly scalable, surface-based Radiance field-dependent Radiance field-dependent

Conclusion

Trellis is not just a 3D asset generation tool; it is a versatile platform poised to redefine how industries approach 3D modeling. Its innovative approach, coupled with scalable outputs and potential for further development, makes it a strong contender in the AI-driven 3D content space.