S.C.A.R.P. - Self Correcting Auto-Refined Prompts with the SambaNova Llama Text & Vision Models #lightning_hackathon

robert.oschler · November 20, 2024, 7:41pm

Overview

Livepeer Image Helper harnesses the power of the decentralized Livepeer stable diffusion model network to create stunning AI-generated art, eliminating the pain and confusion often associated with the process. Thanks to the blazing speed of the SambaNova API, we introduce Self-Correcting Auto-Refined Prompt technology (S.C.A.R.P.), a game-changing feature for seamless image creation.

The first version of Livepeer Image Helper was an AI-driven chatbot that utilized text completion models to interact with users, helping them craft the perfect generation prompt. Once an image was generated, users provided feedback to the chatbot about what was wrong with the image or how they wanted it modified. This capability remains fully intact in the app. However, we asked an important question: Could the Llama vision model automate this feedback process and eliminate the need for user intervention? The answer was a resounding “yes,” and this breakthrough led to the creation of S.C.A.R.P..

With S.C.A.R.P., users are freed from the complexities of selecting stable diffusion models or tweaking intricate parameters. Instead, they simply describe the image they want. If the result doesn’t match their expectations—an all-too-common issue with stable diffusion models—S.C.A.R.P. steps in. Using a powerful chain of Llama text and vision model operations, it refines and corrects the image automatically, ensuring a perfect match to the user’s request. (See the S.C.A.R.P. diagram or demo video for more details.)

Not Shown in the Video

While the demo video showcases core functionality, there are three additional standout features powered by the lightning-fast SambaNova API and advanced Llama text model pipelines:

1. Intent Detector Pipeline

Every input submitted to the app undergoes analysis by a suite of intent detectors. Each detector runs as an independent Llama text model call in parallel, interpreting the user’s intent. These results guide the app in selecting the best stable diffusion model and adjusting parameters based on user feedback. This system makes advanced image generation accessible to everyone, not just technical users, reducing frustration and delivering superior results. (See the Intent Detector Pipeline diagram for details.)

2. Share Image on Twitter/X

With a single button click, users can showcase their creations on Twitter/X. The app leverages a Llama text model to generate the tweet text, emojis, hashtags, and a preview image automatically, streamlining the sharing process and maximizing engagement.

3. Mint NFTs with License Terms

Users have the option to mint NFTs from their creations on the Story Protocol digital rights management blockchain. An AI-powered chatbot, driven by Llama text models, interviews users to help them select the most appropriate license terms for their NFTs. This feature ensures that users can protect and monetize their work with minimal effort.

Livepeer Image Helper, powered by S.C.A.R.P., the SambaNova API, and Llama text models, revolutionizes the image-generation experience by combining cutting-edge technology with user-centric design.

Intent Detector Pipeline

Every input submitted to the app undergoes analysis by a suite of intent detectors. Each detector runs as an independent Llama text model call in parallel, interpreting the user’s intent. These results guide the app in selecting the best stable diffusion model and adjusting parameters based on user feedback. This system makes advanced image generation accessible to everyone, not just technical users, reducing frustration and delivering superior results.

S.C.A.R.P. Pipeline

This diagram shows the complex series of calls that are made to the Llama Text & Vision models to perform S.C.A.R.P. processing. Without the blazing fast SambaNova API, the intent detector pipeline and this pipeline would not be possible because of the large number of operations required to execute the pipelines in real-time.

Links:

Website: Plastic Educator
Demo Video: Watch
GitHub: Repo

coby.adams · November 21, 2024, 4:44am

@robert.oschler

This was cool I went through generate enhance and refine form a simple prompt of mt hood in the winter and wound up with this gorgeous image

I have an old friend that is doing a lot of AI graphic design would you mind if I had him play and see what he thinks ?

-Coby

omkar.gangan · November 21, 2024, 5:52am

Great work, such a cool and impactful idea!

robert.oschler · November 21, 2024, 12:11pm

Of course not. That’s why I published it, so people could have fun with it. It’s also open source, MIT license.

robert.oschler · November 21, 2024, 12:13pm

Thank You! I was rather astonished by how good the Llama vision models are and how fast the SambaNova API is. It’s interesting how better tech and raw speed makes many things possible that weren’t before.

robert.oschler · November 21, 2024, 12:32pm

Note, I have just updated the code to @ sign SambaNova AI with every auto-generated Tweet/X post when you use the Twitter/X share button from the app:

@omkar.gangan

prafull.thokal · November 27, 2024, 7:30pm

@robert.oschler This tool sets a new standard for AI-powered creativity—fantastic job!

robert.oschler · November 27, 2024, 8:41pm

Thank you! I couldn’t have done it without SambaNova’s speed and its vision model support.