Project Sparky: Mobile Web App for Multimodal Prompting with Google’s Gemini API

4 min read2 days ago

UC Berkeley Codebase Project Collaboration with Google

Hello! We are a team from Codebase, a student organization from UC Berkeley that empowers students to break into the software industry through hands-on technical projects. This past semester, the eight of us had an incredible experience collaborating with Google to build Sparky, a mobile web app for multimodal prompting using the Gemini API.

Experiment with Gemini in multimodal fashion with Sparky!

Why Sparky?

Sparky is heavily inspired by Google AI Studio, and endeavors to build upon its ease of experimentation and prototyping by providing additional, mobile-friendly features such as camera and photo library integration, as well as audio recording.

Through Sparky, we hope to provide a seamless and robust user experience by integrating all media capturing, processing, and uploading functionalities directly into the web app, enabling rapid iteration and testing!

Project Scope

API Features

To build our app, we used two key features of the Gemini API — multi-turn chat capabilities and multimodal input parsing. Since Gemini 1.5 Pro was released during the course of our project, we incorporated features such as audio inputs into our app that came with the release.

const chat = model.startChat({
     history: await constructChatHistory(
       currentPrompt.variants[variantIndex].variantHistory
     ),
     generationConfig: {
       maxOutputTokens: 2048
     }
   })

Implementing multi-turn chat capabilities with Gemini was straightforward from a development standpoint. The retrieved model’s construction allowed for a smooth integration into Sparky, making it very easy for us to begin building out the full functionality of our app.

Sparky Features

We built Sparky with developers in mind, and thus included several features that make experimenting with the Gemini API effortless. Below are several examples of features that we implemented with developers in mind, made easy with the intuitive and comprehensive model.

Compare different prompt variants: This feature generates a new Variant — a copy of all of the prompts and responses so far — and places it vertically below on the screen. Users can easily branch off and try different prompts, while still being able to compare the results side by side.

const deepCopiedRequests = variantToCopy.currentRequests.map((bubble) => {
       return { ...bubble }
     })

     const copiedVariant = new Variant()
     copiedVariant.variantHistory = [...variantToCopy.variantHistory]
     copiedVariant.currentRequests = deepCopiedRequests     const newVariants = [...prevData.variants, copiedVariant]

Regenerate prompts: Users can regenerate responses to their prompts with the click of a button, and toggle between the different response options.
Efficiently generate image/audio inputs: Users can add multimodal inputs through the camera and microphone to capture image and audio, respectively.
Prompt auto-saving/auto-naming: Users can hop on and off the web app at any point to experiment with prompting.

async function generateTitle(msg) {
   try {
     const prompt = [
       "If you were a chatbot, describe the following input as a prompt title in about 7 words or less:"
     ].concat(msg)
     const result = await model.generateContent(prompt)
     const response = await result.response
     const text = response.text()
     console.log("generated title: ", text)
     return text
   } catch (error) {
     console.error("Error creating title for prompt: ", error)
     throw new Error(error)
   }
 }

What we learned

Throughout Sparky’s development, we were constantly playing around with the API. Here are some of our team’s thoughts on Gemini:

Very useful in code debugging — knowledgeable about its own implementation, which was quite relevant for our team.
Skilled at creative applications — for example, it can generate many entertaining stories (our favorites included a tragic love story between two cats).
Accurate at image identification
Capable of playing intelligent games, such as trivia, twenty questions, and guessing games.

Additionally, our team had an amazing opportunity to delve deeper into more recent technologies, such as interacting with LLMs, refining mobile app development skills with Next.js, and designing a collaborative product from scratch. Starting with a Figma layout of Sparky to a design doc outlining the integral features of the app, we were truly able to see our ideas come to life.

Thank You, Google Labs Team!

A shout out to the amazing team at Google Labs! Cher Hu, Jaclyn Konzelmann, Dimitri Glazkov, and Barnaby James all made this project and collaboration possible! We really appreciate the ability to work hands-on with the Gemini model and gain such invaluable experience. We loved working with the team, and we are looking forward to seeing our work integrated with the developer workflow for multimodal prompting with Gemini!

Project Sparky: Mobile Web App for Multimodal Prompting with Google’s Gemini API

Why Sparky?

Project Scope

What we learned

Thank You, Google Labs Team!

Written by Codebase