
Image by Author | Canva
Introduction
This article is a continuation of my beginner project series. If you missed the first one on Python, I highly encourage you to check out 5 Fun Python Projects for Absolute Beginners.
So, what exactly is generative AI or Gen AI? Essentially, it involves using artificial intelligence to generate new content—whether that be text, images, audio, or even video. The landscape has dramatically changed since the emergence of advanced foundation models like GPT, LLaMA, and LLaVA, allowing newcomers to create creative tools and applications without needing to build models from scratch.
In this article, I’m sharing five projects that cover a wide array of generative AI concepts: text generation, image creation, voice interaction, and more, including backend concepts like fine-tuning and Retrieval Augmented Generation (RAG). These projects utilize both API-based solutions and local setups, ensuring that you grasp essential building blocks used in many modern Gen AI applications. Let’s dive in!
1. Recipe Generator App (Text Generation)
Link: Build a Recipe Generator with React and AI: Code Meets Kitchen
Let’s kick things off with a fun and simple project that utilizes text generation through an API. This app lets users input basic information—like ingredients, meal type, cuisine, cooking time, and complexity—and generates a complete recipe using GPT. You’ll learn how to create a frontend form, send data to GPT, and render the AI-generated recipe back to the user. If you’re up for something a bit more advanced, check out this version: Create an AI Recipe Finder with GPT in 1 Hour. It incorporates advanced prompt engineering, suggestions, ingredient substitutions, and a more dynamic frontend.
2. Image Generator App (Stable Diffusion, Local Setup)
Link: Build a Python AI Image Generator in 15 Minutes (Free & Local)
While you might be familiar with generating images using cloud-based tools like ChatGPT, DALL·E, or Midjourney, what if you wanted to run everything locally—avoiding API costs or cloud limitations? This project shows you how to set up Stable Diffusion on your computer, allowing you to enter text prompts and generate AI images instantly without an internet connection. The simple steps include installing Python, cloning a lightweight web UI repository, downloading model checkpoints, and setting up a local server.
3. Medical Chatbot with Voice, Vision, and Text
Link: Build an AI Voice Assistant App using Multimodal LLM Llava and Whisper
This project taps into multiple modalities, combining voice interaction, image analysis, and text comprehension. Although it isn’t specifically designed as a medical chatbot, the use case fits well in medical contexts. Users can speak to the app, which listens, looks at an image (like an X-ray or document), and provides intelligent responses using LLaVA (a multimodal vision-language model) and Whisper (OpenAI’s speech-to-text model). The guide walks you through the process of setting it up on Google Colab, installing the necessary libraries, and integrating audio replies using gTTS.
4. Fine-Tuning Modern LLMs
Link: Fine-tune Gemma 3, Qwen3, Llama 4, Phi 4, and Mistral Small with Unsloth and Transformers
Utilizing off-the-shelf models is a good start, but gaining more control through fine-tuning is where you can really unleash creativity. This particular video by Trelis Research takes you through the fine-tuning process for models like Gemma 3, Qwen3, Llama 4, Phi 4, and Mistral Small using Unsloth (a library for faster, memory-efficient training) and Transformers. It’s a longer video—about 1.5 hours—but it’s worth the investment as it walks you through preparing datasets, running evaluations, and troubleshooting training issues.
5. Build Local RAG from Scratch
Link: Local Retrieval Augmented Generation (RAG) from Scratch (step-by-step tutorial)
Chatbots often struggle when asked questions outside their training data, and this is where RAG becomes invaluable. You provide your language model with a vector database containing relevant documents, enabling it to retrieve context before generating answers. This video tutorial guides you through building a full local RAG system either on Google Colab or your machine. You’ll learn how to load documents (like textbook PDFs), split them into chunks, generate embeddings using a sentence-transformer model, store them in SQLite-VSS, and connect it to a local LLM (such as Llama 2 via Ollama).
Key Takeaways
Throughout these projects, you will gain a foundational understanding of essential components in generative AI:
Text → Image → Voice → Fine-tuning → Retrieval
If you’re eager to dive into generative AI and want to build something tangible—rather than just interact with demos—these projects provide a comprehensive roadmap. Begin with whichever project excites you most, and remember, it’s perfectly fine to experiment and even break things along the way. That’s all part of the learning process!
Kanwal Mehreen is a machine learning engineer and technical writer, passionate about data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT” and champions diversity in tech as a Google Generation scholar. Kanwal has founded FEMCodes to empower women in STEM.