all posts

DietLogApp is a powerful, AI-driven application designed to analyze food images and provide detailed nutritional feedback. Built with FastAPI, this application serves as a backend API that can be integrated into various health and wellness applications. It leverages the capabilities of Anthropic’s Claude 3.5 Sonnet model to understand and interpret food images, offering users a comprehensive breakdown of their meals.

Check out the project source code on Github

This post will take you on a deep dive into the DietLogApp, exploring its features, architecture, and how you can get it up and running on your local machine.

Features

API Deep Dive: From Request to Nutritional Insights

The application’s core logic resides in its API, which processes image URLs to deliver detailed nutritional analysis. Let’s break down the entire workflow, from receiving a request to streaming back AI-generated feedback.

1. Receiving and Processing the Request

The process begins when a user submits an image URL through the web interface, triggering a POST request to the /diet/process endpoint. This endpoint is defined in app/api/diet.py and expects a JSON body with a single url field, validated by the ImageRequest Pydantic model.

Here’s a step-by-step look at how the request is handled:

  1. Dependency Injection: The API leverages FastAPI’s dependency injection to get instances of the LLMService and ImageService from their respective providers (LLMProvider and ImageProvider). This promotes a clean separation of concerns.
  2. Image Fetching: The HTTPXService (an implementation of ImageService) fetches the image from the provided URL. Before downloading, it sends a HEAD request to check the Content-Length header.
  3. Image Validation: The application enforces a size limit of 4MB. If the image exceeds this size, an ImageTooLargeError is raised, and the API returns a 400 Bad Request response. This prevents excessive memory consumption and processing time. While any image URL can be sent, the analysis is optimized for common formats like JPEG and PNG that are clear and focused on food items.
  4. Image Decoding: If the image is valid, its content is read into bytes and then encoded into a Base64 string. This standardized format is required for the vision model API.

2. Integrating with Claude for Food Analysis

The magic of DietLogApp lies in its integration with Anthropic’s Claude 3.5 Sonnet model, which is handled by the AnthropicService. The analysis is a two-step process:

  1. Image Description: The Base64-encoded image is sent to Claude with a detailed prompt (food_image_description_prompt). This prompt instructs the model to act as an AI assistant and provide a factual description of the food, including ingredients, cooking methods, and presentation, while explicitly warning against making assumptions about unidentifiable items.
  2. Nutritional Feedback: The descriptive text generated in the first step is then embedded into a second prompt (food_nutritional_feedback_prompt). This prompt asks the model to take on the persona of an AI nutritionist, analyze the description, and generate a comprehensive nutritional breakdown.

The application uses the official anthropic Python library to interact with the Claude API. Specifically, it calls client.messages.create for the initial description and client.messages.stream to generate the final analysis.

Advanced Prompt Engineering for Accurate Analysis

The reliability of DietLogApp’s analysis hinges on sophisticated prompt engineering. Instead of a single, generic query, the application uses a chain of highly structured prompts to guide the AI, ensuring the output is accurate, relevant, and consistent. Here are the key techniques employed:

1. Persona Pattern

Both prompts assign a specific role to the AI.

Advantage: Adopting a persona significantly improves the quality and relevance of the response. It helps the model select the appropriate tone, style, and depth of information for the given task.

2. Two-Step Prompt Chaining

The analysis is broken into two distinct steps: description and nutritional feedback. This “chaining” technique, where the output of the first prompt becomes the input for the second, is a deliberate design choice.

Advantage: Decomposing a complex task into simpler, sequential steps reduces the cognitive load on the model. It allows the AI to focus on one thing at a time—first observation, then interpretation—leading to a more accurate and detailed final output than a single, complex prompt could achieve.

3. Structured Output and XML Tagging

The second prompt commands the model to structure its output using specific XML-like tags: <nutritional_breakdown>, <reasoning>, <score>, and <feedback>.

Advantage: This is the most powerful technique for ensuring a reliable and predictable output. It forces the model to provide all the required pieces of information in a consistent format. For future development, this makes the API’s output easily parsable, allowing a frontend application to cleanly separate and display the score, the reasoning, and the feedback in different UI components.

4. Step-by-Step Instructions and Negative Constraints

Both prompts provide clear, enumerated steps for the AI to follow and explicitly state what not to do.

Advantages:

These combined techniques elevate the application from a simple AI wrapper to a robust analysis tool that produces reliable and well-structured nutritional feedback.

4. Why Stream the Response?

Instead of waiting for the entire nutritional analysis to be generated, the application streams the response back to the client as it’s being created by the model. This is achieved using FastAPI’s StreamingResponse.

The primary benefits of this approach are:

The simple frontend in index.html demonstrates how to consume this stream using the fetch API and a ReadableStream to update the UI in real time.

Project Structure

The project is organized into a clean and modular structure:

.
├── app/
│   ├── api/
│   │   └── diet.py         # Defines the API endpoints
│   ├── exceptions/
│   │   └── image.py        # Custom exception for image processing
│   ├── integration/
│   │   └── anthropic.py    # Integration with the Anthropic API
│   ├── interfaces/
│   │   ├── image.py        # Interface for image services
│   │   └── llm.py          # Interface for LLM services
│   ├── providers/
│   │   ├── image.py        # Provider for image services
│   │   └── llm.py          # Provider for LLM services
│   ├── static/
│   │   └── index.html      # Simple frontend for the application
│   └── main.py             # Main application entry point
├── Dockerfile              # Dockerfile for building the application image
├── pyproject.toml          # Project metadata and dependencies
├── README.md               # Project README file
└── requirements.txt        # Project dependencies

Getting Started

You can run the DietLogApp locally using Docker.

Prerequisites

Steps

  1. Clone the repository:

    git clone <repository-url>
    cd <repository-directory>
    
  2. Create a .env file: Create a .env file in the root of the project and add your Anthropic API key:

    ANTHROPIC_API_KEY=your-api-key
    
  3. Build the Docker image:

    docker build -t dietlogapp .
    
  4. Run the Docker container:

    docker run -d --name dietlogapp -p 8000:8000 --env-file .env dietlogapp
    
  5. Access the application:

Conclusion

The DietLogApp is a great example of how to build a modern, AI-powered application with FastAPI. Its modular architecture, use of streaming responses, and integration with a powerful AI model make it a robust and scalable solution for food image analysis.

Future improvements could include:

I hope this blog post has given you a good overview of the DietLogApp. Feel free to explore the code, run it locally, and build upon it!