From 141327b9790df8d0f5927ac95f223a440c6bf409 Mon Sep 17 00:00:00 2001 From: Stamkulov Sattar Date: Tue, 24 Feb 2026 21:26:06 +0100 Subject: [PATCH 1/3] Created using Colab --- .../intro_batch_prediction.ipynb | 1035 +++++++++++++++++ 1 file changed, 1035 insertions(+) create mode 100644 gemini/batch-prediction/intro_batch_prediction.ipynb diff --git a/gemini/batch-prediction/intro_batch_prediction.ipynb b/gemini/batch-prediction/intro_batch_prediction.ipynb new file mode 100644 index 0000000..01e2381 --- /dev/null +++ b/gemini/batch-prediction/intro_batch_prediction.ipynb @@ -0,0 +1,1035 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2024 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Intro to Batch Predictions with the Gemini API\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"Vertex
Open in Workbench\n", + "
\n", + "
\n", + " \n", + " \"BigQuery
Open in BigQuery Studio\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
\n", + "\n", + "
\n", + "\n", + "Share to:\n", + "\n", + "\n", + " \"LinkedIn\n", + "\n", + "\n", + "\n", + " \"Bluesky\n", + "\n", + "\n", + "\n", + " \"X\n", + "\n", + "\n", + "\n", + " \"Reddit\n", + "\n", + "\n", + "\n", + " \"Facebook\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "84f0f73a0f76" + }, + "source": [ + "| | |\n", + "|-|-|\n", + "|Author(s) | [Eric Dong](https://github.com/gericdong), [Holt Skinner](https://github.com/holtskinner) |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "Different from getting online (synchronous) responses, where you are limited to one input request at a time, the batch predictions with the Gemini API in Vertex AI allow you to send a large number of multimodal requests to a Gemini model in a single batch request. Then, the model responses asynchronously populate to your storage output location in [Cloud Storage](https://cloud.google.com/storage/docs/introduction) or [BigQuery](https://cloud.google.com/bigquery/docs/storage_overview).\n", + "\n", + "Batch predictions are generally more efficient and cost-effective than online predictions when processing a large number of inputs that are not latency sensitive.\n", + "\n", + "To learn more, see the [Get batch predictions for Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini) page.\n", + "\n", + "### Objectives\n", + "\n", + "In this tutorial, you learn how to make batch predictions with the Gemini API in Vertex AI. This tutorial shows how to use **Cloud Storage** and **BigQuery** as input sources and output locations.\n", + "\n", + "You will complete the following tasks:\n", + "\n", + "- Preparing batch inputs and an output location\n", + "- Submitting a batch prediction job\n", + "- Retrieving batch prediction results\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "61RBz8LLbxCR" + }, + "source": [ + "## Get started" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "No17Cw5hgx12" + }, + "source": [ + "### Install Google Gen AI SDK\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "tFy3H3aPgx12", + "outputId": "f44aae84-cd07-426a-f7b0-f7d6b719f3ea", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.2/53.2 kB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m728.8/728.8 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.9/10.9 MB\u001b[0m \u001b[31m32.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 3.0.1 which is incompatible.\n", + "bqplot 0.12.45 requires pandas<3.0.0,>=1.0.0, but you have pandas 3.0.1 which is incompatible.\n", + "db-dtypes 1.5.0 requires pandas<3.0.0,>=1.5.3, but you have pandas 3.0.1 which is incompatible.\n", + "gradio 5.50.0 requires pandas<3.0,>=1.0, but you have pandas 3.0.1 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0m" + ] + } + ], + "source": [ + "%pip install --upgrade --quiet google-genai pandas google-cloud-storage google-cloud-bigquery" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dmWOrTJ3gx13" + }, + "source": [ + "### Authenticate your notebook environment (Colab only)\n", + "\n", + "If you're running this notebook on Google Colab, run the cell below to authenticate your environment." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "NyKGtVQjgx13" + }, + "outputs": [], + "source": [ + "import sys\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + " from google.colab import auth\n", + "\n", + " auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06489bd14f16" + }, + "source": [ + "### Import libraries\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DF4l8DTdWgPY" + }, + "source": [ + "### Set Google Cloud project information and create client\n", + "\n", + "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", + "\n", + "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "Nqwi-5ufWp_B" + }, + "outputs": [], + "source": [ + "import os\n", + "import time\n", + "from datetime import datetime\n", + "\n", + "import fsspec\n", + "import pandas as pd\n", + "from google import genai\n", + "from google.cloud import bigquery\n", + "from google.genai.types import CreateBatchJobConfig\n", + "\n", + "# fmt: off\n", + "PROJECT_ID = \"b-ferecommend-dev\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", + "# fmt: on\n", + "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", + " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", + "\n", + "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"global\")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "cfca4d7bd6db" + }, + "outputs": [], + "source": [ + "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e43229f3ad4f" + }, + "source": [ + "### Load model\n", + "\n", + "You can find a list of the Gemini models that support batch predictions in the [Multimodal models that support batch predictions](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini#multimodal_models_that_support_batch_predictions) page.\n", + "\n", + "This tutorial uses Gemini 2.5 Flash (`gemini-2.5-flash`) model." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "cf93d5f0ce00" + }, + "outputs": [], + "source": [ + "MODEL_ID = \"gemini-2.5-flash\" # @param {type:\"string\", isTemplate: true}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "265f180b58e0" + }, + "source": [ + "## Cloud Storage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1_xZADsak23H" + }, + "source": [ + "### Prepare batch inputs\n", + "\n", + "The input for batch requests specifies the items to send to your model for prediction. You can learn more about the batch input formats in the [Batch text generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini#prepare_your_inputs) page.\n", + "\n", + "This tutorial uses Cloud Storage as an example. The requirements for Cloud Storage input are:\n", + "\n", + "- File format: [JSON Lines (JSONL)](https://jsonlines.org/)\n", + "- Multiple files are supported with regex such as gs://bucketname/path/to/*.jsonl\n", + "- Located in `us-central1`\n", + "- Appropriate read permissions for the service account\n", + "\n", + "Each request that you send to a model can include parameters that control how the model generates a response. Learn more about Gemini parameters in the [Experiment with parameter values](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values) page.\n", + "\n", + "This is one of the example requests in the input JSONL file `batch_requests_for_multimodal_input_2.jsonl`:\n", + "\n", + "```json\n", + "{\"request\":{\"contents\": [{\"role\": \"user\", \"parts\": [{\"text\": \"List objects in this image.\"}, {\"file_data\": {\"file_uri\": \"gs://cloud-samples-data/generative-ai/image/office-desk.jpeg\", \"mime_type\": \"image/jpeg\"}}]}],\"generationConfig\":{\"temperature\": 0.4}}}\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "uWb8QzxwbH6W" + }, + "outputs": [], + "source": [ + "# fmt: off\n", + "INPUT_DATA = \"gs://gen-ai-async-gemini-batch-jobs-bucket/input/gemini_batch_requests_for_multimodal_input_2.jsonl\" # @param {type:\"string\"}\n", + "# fmt: on" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T3jQ59mCsXLc" + }, + "source": [ + "### Prepare batch output location\n", + "\n", + "When a batch prediction task completes, the output is stored in the location that you specified in your request.\n", + "\n", + "- The location is in the form of a Cloud Storage prefix.\n", + " - For example: `gs://path/to/output/data`.\n", + "\n", + "- You can specify the URI of your Cloud Storage bucket in `BUCKET_URI`, or\n", + "- If it is not specified, this notebook will create a Cloud Storage bucket in the form of `gs://PROJECT_ID-TIMESTAMP`." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "OtUodwGXZ7US" + }, + "outputs": [], + "source": [ + "BUCKET_URI = \"gs://gen-ai-async-gemini-batch-jobs-bucket/output\" # @param {type:\"string\"}\n", + "GCS_LOCATION = \"us-central1\" # @param {type:\"string\"}\n", + "\n", + "if BUCKET_URI == \"[your-cloud-storage-bucket]\":\n", + " TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n", + " BUCKET_URI = f\"gs://{PROJECT_ID}-{TIMESTAMP}\"\n", + "\n", + " ! gcloud storage buckets create {BUCKET_URI} --project={PROJECT_ID} --location={GCS_LOCATION}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T90CwWDHvonn" + }, + "source": [ + "### Send a batch prediction request\n", + "\n", + "To make a batch prediction request, you specify a source model ID, an input source and an output location where Vertex AI stores the batch prediction results.\n", + "\n", + "To learn more, see the [Batch prediction API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/batch-prediction-api) page.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "d8e54c57072e", + "outputId": "6347b9da-3695-4b07-d6a6-95687cbe6e8e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "gcs_batch_job = client.batches.create(\n", + " model=MODEL_ID,\n", + " src=INPUT_DATA,\n", + " config=CreateBatchJobConfig(dest=BUCKET_URI),\n", + ")\n", + "gcs_batch_job.name" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-Fo_Kd9FYRj" + }, + "source": [ + "Print out the job status and other properties. You can also check the status in the Cloud Console at https://console.cloud.google.com/vertex-ai/batch-predictions" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "DWq7m79PbjG8", + "outputId": "1439f4e6-86f9-4698-80c5-f0361747c808", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "BatchJob(\n", + " create_time=datetime.datetime(2026, 2, 24, 20, 8, 27, 188146, tzinfo=TzInfo(0)),\n", + " dest=BatchJobDestination(\n", + " format='jsonl',\n", + " gcs_uri='gs://gen-ai-async-gemini-batch-jobs-bucket/output'\n", + " ),\n", + " display_name='genai_batch_job_20260224200826_850cb',\n", + " model='publishers/google/models/gemini-2.5-flash',\n", + " name='projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392',\n", + " src=BatchJobSource(\n", + " format='jsonl',\n", + " gcs_uri=[\n", + " 'gs://gen-ai-async-gemini-batch-jobs-bucket/input/gemini_batch_requests_for_multimodal_input_2.jsonl',\n", + " ]\n", + " ),\n", + " state=,\n", + " update_time=datetime.datetime(2026, 2, 24, 20, 8, 29, 397307, tzinfo=TzInfo(0))\n", + ")" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ], + "source": [ + "gcs_batch_job = client.batches.get(name=gcs_batch_job.name)\n", + "gcs_batch_job" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WHUUEREoiewD" + }, + "source": [ + "Optionally, you can list all the batch prediction jobs in the project." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "QVgOnasfigx1", + "outputId": "b8c769f5-1d90-4c08-b2c3-cf69c8b4f18b", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392 2026-02-24 20:08:27.188146+00:00 JobState.JOB_STATE_PENDING\n", + "projects/714644158567/locations/global/batchPredictionJobs/5502370701274775552 2026-02-22 20:55:19.297011+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/7129647910383255552 2026-02-22 15:58:08.745802+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/3900566977558609920 2026-02-22 15:50:43.800609+00:00 JobState.JOB_STATE_FAILED\n", + "projects/714644158567/locations/global/batchPredictionJobs/5365239611058552832 2026-02-20 00:07:26.242456+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/6652143204579672064 2026-02-19 22:22:27.732340+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/7203376762095403008 2026-02-18 14:28:15.327604+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/316106294450716672 2026-02-17 23:10:56.755787+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/5516453246203330560 2026-02-11 00:26:18.174842+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/7905612848523378688 2026-02-11 00:08:04.265348+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/4842039202004598784 2026-02-10 22:01:21.728474+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/3526214054626459648 2026-02-10 16:21:23.083033+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/187920830838079488 2026-02-10 14:57:27.709171+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/3580397987643260928 2026-02-09 11:53:29.131326+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/7163817433239650304 2026-02-05 23:06:55.839232+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/8429328928530759680 2026-02-05 22:55:13.221729+00:00 JobState.JOB_STATE_FAILED\n", + "projects/714644158567/locations/global/batchPredictionJobs/8661827259293761536 2026-02-05 17:54:20.718076+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/14915974742409216 2026-02-05 17:41:12.852362+00:00 JobState.JOB_STATE_FAILED\n", + "projects/714644158567/locations/global/batchPredictionJobs/2941622413835632640 2026-02-04 14:57:31.883126+00:00 JobState.JOB_STATE_FAILED\n", + "projects/714644158567/locations/global/batchPredictionJobs/8781418940023701504 2026-02-04 05:51:21.058064+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/7241539711183880192 2026-02-03 15:27:13.674738+00:00 JobState.JOB_STATE_SUCCEEDED\n", + "projects/714644158567/locations/global/batchPredictionJobs/6584700260843520000 2026-01-27 17:25:03.313675+00:00 JobState.JOB_STATE_FAILED\n", + "projects/714644158567/locations/global/batchPredictionJobs/315689579543789568 2026-01-27 17:02:24.740220+00:00 JobState.JOB_STATE_FAILED\n", + "projects/714644158567/locations/global/batchPredictionJobs/1738827061792866304 2026-01-27 16:49:54.610306+00:00 JobState.JOB_STATE_FAILED\n" + ] + } + ], + "source": [ + "for job in client.batches.list():\n", + " print(job.name, job.create_time, job.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7aJaPNBrGPqK" + }, + "source": [ + "### Wait for the batch prediction job to complete\n", + "\n", + "Depending on the number of input items that you submitted, a batch generation task can take some time to complete. You can use the following code to check the job status and wait for the job to complete." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "dtJDIXdHc0W-", + "outputId": "f28997d4-bfd5-4cd0-f72c-c312db350bd3", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Job succeeded!\n" + ] + } + ], + "source": [ + "# Refresh the job until complete\n", + "while gcs_batch_job.state in (\n", + " \"JOB_STATE_RUNNING\",\n", + " \"JOB_STATE_PENDING\",\n", + " \"JOB_STATE_QUEUED\",\n", + "):\n", + " time.sleep(5)\n", + " gcs_batch_job = client.batches.get(name=gcs_batch_job.name)\n", + "\n", + "# Check if the job succeeds\n", + "if gcs_batch_job.state == \"JOB_STATE_SUCCEEDED\":\n", + " print(\"Job succeeded!\")\n", + "else:\n", + " print(f\"Job failed: {gcs_batch_job.error}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XWUgAxL-HjN9" + }, + "source": [ + "### Retrieve batch prediction results\n", + "\n", + "When a batch prediction task is complete, the output of the prediction is stored in the bucket in JSONL that you specified in your request.\n", + "\n", + "The file name should look like this: `{gcs_batch_job.dest.gcs_uri}/prediction-model-TIMESTAMP/predictions.jsonl`\n", + "\n", + "Example output:\n", + "\n", + "```json\n", + "{\"status\": \"\", \"processed_time\": \"2024-11-13T14:04:28.376+00:00\", \"request\": {\"contents\": [{\"parts\": [{\"file_data\": null, \"text\": \"List objects in this image.\"}, {\"file_data\": {\"file_uri\": \"gs://cloud-samples-data/generative-ai/image/gardening-tools.jpeg\", \"mime_type\": \"image/jpeg\"}, \"text\": null}], \"role\": \"user\"}], \"generationConfig\": {\"temperature\": 0.4}}, \"response\": {\"candidates\": [{\"avgLogprobs\": -0.10394711927934126, \"content\": {\"parts\": [{\"text\": \"Here's a list of the objects in the image:\\n\\n* **Watering can:** A green plastic watering can with a white rose head.\\n* **Plant:** A small plant (possibly oregano) in a terracotta pot.\\n* **Terracotta pots:** Two terracotta pots, one containing the plant and another empty, stacked on top of each other.\\n* **Gardening gloves:** A pair of striped gardening gloves.\\n* **Gardening tools:** A small trowel and a hand cultivator (hoe). Both are green with black handles.\"}], \"role\": \"model\"}, \"finishReason\": \"STOP\"}], \"modelVersion\": \"gemini-3-flash-preview@default\", \"usageMetadata\": {\"candidatesTokenCount\": 110, \"promptTokenCount\": 264, \"totalTokenCount\": 374}}}\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qZlRIYsC01F1" + }, + "source": [ + "The example code below shows how to load the `.jsonl` file in the Cloud Storage output location into a Pandas DataFrame and print out the object.\n", + "\n", + "You can retrieve the specific responses in the `response` field." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "-jLl3es3dTqB", + "outputId": "bad541fa-29e7-4aca-f5f3-a54c1f9ceba8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 182 + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + " key \\\n", + "0 user-defined-request-id-2 \n", + "1 user-defined-request-id-1 \n", + "\n", + " request status \\\n", + "0 {'contents': [{'parts': [{'file_data': None, '... \n", + "1 {'contents': [{'parts': [{'file_data': None, '... \n", + "\n", + " response \\\n", + "0 {'candidates': [{'avgLogprobs': -0.84266102567... \n", + "1 {'candidates': [{'avgLogprobs': -0.43489231782... \n", + "\n", + " processed_time avgLogprobs finishReason score \\\n", + "0 2026-02-24 20:11:22.961222+00:00 -0.842661 STOP -158.420273 \n", + "1 2026-02-24 20:09:47.208845+00:00 -0.434892 STOP -44.359016 \n", + "\n", + " content.parts content.role \n", + "0 [{'text': 'Here are the objects visible in the... model \n", + "1 [{'text': 'Here are the objects visible in the... model " + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
keyrequeststatusresponseprocessed_timeavgLogprobsfinishReasonscorecontent.partscontent.role
0user-defined-request-id-2{'contents': [{'parts': [{'file_data': None, '...{'candidates': [{'avgLogprobs': -0.84266102567...2026-02-24 20:11:22.961222+00:00-0.842661STOP-158.420273[{'text': 'Here are the objects visible in the...model
1user-defined-request-id-1{'contents': [{'parts': [{'file_data': None, '...{'candidates': [{'avgLogprobs': -0.43489231782...2026-02-24 20:09:47.208845+00:00-0.434892STOP-44.359016[{'text': 'Here are the objects visible in the...model
\n", + "
" + ] + }, + "metadata": {} + } + ], + "source": [ + "fs = fsspec.filesystem(\"gcs\")\n", + "\n", + "file_paths = fs.glob(f\"{gcs_batch_job.dest.gcs_uri}/*/predictions.jsonl\")\n", + "\n", + "if gcs_batch_job.state == \"JOB_STATE_SUCCEEDED\":\n", + " # Load the JSONL file into a DataFrame\n", + " df = pd.read_json(f\"gs://{file_paths[0]}\", lines=True)\n", + "\n", + " df = df.join(pd.json_normalize(df[\"response\"], \"candidates\"))\n", + " display(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfb2a462a7c6" + }, + "source": [ + "## BigQuery" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5ea69e283023" + }, + "source": [ + "### Batch Input Preparation \n", + "\n", + "To send batch requests for prediction, you need to structure your input properly. For more details, visit the [Batch text generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini#prepare_your_inputs) page. \n", + "\n", + "This guide uses **BigQuery** as an example. To use a BigQuery table as input: \n", + "- Ensure the dataset is created in a supported region (e.g., `us-central1`). Multi-region locations (e.g., `us`) are not allowed. \n", + "- The input table must include a `request` column of type `JSON` or `STRING` containing valid JSON, structured as a [`GenerateContentRequest`](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference). \n", + "- Additional columns can use any BigQuery data types except `array`, `struct`, `range`, `datetime`, and `geography`. These are ignored for generation but appear in the output table. The system reserves `response` and `status` for output. \n", + "- Only public YouTube or Cloud Storage URIs are supported in the `fileData` or `file_data` field. \n", + "- Requests can include parameters to customize the model's output. Learn more in the [Gemini parameters guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "835bfd485d49" + }, + "source": [ + "This is an example BigQuery table with sample requests:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b932f02d1f9c" + }, + "outputs": [], + "source": [ + "# fmt: off\n", + "INPUT_DATA = \"bq://storage-samples.generative_ai.batch_requests_for_multimodal_input_2\" # @param {type:\"string\"}\n", + "# fmt: on" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7d62eb771ad6" + }, + "source": [ + "You can query the BigQuery table to review the input data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fbfcefb7d295" + }, + "outputs": [], + "source": [ + "bq_client = bigquery.Client(project=PROJECT_ID)\n", + "\n", + "bq_table_id = INPUT_DATA.replace(\"bq://\", \"\")\n", + "sql = f\"\"\"\n", + " SELECT *\n", + " FROM {bq_table_id}\n", + " \"\"\"\n", + "\n", + "query_result = bq_client.query(sql)\n", + "\n", + "df = query_result.result().to_dataframe()\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "529dae543bfc" + }, + "source": [ + "### Prepare batch output location\n", + "\n", + "When a batch prediction task completes, the output is stored in the location that you specified in your request.\n", + "\n", + "- The location is in the form of a BigQuery URI prefix, for example: `bq://projectId.bqDatasetId`.\n", + "- If not specified, `bq://PROJECT_ID.gen_ai_batch_prediction.predictions_TIMESTAMP` will be used.\n", + "\n", + "This tutorial uses a **BigQuery** table as an example.\n", + "\n", + "- You can specify the URI of your BigQuery table in `BQ_OUTPUT_URI`, or\n", + "- If it is not specified, this notebook will create a new dataset `bq://PROJECT_ID.gen_ai_batch_prediction` for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "75914840e555" + }, + "outputs": [], + "source": [ + "BQ_OUTPUT_URI = \"[your-bigquery-table]\" # @param {type:\"string\"}\n", + "\n", + "if BQ_OUTPUT_URI == \"[your-bigquery-table]\":\n", + " bq_dataset_id = \"gen_ai_batch_prediction\"\n", + "\n", + " # The output table will be created automatically if it doesn't exist\n", + " timestamp = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n", + " bq_table_id = f\"prediction_result_{timestamp}\"\n", + " BQ_OUTPUT_URI = f\"bq://{PROJECT_ID}.{bq_dataset_id}.{bq_table_id}\"\n", + "\n", + " bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{bq_dataset_id}\")\n", + " bq_dataset.location = \"us-central1\"\n", + "\n", + " bq_dataset = bq_client.create_dataset(bq_dataset, exists_ok=True, timeout=30)\n", + " print(\n", + " f\"Created BigQuery dataset {bq_client.project}.{bq_dataset.dataset_id} for batch prediction output.\"\n", + " )\n", + "\n", + "print(f\"BigQuery output URI: {BQ_OUTPUT_URI}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "14f84475dc26" + }, + "source": [ + "### Send a batch prediction request\n", + "\n", + "To make a batch prediction request, you specify a source model ID, an input source and an output location where Vertex AI stores the batch prediction results.\n", + "\n", + "To learn more, see the [Batch prediction API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/batch-prediction-api) page.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e809955e753b" + }, + "outputs": [], + "source": [ + "bq_batch_job = client.batches.create(\n", + " model=MODEL_ID,\n", + " src=INPUT_DATA,\n", + " config=CreateBatchJobConfig(dest=BQ_OUTPUT_URI),\n", + ")\n", + "bq_batch_job.name" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5fcb3dc2a5fc" + }, + "source": [ + "Print out the job status and other properties. You can also check the status in the Cloud Console at https://console.cloud.google.com/vertex-ai/batch-predictions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b19319d92bf0" + }, + "outputs": [], + "source": [ + "bq_batch_job = client.batches.get(name=bq_batch_job.name)\n", + "bq_batch_job" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e72800144910" + }, + "source": [ + "Optionally, you can list all the batch prediction jobs in the project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6a9fb087f9ba" + }, + "outputs": [], + "source": [ + "for job in client.batches.list():\n", + " print(job.name, job.create_time, job.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ebcb00b3add9" + }, + "source": [ + "### Wait for the batch prediction job to complete\n", + "\n", + "Depending on the number of input items that you submitted, a batch generation task can take some time to complete. You can use the following code to check the job status and wait for the job to complete." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "189945468c6b" + }, + "outputs": [], + "source": [ + "# Refresh the job until complete\n", + "while bq_batch_job.state in (\n", + " \"JOB_STATE_RUNNING\",\n", + " \"JOB_STATE_PENDING\",\n", + " \"JOB_STATE_QUEUED\",\n", + "):\n", + " time.sleep(5)\n", + " bq_batch_job = client.batches.get(name=bq_batch_job.name)\n", + "\n", + "# Check if the job succeeds\n", + "if bq_batch_job.state == \"JOB_STATE_SUCCEEDED\":\n", + " print(\"Job succeeded!\")\n", + "else:\n", + " print(f\"Job failed: {bq_batch_job.error}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ec25f40f0dd9" + }, + "source": [ + "### Retrieve batch prediction results\n", + "\n", + "When a batch prediction task is complete, the output of the prediction is stored in the location that you specified in your request. It is also available in `batch_job.dest.bigquery_uri` or `batch_job.dest.gcs_uri`.\n", + "\n", + "- When you are using BigQuery, the output of batch prediction is stored in an output dataset. If you had provided a dataset, the name of the dataset (`BQ_OUTPUT_URI`) is the name you had provided earlier.\n", + "- If you did not provide an output dataset, a default dataset `bq://PROJECT_ID.gen_ai_batch_prediction` will be created for you.\n", + "- The name of the table is formed by appending `predictions_` with the timestamp of when the batch prediction job started." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c459121f0169" + }, + "source": [ + "You can use the example code below to retrieve predictions and store them into a Pandas DataFrame.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ca8b73b0ad1b" + }, + "outputs": [], + "source": [ + "bq_table_id = bq_batch_job.dest.bigquery_uri.replace(\"bq://\", \"\")\n", + "\n", + "sql = f\"\"\"\n", + " SELECT *\n", + " FROM {bq_table_id}\n", + " \"\"\"\n", + "\n", + "query_result = bq_client.query(sql)\n", + "\n", + "df = query_result.result().to_dataframe()\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2a4e033321ad" + }, + "source": [ + "## Cleaning up\n", + "\n", + "Clean up resources created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZNCyIKIrdPJY" + }, + "outputs": [], + "source": [ + "# Delete the batch prediction jobs\n", + "if gcs_batch_job:\n", + " client.batches.delete(name=gcs_batch_job.name)\n", + "if bq_batch_job:\n", + " client.batches.delete(name=bq_batch_job.name)" + ] + } + ], + "metadata": { + "colab": { + "name": "intro_batch_prediction.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file From e1a6cf32b96d92d555fe5782d0c581d2e2cf4c60 Mon Sep 17 00:00:00 2001 From: Stamkulov Sattar Date: Tue, 24 Feb 2026 23:26:58 +0100 Subject: [PATCH 2/3] Created using Colab --- .../intro_batch_prediction.ipynb | 328 ++++++++++++++++-- 1 file changed, 295 insertions(+), 33 deletions(-) diff --git a/gemini/batch-prediction/intro_batch_prediction.ipynb b/gemini/batch-prediction/intro_batch_prediction.ipynb index 01e2381..e4ab259 100644 --- a/gemini/batch-prediction/intro_batch_prediction.ipynb +++ b/gemini/batch-prediction/intro_batch_prediction.ipynb @@ -141,34 +141,94 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 16, "metadata": { - "id": "tFy3H3aPgx12", - "outputId": "f44aae84-cd07-426a-f7b0-f7d6b719f3ea", + "id": "tFy3H3aPgx12" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet google-genai pandas google-cloud-storage google-cloud-bigquery" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Install current version used in `gen-ai` services" + ], + "metadata": { + "id": "ajN6ylLlwo_L" + } + }, + { + "cell_type": "code", + "source": [ + "%pip install google-genai==1.56.0" + ], + "metadata": { + "id": "akbvyf80wmk3", + "outputId": "35540121-e817-4c04-8422-2b90d07a0401", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 760 } }, + "execution_count": 47, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.2/53.2 kB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m728.8/728.8 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.9/10.9 MB\u001b[0m \u001b[31m32.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", - "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 3.0.1 which is incompatible.\n", - "bqplot 0.12.45 requires pandas<3.0.0,>=1.0.0, but you have pandas 3.0.1 which is incompatible.\n", - "db-dtypes 1.5.0 requires pandas<3.0.0,>=1.5.3, but you have pandas 3.0.1 which is incompatible.\n", - "gradio 5.50.0 requires pandas<3.0,>=1.0, but you have pandas 3.0.1 which is incompatible.\u001b[0m\u001b[31m\n", - "\u001b[0m" + "Collecting google-genai==1.56.0\n", + " Downloading google_genai-1.56.0-py3-none-any.whl.metadata (53 kB)\n", + "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/53.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.3/53.3 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: anyio<5.0.0,>=4.8.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (4.12.1)\n", + "Requirement already satisfied: google-auth<3.0.0,>=2.45.0 in /usr/local/lib/python3.12/dist-packages (from google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (2.47.0)\n", + "Requirement already satisfied: httpx<1.0.0,>=0.28.1 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (0.28.1)\n", + "Requirement already satisfied: pydantic<3.0.0,>=2.9.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (2.12.3)\n", + "Requirement already satisfied: requests<3.0.0,>=2.28.1 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (2.32.4)\n", + "Requirement already satisfied: tenacity<9.2.0,>=8.2.3 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (9.1.4)\n", + "Requirement already satisfied: websockets<15.1.0,>=13.0.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (15.0.1)\n", + "Requirement already satisfied: typing-extensions<5.0.0,>=4.11.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (4.15.0)\n", + "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (1.9.0)\n", + "Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (1.3.1)\n", + "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.12/dist-packages (from anyio<5.0.0,>=4.8.0->google-genai==1.56.0) (3.11)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from google-auth<3.0.0,>=2.45.0->google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (0.4.2)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.12/dist-packages (from google-auth<3.0.0,>=2.45.0->google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (4.9.1)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0,>=0.28.1->google-genai==1.56.0) (2026.1.4)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0,>=0.28.1->google-genai==1.56.0) (1.0.9)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1.0.0,>=0.28.1->google-genai==1.56.0) (0.16.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.9.0->google-genai==1.56.0) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.9.0->google-genai==1.56.0) (2.41.4)\n", + "Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.9.0->google-genai==1.56.0) (0.4.2)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.28.1->google-genai==1.56.0) (3.4.4)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.28.1->google-genai==1.56.0) (2.5.0)\n", + "Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.0,>=2.45.0->google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (0.6.2)\n", + "Downloading google_genai-1.56.0-py3-none-any.whl (426 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m426.6/426.6 kB\u001b[0m \u001b[31m29.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: google-genai\n", + " Attempting uninstall: google-genai\n", + " Found existing installation: google-genai 1.64.0\n", + " Uninstalling google-genai-1.64.0:\n", + " Successfully uninstalled google-genai-1.64.0\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "google-cloud-aiplatform 1.137.0 requires google-genai<2.0.0,>=1.59.0; python_version >= \"3.10\", but you have google-genai 1.56.0 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mSuccessfully installed google-genai-1.56.0\n" ] + }, + { + "output_type": "display_data", + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "google" + ] + }, + "id": "3f79808eeb264dd896c9914df936e985" + } + }, + "metadata": {} } - ], - "source": [ - "%pip install --upgrade --quiet google-genai pandas google-cloud-storage google-cloud-bigquery" ] }, { @@ -184,7 +244,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 17, "metadata": { "id": "NyKGtVQjgx13" }, @@ -222,7 +282,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 35, "metadata": { "id": "Nqwi-5ufWp_B" }, @@ -230,13 +290,14 @@ "source": [ "import os\n", "import time\n", + "import json\n", "from datetime import datetime\n", "\n", "import fsspec\n", "import pandas as pd\n", "from google import genai\n", - "from google.cloud import bigquery\n", - "from google.genai.types import CreateBatchJobConfig\n", + "from google.cloud import bigquery, storage\n", + "from google.genai.types import CreateBatchJobConfig, GenerateContentResponse\n", "\n", "# fmt: off\n", "PROJECT_ID = \"b-ferecommend-dev\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", @@ -258,6 +319,17 @@ "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)" ] }, + { + "cell_type": "code", + "source": [ + "storage_client = storage.Client()" + ], + "metadata": { + "id": "mgUY8xrEt3Yc" + }, + "execution_count": 28, + "outputs": [] + }, { "cell_type": "markdown", "metadata": { @@ -273,7 +345,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 19, "metadata": { "id": "cf93d5f0ce00" }, @@ -383,11 +455,11 @@ "execution_count": 9, "metadata": { "id": "d8e54c57072e", - "outputId": "6347b9da-3695-4b07-d6a6-95687cbe6e8e", "colab": { "base_uri": "https://localhost:8080/", "height": 35 - } + }, + "outputId": "6347b9da-3695-4b07-d6a6-95687cbe6e8e" }, "outputs": [ { @@ -427,10 +499,10 @@ "execution_count": 10, "metadata": { "id": "DWq7m79PbjG8", - "outputId": "1439f4e6-86f9-4698-80c5-f0361747c808", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "outputId": "1439f4e6-86f9-4698-80c5-f0361747c808" }, "outputs": [ { @@ -480,10 +552,10 @@ "execution_count": 11, "metadata": { "id": "QVgOnasfigx1", - "outputId": "b8c769f5-1d90-4c08-b2c3-cf69c8b4f18b", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "outputId": "b8c769f5-1d90-4c08-b2c3-cf69c8b4f18b" }, "outputs": [ { @@ -538,10 +610,10 @@ "execution_count": 12, "metadata": { "id": "dtJDIXdHc0W-", - "outputId": "f28997d4-bfd5-4cd0-f72c-c312db350bd3", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "outputId": "f28997d4-bfd5-4cd0-f72c-c312db350bd3" }, "outputs": [ { @@ -601,14 +673,14 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 20, "metadata": { "id": "-jLl3es3dTqB", - "outputId": "bad541fa-29e7-4aca-f5f3-a54c1f9ceba8", "colab": { "base_uri": "https://localhost:8080/", "height": 182 - } + }, + "outputId": "722038e7-3b83-4f94-bbe9-914dcce27872" }, "outputs": [ { @@ -714,6 +786,196 @@ " display(df)" ] }, + { + "cell_type": "markdown", + "source": [ + "### Testing GenerateContentResponse.model_validate() vs GenerateContentResponse._from_response()" + ], + "metadata": { + "id": "wxHv-TT7s3lM" + } + }, + { + "cell_type": "code", + "source": [ + "gcs_uri = f\"gs://{file_paths[0]}\"" + ], + "metadata": { + "id": "x2kOlX1ztGSE" + }, + "execution_count": 30, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bucket_name, blob_name = gcs_uri[5:].split(\"/\", 1)\n", + "\n", + "bucket = storage_client.bucket(bucket_name)\n", + "blob = bucket.blob(blob_name)\n", + "\n", + "response_content = blob.download_as_text()\n", + "response_content" + ], + "metadata": { + "id": "8F4Jft1qtwlR", + "outputId": "bf878dfe-099a-41b3-96b5-c8cc0dc96827", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 143 + } + }, + "execution_count": 40, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'{\"key\":\"user-defined-request-id-2\",\"request\":{\"contents\":[{\"parts\":[{\"file_data\":null,\"text\":\"List objects in this image.\"},{\"file_data\":{\"file_uri\":\"gs://cloud-samples-data/generative-ai/image/gardening-tools.jpeg\",\"mime_type\":\"image/jpeg\"},\"text\":null}],\"role\":\"user\"}],\"generationConfig\":{\"temperature\":0.4}},\"status\":\"\",\"response\":{\"candidates\":[{\"avgLogprobs\":-0.8426610256763215,\"content\":{\"parts\":[{\"text\":\"Here are the objects visible in the image:\\\\n\\\\n1. **Watering can:** A dark green plastic watering can with a white sprinkler head (rose) on its spout. It has an embossed floral design on its side.\\\\n2. **Plant pot with a plant:** A terracotta-colored pot containing a small green plant.\\\\n3. **Empty plant pots:** Two terracotta-colored pots, stacked one on top of the other.\\\\n4. **Gardening trowel:** A small green hand trowel with a black handle.\\\\n5. **Hand cultivator/rake:** A small green hand tool with three tines (a cultivator or small rake) and a black handle.\\\\n6. **Gardening gloves:** A pair of fabric gloves, striped green, yellow, and white, with a solid green cuff.\\\\n7. **Grass:** The green lawn serving as the background surface.\"}],\"role\":\"model\"},\"finishReason\":\"STOP\",\"score\":-158.42027282714844}],\"createTime\":\"2026-02-24T20:11:05.247484Z\",\"modelVersion\":\"gemini-2.5-flash\",\"responseId\":\"2QWeabyND8T5jNsPlLOF-AY\",\"usageMetadata\":{\"billablePromptUsage\":{\"imageCount\":1,\"textCount\":23},\"candidatesTokenCount\":188,\"candidatesTokensDetails\":[{\"modality\":\"TEXT\",\"tokenCount\":188}],\"promptTokenCount\":1812,\"promptTokensDetails\":[{\"modality\":\"TEXT\",\"tokenCount\":6},{\"modality\":\"IMAGE\",\"tokenCount\":1806}],\"thoughtsTokenCount\":835,\"totalTokenCount\":2835,\"trafficType\":\"ON_DEMAND\"}},\"processed_time\":\"2026-02-24T20:11:22.961222+00:00\"}\\n{\"key\":\"user-defined-request-id-1\",\"request\":{\"contents\":[{\"parts\":[{\"file_data\":null,\"text\":\"List objects in this image.\"},{\"file_data\":{\"file_uri\":\"gs://cloud-samples-data/generative-ai/image/office-desk.jpeg\",\"mime_type\":\"image/jpeg\"},\"text\":null}],\"role\":\"user\"}],\"generationConfig\":{\"temperature\":0.4}},\"status\":\"\",\"response\":{\"candidates\":[{\"avgLogprobs\":-0.43489231782801013,\"content\":{\"parts\":[{\"text\":\"Here are the objects visible in the image:\\\\n\\\\n* Globe\\\\n* Eiffel Tower miniature\\\\n* Tablet (with a blank screen)\\\\n* Shopping cart miniature (with a red gift box inside)\\\\n* Cup of coffee (with saucer)\\\\n* Airplane model\\\\n* Keyboard\\\\n* Computer mouse\\\\n* Passport\\\\n* Sunglasses\\\\n* Banknotes (appears to be a one-dollar bill and another bill)\\\\n* Notebook\\\\n* Pen\"}],\"role\":\"model\"},\"finishReason\":\"STOP\",\"score\":-44.35901641845703}],\"createTime\":\"2026-02-24T20:09:44.015246Z\",\"modelVersion\":\"gemini-2.5-flash\",\"responseId\":\"iAWeaY53u4SK1Q_ZnpfABg\",\"usageMetadata\":{\"billablePromptUsage\":{\"imageCount\":1,\"textCount\":23},\"candidatesTokenCount\":102,\"candidatesTokensDetails\":[{\"modality\":\"TEXT\",\"tokenCount\":102}],\"promptTokenCount\":1812,\"promptTokensDetails\":[{\"modality\":\"IMAGE\",\"tokenCount\":1806},{\"modality\":\"TEXT\",\"tokenCount\":6}],\"thoughtsTokenCount\":96,\"totalTokenCount\":2010,\"trafficType\":\"ON_DEMAND\"}},\"processed_time\":\"2026-02-24T20:09:47.208845+00:00\"}\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 40 + } + ] + }, + { + "cell_type": "code", + "source": [ + "response_lines = [json.loads(line) for line in response_content.splitlines()]\n", + "response_lines" + ], + "metadata": { + "id": "3zBeLHy6uWE0", + "outputId": "39b54031-d0c9-4c1c-f2db-d79efe5fed8d", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 41, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[{'key': 'user-defined-request-id-2',\n", + " 'request': {'contents': [{'parts': [{'file_data': None,\n", + " 'text': 'List objects in this image.'},\n", + " {'file_data': {'file_uri': 'gs://cloud-samples-data/generative-ai/image/gardening-tools.jpeg',\n", + " 'mime_type': 'image/jpeg'},\n", + " 'text': None}],\n", + " 'role': 'user'}],\n", + " 'generationConfig': {'temperature': 0.4}},\n", + " 'status': '',\n", + " 'response': {'candidates': [{'avgLogprobs': -0.8426610256763215,\n", + " 'content': {'parts': [{'text': 'Here are the objects visible in the image:\\n\\n1. **Watering can:** A dark green plastic watering can with a white sprinkler head (rose) on its spout. It has an embossed floral design on its side.\\n2. **Plant pot with a plant:** A terracotta-colored pot containing a small green plant.\\n3. **Empty plant pots:** Two terracotta-colored pots, stacked one on top of the other.\\n4. **Gardening trowel:** A small green hand trowel with a black handle.\\n5. **Hand cultivator/rake:** A small green hand tool with three tines (a cultivator or small rake) and a black handle.\\n6. **Gardening gloves:** A pair of fabric gloves, striped green, yellow, and white, with a solid green cuff.\\n7. **Grass:** The green lawn serving as the background surface.'}],\n", + " 'role': 'model'},\n", + " 'finishReason': 'STOP',\n", + " 'score': -158.42027282714844}],\n", + " 'createTime': '2026-02-24T20:11:05.247484Z',\n", + " 'modelVersion': 'gemini-2.5-flash',\n", + " 'responseId': '2QWeabyND8T5jNsPlLOF-AY',\n", + " 'usageMetadata': {'billablePromptUsage': {'imageCount': 1, 'textCount': 23},\n", + " 'candidatesTokenCount': 188,\n", + " 'candidatesTokensDetails': [{'modality': 'TEXT', 'tokenCount': 188}],\n", + " 'promptTokenCount': 1812,\n", + " 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 6},\n", + " {'modality': 'IMAGE', 'tokenCount': 1806}],\n", + " 'thoughtsTokenCount': 835,\n", + " 'totalTokenCount': 2835,\n", + " 'trafficType': 'ON_DEMAND'}},\n", + " 'processed_time': '2026-02-24T20:11:22.961222+00:00'},\n", + " {'key': 'user-defined-request-id-1',\n", + " 'request': {'contents': [{'parts': [{'file_data': None,\n", + " 'text': 'List objects in this image.'},\n", + " {'file_data': {'file_uri': 'gs://cloud-samples-data/generative-ai/image/office-desk.jpeg',\n", + " 'mime_type': 'image/jpeg'},\n", + " 'text': None}],\n", + " 'role': 'user'}],\n", + " 'generationConfig': {'temperature': 0.4}},\n", + " 'status': '',\n", + " 'response': {'candidates': [{'avgLogprobs': -0.43489231782801013,\n", + " 'content': {'parts': [{'text': 'Here are the objects visible in the image:\\n\\n* Globe\\n* Eiffel Tower miniature\\n* Tablet (with a blank screen)\\n* Shopping cart miniature (with a red gift box inside)\\n* Cup of coffee (with saucer)\\n* Airplane model\\n* Keyboard\\n* Computer mouse\\n* Passport\\n* Sunglasses\\n* Banknotes (appears to be a one-dollar bill and another bill)\\n* Notebook\\n* Pen'}],\n", + " 'role': 'model'},\n", + " 'finishReason': 'STOP',\n", + " 'score': -44.35901641845703}],\n", + " 'createTime': '2026-02-24T20:09:44.015246Z',\n", + " 'modelVersion': 'gemini-2.5-flash',\n", + " 'responseId': 'iAWeaY53u4SK1Q_ZnpfABg',\n", + " 'usageMetadata': {'billablePromptUsage': {'imageCount': 1, 'textCount': 23},\n", + " 'candidatesTokenCount': 102,\n", + " 'candidatesTokensDetails': [{'modality': 'TEXT', 'tokenCount': 102}],\n", + " 'promptTokenCount': 1812,\n", + " 'promptTokensDetails': [{'modality': 'IMAGE', 'tokenCount': 1806},\n", + " {'modality': 'TEXT', 'tokenCount': 6}],\n", + " 'thoughtsTokenCount': 96,\n", + " 'totalTokenCount': 2010,\n", + " 'trafficType': 'ON_DEMAND'}},\n", + " 'processed_time': '2026-02-24T20:09:47.208845+00:00'}]" + ] + }, + "metadata": {}, + "execution_count": 41 + } + ] + }, + { + "cell_type": "code", + "source": [ + "parsed_response_using_model_validate = GenerateContentResponse.model_validate(response_lines[1], extra=\"ignore\")\n", + "parsed_response_using_model_validate" + ], + "metadata": { + "id": "46fHnGWUu4xH", + "outputId": "07427de6-8da9-498c-d3eb-24a41aa661c7", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 46, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "GenerateContentResponse()" + ] + }, + "metadata": {}, + "execution_count": 46 + } + ] + }, + { + "cell_type": "code", + "source": [ + "parsed_response_using_from_response = GenerateContentResponse._from_response(response=response_lines[0], kwargs={})\n", + "parsed_response_using_from_response" + ], + "metadata": { + "id": "q7q7Y_00uJOJ", + "outputId": "305bbd5b-17c5-42c4-af42-3fa3405a4d18", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 42, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "GenerateContentResponse()" + ] + }, + "metadata": {}, + "execution_count": 42 + } + ] + }, { "cell_type": "markdown", "metadata": { From f7c7f5fa690c14515e8d4862fc725e13c87dfb27 Mon Sep 17 00:00:00 2001 From: Stamkulov Sattar Date: Wed, 25 Feb 2026 00:58:26 +0100 Subject: [PATCH 3/3] Created using Colab --- .../intro_batch_prediction.ipynb | 444 +++++++----------- 1 file changed, 165 insertions(+), 279 deletions(-) diff --git a/gemini/batch-prediction/intro_batch_prediction.ipynb b/gemini/batch-prediction/intro_batch_prediction.ipynb index e4ab259..0271647 100644 --- a/gemini/batch-prediction/intro_batch_prediction.ipynb +++ b/gemini/batch-prediction/intro_batch_prediction.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": { "id": "ur8xi4C7S06n" }, @@ -141,13 +141,32 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 1, "metadata": { - "id": "tFy3H3aPgx12" + "id": "tFy3H3aPgx12", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "871f22a7-cfa2-4fd2-cb7d-62efee89dc9e" }, - "outputs": [], + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m3.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.9/10.9 MB\u001b[0m \u001b[31m80.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 3.0.1 which is incompatible.\n", + "bqplot 0.12.45 requires pandas<3.0.0,>=1.0.0, but you have pandas 3.0.1 which is incompatible.\n", + "db-dtypes 1.5.0 requires pandas<3.0.0,>=1.5.3, but you have pandas 3.0.1 which is incompatible.\n", + "gradio 5.50.0 requires pandas<3.0,>=1.0, but you have pandas 3.0.1 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0m" + ] + } + ], "source": [ - "%pip install --upgrade --quiet google-genai pandas google-cloud-storage google-cloud-bigquery" + "%pip install --upgrade --quiet pandas google-cloud-storage google-cloud-bigquery" ] }, { @@ -162,26 +181,28 @@ { "cell_type": "code", "source": [ + "%pip uninstall -y google-cloud-aiplatform\n", "%pip install google-genai==1.56.0" ], "metadata": { - "id": "akbvyf80wmk3", - "outputId": "35540121-e817-4c04-8422-2b90d07a0401", "colab": { - "base_uri": "https://localhost:8080/", - "height": 760 - } + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "akbvyf80wmk3", + "outputId": "31b3bbab-cb1b-49fb-e273-06d71e2aa83a" }, - "execution_count": 47, + "execution_count": 3, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Collecting google-genai==1.56.0\n", - " Downloading google_genai-1.56.0-py3-none-any.whl.metadata (53 kB)\n", - "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/53.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.3/53.3 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: anyio<5.0.0,>=4.8.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (4.12.1)\n", + "Found existing installation: google-cloud-aiplatform 1.137.0\n", + "Uninstalling google-cloud-aiplatform-1.137.0:\n", + " Successfully uninstalled google-cloud-aiplatform-1.137.0\n", + "Requirement already satisfied: google-genai==1.56.0 in /usr/local/lib/python3.12/dist-packages (1.56.0)\n", + "Requirement already satisfied: anyio<5.0.0,>=4.8.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (4.12.1)\n", "Requirement already satisfied: google-auth<3.0.0,>=2.45.0 in /usr/local/lib/python3.12/dist-packages (from google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (2.47.0)\n", "Requirement already satisfied: httpx<1.0.0,>=0.28.1 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (0.28.1)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.9.0 in /usr/local/lib/python3.12/dist-packages (from google-genai==1.56.0) (2.12.3)\n", @@ -202,32 +223,8 @@ "Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.9.0->google-genai==1.56.0) (0.4.2)\n", "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.28.1->google-genai==1.56.0) (3.4.4)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.28.1->google-genai==1.56.0) (2.5.0)\n", - "Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.0,>=2.45.0->google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (0.6.2)\n", - "Downloading google_genai-1.56.0-py3-none-any.whl (426 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m426.6/426.6 kB\u001b[0m \u001b[31m29.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hInstalling collected packages: google-genai\n", - " Attempting uninstall: google-genai\n", - " Found existing installation: google-genai 1.64.0\n", - " Uninstalling google-genai-1.64.0:\n", - " Successfully uninstalled google-genai-1.64.0\n", - "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", - "google-cloud-aiplatform 1.137.0 requires google-genai<2.0.0,>=1.59.0; python_version >= \"3.10\", but you have google-genai 1.56.0 which is incompatible.\u001b[0m\u001b[31m\n", - "\u001b[0mSuccessfully installed google-genai-1.56.0\n" + "Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.0,>=2.45.0->google-auth[requests]<3.0.0,>=2.45.0->google-genai==1.56.0) (0.6.2)\n" ] - }, - { - "output_type": "display_data", - "data": { - "application/vnd.colab-display-data+json": { - "pip_warning": { - "packages": [ - "google" - ] - }, - "id": "3f79808eeb264dd896c9914df936e985" - } - }, - "metadata": {} } ] }, @@ -244,7 +241,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 4, "metadata": { "id": "NyKGtVQjgx13" }, @@ -282,7 +279,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 5, "metadata": { "id": "Nqwi-5ufWp_B" }, @@ -310,7 +307,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": { "id": "cfca4d7bd6db" }, @@ -327,7 +324,7 @@ "metadata": { "id": "mgUY8xrEt3Yc" }, - "execution_count": 28, + "execution_count": 7, "outputs": [] }, { @@ -345,7 +342,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 8, "metadata": { "id": "cf93d5f0ce00" }, @@ -391,7 +388,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 9, "metadata": { "id": "uWb8QzxwbH6W" }, @@ -421,7 +418,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 10, "metadata": { "id": "OtUodwGXZ7US" }, @@ -452,30 +449,11 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { - "id": "d8e54c57072e", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "outputId": "6347b9da-3695-4b07-d6a6-95687cbe6e8e" + "id": "d8e54c57072e" }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 9 - } - ], + "outputs": [], "source": [ "gcs_batch_job = client.batches.create(\n", " model=MODEL_ID,\n", @@ -496,43 +474,11 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { - "id": "DWq7m79PbjG8", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "1439f4e6-86f9-4698-80c5-f0361747c808" + "id": "DWq7m79PbjG8" }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "BatchJob(\n", - " create_time=datetime.datetime(2026, 2, 24, 20, 8, 27, 188146, tzinfo=TzInfo(0)),\n", - " dest=BatchJobDestination(\n", - " format='jsonl',\n", - " gcs_uri='gs://gen-ai-async-gemini-batch-jobs-bucket/output'\n", - " ),\n", - " display_name='genai_batch_job_20260224200826_850cb',\n", - " model='publishers/google/models/gemini-2.5-flash',\n", - " name='projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392',\n", - " src=BatchJobSource(\n", - " format='jsonl',\n", - " gcs_uri=[\n", - " 'gs://gen-ai-async-gemini-batch-jobs-bucket/input/gemini_batch_requests_for_multimodal_input_2.jsonl',\n", - " ]\n", - " ),\n", - " state=,\n", - " update_time=datetime.datetime(2026, 2, 24, 20, 8, 29, 397307, tzinfo=TzInfo(0))\n", - ")" - ] - }, - "metadata": {}, - "execution_count": 10 - } - ], + "outputs": [], "source": [ "gcs_batch_job = client.batches.get(name=gcs_batch_job.name)\n", "gcs_batch_job" @@ -549,46 +495,11 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": { - "id": "QVgOnasfigx1", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "b8c769f5-1d90-4c08-b2c3-cf69c8b4f18b" + "id": "QVgOnasfigx1" }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392 2026-02-24 20:08:27.188146+00:00 JobState.JOB_STATE_PENDING\n", - "projects/714644158567/locations/global/batchPredictionJobs/5502370701274775552 2026-02-22 20:55:19.297011+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/7129647910383255552 2026-02-22 15:58:08.745802+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/3900566977558609920 2026-02-22 15:50:43.800609+00:00 JobState.JOB_STATE_FAILED\n", - "projects/714644158567/locations/global/batchPredictionJobs/5365239611058552832 2026-02-20 00:07:26.242456+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/6652143204579672064 2026-02-19 22:22:27.732340+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/7203376762095403008 2026-02-18 14:28:15.327604+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/316106294450716672 2026-02-17 23:10:56.755787+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/5516453246203330560 2026-02-11 00:26:18.174842+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/7905612848523378688 2026-02-11 00:08:04.265348+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/4842039202004598784 2026-02-10 22:01:21.728474+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/3526214054626459648 2026-02-10 16:21:23.083033+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/187920830838079488 2026-02-10 14:57:27.709171+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/3580397987643260928 2026-02-09 11:53:29.131326+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/7163817433239650304 2026-02-05 23:06:55.839232+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/8429328928530759680 2026-02-05 22:55:13.221729+00:00 JobState.JOB_STATE_FAILED\n", - "projects/714644158567/locations/global/batchPredictionJobs/8661827259293761536 2026-02-05 17:54:20.718076+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/14915974742409216 2026-02-05 17:41:12.852362+00:00 JobState.JOB_STATE_FAILED\n", - "projects/714644158567/locations/global/batchPredictionJobs/2941622413835632640 2026-02-04 14:57:31.883126+00:00 JobState.JOB_STATE_FAILED\n", - "projects/714644158567/locations/global/batchPredictionJobs/8781418940023701504 2026-02-04 05:51:21.058064+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/7241539711183880192 2026-02-03 15:27:13.674738+00:00 JobState.JOB_STATE_SUCCEEDED\n", - "projects/714644158567/locations/global/batchPredictionJobs/6584700260843520000 2026-01-27 17:25:03.313675+00:00 JobState.JOB_STATE_FAILED\n", - "projects/714644158567/locations/global/batchPredictionJobs/315689579543789568 2026-01-27 17:02:24.740220+00:00 JobState.JOB_STATE_FAILED\n", - "projects/714644158567/locations/global/batchPredictionJobs/1738827061792866304 2026-01-27 16:49:54.610306+00:00 JobState.JOB_STATE_FAILED\n" - ] - } - ], + "outputs": [], "source": [ "for job in client.batches.list():\n", " print(job.name, job.create_time, job.state)" @@ -607,23 +518,11 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": { - "id": "dtJDIXdHc0W-", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "f28997d4-bfd5-4cd0-f72c-c312db350bd3" + "id": "dtJDIXdHc0W-" }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Job succeeded!\n" - ] - } - ], + "outputs": [], "source": [ "# Refresh the job until complete\n", "while gcs_batch_job.state in (\n", @@ -673,106 +572,11 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": { - "id": "-jLl3es3dTqB", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 182 - }, - "outputId": "722038e7-3b83-4f94-bbe9-914dcce27872" + "id": "-jLl3es3dTqB" }, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/plain": [ - " key \\\n", - "0 user-defined-request-id-2 \n", - "1 user-defined-request-id-1 \n", - "\n", - " request status \\\n", - "0 {'contents': [{'parts': [{'file_data': None, '... \n", - "1 {'contents': [{'parts': [{'file_data': None, '... \n", - "\n", - " response \\\n", - "0 {'candidates': [{'avgLogprobs': -0.84266102567... \n", - "1 {'candidates': [{'avgLogprobs': -0.43489231782... \n", - "\n", - " processed_time avgLogprobs finishReason score \\\n", - "0 2026-02-24 20:11:22.961222+00:00 -0.842661 STOP -158.420273 \n", - "1 2026-02-24 20:09:47.208845+00:00 -0.434892 STOP -44.359016 \n", - "\n", - " content.parts content.role \n", - "0 [{'text': 'Here are the objects visible in the... model \n", - "1 [{'text': 'Here are the objects visible in the... model " - ], - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
keyrequeststatusresponseprocessed_timeavgLogprobsfinishReasonscorecontent.partscontent.role
0user-defined-request-id-2{'contents': [{'parts': [{'file_data': None, '...{'candidates': [{'avgLogprobs': -0.84266102567...2026-02-24 20:11:22.961222+00:00-0.842661STOP-158.420273[{'text': 'Here are the objects visible in the...model
1user-defined-request-id-1{'contents': [{'parts': [{'file_data': None, '...{'candidates': [{'avgLogprobs': -0.43489231782...2026-02-24 20:09:47.208845+00:00-0.434892STOP-44.359016[{'text': 'Here are the objects visible in the...model
\n", - "
" - ] - }, - "metadata": {} - } - ], + "outputs": [], "source": [ "fs = fsspec.filesystem(\"gcs\")\n", "\n", @@ -786,6 +590,64 @@ " display(df)" ] }, + { + "cell_type": "markdown", + "source": [ + "## Alternative: set Finished batch job" + ], + "metadata": { + "id": "H2xIS6KCyUmG" + } + }, + { + "cell_type": "code", + "source": [ + "gcs_batch_job = client.batches.get(name='projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392')\n", + "gcs_batch_job" + ], + "metadata": { + "id": "fB2tOJB3yR-T", + "outputId": "a7b895ef-c818-4725-9c9b-6fa3cdede60e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 14, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "BatchJob(\n", + " completion_stats=CompletionStats(\n", + " successful_count=2\n", + " ),\n", + " create_time=datetime.datetime(2026, 2, 24, 20, 8, 27, 188146, tzinfo=TzInfo(0)),\n", + " dest=BatchJobDestination(\n", + " format='jsonl',\n", + " gcs_uri='gs://gen-ai-async-gemini-batch-jobs-bucket/output'\n", + " ),\n", + " display_name='genai_batch_job_20260224200826_850cb',\n", + " end_time=datetime.datetime(2026, 2, 24, 20, 15, 33, 987090, tzinfo=TzInfo(0)),\n", + " model='publishers/google/models/gemini-2.5-flash',\n", + " name='projects/714644158567/locations/global/batchPredictionJobs/6979938407145275392',\n", + " src=BatchJobSource(\n", + " format='jsonl',\n", + " gcs_uri=[\n", + " 'gs://gen-ai-async-gemini-batch-jobs-bucket/input/gemini_batch_requests_for_multimodal_input_2.jsonl',\n", + " ]\n", + " ),\n", + " start_time=datetime.datetime(2026, 2, 24, 20, 9, 11, 226574, tzinfo=TzInfo(0)),\n", + " state=,\n", + " update_time=datetime.datetime(2026, 2, 24, 20, 15, 33, 987090, tzinfo=TzInfo(0))\n", + ")" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ] + }, { "cell_type": "markdown", "source": [ @@ -798,13 +660,37 @@ { "cell_type": "code", "source": [ - "gcs_uri = f\"gs://{file_paths[0]}\"" + "fs = fsspec.filesystem(\"gcs\")\n", + "\n", + "file_paths = fs.glob(f\"{gcs_batch_job.dest.gcs_uri}/*/predictions.jsonl\")\n", + "\n", + "gcs_uri = f\"gs://{file_paths[0]}\"\n", + "gcs_uri" ], "metadata": { - "id": "x2kOlX1ztGSE" + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "x2kOlX1ztGSE", + "outputId": "de262ef4-c01f-49ca-9d59-fcdcd3b8a745" }, - "execution_count": 30, - "outputs": [] + "execution_count": 16, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'gs://gen-ai-async-gemini-batch-jobs-bucket/output/prediction-model-2026-02-24T20:08:26.982399Z/predictions.jsonl'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 16 + } + ] }, { "cell_type": "code", @@ -818,14 +704,14 @@ "response_content" ], "metadata": { - "id": "8F4Jft1qtwlR", - "outputId": "bf878dfe-099a-41b3-96b5-c8cc0dc96827", "colab": { "base_uri": "https://localhost:8080/", "height": 143 - } + }, + "id": "8F4Jft1qtwlR", + "outputId": "82f6b251-2099-4a35-b122-30c45a6b3e53" }, - "execution_count": 40, + "execution_count": 17, "outputs": [ { "output_type": "execute_result", @@ -838,7 +724,7 @@ } }, "metadata": {}, - "execution_count": 40 + "execution_count": 17 } ] }, @@ -849,13 +735,13 @@ "response_lines" ], "metadata": { - "id": "3zBeLHy6uWE0", - "outputId": "39b54031-d0c9-4c1c-f2db-d79efe5fed8d", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "id": "3zBeLHy6uWE0", + "outputId": "f8058ceb-6e59-459a-8bdf-c998dfad4db0" }, - "execution_count": 41, + "execution_count": 18, "outputs": [ { "output_type": "execute_result", @@ -918,7 +804,7 @@ ] }, "metadata": {}, - "execution_count": 41 + "execution_count": 18 } ] }, @@ -929,13 +815,13 @@ "parsed_response_using_model_validate" ], "metadata": { - "id": "46fHnGWUu4xH", - "outputId": "07427de6-8da9-498c-d3eb-24a41aa661c7", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "id": "46fHnGWUu4xH", + "outputId": "620dda90-e4c7-40b9-d252-133f59a61565" }, - "execution_count": 46, + "execution_count": 19, "outputs": [ { "output_type": "execute_result", @@ -945,7 +831,7 @@ ] }, "metadata": {}, - "execution_count": 46 + "execution_count": 19 } ] }, @@ -956,13 +842,13 @@ "parsed_response_using_from_response" ], "metadata": { - "id": "q7q7Y_00uJOJ", - "outputId": "305bbd5b-17c5-42c4-af42-3fa3405a4d18", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "id": "q7q7Y_00uJOJ", + "outputId": "555f4a7d-cdbe-4e31-f4c1-82453ef1a049" }, - "execution_count": 42, + "execution_count": 20, "outputs": [ { "output_type": "execute_result", @@ -972,7 +858,7 @@ ] }, "metadata": {}, - "execution_count": 42 + "execution_count": 20 } ] },