Quick HTML With ChatGPT Vision: I’ve Been Waiting For This

Ever since the release of the original GPT-4 developer livestream where we all got our first glimpse of GPT-4, I’ve been waiting to do one thing: repeat the part of the video where the dev doodles a webpage on paper, snaps a pic with his phone, and has GPT build the webpage.

Now, with the upgrades bringing “sight and speech” to GPT via image recognition and the Whisper voice-to-text model, we finally get to play with the toys we were promised. You can upload an image to GPT with a click and standard file browse window. I haven’t checked if you can paste images directly into ChatGPT yet.

Back to the visual web page coding. Oh, Glory! HTML/CSS isn’t that hard, but it is finicky and annoying sometimes. To rapidly get the skeleton of a site on the page is a great time saver. I’m pleased to say it all works as advertised.  As soon as I noticed I had gotten the ChatGPT “Vision” upgrade (some are calling it GPT-V), I had to try the website trick.

A screenshot of ChatGPT showing the image icon for GPT Vision boxed in red.
The small image icon in the prompt entry field means you have GPT-V! Image: Author

I didn’t have paper handy so I opened paint and drew some quick boxes for divs. My new web page will have a header area at the top with my logo image, then a series of text containers with H2 headings and paragraph text elements. I added a note that the dimensions should be that of standard paper and the containers should be 7 inches wide, effectively giving 3/4″ side margins. See below:

A screenshot of the paint mockup image of the page given to ChatGPT.
A quick paint image showing GPT-4 what I wanted my page to look like. Image: Author.

Uploading the image is simple. If you’ve used Advanced Data Analysis to upload files, it’s the same process (except note that GPT Vision is currently available on the Default model of GPT-4, not in Advanced Data Analysis a.k.a. ChatGPT code interpreter). Click the small image icon on the left side of the prompt entry field to open a browse window and select your image.

Type your prompt in with the image and click send! No problems.

Here’s the HTML and CSS I got back from GPT:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>GCD Reports Layout</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 0;
padding: 0;
width: 8.5in; /* Equivalent to the standard paper width */
}

.container {
width: 7in; /* 7 inches for the container */
margin: 0 auto; /* Centers the container on the page */
border: 1px dashed black;
padding: 10px;
}

.header {
border: 1px solid red;
height: 100px; /* Adjust height as needed */
text-align: center;
margin-bottom: 20px;
}

.content-box {
border: 1px solid black;
margin-bottom: 20px;
padding: 10px;
}

.content-box h2 {
margin: 0;
}

.content-box p {
margin: 10px 0;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<img src="LOGO_IMAGE_PATH" alt="Logo" style="max-width: 100%; height: auto;"> <!-- Replace LOGO_IMAGE_PATH with the path to your logo -->
</div>
<div class="content-box">
<h2>Placeholder Text</h2>
<p>Paragraph Placeholder Text</p>
</div>
<div class="content-box">
<h2>Placeholder Text</h2>
<p>Paragraph Placeholder Text</p>
</div>
<div class="content-box">
<h2>Placeholder Text</h2>
<p>Paragraph Placeholder Text</p>
</div>
</div>
</body>
</html>

Here you can see the HTML GPT-V gave me rendered in a preview pane. It's spot on!
Here you can see the HTML GPT-V gave me rendered in a preview pane. It’s spot on!

As you can see, the HTML GPT-V gave me is spot on! It looks almost exactly like my diagram, it had the correct dimensions, and everything was formatted appropriately in the included CSS.

If you’ve ever asked yourself: “Can I put pictures in chat gpt?” Well, now you can. I’m super excited about this! Nerd alert! It’s just… so… handy. Give it a whirl.