
Building a Local Minecraft Skybox Generator with SD1.5
Text prompt to 128³ Minecraft room: SD1.5 panorama generation, equirectangular-to-cubemap projection, and CIEDE2000 block color matching — fully offline.
Table of Contents
Why Build This?
Services like Blockade Labs will convert a text prompt into a 360° panoramic skybox for $25/month. The output looks great — until you need to generate hundreds of variants for a game project, a film reference library, or a Minecraft server build. The bill compounds fast, and the entire pipeline stalls the moment their API goes down.
Beyond cost, there is an autonomy problem. When a creative workflow depends on an external API, you are not building a pipeline — you are renting one.
I built an equivalent that runs completely offline on a local GPU: seamless 1024×512 equirectangular panoramas from SD1.5, reprojected onto six cubemap faces, and matched to Minecraft blocks using CIEDE2000 color distance. One /function command places all 98,304 blocks in a single game tick.
Here is how each stage works and why the choices were made.
The Full Pipeline
text prompt
│
▼
[SD1.5 + 3 LoRAs + SeamlessTile + CircularVAEDecode]
│ ComfyUI workflow, 1024×512
▼
equirectangular panorama (.png)
│
▼
[equirect_to_faces() — bilinear interp + horizontal wrap]
│ 6 cubemap face images, 128×128 each
▼
[face_to_blocks() — CIEDE2000 in CIELAB space]
│ 6 block grids, 89-entry Minecraft palette
▼
[build_room_commands() — wall orientation logic]
│ 98,304 setblock commands
▼
room_name.mcfunction
Each stage has a distinct job and a distinct failure mode. Let’s go through them.
Stage 1: Generating a Seamless Panorama
Standard Stable Diffusion does not generate panoramas. Ask SD for a wide landscape at 2:1 resolution and you get a landscape — but the left and right edges won’t connect. A room built from that image has a hard seam exactly where north wall meets south wall.
Three things eliminate the seam at generation time, before any pixels are committed to disk.
The 360 Panorama LoRA (360_panorama_sd15.safetensors) is the foundation. It was trained on equirectangular images and understands the geometric distortion that format requires: objects near the poles are compressed, and the horizon wraps continuously left to right. Including 360, panorama in your prompt activates that conditioning.
SeamlessTile is a ComfyUI custom node that patches the Conv2d layers in the U-Net to use circular padding on the X axis instead of zero padding. In practice: during every convolution pass in the diffusion process, the model treats the right edge of the image as the left edge’s neighbor. The seam is eliminated structurally, not repaired afterward.
CircularVAEDecode extends the same approach through the VAE decoder. Without it, the VAE — which compresses the latent space and expands it back to pixels — would reintroduce a seam at decode time even if the U-Net produced a seamless latent. CircularVAEDecode applies circular padding during decoding, preserving what SeamlessTile built upstream.
Two additional LoRAs stack on top:
- Detail Tweaker sharpens fine texture and increases local contrast. Panoramas at 1024×512 have less real estate per scene element than a standard 512×512 generation, so the extra sharpness recovers lost detail.
- epi_noiseoffset_v2 shifts the noise offset during sampling, producing deeper shadows and more dramatic lighting — indoor scenes feel three-dimensional, outdoor scenes feel atmospheric.
The ComfyUI workflow builds the LoRA chain programmatically:
// LoRA chain: base checkpoint → 360 LoRA → Detail Tweaker → noise offset
"10": { class_type: "LoraLoader",
inputs: { lora_name: "360_panorama_sd15.safetensors",
strength_model: loraStrength, model: ["1", 0], clip: ["1", 1] } },
"11": { class_type: "LoraLoader",
inputs: { lora_name: "detail_tweaker.safetensors",
strength_model: detailStrength, model: ["10", 0], clip: ["10", 1] } },
"12": { class_type: "LoraLoader",
inputs: { lora_name: "epi_noiseoffset_v2.safetensors",
strength_model: noiseOffsetStrength, model: ["11", 0], clip: ["11", 1] } },
// SeamlessTile patches Conv2d for circular X padding
"3": { class_type: "SeamlessTile",
inputs: { model: ["12", 0], tiling: "x_only", copy_model: "Make a copy" } },
// KSampler uses the patched model
"7": { class_type: "KSampler",
inputs: { model: ["3", 0], sampler_name: "euler_ancestral",
scheduler: "karras", steps: 25, cfg: 7.5, denoise: 1.0,
positive: ["4", 0], negative: ["5", 0], latent_image: ["6", 0] } },
// CircularVAEDecode preserves seamlessness through decoding
"8": { class_type: "CircularVAEDecode",
inputs: { samples: ["7", 0], vae: ["1", 2], tiling: "x_only" } }
The negative prompt matters as much as the positive. The default negative suppresses visible seams, repeating patterns, tiled textures and wrong perspective, panoramic distortion, equirectangular artifact, pole distortion. Without these, the model’s default tendencies push toward conventional framing rather than the panoramic geometry you need.
The output is a 1024×512 PNG. For reference, I compared this against an 8192×4096 Blockade Labs panorama (generated earlier on a paid key) — both feed into the same projection and matching stages, and at Minecraft’s block resolution, the quality difference is negligible.
Stage 2: Equirectangular to Cubemap
The panorama encodes all 360° of horizontal view and 180° of vertical view in spherical coordinates. To wrap it onto six walls, you reverse-project it: for each pixel on each cubemap face, calculate the 3D viewing direction that pixel represents, map that direction back to panorama coordinates, and sample.
def equirect_to_faces(equirect_img, face_size):
eq = np.array(equirect_img.convert('RGB'), dtype=np.float64)
eq_h, eq_w = eq.shape[:2]
u = np.linspace(-1, 1, face_size)
v = np.linspace(-1, 1, face_size)
uu, vv = np.meshgrid(u, v)
ones = np.ones_like(uu)
face_dirs = {
'north': (uu, -vv, -ones),
'south': (-uu, -vv, ones),
'east': (ones, -vv, uu),
'west': (-ones, -vv, -uu),
'ceiling': (uu, ones, vv),
'floor': (uu, -ones, -vv),
}
The direction tuples define the 3D vector for every pixel on a face. For north, as u sweeps -1 to 1 you move left to right across the wall while staying at z = -1 (pointing away from you). For south, the X component is negated — a mirror flip — which is geometrically correct: when you face south from inside the room, east is to your right, not your left.
Converting direction vectors to panorama sample coordinates:
r = np.sqrt(dx**2 + dy**2 + dz**2)
lon = np.arctan2(dx, dz) # longitude: -π to +π
lat = np.arcsin(np.clip(dy/r, -1, 1)) # latitude: -π/2 to +π/2
eq_x = (lon / np.pi + 1) / 2 * (eq_w - 1)
eq_y = (0.5 - lat / np.pi) * (eq_h - 1)
The bilinear interpolation step is where horizontal wrapping becomes load-bearing:
x0w = x0 % eq_w # wrap left edge to right edge
x1w = (x0 + 1) % eq_w # and its neighbor
face_rgb = (eq[y0c, x0w] * (1 - fx) * (1 - fy) +
eq[y0c, x1w] * fx * (1 - fy) +
eq[y1c, x0w] * (1 - fx) * fy +
eq[y1c, x1w] * fx * fy)
The modulo on x0w ensures that pixels on the east face sampling near the 360°/0° boundary of the panorama interpolate correctly across that seam rather than clamping to the edge. Skip this and the east wall gets a hard vertical artifact right where the panorama wraps.
Each face is resized to 128×128 — one pixel per block.
Stage 3: Block Color Matching with CIEDE2000
This is where most pixel-art-to-Minecraft converters fail. They use Euclidean distance in RGB space, which does not reflect how humans perceive color. Two colors with identical RGB distance can look extremely different — a dark blue versus a dark purple — while two colors with larger RGB distance can look nearly identical, like two slightly different grays.
CIEDE2000 is the international standard for perceptual color difference. It operates in CIELAB space, which is designed so that equal numerical distances correspond to equal perceived differences.
The conversion chain from sRGB to CIELAB:
def srgb_to_linear(c):
c = c / 255.0
return np.where(c <= 0.04045, c / 12.92, ((c + 0.055) / 1.055) ** 2.4)
def linear_to_xyz(rgb):
M = np.array([
[0.4124564, 0.3575761, 0.1804375],
[0.2126729, 0.7151522, 0.0721750],
[0.0193339, 0.1191920, 0.9503041],
])
return rgb @ M.T
def xyz_to_lab(xyz):
D65 = np.array([0.95047, 1.00000, 1.08883])
xyz = xyz / D65
delta = 6.0 / 29.0
f = np.where(xyz > delta**3, np.cbrt(xyz), xyz / (3*delta**2) + 4.0/29.0)
L = 116 * f[..., 1] - 16
a = 500 * (f[..., 0] - f[..., 1])
b = 200 * (f[..., 1] - f[..., 2])
return np.stack([L, a, b], axis=-1)
The Minecraft palette contains 89 blocks: 16 colors each of concrete, terracotta, and wool, plus natural stone variants (granite, diorite, andesite, deepslate, obsidian), wood planks, copper in four oxidation states, gem blocks, Nether materials, and Prismarine. At the bright end: iron block, gold block, diamond block, quartz block, sandstone. At the dark end: coal block, obsidian, black concrete.
An optional --glow flag extends the palette with 10 emissive blocks — glowstone, magma block, sea lantern, the three froglight variants — for scenes where you want bright areas to cast light in-game.
One important pre-processing step: before matching, each face image is brightness-boosted by +10% and contrast-enhanced by 1.1x. Minecraft’s ambient occlusion and reduced sky access inside an enclosed room make blocks render noticeably darker than their texture colors. The adjustment compensates before any matching happens.
PaletteIndex converts the entire palette to CIELAB once at startup, then computes the full CIEDE2000 distance matrix per batch:
class PaletteIndex:
def __init__(self, palette, glow_palette=None):
colors = np.array([c for c, _ in palette], dtype=np.float64)
self.names = [n for _, n in palette]
self.lab = rgb_to_lab(colors)
def nearest_batch(self, pixel_labs, brightness):
dists = ciede2000(pixel_labs, self.lab)
return [self.names[i] for i in dists.argmin(axis=1)]
Converting once at startup means 89 CIELAB vectors are computed a single time regardless of how many pixels you match. The cost of the matrix operation scales with pixel count, not palette lookups.
Stage 4: Room Assembly
The six block grids become setblock commands. The orientation logic is the subtlest part of the pipeline — get a mirror direction wrong and the room looks correct from five sides and backwards on the sixth.
The rule: when you stand inside the finished room facing north, the north wall shows the front of the panorama, east shows what is to your right, and the floor connects seamlessly at its base with all four walls.
# North wall (z=0): left-to-right matches panorama left-to-right
for row in range(size):
for col in range(size):
commands.append(f"setblock ~{col} ~{size-1-row} ~0 minecraft:{g[row][col]}")
# South wall (z=size-1): mirrored on X so east stays east from inside
for row in range(size):
for col in range(size):
commands.append(f"setblock ~{size-1-col} ~{size-1-row} ~{size-1} minecraft:{g[row][col]}")
# Floor (y=0): Z-flipped so the front edge aligns with the north wall's base
for row in range(size):
for col in range(size):
commands.append(f"setblock ~{col} ~0 ~{size-1-row} minecraft:{g[row][col]}")
The south wall uses size-1-col instead of col — that is the mirror flip. The floor uses size-1-row to align its near edge with the north wall’s base. East and west walls mirror each other the same way.
Running it for a 128³ room:
$ python3 tools/pano_to_room.py \
--image data/skybox/panoramas/skybox_crystal_canyon.png \
--name room_crystal_canyon_skybox \
--size 128
Loading panorama: skybox_crystal_canyon.png
Size: 1024x512 (ratio 2.00)
Projecting to 128x128 cubemap faces...
Converting north...
Converting south...
Converting east...
Converting west...
Converting ceiling...
Converting floor...
Assembling room...
room_crystal_canyon_skybox.mcfunction (98304 commands)
clear_room_crystal_canyon_skybox.mcfunction
Room: 128³ = 98304 blocks across 6 faces
Required: /gamerule maxCommandChainLength 98404
Build: /function p2:room_crystal_canyon_skybox
Clear: /function p2:clear_room_crystal_canyon_skybox
Tip: stand at the CORNER of where you want the room, facing +X +Z
In-game:
/gamerule maxCommandChainLength 98404
/reload
/function p2:room_crystal_canyon_skybox
The function runs in a single tick. 98,304 blocks placed at once.
Where This Lives in the Broader Pipeline
The skybox generator is integrated into the id8 creative pipeline as a dedicated tab. The Express backend (src/server/routes/skybox.mjs) handles:
- Submitting the ComfyUI panorama workflow via
/api/skybox/comfyui/generate-panorama, which builds the workflow JSON programmatically and posts it to ComfyUI’s/promptAPI - Polling ComfyUI’s history API for completion and copying the output to
data/skybox/panoramas/ - Running the cubemap preview via
pano_to_room.py --save-faces(256×256 faces for browser preview) - Running the full mcfunction export on demand and serving the file for download
The pipeline also accepts Blockade Labs API output as an alternative panorama source — useful for A/B quality comparisons when you already have a key. Both sources feed into the same projection and block-matching stages, so the backend code stays clean regardless of origin.
Known Limitations and Next Steps
A few honest tradeoffs worth naming before you build this:
Palette ceiling. 89 blocks cover a wide tonal range but will struggle with saturated colors that have no direct Minecraft analog — neon greens, electric blues. CIEDE2000 finds the closest perceptual match, but “closest” is constrained by what the palette contains. Adding more blocks helps; adding dyed glass for semi-transparent effects helps more.
Pole compression. Equirectangular images compress geometry toward the poles. At 128×128 face resolution, the ceiling and floor carry less visual information than the four walls. For most scenes this is acceptable — skies and floors tend to be simpler — but if your prompt has detailed zenith content, you will see it smear.
One block per pixel. The 1:1 mapping means face resolution sets room size. A 64³ room saves commands but loses detail. A 256³ room requires raising maxCommandChainLength significantly and may stress older hardware when the function fires.
The pipeline as described handles the core use case well. The extensions — larger palettes, upsampled faces, progressive block placement — are straightforward from here once you understand how each stage connects.
What You Built
- Seamless panorama generation requires circular padding at two stages — the U-Net convolutions (SeamlessTile) and the VAE decoder (CircularVAEDecode). Post-processing seam repair is not needed when you solve it at generation time.
- Equirectangular-to-cubemap projection is vectorized spherical coordinate math. The direction vectors and mirror logic are load-bearing — they encode the viewer’s inside perspective, not an abstract coordinate system.
- CIEDE2000 in CIELAB space produces perceptually correct block matches. Euclidean RGB distance does not, and the difference is visible in the final room.
- The brightness and contrast pre-adjustment compensates for Minecraft’s ambient occlusion before any matching decisions are made.
- The 128³ room is 98,304 setblock commands. Raise
maxCommandChainLengthbefore firing the function. - The entire pipeline runs offline with no API dependencies. Once it is working, it keeps working.