Quick ACES Thoughts

Appears that the RRT global de-saturate step applied in AP1 drops to a gamut smaller than Rec2020. This seems to be ok when targeting Rec709/sRGB but not sure if this is future proof in the context of Rec2020. Seems like the reference ACES ODT for Rec709 at 48 nits ends up with gamut clipping when inputs to the RRT had covered the full positive part of AP1 space. Those working with sRGB/Rec709 primaries in the rendering pipeline might not have issues here depending on how much saturation is added during grading before the RRT. Guessing some people would rather be able to go nuts anywhere in the human perceptual space and have it smoothly map to the output display space?


The Written Word

Growing older, I find that games, movies, TV are all limiting forms of entertainment, and that by far the best form of story driven consumables is the book. Right now I'm half through On the Steel Breeze by Alastair Reynolds, taking a break to reflect. Something was lost over the years as digital entertainment has evolved from the soup of interactive text adventures. Certainly enjoy the visual representation of a good story, but I enjoy more the freedom to explore stories which could never gain the support necessary for a non-literary translation. Early in gaming there was an interesting balance forced by the limitations of the machine, where the written word took the place of electronically "physically" realizing everything in the game. Would be great once and a while to trade the modern game single player storyline, played out in "cut scenes", with a story of the caliper of a great novel, represented instead in "cut pages" of text. Then shifting the focus of development and polish back into the game itself.


Cloudhead Games : Blink Locomotion for VR

Blink locomotion for VR is quite a cool idea ... but beyond the usage for removing VR sickness: for graphics. Brings back the feeling of classic adventure games. Fixed spaces in which the player interacts with instant connectivity between the spaces. The opportunity for graphics is to pre-compute the spaces to extremely high fidelity. Effectively pre-solving the visibility and light transport, with a secondary system which composites in the dynamic 3D elements into the scene...


1536-5 : Keys

Evening 5 on 1536. Wrote a mini PS/2 keyboard driver (source below) based on prior work. Ran out of time for testing, got distracted by SIGGRAPH slides. Only supporting 64 keys (bit array in register), good enough to run arcade controllers which alias as keyboards. Only supporting driver key release on {shift, control, alt}, allowing application to clear bits for release for other keys. Had an interesting bug today: forgot to implement the "MOV REG,REG" opcode, surprised got this far in 1536 without register to register move. Manually keeping 16-byte groupings for instructions has some interesting side effects on coding style...

SIGGRAPH : Ready At Dawn

Ready at Dawn is starting to post SIGGRAPH content: readyatdawn.com/ready-at-dawn-siggraph

GL and Vulkan at SIGGRAPH 2015

The biggest news is that Google is going to ship Vulkan on Android. Vulkan is set to become the best option for cross-platform portable lower-level graphics development: Android, Linux, SteamOS, Windows 7/8/10/etc. Vulkan has some great advantages: (a.) Vulkan is not locked to a given OS version, (b.) Vulkan has an extension system both in the API and shader language which enables hardware vendors to expose features of the hardware and enables the API to continue to rapidly improve, (c.) Vulkan as an open standard promotes great 3rd party support (see what people are already doing with SPIR-V)...

GL released extension specs for a lot of great new features including ARB_shader_ballot. This is a great step forward in the process of getting some support for basic ISA functionality which has been shipping in hardware for the past 3 years.


1536-4 : Coloring

Night 4 on 1536. Brought up most of the "x56-40" (x86-64 in hex) assembler now. Also have majority of the forth-like words needed to assemble self-documenting constants {add,mul,neg,not,and,or,xor,...}.

Started on the editor. Just enough of a quick prototype to render the text view in the editor (sans cursor for now). All screens on this post are captured from the editor running in an x86-64 emulator. Keeping the fixed 64 character lines makes everything very simple. Syntax highlighting was carefully designed to only need one line of context. Just a simple backward sweep to color, then a forward sweep to correct the color for comments (the \ marks rest of line as comment). Adjusted the font, {_,-,=} all now extend out full font cell width so they can double as lines. Adjusted the colors closer to what I like for syntax highlighting. Still experimenting with how to comment and arrange source.

Have 16 characters to the right of the source window to use for real-time debug data. Like viewing values of registers, memory, etc. Thinking through details in the background. Next step is to bring up the non-USB throw-away keyboard driver, then get the editor functional.

Still finding the no-errors, no-tools, know-everything path, easy to work with. This time lost some time to an opcode assembly bug. A full class of opcodes was broken, something never validated from last time, just forgot to make a RIP relative offset RIP relative for non-branch instructions. Everything else working out of the box with no human errors. When the mind can reason about the entire system, and the edit/execute loop is near instant, bugs normally are instant fix. Quite satisfying to work this way.


Demo Tubes: Parnassum & Monolith

Thoughts on the Evolution of Processor Design

Feels like the fundamental limiter in the evolution of processor design is the {load,alu,store} design paradigm: the separation of memory and ALU at all scales. A CPU is effectively like having billions of people each with a mailbox to store data, each routing the data to just one single person (out of billions) with a calculator doing computation. As CPUs have evolved, there has only been a tiny increase in the number of people with calculators. Then the GPU enters the timeline, providing a substantial increase in the number of people with calculators, but this increase is still relatively tiny with respect to the number of people routing data to and from the mailboxes. I'm wondering if perhaps all people should just have calculators. Looking at some numbers,

Chip Capacity in Flop per Clock per Transistor
Using numbers from Wikipedia for Fury X,

8601 Gflop/s
8900 Mtransistors
1050 MHz

Capacity for upwards of 8000 flops each clock, but with around 1 million transistors per flop.

Science Fiction Version of the Suggested Paradigm Shift
A completely science fiction version of the suggested paradigm shift might be a chip with 256 MB of memory, divided into 32 million 64-bit cells, with each cell doing one 64-bit uber-operation every 64 clocks (bit per clock), clocked at a relatively low clock rate like 500 Mhz: providing something like 250,000,000,000,000 uber-ops per second. The board composed of 3D stacks of these chips connected by TSVs, stacks connected by interposers (like HBM). Board might have something like 16 stacks, providing 4,000,000,000,000,000 uber-ops per second. The local parallel just-in-time compile step configures cells to work around bad cells, yield problems go away. The mindset used to program the machine is quite different. Data constantly flows around chip as it is filtered by computation. The organization of data constantly changing to adapt to locality of reference. Programs reconfigure parts of the chip at run-time to solve problems. Reconfigure of a cell is basically adjusting which neighborhood connections are inputs to the uber-op, and the properties of the uber-op.


1536-3 : Simplify, Repeat

Night 3 on the 1536 project, decided to make some changes before going full ahead with writing an editor.

(1.) Switched the boot loader to not relocate to zero. Now leaving the BIOS areas alone, I doubt I'll ever go back to real mode after switching to long mode, this ends up making the code easier, and provides room for the next change.

(2.) Switched the boot loader to fetch the first 15 tracks instead of just 1 track. Now have a little over 472KB of source to work with on boot, which is effectively infinite for this project. The motivation for this change was the realization that x86-64 assembly source would get big. Don't want to change this later. 472KB is close to the maximum without checking for things like EBA, or handling non-full-track reads.

(3.) Switched to an easier source model. Lines are now always 64 characters long. Comments are switch to a single \ which functions like the C // style comment, ignoring the rest of the line. Since lines are always 64 bytes (cacheline sized and aligned), the interpreter can quickly skip over comments. This is a trade in increased source size, for simplification of the editor: fixed size lines makes everything trivial.

(4.) Making a convention that syntax highlighting with color only has a line of context. Which translates into don't let things wrap. Easy.

Star Wars Battle Pod Arcade Review : Save Your Tokens for Air Hockey

Went to the Cary North Carolina Dave and Busters to try out the Star Wars Battle Pod a few days ago after posting someone's youtube review ages ago. The arcade experience in the US has certainly changed since I was a youth. Nearly everything I loved as a kid is gone, with the exception of some classic physical games like air hockey, skee ball, pool, etc. The Battle Pod is a great example of how the spirit of the arcade is getting lost. Starting with the screen: it's a spherical projection screen where Dave and Busters had the awesome idea of keeping their card reader illuminated so strongly during gameplay, that the screen black level was practically white: nearly impossible to see what was going on. That might have actually been a blessing, because whoever wrote the spherical projection code apparently figured out how to do something worse than bilinear filtering: it looks horrible. What is left is relatively low resolution which would be fine if properly filtered, except in this case the aliasing is so bad, I kept on getting the feeling that the only point to the card reader was to hand out a refund to pay for the player's eye pain. It gets better: the game hitches, doesn't even feel like 30 Hz, let alone the 60 Hz which sets the minimum bar for frame rate in a real arcade game. The classic arcades were defined by perceptually zero latency input designed to take a beating, with locked frame rates at the highest rate possible on the display hardware, and stunning visuals pushing the limits of the hardware. Someone badly needs to bring that experience back...