八戒午夜伦理_窝窝午夜伦理片_六月婷婷

chenfeng — Mon, 21 Apr 2025 05:41:41 +0000

Linux’s efficient and robust memory management is a cornerstone of its success. while users rarely think about it, a lot of complex work happens behind the scenes to ensure processes have the memory they need, and that they don’t interfere with each other. This blog post will explore how Linux leverages the x86 architecture’s memory addressing capabilities, focusing on logical, linear, and physical addresses, segmentation, and paging.

The Foundation: Logical, Linear, and Physical Addresses

At its core, memory addressing involves translating a logical address into a physical address, The 80×86 architecture, particularly in protected mode, introduces a crucial intermediary: the linear address.

First think of it as a three step process:

Logical Address: this is the address a programmer uses in their code. It’s a 32-bit integer repersenting a memory location.
Segmentation: The 80×86 architecture traditionally uses segmentation. Logical address are broken down into segment and an offset. The segment defines a base address and the offset specifies the distance from the base, However, modern Linux largely avoids segmentation for most processes due to its complexity and limitations.
Paging: The linear address is then translated into physical address through a process called paging. This involves using page tables and a Memory Management Unit (MMU) to map linear addresses to physical memory location.

Why Segmentation Isn’t the Star in Linux

Historically, segmentation was integral to 80×86 architecture. However, Linux prefers paging for several reasons:

Simplicity: Managing memory is simpler when all processes share the same set of linear address, while paging can map the same linear address space into different physical address space.

Portability: RISC architectures often have limited support for segmentation. Linux’s design portability across a wide range of architectures.

Intro Segment Descriptors

Segment descriptors (SD) are 8-byte data structures that describle the characteristics of each segment in memory, Each segment (code, data, stack, etc.) has a corresponding segment descriptor. SD are stored in Global Descriptor Table (GDT) or Local Descriptor Table (LDT)

segmentation unit

The segmentation unit is a component of the 80×86 processor responsible for translating logical address into linear address.

Convert steps:

Examines the TI field of the Segment Selector: This field determines whether the Segment Descriptor is stored in the GDT or LDT

Computes the address of the segment descriptor
Adds the offset to the Base field: Final it adds the offset of the logical address to the Base field of the Segment descriptor to obtain the final linear address.

Global Descriptor Table (GDT):

Purpose: The GDT is a central table containing segment descriptors that define the memory segments accessible by the system. There is only one GDT per cpu in a multiprocessor system.

Storage: Stored in the cpu_gdt_table array, with address and sizes in cpu_gdt_descr

ENTRY(cpu_gdt_table)
	.quad 0x0000000000000000	/* NULL descriptor */
	.quad 0x0000000000000000	/* 0x0b reserved */
	.quad 0x0000000000000000	/* 0x13 reserved */
	.quad 0x0000000000000000	/* 0x1b reserved */
	.quad 0x0000000000000000	/* 0x20 unused */
	.quad 0x0000000000000000	/* 0x28 unused */
	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
	.quad 0x0000000000000000	/* 0x3b TLS entry 2 */
	.quad 0x0000000000000000	/* 0x43 TLS entry 3 */
	.quad 0x0000000000000000	/* 0x4b reserved */
	.quad 0x0000000000000000	/* 0x53 reserved */
	.quad 0x0000000000000000	/* 0x5b reserved */

	.quad 0x00cf9a000000ffff	/* 0x60 kernel 4GB code at 0x00000000 */
	.quad 0x00cf92000000ffff	/* 0x68 kernel 4GB data at 0x00000000 */
	.quad 0x00cffa000000ffff	/* 0x73 user 4GB code at 0x00000000 */
	.quad 0x00cff2000000ffff	/* 0x7b user 4GB data at 0x00000000 */

	.quad 0x0000000000000000	/* 0x80 TSS descriptor */
	.quad 0x0000000000000000	/* 0x88 LDT descriptor */

	/* Segments used for calling PnP BIOS */
	.quad 0x00c09a0000000000	/* 0x90 32-bit code */
	.quad 0x00809a0000000000	/* 0x98 16-bit code */
	.quad 0x0080920000000000	/* 0xa0 16-bit data */
	.quad 0x0080920000000000	/* 0xa8 16-bit data */
	.quad 0x0080920000000000	/* 0xb0 16-bit data */
	/*
	 * The APM segments have byte granularity and their bases
	 * and limits are set at run time.
	 */
	.quad 0x00409a0000000000	/* 0xb8 APM CS    code */
	.quad 0x00009a0000000000	/* 0xc0 APM CS 16 code (16 bit) */
	.quad 0x0040920000000000	/* 0xc8 APM DS    data */

	.quad 0x0000000000000000	/* 0xd0 - unused */
	.quad 0x0000000000000000	/* 0xd8 - unused */
	.quad 0x0000000000000000	/* 0xe0 - unused */
	.quad 0x0000000000000000	/* 0xe8 - unused */
	.quad 0x0000000000000000	/* 0xf0 - unused */
	.quad 0x0000000000000000	/* 0xf8 - GDT entry 31: double-fault TSS */

Layout: Contains 18 segment descriptors and 14 unused/reserved entries, the unused entries are strategically places for hardware cache alignment.

Segment Defined:

4 user/kernel code and data segments
A task state segment (TSS) – unique for each processor
Thress Thread-Local Storage TLS segments
Thress segments related to Advanced Power Management (APM)
Five segments relate to plug-and-play BIOS services

Access: The CPU uses the gdtr register to locate GDT

Local Descriptor Table (LDT)

Purpose: An LDT is a per-process table of segment descriptors. It allows processes to define their own custom segments. Most Linux user-mode applications don’t use LDTs, so the kernel provides a default LDT

Custom LDTs: Processes can create their owm custom LDTs using the modify_ldt() system call

GDT update: When a process starts using a custom LDT, the corresponding entry in the CPU specific GDT is updated

Storage: The address and size of the LDT are stored in the ldtr register

Intro Memory Paging

Core Concepts: Paging and Page Tables

Paging: Linux uses paging to manage memory efficiently. Instead of contiguous blocks of memory, RAM is divided into fixed-size units called Page Frames (typically 4KB), Processes don’t directly access physical memory locations, they use linear address. Paging translates these linear addresses to physical address.

Page Tables: The translation from linear to physical address is done by Page Tables. These are data structures stored in RAM that map linear addresses to physical page frames.

Regular Paging

The 32 Bits of a linear address divided into thress level fields:

Directory (10 bits)
Table (10 bits)
Offset (12 bits)

The control register cr3 stores the physical address of the Page Directory which is used. The directory field of the linear address uses with the cr3 to find the proper page directory, Using the table field with the proper page directory we can find the address of the proper page table, Finally adding the offset field with the proper page table we get the real page address.

Paging In Linux

Linux adopts a common paging model that fits both 32 bits and 64 bits architectures.

The structure of Page Tables:

Page Global Directory (PGD): This is the top-level table, It contains pointers (address) to the PUD, think of it as the “root” of the paging tree. Each process has its own PGD. The Kernel carefully manages the PGDs and ensure they are correctly loaded when a process switches. The cr3 control register holds the physical address of the current process’s PGD.

Page Upper Directory (PUD): Not always present, especially in 32-bit systems. It contains pointers to the Page Middle Directories.
Page Middle Directory (PMD): It contains pointers to the Page Table
Page Table (PT): This is the bottom level. Each entry in a Page Table points to a specific page frame in physical memory.

Process Page Table (PPT)

These are sets of page tables maintained for each process running on the system. They map the process’s linear address to a physical memory address.

When a process is in User Mode, it uses linear addresses below 0xc0000000.

When a process is in Kernel Mode, it uses linear addresses greater than or equal to 0xc0000000.

Kernel Page Table (KPT)

The kernel maintains its own set of page tables, rootes at a Master Kernel Page Global Directory, this master KPGD and its associated tables are not directly used by processes or kernel threads.

The Key Differences & Relationship (PPT and KPT)

process-specific vs kernel wide: process page tables are unique to each process, while the kernel page tables are a central and shared resource.

Template/Reference: The kernel page tables act as a template or reference for setting up the process page tables.
Dynamic updates: Kernel updates are propagated to process tables, ensuring consistency.

Summarize

Core Concept: The linux kernel uses a hierarchical page table system to translate linear virtual addresses used by processes into physical addresses in RAM. This allows for memory protection, virtual memory, and efficient memory management.

Page Global Directory (PGD)
The top-level structure in the paging hierarchy. It contains entries that point to Page Upper Directories (PUD)
Each process has its own PGD, The kernel has a master PGD
- pgd_index(addr) Macro: Calculates the index of the PGD entry for a given linear address
- pgd_offset(mm, addr) Macro: Calculates the linear address of the PGD entry for a given address and memory descriptor
- pgd_offset_k(address) Macro: Calculates the linear address of the kernel PGD entry
- pgd_page(pgd): Gets the address of the page Fram containing the PGD
Page Upper Directory (PUD)
Points to Page Middle Directories(PMDs)
Each PGD entry points to a PUD
- pud_offset(pgd, addr)`: Calculates the linear address of the PUD entry for a given address
- pud_page(pud)`: Gets the address of the page frame containing the PU
Page Middle Directory (PMD)
Points to Page Tables (PTs)
Each PUD entry points to a PMD
- pmd_index(addr): Calculates the index of the PMD entry for a given address
- pmd_offset(pud, addr): Calculates the linear address of the PMD entry for a given address
- pmd_page(pmd): Gets the address of the page frame containing the PMD
Page Table (PT)
The lowest level. Contains entries that directly map linear addresses to physical addresses (page frames)
Each PMD entry points to a PT
- pte_offset_map(dir, addr)`: Calculates the linear address of the PT entry for a given address
- pte_page(x)`: Gets the page descriptor address of the page referenced by the PT entry
- pte_to_pgoff(pte)`: Extracts the physical page offset from a PTE

Additional Import Concepts:

Page Descriptor: A data structure associated with each physical page frame. It contains information about the page’s status, access rights, and other metadata.
CR3 Register: A CPU register that holds the physical address of the current process’s PGD. Switching processes involves updating CR3
TLB (Translation Lookaside Buffer): A cache that stores recent translations from linear to physical addresses to speed up memory access.

Paging Levels:

32-bit system (PGD -> PT)
32-bit system (with PAE physical address extension) (PGD -> PUD -> PT)
64 bit system (PGD -> PUD -> PMD -> PT)

The Paging Process (simplified)

CPU requests memory: the CPU generates a linear address to access memory
MMU lookup: The MMU checks the the TLB for a translation of that linear address
- TLB Hit: The MMU immediately retrieve the corresponding physical address
- TLB Miss: The MMU must walk the page tables. It uses the bits of the linear address to index into the PGD, then the PUD, then the PMD, and finally the PT, to find the physical address. This is a slower process. The Translation is then stored in the TLB for future use.
Memory Access: The MMU provides the physical address to the memory controller, which accesses the data information in RAM.

In essence, the page tables form a tree-like structure that allows the kernel to map a process’s linear address space to the physical memory, providing memory management and protection.

chenfeng — Mon, 26 Feb 2024 05:01:03 +0000

When I used the Robomaster Development C Board as my embedding microcontroller system under Linux, I met some problems; I didn’t know how to flash the board, all things about flash on the web were progressed under windows operating system. So, I have investigated by myself and have learned something about stm32, stlink, and jlink knowledge. I’ll share how to deploy the Robomaster Dev Board development Environment under linux for you.

Prepare

ST-Link / J-Link (I used ST-link-mini-v2 by waveshare and J-link edu mini)

STlink-v2-mini

Jlink-edu-mini

Robomaster Development Board (Type A / Type C)

Type-A Board

Type-C Board

SWD cable link (4pin)

A linux host (of course, you can use Raspberry Pi Arm64, but stink v2 mini can’t work on arm64)

Pins Connect Figure

The above figure exhibits how to connect the development board and the stlink/jlink flasher. The more information you can reference from the Robomaster Development Board Type C User Manual.

Using STlink v2 mini is very simple, but it only works under Windows and Linux AMD64, so I will only talk about how to use jlink with the dev board because Jlink also supports macOS and linux arm64.

Jlink Edu Mini Connect

Jlink edu mini use 9-pin JTAG

On the mini board, the No.1 pin is VTref, and you need to use a 9-pin ribbon cable to connect the jlink edu mini because the width between pins is 1.27mm. You also need a SWD (2*5 1.27mm) Cable Breakoutput to connect the robomaster dev board.

Let us connect pins between the Jlink mini and the dev board. Exhibit below figure.

Jlink 9 pin cable connect to dev board SWD interface

For example, I used Raspberry Pi as my linux host and the connections like below.

Flash the dev Board

Download and Install STM32CubeMX for Linux
Download and Install Opened

Download and Install gcc-arm-none-eabi-10.3-2021.10-aarch64-linux.tar.bz2
Download and Install Jlink Software for arm64
git clone https://github.com/RoboMaster/Development-Board-C-Examples.git
open example project ioc file in STM32CubeMX
Convert the Toolchain/IDE to makefile in ProjectManager option, then 尤酷播午夜伦理 GENERATE CODE
Run the Make command under the example project you want, then you can find the elf binary file under the build directory

Download program to robomaster dev board

In this sense, I will flash 1.light_led ELF file to the dev board using the openocd command.

Jlink openocd config file (jlink.cfg)

source [find interface/jlink.cfg]
transport select swd
source [find target/stm32f4x.cfg]

program build/light_led.elf verify reset exit

STLink openocd config file (stlink.cfg)

ource [find interface/stlink.cfg]
source [find target/stm32f4x.cfg]
program build/light_led.elf verify reset exit

Flash command line:

# Use STLink
openocd -f ./stlink.cfg

# Use Jlink
openocd -f ./jlink.cfg

Check if it works, change the main code in Src/main.c, only light the red led.

  while (1)
  {
    /* USER CODE END WHILE */

    /* USER CODE BEGIN 3 */
        //set GPIO output high level
        HAL_GPIO_WritePin(LED_R_GPIO_Port, LED_R_Pin, GPIO_PIN_SET);
        HAL_GPIO_WritePin(LED_G_GPIO_Port, LED_G_Pin, GPIO_PIN_RESET);
        HAL_GPIO_WritePin(LED_B_GPIO_Port, LED_B_Pin, GPIO_PIN_RESET);
  }
  /* USER CODE END 3 */
}

Now, the Red LED is lighting.

chenfeng — Mon, 02 Oct 2023 13:55:33 +0000

Transformer

Last year when OpenAI published ChatGPT, the large language model (LLM) making waves in AI community. why chatgpt works so well? can we understand the internal of GPT?

Deep learning neural network like a black box, we can only know the result from it, but can’t understand the data process in it’s internal. for example deign a MLP neural neural work, does the number of layers need 5 or 6 or more to get the more accurate answers? sorry no theory to prove that because of the deep neural network can’t explained. so turn to ChatGPT, we will dive into the chatgpt internal to understand it’s operational mechanism.

Tansformer is the basic architecture of GPT, so the first things first, we will use mathematics language to understand the transformer model.

The Transformer model has four parts:

Embedding
Encoder
Decoder
Softmax

we will use Wolfram language and mathematics to struct these four parts.

EmbeddingBlock

Tokenizer

when we put text sequences to transformer net, we need to convert the texts string type into numericArray type, but what methods can we convert language text to numericArray. The answer is “SubWordTokens” method. Suppose, we have a big dictionary book which have all meta tokens that represent all meta words. we use “vocabulary” to representation of this dictionary book. so we define the vocabulary size is N, the sequences of language text can be represent some points set in vocabulary N dimensions space. echo vocabulary element is assigned a uniq index:

\text{[N] = } \left\{1,2,\text{...},N_v\right\}

A piece of text is represented as a sequence of indices , we call it as Token IDs, corresponding to its subwords. In vocabulary there are three special tokens:

mask token (used in masked language modeling)
bos token (represent beginning of sequence)
eos token (represent end of sequence)

For example:

Wolfram " "] (* output is "what is transformer language model" *) (* Ġ represent whitespace *)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="5b84fb74 code-block-pro-copy-button">

vocabulary = NetExtract[ResourceFunction["GPTTokenizer"][], "Tokens"];
voca = NetExtract[ResourceFunction["GPTTokenizer"][], "Tokens"];
(* tokenizer encoder: convert text strings to tokenIDs *)
netTextEncoder = ResourceFunction["GPTTokenizer"][]

(* tokenizer decoder: convert tokenIDs to string sequence*)
netTextDecoder = 
 NetDecoder[{"Class", 
   NetExtract[ResourceFunction["GPTTokenizer"][], "Tokens"]}]

seq = netTextEncoder["what is transformer language model"]
(* represent sequence tokenIDs *)
(* output is {10920, 319, 47386, 3304, 2747}*)

(* convert tokenIDs to sequence *)

(* map tokenIDs to vector of vocabulary space *)
tokenids = Map[UnitVector[Length@voca, #] &, seq] 

StringReplace[StringJoin@Map[netTextDecoder[#] &, tokenids], 
 "Ġ" -> " "]
(* output is "what is transformer language model" *)
(* Ġ represent whitespace *)

Now we can represent string sequences as tokenIDs that easier to feed into our neural network.

Embedding

The embedding layer converts an input sequence of tokens into a sequence of embedding vectors, we call it as Context.

V represent vocabulary
N represent vocabulary index
e represent embedding Token
W represent matrix
W(All, i) represent the i column of the matrix
W(i, All) represent the i row of the matrix
t represent index of token in a sequence
l represent length of token sequence

Token Embedding Algorithm:

\text{Input} \text{: v $\in $ V} \left[N_v\right] \text{, } a \;\text{token} \; \text{ID} \\ \text{Output} \text{: e $\in $ } \mathbb{R}^{d_e}, \text {the}\, \text {vector} \,\text{of}\, \text{representation} \, \text {of}\, \text{the} \,\text{token} \\ \text{Parameters} \text{: } W_e \text{$\in $ } \mathbb{R}^{d_e * N_v} \text{, the token embedding matrix } \\ \text{Return} \text{: e = } W_e(\text{All},v)

Because transformer mode only use forward network, no recurrent or backward network, the model essentially can not know the positional of token. so we need feed some token positional information into the network, we call it positional embedding.

Positional hardcode Embedding Algorithm:

W_p \text{ : $\mathbb{N}$ $\rightarrow $ } \mathbb{R}^{d_e} \text{, use equations:} \\ W_p \text{[2i - 1, t] = } \sin \left(\frac{t}{\ell _{\max }^{\frac{2 i}{d_e}}}\right) \\ W_p \text{[2i , t] = } \cos \left(\frac{t}{\ell _{\max }^{\frac{2 i}{d_e}}}\right) \\ \text{0 $<$ i $\leq $ } \frac{d_e}{2} \\ \text{For the t-} \text{th} \text{ token , the embedding is:} \\ \pmb{\text{e = }} W_e(\text{All},x(t))+W_p(\text{All},t)

Wolfram {"Varying", NetEncoder[{"Class", vocabulary}]}]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="f32155e794e47528 code-block-pro-copy-button">

(* learned positional embeding *)
embedding[embedDim_, vocabulary_] := 
 NetInitialize@NetGraph@FunctionLayer[
    (* learned position embeding  *)
    Block[{emb1, emb2, posembed},
      emb1 = EmbeddingLayer[embedDim][#Input];
      posembed = SequenceIndicesLayer[embedDim][#Input];
      emb2 = EmbeddingLayer[embedDim][posembed];
      emb1 + emb2 ] &,
    "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}]

token embedding + trained able positional embedding net graph, (context dimension is 128):

For example, we set embedding dimension is 64:

Wolfram

ArrayPlot[
 embedding[64, voca][
  StringSplit["what is transformer language model"]]]

“what is transformer language model” sequence will be represented by a numericArray. every row representation one word. black, white, grayscale squares mean the value of embedding output, the dimensions of output is (5, 64)

Wolfram List /@ Table[1./10000.^(2 i/embedDim ), {i, embedDim/2}], "Biases" -> None, "Input" -> "Integer", LearningRateMultipliers -> 0 ] ]; embeddingSinusoidal[embedDim_, vocabulary_,dropout_ : 0.1] := NetInitialize@NetGraph[ <| "sequenceLength" -> SequenceIndicesLayer[], "coeffs" -> coeffsPositionalEncoding[embedDim], "sin" -> ElementwiseLayer[Sin], "cos" -> ElementwiseLayer[Cos], "catenate" -> CatenateLayer[2], "+" -> ThreadingLayer[Plus], "dropout" -> DropoutLayer[dropout], "embeddingTokenID" -> EmbeddingLayer[embedDim, "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}] |>, { NetPort["Input"] -> "sequenceLength" -> "coeffs" -> {"sin", "cos"} -> "catenate" -> "+" -> "dropout", NetPort["Input"] -> "embeddingTokenID" -> "+"} ]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="98c58993772c33 code-block-pro-copy-button">

(* hard code positional embeding *)
coeffsPositionalEncoding[embedDim_] := 
  NetMapOperator[
   LinearLayer[embedDim/2,
    "Weights" -> 
     List /@ Table[1./10000.^(2 i/embedDim ), {i, embedDim/2}],
    "Biases" -> None, "Input" -> "Integer",
    窝窝午夜伦理片 -> 0
    ]
   ];

embeddingSinusoidal[embedDim_, vocabulary_,dropout_ : 0.1] := NetInitialize@NetGraph[
   <|
    "sequenceLength" -> SequenceIndicesLayer[],
    "coeffs" -> coeffsPositionalEncoding[embedDim],
    "sin" -> ElementwiseLayer[Sin],
    "cos" -> ElementwiseLayer[Cos],
    "catenate" -> CatenateLayer[2],
    "+" -> ThreadingLayer[Plus],
    "dropout" -> DropoutLayer[dropout],
    "embeddingTokenID" -> 
     EmbeddingLayer[embedDim, 
      "Input" -> {"Varying",  NetEncoder[{"Class", vocabulary}]}]
    |>,
   {
    NetPort["Input"] -> 
     "sequenceLength" ->  
      "coeffs" -> {"sin", "cos"} -> "catenate" ->  "+" -> "dropout",
    NetPort["Input"] -> "embeddingTokenID" -> "+"}
   ]

Positional Learned Embedding Algorithm:

\text{Input} \text{: $\ell $ $\in $ } \left[\ell _{\max }\right] \text{, position a token in the sequence} \\ \text{Output} \text{: } e_p \text{ $\in $ } \mathbb{R}^{d_e}, \text {the vector representation of the position} \\ \text{Parameters} \text{: } W_p \text{$\in $ } \mathbb{R}^{d_e * \ell _{\max }}, \text {the positional embedding matrix} \\ \text{Return} \text{: e = } W_p(\text{All},\ell )

token embedding + hard code positional embedding net graph (context dimension is 128):

Wolfram

ArrayPlot[
 embeddingSinusoidal[64, voca][
  StringSplit["what is transformer language model"]]]

we can see that the squares of the rows are easy to distinguish, just like the learned embedding.

now, we combine all the functions into embeddingBlock:

Wolfram None, "voca" -> None, "hardCode" -> False}; embeddingBlock[OptionsPattern[]] := Block[ {embedding, embeddingSinusoidal, posencoding, embedDim = OptionValue["depth"], vocabulary = OptionValue["voca"], posWeightHardCode = OptionValue["hardCode"]}, posencoding[embedDim_] := NetMapOperator[ LinearLayer[embedDim/2, "Weights" -> List /@ Table[1./10000.^(2 i/embedDim ), {i, embedDim/2}], "Biases" -> None, "Input" -> "Integer", LearningRateMultipliers -> 0 ] ]; embedding[embedDim_, vocabulary_] := NetInitialize@NetGraph@FunctionLayer[ (* learned position embeding *) Block[{emb1, emb2, posembed, add, dropout}, emb1 = EmbeddingLayer[embedDim][#Input]; posembed = SequenceIndicesLayer[embedDim][#Input]; emb2 = EmbeddingLayer[embedDim][posembed]; add = emb1 + emb2; dropout = DropoutLayer[0.1][add]] &, "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}]; embeddingSinusoidal[embedDim_, vocabulary_, dropout_ : 0.1] := NetInitialize@NetGraph[ <| "sequenceLength" -> SequenceIndicesLayer[], "coeffs" -> posencoding[embedDim], "sin" -> ElementwiseLayer[Sin], "cos" -> ElementwiseLayer[Cos], "catenate" -> CatenateLayer[2], "+" -> ThreadingLayer[Plus], "dropout" -> DropoutLayer[dropout], "embeddingTokenID" -> EmbeddingLayer[embedDim, "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}] |>, { NetPort["Input"] -> "sequenceLength" -> "coeffs" -> {"sin", "cos"} -> "catenate" -> "+" -> "dropout", NetPort["Input"] -> "embeddingTokenID" -> "+"} ]; If[posWeightHardCode, embeddingSinusoidal[embedDim, vocabulary], embedding[embedDim, vocabulary]] ] embeddingBlock["depth" -> 128, "voca" -> voca, "hardCode" -> False]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="15d203c75ec0 code-block-pro-copy-button">

Options[embeddingBlock] = {"depth" -> None, "voca" -> None, 
   "hardCode" -> False};
embeddingBlock[OptionsPattern[]] := Block[
  {embedding, embeddingSinusoidal, posencoding, 
   embedDim = OptionValue["depth"], 
   vocabulary = OptionValue["voca"],
   posWeightHardCode = OptionValue["hardCode"]},
  
  posencoding[embedDim_] := NetMapOperator[
    LinearLayer[embedDim/2,
     "Weights" -> 
      List /@ Table[1./10000.^(2 i/embedDim ), {i, embedDim/2}],
     "Biases" -> None, "Input" -> "Integer",
     窝窝午夜伦理片 -> 0
     ]
    ];
  
  embedding[embedDim_, vocabulary_] := 
   NetInitialize@NetGraph@FunctionLayer[
      (* learned position embeding  *)
      Block[{emb1, emb2, posembed, add, dropout},
        emb1 = EmbeddingLayer[embedDim][#Input];
        posembed = SequenceIndicesLayer[embedDim][#Input];
        emb2 = EmbeddingLayer[embedDim][posembed];
        add = emb1 + emb2;
        dropout = DropoutLayer[0.1][add]] &,
      "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}];
  
  embeddingSinusoidal[embedDim_, vocabulary_,  dropout_ : 0.1] := 
   NetInitialize@NetGraph[
     <|
      "sequenceLength" -> SequenceIndicesLayer[],
      "coeffs" -> posencoding[embedDim],
      "sin" -> ElementwiseLayer[Sin],
      "cos" -> ElementwiseLayer[Cos],
      "catenate" -> CatenateLayer[2],
      "+" -> ThreadingLayer[Plus],
      "dropout" -> DropoutLayer[dropout],
      "embeddingTokenID" -> 
       EmbeddingLayer[embedDim, 
        "Input" -> {"Varying",  NetEncoder[{"Class", vocabulary}]}]
      |>,
     {
      NetPort["Input"] -> 
       "sequenceLength" ->  
        "coeffs" -> {"sin", "cos"} -> 
          "catenate" ->  "+" -> "dropout",
      NetPort["Input"] -> "embeddingTokenID" -> "+"}
     ];
  If[posWeightHardCode,
   embeddingSinusoidal[embedDim, vocabulary],
   embedding[embedDim, vocabulary]]
  ]
  
embeddingBlock["depth" -> 128, "voca" -> voca, "hardCode" -> False]

Embedding Block

EncoderBlock

Figure-1

SelfAttention Unit

Attention is the main architectural component of transformer. It enables a neural network to make use of context information for predicting the current token.

Basic Single Query Attention Algorithm:

\text{Input} \text{: e $\in $ } \mathbb{R}^{d_{\text{in}}} \text{ ,} \text {vector represent of the current token, Q in figure-2} \\ \text{Input} \text{: } e_t \text{ $\in $ } \mathbb{R}^{d_{\text{in}}} \text{, } \text {vector represent of context tokens, K, V in figure-2} \\ \text{Output} \text{: } \tilde{v} \text{$\in $ } \mathbb{R}^{d_{\text{out}}} , \text {vector representation of the token and context combined} \\ \text{Parameters} \text{: } W_q,W_k \text{$\in $ } \mathbb{R}^{d_{\text{atten}} * d_{\text{in}}} \text{, } b_q,b_k \text{$\in $ } \mathbb{R}^{d_{\text{atten}}} \text { the query and key linear projections} \\ \text{Parameters} \text{: } W_v \text{ $\in $ } \mathbb{R}^{d_{\text{in}} *d_{\text{out}}} \text{, } b_v \text{$\in $ } \mathbb{R}^{d_{\text{out}}}, \text {the value linear projection}

Figure-2

Attention Pseudo Code: (UnMasked SelfAttention)

\text{q $\leftarrow $ } e W_q \text{ + } b_q \\ \text{$\forall $t: } k_t \text{$\leftarrow $ } b_k+e_t W_k \\ \text{$\forall $t: } v_t \text{$\leftarrow $ } b_v+e_t W_v \\ \text{$\forall $t: } \alpha _t \text{= } \frac{e^{\frac{k_t q^T}{\sqrt{d_{\text{atten}}}}}}{\sum _u e^{\frac{k_u q^T}{\sqrt{d_{\text{atten}}}}}} = \text{Attention(Q, K, V) } = \text{softmax}\left(\frac{Q K^T}{\sqrt{d_k}} \right) \text V \\ \tilde{v} \text{ = } \sum _{t=1}^T \alpha _t v_t

Q, K, V are linear processor, Q maps current token to a query vector, K maps current context to a key vector, V maps current context to a value vector
Q and K matrix multiplication, interpreted as the degree to which token t is important for predicting the current token q
softmax combine with V use matrix multiplication

Masked SelfAttention Algorithm:

\text{Input: X $\in $ } \mathbb{R}^{d_x * \ell _x}, Z \text{$\in $ } \mathbb{R}^{d_z * \ell _z}, \text {vector representations of primary and context sequence.} \\ \text{Output: } \tilde{V} \text{ $\in $ } \mathbb{R}^{d_{\text{out}} * \ell _x} \text {updated representations of tokens in X, folding in information from tokens in Z} \\ \text{Parameters: } W'_{\text{qkv}} \text{ consisting of:} \\ \text{$\quad \quad |$ } W_q \text{$\in $ } \mathbb{R}^{d_{\text{atten}} * d_x} \text{, } b_q \text{ $\in $ } \mathbb{R}^{d_{\text{atten}}} \\ \text{$\quad \quad |$ } W_k \text{$\in $ } \mathbb{R}^{d_{\text{atten}}* d_z} \text{, } b_k \text{ $\in $ } \mathbb{R}^{d_{\text{atten}}} \\ \text{$\quad \quad |$ } W_v \text{$\in $ } \mathbb{R}^{d_{\text{out}}* d_z} \text{, } b_v \text{ $\in $ } \mathbb{R}^{d_{\text{out}}} \\ \text{Hyperparameters: Mask $\in $ } \{0,1\}^{\ell _x \ell _z} \\ \text{Mask}\left[t_z,t_x\right] \text{=1}, \text {for bidirectional attention} \\ \text{Mask}\left[t_z,t_x\right] \text{ = 0,} \text {for unidirectional attention when } t_z < t_x

Masked SelfAttention pseudo code:

\text{Q $\leftarrow $ } 1^T b_q+X W_q \text{ [[Query $\in $ } \mathbb{R}^{d_{\text{atten}} * \ell _x} \text{]]} \\ \text{K $\leftarrow $ } 1^T b_k+Z W_k \text{ [[Key $\in $ } \mathbb{R}^{d_{\text{atten}} * \ell _z} \text{]]} \\ \text{V $\leftarrow $ } 1^T b_v+Z W_v \text{ [[Value $\in $ } \mathbb{R}^{d_{\text{atten}}* \ell _z} \text{]]} \\ \text{S $\leftarrow $ } Q K^T \text{ [[Score $\in $ } \mathbb{R}^{\ell _x * \ell _z} \text{]]} \\ \text{For All } t_z, t_x, \text{if} \text{ not } \text{Mask}\left[t_z,t_x\right], \text{then } S\left[t_z\right. \text{, } t_x \text{] $\leftarrow $ -$\infty $} \\ \tilde{V} \text{ = V $\cdot $ softmax(S/} \sqrt{d_{\text{attn}}} )

classes of selfAttentions:

Unmasked SelfAttention
- Apply attention to each token, the the context k has all sequence tokens
Masked SelfAttention
- Apply attention to each token, but the context has only preceding tokens, using causal mask ensures that each location only has access to the location that come before it, this version can be used for online prediction.
Cross Attention
- often used in sequence to sequence task, give two sequences of token representations X,Z, use Z as context sequence, and set mask=1, the output V’s length as same as input X, but the Z’s length can be different with X’s length.

\text{MultiHead(Q, K, V) = } \text{Concat}\left(\text{head}_1,\text{head}_2,\text{...}.,\text{head}_n\right. ) W^O

Multi head self attention Algorithm

the structure of MHA is same as Basic Single Query Attention, but it has multiple attention heads, with separate learnable parameters. (figure-2)

\text{Input} \text{: X $\in $ } \mathbb{R}^{d_x * \ell _x} \text{, Z $\in $ } \mathbb{R}^{d_z * \ell _z} \text {vector representation of primary and context sequence} \\ \text{output} \text{: } \tilde{v} \text{$\in $ } \mathbb{R}^{d_{\text{out}} * \ell _x}\text {updated representation of tokens in X, folding in information from tokens in Z} \\ \text {Hyperparameters: H, number of attention heads} \\ \text{HyperParameters: } \text{Mask $\in $ } \{0,1\}^{\ell _x * \ell _z} \\ \text{Parameters} \text{: } W' \text{consisting} \text{of}: \\ \text{For h $\in $ [H], } \left(W'\right)_{\text{qkv}}^h \text{ consisting of :} \\ \text{$\quad \quad |$ } W_q^h \text{$\in $ } \mathbb{R}^{d_{\text{atten}}* d_x} \text{, } b_q^h \text{ $\in $ } \mathbb{R}^{d_{\text{atten}}} \\ \text{$\quad \quad |$ } W_k^h \text{$\in $ } \mathbb{R}^{d_{\text{atten}}* d_z} \text{, } b_k^h \text{ $\in $ } \mathbb{R}^{d_{\text{atten}}} \\ \text{$\quad \quad |$ } W_v^h \text{$\in $ } \mathbb{R}^{d_{\text{mid}}*d_z} \text{, } b_v^h \text{ $\in $ } \mathbb{R}^{d_{\text{mid}}} \\ W_O \text{ $\in $ } \mathbb{R}^{d_{\text{out}} *\text{Hd}_{\text{mid}}} \text{, } b_O \text{$\in $ } \mathbb{R}^{d_{\text{out}}}

Multi head self attention Pseudo Code

\text{For h $\in $ [H]} \\ Y^h \text{ $\leftarrow $ Attention(X, Z$|$} \left(W'\right)_{\text{qkv}}^h \text{, Mask)} \\ Y \text{ $\leftarrow $ } \left[Y^1;Y^2;Y^3,\text{Null}\ldots \right. \text{;, } \left.Y^H\right] \\ \tilde{V} \text{ = } Y W_O \text{ + } I^T b_O

Wolfram 0.0001 ][#Input]; keys = NetMapOperator[{heads, embedDim/heads}][seq]; queries = NetMapOperator[{heads, embedDim/heads}][seq]; values = NetMapOperator[{heads, embedDim/heads}][seq]; attention = AttentionLayer["Dot", "MultiHead" -> True, "Mask" -> masking, "ScoreRescaling" -> "DimensionSqrt"][<|"Key" -> keys, "Query" -> queries, "Value" -> values|>]; merge = NetMapOperator[embedDim][attention]; seq = DropoutLayer[0.1][merge]; seq = seq + #Input ] &, "Input" -> {"Varying", embedDim} ] ] ]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="fa2a7bc9 code-block-pro-copy-button">

selfAttentionBlock[embedDim_, heads_, masking_ : None] := 
 NetInitialize[
  NetGraph[
   FunctionLayer[
    Block[{keys, queries, values, seq, attention, merge},
      (* pre layer normalization*)
      seq = 
       NormalizationLayer[2 ;;, "Same", 
         "Epsilon" -> 0.0001 ][#Input];
      keys = NetMapOperator[{heads, embedDim/heads}][seq];
      queries = NetMapOperator[{heads, embedDim/heads}][seq];
      values = NetMapOperator[{heads, embedDim/heads}][seq];
      attention = 
       AttentionLayer["Dot", "MultiHead" -> True, "Mask" -> masking, 
         "ScoreRescaling" -> "DimensionSqrt"][<|"Key" -> keys, 
         "Query" -> queries, "Value" -> values|>];
      merge = NetMapOperator[embedDim][attention];
      seq = DropoutLayer[0.1][merge];
      seq = seq + #Input
      ] &, "Input" -> {"Varying", embedDim}
    ]
   ]
  ]

figure-3

if the model’s embedding dimension is 128, and use 8 attention heads, then the key’s query’s and value’s dimension is n * 8 * 16, in the selfAttentionBlock we use linearLayer to merge all attention heads to n * 128 dimension.

Feed Forward Network

Wolfram 0.0001][#Input]; seq = NetMapOperator[4 embedDim][seq]; seq = ElementwiseLayer["GELU"][seq]; seq = NetMapOperator[embedDim][seq]; seq = DropoutLayer[0.1][seq]; seq = seq + #Input ] &, "Input" -> {"Varying", embedDim} ] ] ] Information[feedForwardBlock[128], "SummaryGraphic"]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="7d3e3e36 code-block-pro-copy-button">

feedForwardBlock[embedDim_] := NetInitialize[
  NetGraph[
   FunctionLayer[
    Block[{seq},
      seq = 
       NormalizationLayer[2 ;;, "Same", "Epsilon" -> 0.0001][#Input];
      seq = NetMapOperator[4 embedDim][seq];
      seq = ElementwiseLayer["GELU"][seq];
      seq = NetMapOperator[embedDim][seq];
      seq = DropoutLayer[0.1][seq];
      seq = seq + #Input
      ] &,
    "Input" -> {"Varying", embedDim}
    ]
   ]
  ]
  
Information[feedForwardBlock[128], "SummaryGraphic"]

In feed forward network, there are two linearLayers, the first one’s output dimension is n*512, then use a “GELU” activation, after that, the data will be process by another one which the output dimension is n * 128. Notice that in the attention and feedForward network, we use pre layer normalization arrangements, this also same in decoder network.

Pre Layer Normalization

Post Layer Normalization

Decoder

the decoder network’s components are as same as encoder, but the attention layer will be removed by masked attention layer and cross attention layer. (figure-1)

Wolfram 0.0001 ][#Input]; keys = NetMapOperator[{heads, embedDim/heads}][seq]; values = NetMapOperator[{heads, embedDim/heads}][seq]; queries = NetMapOperator[{heads, embedDim/heads}][ #Query]; seq = AttentionLayer["Dot", "MultiHead" -> True, "ScoreRescaling" -> "DimensionSqrt"][<|"Key" -> keys, "Query" -> queries, "Value" -> values|>]; seq = NetMapOperator[embedDim][seq]; seq = DropoutLayer[0.1][seq]; seq = #Query + seq ] &, "Input" -> {"Varying", embedDim}, "Query" -> {"Varying", embedDim} ] ] ] Information[crossAttentionBlock[128, 8], "SummaryGraphic"]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="6064f9ef9dced92e code-block-pro-copy-button">

crossAttentionBlock[embedDim_, heads_] := NetInitialize[
  NetGraph[
   FunctionLayer[
    Block[{keys, queries, values, seq},
      seq = 
       NormalizationLayer[2 ;;, "Same", 
         "Epsilon" -> 0.0001 ][#Input];
      keys = NetMapOperator[{heads, embedDim/heads}][seq];
      values = NetMapOperator[{heads, embedDim/heads}][seq];
      queries = NetMapOperator[{heads, embedDim/heads}][ #Query];
      seq = 
       AttentionLayer["Dot", "MultiHead" -> True, 
         "ScoreRescaling" -> "DimensionSqrt"][<|"Key" -> keys, 
         "Query" -> queries, "Value" -> values|>];
      seq = NetMapOperator[embedDim][seq];
      seq = DropoutLayer[0.1][seq];
      seq = #Query + seq
      ] &,
    "Input" -> {"Varying", embedDim},
    "Query" -> {"Varying", embedDim}
    ]
   ]
  ]
Information[crossAttentionBlock[128, 8], "SummaryGraphic"]

Cross Attention Layer

The input socket’s data (Key, Value) of cross attention layer is from output of encoder stack, the query socket’s data (Query) of cross attention layer is from the output of decoder masked attention layer.

Encoder and Decoder Stack

Now we have encoder and decoder block, we can make them stack into new big neural network.

Wolfram embedDim, "voca" -> vocabulary, "hardCode" -> False]; encoderBlock = NetChain[{selfAttentionBlock[embedDim, heads], feedForwardBlock[embedDim]}]; NetInitialize@NetGraph@FunctionLayer[ Block[{embedding, block}, embedding = emBlock[#Input]; block = encoderBlock[embedding]; Do[block = encoderBlock[block], {blocks - 1}]; block ] &, "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]} ] ] encoderStack[128, 8, 1, voca]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="e136539f74 code-block-pro-copy-button">

encoderStack[embedDim_, heads_, blocks_Integer, vocabulary_] := 
 Block[
  {emBlock, encoderBlock},
  emBlock = 
   embeddingBlock["depth" -> embedDim, "voca" -> vocabulary, 
    "hardCode" -> False];
  encoderBlock = 
   NetChain[{selfAttentionBlock[embedDim, heads], 
     feedForwardBlock[embedDim]}];
  NetInitialize@NetGraph@FunctionLayer[
     Block[{embedding, block},
       embedding = emBlock[#Input];
       block = encoderBlock[embedding];
       Do[block = encoderBlock[block], {blocks - 1}];
       block
       ] &,
     "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}
     ]
  ]
encoderStack[128, 8, 1, voca]

1*encoder stack (尤酷播午夜伦理 to zoom)

6 * encoder stack

Wolfram embedDim, "voca" -> vocabulary, "hardCode" -> False]; decoderBlock = NetFlatten@NetGraph[ <|"maskedSelfAtt" -> selfAttentionBlock[embedDim, heads, "Causal"], "crossatt" -> crossAttentionBlock[embedDim, heads], "FFN" -> feedForwardBlock[embedDim]|>, {NetPort["Input"] -> NetPort["maskedSelfAtt", "Input"], "maskedSelfAtt" -> NetPort["crossatt", "Query"], NetPort["EncoderInput"] -> NetPort["crossatt", "Input"], "crossatt" -> "FFN"} ]; NetInitialize@NetGraph@FunctionLayer[ Block[{emb, block, , linear, softmax}, emb = emBlock[#Input]; block = decoderBlock[emb]; Do[block = decoderBlock[block], {blocks - 1}]; linear = LinearLayer[] /@ block; softmax = SoftmaxLayer[] /@ linear ] &, "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]}, "Output" -> {"Varying", NetDecoder[{"Class", vocabulary}]} ] ] decoderStack[128, 8, 1, voca]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="582fe14fab00 code-block-pro-copy-button">

decoderStack[embedDim_, heads_, blocks_Integer, vocabulary_] := 
 Block[
  {emBlock, decoderBlock},
  emBlock = 
   embeddingBlock["depth" -> embedDim, "voca" -> vocabulary, 
    "hardCode" -> False];
  decoderBlock = NetFlatten@NetGraph[
     <|"maskedSelfAtt" -> 
       selfAttentionBlock[embedDim, heads, "Causal"],
      "crossatt" -> crossAttentionBlock[embedDim, heads],
      "FFN" -> feedForwardBlock[embedDim]|>,
     {NetPort["Input"] -> NetPort["maskedSelfAtt", "Input"],
      "maskedSelfAtt" -> NetPort["crossatt", "Query"],
      NetPort["EncoderInput"] -> NetPort["crossatt", "Input"],
      "crossatt" -> "FFN"}
     ];
  NetInitialize@NetGraph@FunctionLayer[
     Block[{emb, block, , linear, softmax},
       emb = emBlock[#Input];
       block = decoderBlock[emb];
       Do[block = decoderBlock[block], {blocks - 1}];
       linear = LinearLayer[] /@ block;
       softmax = SoftmaxLayer[] /@ linear
       ] &,
     "Input" -> {"Varying", NetEncoder[{"Class", vocabulary}]},
     "Output" -> {"Varying", NetDecoder[{"Class", vocabulary}]}
     ]
  ] 
decoderStack[128, 8, 1, voca]

1 * decoder stack

6 * decoder blocks

finally we merge the encoder stack and decoder stack (the completed transformer model):

Training Transformer Model

we will training sequence to sequence transformer model to translate sequence from English to French. So how to training seq to seq transformer model?

Training Algorithm:

\text{Input: } \left\{x_n,y_n\right\},\text{a dataset of sequence pairs, dataset size is } N_{\text{data}} \\ \text{Input: $\theta $, initial transformer parameters} \\ \text{Output: } \tilde{\theta } \text{, the trained parameters} \\ \text{for i = 1,2,3 ... } N_{\text{epochs }} \text{do} \\ \text{for n = 1,2,3, ..., } N_{\text{data }} \text{do} \\ \text{$\ell $ $\leftarrow $ } \text{length}\left(y_n\right) \\ \text{P($\theta $) $\leftarrow $ } \text{TransformerNet}\left(x_n\right. \text{, } y_n \text{$|\theta $)} \\ \text{loss($\theta $) = -} \sum _{t=1}^{\ell -1} \text{LogP}(\theta )\left[y_n(t+1),t\right] \\ \theta \leftarrow \theta -\theta \eta \cdot \nabla \text{loss} \\ \text {end for} \\ \text {end for} \\ \text{return } \tilde{\theta } \text{ = $\theta $}

The Algorithm can be explained below diagram: SourceInput is X, Target Input is Y(Left shifted by1), Labels is Y (Right shifted by 1), [start] [end] representation of BOS(begin of sequence), EOS(end of sequence)

Prepare Datasets:

Wolfram

(* Dataset *)
(* please download the database from https://www.manythings.org/anki/ *)
dataFilePath = "~/CineNeural/notebooks/Datasets/fra-eng/fra.txt"
text = Import[dataFilePath];
sentencePairs = Rule @@@ Part[StringSplit[StringSplit[text, "\n"], "\t"], All, {1, 2}];
$randomSeed = 1357;
SeedRandom[$randomSeed];
trainingSet = RandomSample[sentencePairs];

netEncoder = ResourceFunction["GPTTokenizer"][];
netDecoder = NetDecoder[{"Class", NetExtract[ResourceFunction["GPTTokenizer"][], "Tokens"]}]

trainingTokens = MapAt[netEncoder, trainingSet, {All, {1, 2}}];
(* in the training Tokens we use 50257 tokenid as bos and eos *)
trainingTokens = MapAt[Join[{50257}, #1, {50257}] &, trainingTokens, {All, {1, 2}}];

x = Keys[trainingTokens]; (* source input *)
y = Values[trainingTokens]; (* target input *)

create training network:

Wolfram most, "EncoderInput" -> encode|>]; CrossEntropyLossLayer["Index"][{decode, rest}]] &, "SourceSequence" -> NetEncoder[{"Class", voca}], "TargetSequence" -> NetEncoder[{"Class", voca}]]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="c827c781b36836 code-block-pro-copy-button">

encoder = encoderStack[128, 8, 6, voca];
decoder = decoderStack[128, 8, 6, voca];
transformerNet = 
 NetGraph@
  FunctionLayer[
   Block[{encode, decode, most, rest}, most = Most[#TargetSequence];
     rest = Rest[#TargetSequence]; 
     encode = encoder[#SourceSequence];
     decode = decoder[<|"Input" -> most, "EncoderInput" -> encode|>];
     CrossEntropyLossLayer["Index"][{decode, rest}]] &, 
   "SourceSequence" -> NetEncoder[{"Class", voca}], 
   "TargetSequence" -> NetEncoder[{"Class", voca}]]

Training Model:

Wolfram x, "TargetSequence" -> y|>, All, MaxTrainingRounds -> 4, ValidationSet -> Scaled[0.1], BatchSize -> 16, TargetDevice -> {"GPU", All}] (*save trained model*) net = result["TrainedNet"] Export["~/transformer-128depth.wlnet", net]" style="color:#d8dee9ff;display:none" aria-label="Copy" class="084eaca8 code-block-pro-copy-button">

result =
 NetTrain[
  transformerNet, <|"SourceSequence" -> x, "TargetSequence" -> y|>,
  All, MaxTrainingRounds -> 4, ValidationSet -> Scaled[0.1],
  BatchSize -> 16,
  TargetDevice -> {"GPU", All}]

(*save trained model*)
net = result["TrainedNet"]
Export["~/transformer-128depth.wlnet", net]

Inference Transformer Trained Model

seq2seq trained model predicting algorithm:

\text{Input}: \text{A seq2seq transformer and trained parameters } \tilde{\theta } \text { of transformer} \\ \text{Input: x $\in $ } V^* \text{, input sequence} \\ \text{Output: } \tilde{x} \text{ $\in $ } V^* \text{, output sequence} \\ \text{Hyperparameters: $\tau $ $\in $ (0, $\infty $)} \\ \tilde{x} \text{ $\leftarrow $ [bos$\_$token]} \\ y\leftarrow 0 \\ \text{while } y\neq \text{eos$\_$token} \text{ do} \\ \text{P $\leftarrow $ TransformerNet(x, } \tilde{x} \text{ $|$ } \tilde{\theta } ) \\ \text{p $\leftarrow $ P[All, } \text{length}\left(\tilde{x}\right) ] \\ \text{sample a token y from q $\propto $ } p^{1/\tau } \\ \tilde{x} \text{ $\leftarrow $ } \left[\tilde{x}\right. \text{, y]} \\ \text {end} \\ \text{return} \tilde{x}

Wolfram NetDecoder[{"Class", voca}]}] translate[sourceSentence_String] := Module[{sourceSequence, translationSequence, translationTokens, tokenEncoder, tokenDecoder}, sourceSequence = trainedEncodeNet[ Join[{50257}, netEncoder[sourceSentence], {50257}]]; (* add bos and eos for source sequence *) tokenEncoder = NetEncoder[{"Class", voca}]; tokenDecoder = NetDecoder[{"Class", NetExtract[ResourceFunction["GPTTokenizer"][], "Tokens"]}]; translationSequence = NestWhile[ Append[ #, tokenEncoder[Last[predictor[<| "Input" -> #, "EncoderInput" -> sourceSequence|>]]]] &, {50257}, (* check last token of the sequence, if not eos token, then append to sequence for next prediction. *) If[Length@# >= 2, Last[#] != 50257, True] &, 1, 512]; translationTokens = Map[tokenDecoder[UnitVector[Length@voca, #]] &, translationSequence]; StringReplace[StringJoin[Cases[translationTokens, _String]], "Ġ" -> " "] ] translate["thank you"] (* return Franch: Merci. *) " style="color:#d8dee9ff;display:none" aria-label="Copy" class="268891cafa3bd2be code-block-pro-copy-button">

trainedNet = 
 Import["/Users/alexchen/CineNeural/neural-models/transformer/\
transformer-v2-128depth.wlnet"]

trainedEncodeNet = NetExtract[trainedNet, "encode"]
trainedDecodeNet = NetExtract[trainedNet, "decode"]
predictor = 
 NetReplacePart[
  trainedDecodeNet, {"Output" -> NetDecoder[{"Class", voca}]}]


translate[sourceSentence_String] := 
 Module[{sourceSequence, translationSequence, translationTokens, 
   tokenEncoder, tokenDecoder},
   
  sourceSequence = 
   trainedEncodeNet[
    Join[{50257}, netEncoder[sourceSentence], {50257}]]; (* add bos and eos for source sequence *)
    
  tokenEncoder = NetEncoder[{"Class", voca}];
  tokenDecoder = 
   NetDecoder[{"Class", 
     NetExtract[ResourceFunction["GPTTokenizer"][], "Tokens"]}];
     
  translationSequence = NestWhile[
    Append[
      #,
      tokenEncoder[Last[predictor[<|
          "Input" -> #,
          "EncoderInput" -> sourceSequence|>]]]] &,
    {50257},
    (* check last token of the sequence, if not eos token, then append to sequence for next prediction. *)
    If[Length@# >= 2, Last[#] != 50257, True] &,
    1, 512];
  translationTokens = 
   Map[tokenDecoder[UnitVector[Length@voca, #]] &, 
    translationSequence];
    
  StringReplace[StringJoin[Cases[translationTokens, _String]], 
   "Ġ" -> " "]
  ]

translate["thank you"] (* return Franch: Merci. *)

Transformer Data Flow

Reference

Attention Is All You Need

Tensorflow transformer tutorials

wolfram-use-transformer-neural-nets

The Illustrated Transformer

The Annotated Transformer

The Encoder-Decoder Transformer Neural Network Architecture – Wolfram Research

Formal Algorithms for Transformers – DeepMind

PyTorch transformer tutorials

Natural Language Processing with Transformers

explainable-ai-for-transformers

2015-08-Understanding-LSTMs

cheatsheet-recurrent-neural-networks

Paper

https://arxiv.org/pdf/2311.01906.pdf

chenfeng — Wed, 30 Aug 2023 15:12:30 +0000

What is Mecanum wheel

The mecanum wheel is an omnidirectional wheel design for a land based vehicle and invented by Swedish Engineer – Bengt Erland Ilon.

Robomaster EP Mecanum wheels

There are a series of free moving rollers attached to the whole circumference of vehicle’s wheel, these rollers typically have a 45 degree to the axle line, and freely about axes in the plane of the wheel, but the overall side profile of the wheel is circular. How the Mecanum wheel drives the mobile robot?

First we will define the wheels sequences:

FrontRightWheel as w1

FrontLeftWheel as w2
RearLeftWheel as w3
RearRightWheel as w4

How the vehicle moves:

Running all four wheels in the same direction and same speed will result moving the vehicle in a forward or backward.
Running both wheels on one side in one direction and other side in the opposite direction, will result in a static rotation of the vehicle.
Running w1 backward, w2 forward, w3 backward, w4 forward, the result of vehicle will moving to right sideway.
Running w2 forward, w4 forward and stop other wheels, the result of vehicle will moving to top-right diagonally.
Running w2 and w3 forward, and stop other wheels, the result of vehicle will moving rotate around the point on x-axies.
Running w2 forward and w1 backward, and stop other wheels, the result of vehicle will moving rotate around the point on y-axies.

Question:

How the wheel speed map to the mobile robot velocity, if we know robot’s velocity is:

\left\{v_x,v_y,\omega _z\right\} \\ v_x, v_y \text{ - } \text{robot linear} \text{ velocity} \\ \omega _z \text{ - } \text{robot angular} \text{ velocity}

what is the wheels angular velocity need?

solve this map problem, we need to use kinematic model.

Kinematic model

Inertial basis frame A

Define

XY – world: inertial frame, call it frame A
XY – robot: robot’s base frame, call it frame C
XY – wheel: robot’s wheel base frame as frame B, and wheels center point at {xi, yi) in Frame C
Robot’s position is (x,y) in Frame A, and it’s orientation angle is 𝜙
v-x, v-y are wheel’s linear velocity, v-slide is sliding speed, v-drive is direction drive speed.
free sliding direction angle with the Frame B -y axes is 𝛾.
frame B (wheel) angle with frame C(robot) is 𝛽.

𝜔 – wheel’s angular velocity.
v-c is robot’s linear velocity in Frame C, v-a is robot’s linear velocity in Frame A.
𝜔i – robot’s i-th wheel
x,y – the distances from the vehicle robot geometric centers to the axis of the wheels geometric centers.

v_{\text{drive}}=v_x+\tan (\gamma ) v_y \\ v_{\text{slide}}=\frac{v_y}{\cos (\gamma )}

\left( \begin{array}{c} v_x \\ v_y \\ \end{array} \right)=\left( \begin{array}{c} 1 \\ 0 \\ \end{array} \right) v_{\text{drive}}+v_{\text{slide}} \left( \begin{array}{c} -\sin (\gamma ) \\ \cos (\gamma ) \\ \end{array} \right)

\omega =\frac{v_{\text{driver}}}{r}=\frac{v_x+\tan (\gamma ) v_y}{r}

Robot’s base frame, Cartesian coordinate system, Frame C

v_a=\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)=\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos (\phi ) & -\sin (\phi ) \\ 0 & \sin (\phi ) & \cos (\phi ) \\ \end{array} \right).\left( \begin{array}{c} \omega _{\text{cz}} \\ v_{\text{cx}} \\ v_{\text{cy}} \\ \end{array} \right)

v_c=\left( \begin{array}{c} \omega _{\text{cz}} \\ v_{\text{cx}} \\ v_{\text{cy}} \\ \end{array} \right)=\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos (\phi ) & \sin (\phi ) \\ 0 & -\sin (\phi ) & \cos (\phi ) \\ \end{array} \right).\left( \begin{array}{c} \frac{d\phi }{dt} \\ \frac{dx}{dt} \\ \frac{dy}{dt} \\ \end{array} \right)=\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos (\phi ) & \sin (\phi ) \\ 0 & -\sin (\phi ) & \cos (\phi ) \\ \end{array} \right).\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

Inverse Kinematic

we can decompose linear velocity on Frame A (world) to Frame C(robot), then to Frame B(wheel), so we get:

Transform Matrix

Transfer v-a to v-c

T_1=\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos (\phi ) & \sin (\phi ) \\ 0 & -\sin (\phi ) & \cos (\phi ) \\ \end{array} \right).\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

Transfer v-c to v-x, v-y in wheel frame B

Because our vehicle mobile robot can both translational and rotational movements, so the angular velocity also need to projection of linear velocity of wheel.

T_1=\left\{\dot{\phi },\dot{x} \cos (\phi )+\dot{y} \sin (\phi ),\dot{y} \cos (\phi )-\dot{x} \sin (\phi )\right\}

\omega _{\text{robot}}=\dot{\phi } \\ v_{\text{xRobot}}=\dot{x} \cos (\phi )+\dot{y} \sin (\phi ) \\ v_{\text{yRobot}}=\dot{y} \cos (\phi )-\dot{x} \sin (\phi )

v_{\text{xwheel}}=v_{\text{xRobot}}-\sin (\beta ) \omega _{\text{robot}} \sqrt{x_i^2+y_i^2}=v_{\text{xRobot}}-\frac{y_i \omega _{\text{robot}} \sqrt{x_i^2+y_i^2}}{\sqrt{x_i^2+y_i^2}}=v_{\text{xRobot}}-\dot{\phi } y_i \\ v_{\text{ywheel}}=\cos (\beta ) \omega _{\text{robot}} \sqrt{x_i^2+y_i^2}+v_{\text{yrobot}}=x_i \omega _{\text{robot}} \sqrt{x_i^2+y_i^2}+v_{\text{yrobot}}=\dot{\phi } x_i+v_{\text{yrobot}}

So, get the translational matrix from robot frame to wheel frame

T_2=\left( \begin{array}{ccc} -y_i & 1 & 0 \\ x_i & 0 & 1 \\ \end{array} \right).T_1

then, get the rotational matrix from robot frame to wheel frame

T_3=\left( \begin{array}{cc} \cos \left(\beta _i\right) & \sin \left(\beta _i\right) \\ -\sin \left(\beta _i\right) & \cos \left(\beta _i\right) \\ \end{array} \right).T_2

Transfer v-x, v-y to wheel’s angular speed

T_4=\left( \begin{array}{cc} \frac{1}{r_i} & \frac{\tan \left(\gamma _i\right)}{r_i} \\ \end{array} \right).T_3

finally, we get the wheel’s angular velocity

\omega _i=T_1.T_2.T_3.T_4.\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

\omega _i=\left( \begin{array}{cc} \frac{1}{r_i} & \frac{\tan \left(\gamma _i\right)}{r_i} \\ \end{array} \right).\left( \begin{array}{cc} \cos \left(\beta _i\right) & \sin \left(\beta _i\right) \\ -\sin \left(\beta _i\right) & \cos \left(\beta _i\right) \\ \end{array} \right).\left( \begin{array}{ccc} -y_i & 1 & 0 \\ x_i & 0 & 1 \\ \end{array} \right).\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos (\phi ) & \sin (\phi ) \\ 0 & -\sin (\phi ) & \cos (\phi ) \\ \end{array} \right).\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

\omega _i=h_i(\phi ).\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

h_i(\phi )=\left( \begin{array}{ccc} \frac{\sec \left(\gamma _i\right) \left(x_i \sin \left(\beta _i+\gamma _i\right)-y_i \cos \left(\beta _i+\gamma _i\right)\right)}{r_i} & \frac{\sec \left(\gamma _i\right) \cos \left(\beta _i+\gamma _i+\phi \right)}{r_i} & \frac{\sec \left(\gamma _i\right) \sin \left(\beta _i+\gamma _i+\phi \right)}{r_i} \\ \end{array} \right)

specially, when the 𝛟 is 0:

h_i(0)=\left( \begin{array}{ccc} \frac{\sec \left(\gamma _i\right) \left(x_i \sin \left(\beta _i+\gamma _i\right)-y_i \cos \left(\beta _i+\gamma _i\right)\right)}{r_i} & \frac{\sec \left(\gamma _i\right) \cos \left(\beta _i+\gamma _i\right)}{r_i} & \frac{\sec \left(\gamma _i\right) \sin \left(\beta _i+\gamma _i\right)}{r_i} \\ \end{array} \right)

Assume our vehicle robot have four wheels, we get H Matrix:

H=\left( \begin{array}{c} h_1(\phi ) \\ h_2(\phi ) \\ h_3(\phi ) \\ h_4(\phi ) \\ \end{array} \right)

now, we consider the angular of robot’s frame with robot wheel’s frame 𝛽 is 0, and set W1, W2, W3, W4 value, “->” means substitute the symbol value of the formula.

W_1=\left\{\gamma _1\to \frac{\pi }{4},\beta _1\to 0,x_1\to x,y_1\to -y,r_1\to r\right\}\\ W_2=\left\{\gamma _2\to -\frac{\pi }{4},\beta _2\to 0,x_2\to x,y_2\to y,r_2\to r\right\} \\ W_3=\left\{\gamma _3\to \frac{\pi }{4},\beta _3\to 0,x_3\to -x,y_3\to y,r_3\to r\right\} \\ W_4=\left\{\gamma _4\to -\frac{\pi }{4},\beta _4\to 0,x_4\to -x,y_4\to -y,r_4\to r\right\}

define x + y = l, we get the inverse kinematic:

\left( \begin{array}{c} \omega _1 \\ \omega _2 \\ \omega _3 \\ \omega _4 \\ \end{array} \right)=\left( \begin{array}{c} \frac{v_{\text{xRobot}}+v_{\text{yRobot}}+(x+y) \omega _{\text{zRobot}}}{r} \\ \frac{v_{\text{xRobot}}-v_{\text{yRobot}}-(x+y) \omega _{\text{zRobot}}}{r} \\ \frac{v_{\text{xRobot}}+v_{\text{yRobot}}-(x+y) \omega _{\text{zRobot}}}{r} \\ \frac{v_{\text{xRobot}}-v_{\text{yRobot}}+(x+y) \omega _{\text{zRobot}}}{r} \\ \end{array} \right)=\frac{1}{r}.\left( \begin{array}{ccc} 1 & 1 & l \\ 1 & -1 & -l \\ 1 & 1 & -l \\ 1 & -1 & l \\ \end{array} \right).\left( \begin{array}{c} \dot{x} \\ \dot{y} \\ \dot{\phi } \\ \end{array} \right)

Forward Kinematic

we know the inverse kinematic, so it’s very easy to let us compute the forward kinematic through inverse matrix operation.

T=H(0)

\omega =T.\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

T=\left( \begin{array}{ccc} \frac{l}{r} & \frac{1}{r} & \frac{1}{r} \\ -\frac{l}{r} & \frac{1}{r} & -\frac{1}{r} \\ -\frac{l}{r} & \frac{1}{r} & \frac{1}{r} \\ \frac{l}{r} & \frac{1}{r} & -\frac{1}{r} \\ \end{array} \right)=\frac{1}{r}.\left( \begin{array}{ccc} l & 1 & 1 \\ -l & 1 & -1 \\ -l & 1 & 1 \\ l & 1 & -1 \\ \end{array} \right)

T^{-1}.\omega =\left( \begin{array}{c} \dot{x} \\ \dot{y} \\ \dot{\phi } \\ \end{array} \right)

T^{-1}=\left(T^T.T\right)^{-1}.T^T

T^{-1}=\left( \begin{array}{cccc} \frac{r}{4 l} & -\frac{r}{4 l} & -\frac{r}{4 l} & \frac{r}{4 l} \\ \frac{r}{4} & \frac{r}{4} & \frac{r}{4} & \frac{r}{4} \\ \frac{r}{4} & -\frac{r}{4} & \frac{r}{4} & -\frac{r}{4} \\ \end{array} \right)

T^{-1}=\left( \begin{array}{cccc} \frac{r}{4 l} & -\frac{r}{4 l} & -\frac{r}{4 l} & \frac{r}{4 l} \\ \frac{r}{4} & \frac{r}{4} & \frac{r}{4} & \frac{r}{4} \\ \frac{r}{4} & -\frac{r}{4} & \frac{r}{4} & -\frac{r}{4} \\ \end{array} \right)=\frac{r}{4}.\left( \begin{array}{cccc} \frac{1}{l} & -\frac{1}{l} & -\frac{1}{l} & \frac{1}{l} \\ 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ \end{array} \right)

finally, we get the forward kinematic:

\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)=\frac{r}{4}.\left( \begin{array}{cccc} \frac{1}{l} & -\frac{1}{l} & -\frac{1}{l} & \frac{1}{l} \\ 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ \end{array} \right).\left( \begin{array}{c} \omega _1 \\ \omega _2 \\ \omega _3 \\ \omega _4 \\ \end{array} \right)

Conclusion

Through this article, we get the forward kinematic (FK) and inverse kinematic (IK) of Mecanum wheels. Using the FK and IK, we can more controllable to drive our robot which using mecanum wheel.

Robot’s Mecanum wheels parameters:

wheel id	wheels Name	𝛾	𝛽	X	Y	R
1	FrontRight (FR)	𝜋/4	0	x	-y	r
2	FrontLeft (FL)	-𝜋/4	0	x	y	r
3	RearLeft (RL)	𝜋/4	0	-x	y	r
4	RearRight (RR)	-𝜋/4	0	-x	-y	r

Forward Kinematic

\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)=\frac{r}{4}.\left( \begin{array}{cccc} \frac{1}{l} & -\frac{1}{l} & -\frac{1}{l} & \frac{1}{l} \\ 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ \end{array} \right).\left( \begin{array}{c} \omega _1 \\ \omega _2 \\ \omega _3 \\ \omega _4 \\ \end{array} \right)

Inverse Kinematic

\left( \begin{array}{c} \omega _1 \\ \omega _2 \\ \omega _3 \\ \omega _4 \\ \end{array} \right)=\frac{1}{r}.\left( \begin{array}{ccc} l & 1 & 1 \\ -l & 1 & -1 \\ -l & 1 & 1 \\ l & 1 & -1 \\ \end{array} \right).\left( \begin{array}{c} \dot{\phi } \\ \dot{x} \\ \dot{y} \\ \end{array} \right)

References

Modern Robotics Mechanics, Planning, and Control – Chapter: Wheeled Mobile Robots
Kinematic Model of a Four Mecanum Wheeled Mobile.

chenfeng — Fri, 14 Oct 2022 02:45:43 +0000

creating a sci-fi alleyway

Image Reference from GnomonWorkshop

Refining your idea
To have a clear path forward
- Inspiration
- Reference
- Blockout studies
- Photobash && Concept

To have a rough idea of what, final layout and camera angle will be
- Ref man
- Move fast and make big changes
Finishing the blockout
To detail out the final blockout and solidify the camera angle
- Add simple details
- Play with camera angles
Find the crack : to not get overwhelmed with our scene and start designing and creating some props
- Decide which prop to make
- Using Reference, design and create our props
- Consider functionality and efficiency

To go over useful steps for efficiently creating props
- Reference searching basics
- Reusing props and elements of props
- Replacing blockout shapes with props
Initial lighting set up
- Speeding up the rendering processes
- Initial things to consider
- Very basic render settings
Render pass and composite
Proper way to composite layers
Decals and paint adjustments
Glow. Flares, Dof (depth of field). Finishing touch

chenfeng — Wed, 12 Oct 2022 07:27:08 +0000

The Early Years

1902 – Georges Méliès films A Trip to the Moon, using actors in front of painted backdrops to create a fanciful journey.

1903 – Edwin S. Porter directs The Great Train Robbery. Porter creates some of the first 八戒午夜伦理 composites by rewinding the film in camera.

1905 – Norman Dawn, commercial artist and photographer for the Thorpe Engraving Company, experiments with the glass 八戒午夜伦理s on still photographs on advice of his boss, Max Handschiegl.

1907 – With The Missions of California, Norman Dawn produces the first known example of the glass shot. Using the technique to “restore” damage caused by weather to the neglected missions, he places a glass with the painted corrections between the camera and existing buildings.

1912 – Edward Rogers produces what is possibly the first glass shot in England.

1913 – Norman Dawn employs one of the first known uses of “rear projection” by projecting a still film image on a frosted glass plate behind an actor during photography for his western, The Drifter.

1914 – Norman Dawn purchases the new Bell & Howell 2709 camera that is precise enough to do convincing multiple exposures. The camera helps Dawn to develop original negative 八戒午夜伦理八戒午夜伦理 technique.

1916 – Walter Hall, the English art director of D. W. Griffith’s Intolerance, develops his own method of creating the glass shot. He paints the additions to the scene on composition board, cuts them out with a beveled edge, and mounts them in front of the camera. He patented this variation of the glass shot technique, known as “The Hall Process” in 1921.

1920s

1921 – Ferdinand Pinney Earle directs and paints 八戒午夜伦理s for The Rubaiyat of Omar Khayyam. Paul Detlefsen assists.

1922 – Walter Percy (“Pop”) Day introduces the “The Hall Process” to the French film industry in Les Opprimés.

1925 – Warren Newcombe becomes head of the MGM 八戒午夜伦理 department.Ralph Hammeras paints 八戒午夜伦理s for The Lost World. Ferdinand P. Earle paints 八戒午夜伦理s, which include a shooting star over Bethlehem in Ben-Hur.

1927 – Clarence Slifer arrives in Hollywood to become an assistant cameraman after winning a contest in Screenland magazine. Still in Paris, Percy Day uses the “The Hall Process” for Napoleon.

1928 – Linwood G. Dunn joins the visual effects department at RKO.

1929 – Bud Thackery and Paul Grimm paint glass shots of the ark, photographed at the Iverson Ranch for Noah’s Ark.

1930s

1930 – Percy Day develops his version of the latent image technique and applies it in Au Bonheur des Dames.

1933 – Mario and Juan Larrinaga, Byron L. Crabbe, and Henri Hillinck paint the ominous Skull Island and views of New York City for King Kong.

1934 – Returning to England, Percy Day and his assistant and stepson Peter Ellenshaw paint 八戒午夜伦理s for producer Alexander Korda. Day will head the visual effects departments at Denham Studio and later at the Shepperton Studio. Jack Cosgrove and Russell Lawson paint 八戒午夜伦理s for The Black Cat. They team up at the beginning of the 1930s, establishing headquarters at Universal, among other studios.

1935 – Director Alfred Hitchcock has illustrator Fortunino Matania create a 八戒午夜伦理八戒午夜伦理 for the trap sequence at the Royal Albert Hall in The Man Who Knew Too Much.

1936 – Clarence Slifer and Jack Cosgrove paint 八戒午夜伦理s for Garden of Allah the first Technicolor film to use original negative 八戒午夜伦理八戒午夜伦理s.
Jack Cosgrove becomes head of the Selznick International visual effects department.
Ray Kellogg becomes the chief 八戒午夜伦理 painter at Twentieth Century Fox. Emil Kosa, Jr., is his assistant.
Percy Day and assistant Peter Ellenshaw paint 八戒午夜伦理s for Things to Come.

1937 – Albert Maxwell Simpson and Byron Crabbe paint 八戒午夜伦理s for The Prisoner of Zenda.

1939 – Jack Cosgrove supervises and paints 八戒午夜伦理s along with Albert Maxwell Simpson, Jack Shaw, and Fitch Fulton to create the establishing shots of Scarlett’s Tara and views of Atlanta under siege in Gone With the Wind. Clarence Slifer supervises 八戒午夜伦理 camera effects and opticals.
Chesley Bonestell paints 八戒午夜伦理s for The Hunchback of Notre Dame and Only Angels Have Wings.
Fred Sersen supervises and paints 八戒午夜伦理s along with Ray Kellogg on The Rains Came.
Warren Newcombe and his department paint 八戒午夜伦理s for The Wizard of Oz, including one of the most famous 八戒午夜伦理 shots the Emerald City.

chenfeng — Wed, 12 Oct 2022 06:26:06 +0000

Titanic

Bram Stoker’s Dracula

Casino

“The invading army was the technical people who built the machines. At first we [artists] were all confused traditional 八戒午夜伦理八戒午夜伦理 and digital was a head-on collision. There was lots of carnage. Then, eventually, the smoke cleared and it became clear what to do. What happened was artists who were afraid of the thing eventually said, ‘Step aside, let me take a look at that.'”
Robert Stromberg, digital 八戒午夜伦理 painter

Although computer-generated effects had begun appearing in the 1980s, notably with ILM’s “Genesis Effect” of a barren planet becoming transformed into a garden world for Star Trek II, it was not until a decade later that digital technology became reliable and cost effective. The turning point was ILM’s creation of the realistic computer-generated dinosaurs for the 1993 release Jurassic Park.

Predictions of the time, which prophesied the end of all traditional visual effects, were greatly exaggerated. Makeup, creature costumes, animatronic effects, miniatures, and scale models all remain vital crafts, although every one of those disciplines has been changed by computer technology.

But the computer did have a sudden impact on other aspects of the craft. Almost overnight, optical printers were replaced because of the new freedom to scan images into a computer and seamlessly create final composites free of image degradation. And traditional 八戒午夜伦理八戒午夜伦理 was soon transformed, with digital paint programs allowing for new freedoms and, potentially, more complex shots.

But for the new breed of digital 八戒午夜伦理 painter, the transition from brush and oils and canvas to software and computer monitors has not altered the irreducible essence at the heart of any creative equation the inventive mind and talent of the individual artist.

Titanic (1997) (Origin)

Titanic (1997) (Composited)

When director James Cameron was making this film about the doomed 1912 maiden voyage of the Titanic, the production logistics included a nearly full-scale re-creation of the luxury ship and a special studio, built in the Mexican seaside town of Rosarito, that included an eight-acre water tank and three large stages. The production’s scale, and the price tag that went with it, had Hollywood and movie critics primed for the kind of legendary box-office failure that bankrupts studios. What Cameron delivered was the most successful box-office film of all time, with eleven Oscars awarded at the Academy’s annual ceremony.

While the effects-heavy film took full advantage of digital technology, this climactic image of the crew of the Carpathia searching the icy waters for survivors was a fusion of traditional and digital techniques. The shot, created by 八戒午夜伦理 World Digital, combined a live plate of lifeboats shot in Mexico, physical and computer-generated models of floating icebergs, and a live-action smoke element added to the painted smokestack of the Carpathia. The rescue ship itself was created by Chris Evans as an old-fashioned, acrylic-on-Masonite board 八戒午夜伦理. The 八戒午夜伦理 was then photographed and scanned into the computer along with the other elements, including a digitally painted dawn sky, in the final composite image.

The Carpathia 八戒午夜伦理 marked a full circle for Evans, the first artist to take a 八戒午夜伦理八戒午夜伦理 into the digital realm (created at Industrial Light + Magic for a scene of a stained-glass knight magically coming to life in the 1985 release Young Sherlock Holmes). Although 八戒午夜伦理 World Digital had originally considered doing the ship with computer graphics, the looming deadline allowed only two weeks for the creating all the elements and a final composite. It was Evans who suggested it would be quicker to create the ship as a traditional 八戒午夜伦理八戒午夜伦理, a rare recourse to brush and paints in the digital age.

For Evans, the Titanic assignment had a personal echo. His great-grandfather, John Bartholomew, worked for the White Star Line as chief victuals officer and was scheduled to sail on the Titanic maiden voyage as a company VIP. The night before the launch, however, Bartholomew was stricken with an illness and canceled his trip. The notice came so late his luggage was already aboard the ship, and early reports on the disaster listed Bartholomew as one of the casualties. “When he heard that the Titanic went down with so many of the friends he’d worked with for thirty or forty years, he was heartbroken,” Evans recalled in a December 1997 Cinefex special issue on the making of Titanic.

Bram Stoker’s Dracula (1992)

The 1990s was an intriguing decade for 八戒午夜伦理八戒午夜伦理, a time when new digital tools appeared and began to be applied, but also a time when entire productions still embraced traditional effects. One such was Bram Stoker’s Dracula, with director Francis Ford Coppola contracting 八戒午夜伦理 World specifically to create 八戒午夜伦理 shots the old-fashioned way. The film, set in the Victorian times that coincided with the earliest days of movies, inspired Coppola to attempt to use effects appropriate to that era. (八戒午夜伦理 World did, however, dissuade the director from shooting glass shots on location, which, although a seminal effect, had always been laborious and time-consuming even under the best conditions.)

In this shot of a horse-driven carriage approaching Dracula’s castle, the live-action 八戒午夜伦理 element was combined with artist Bill Mather’s 八戒午夜伦理 on the same strip of film, with the camera rewound to film each new element. The film was then put into a high-speed camera to shoot several passes of “snow,” actually baking soda shaken through a wire mesh screen.

This shot also demonstrates the subliminal effects that can be achieved by a 八戒午夜伦理八戒午夜伦理. Beginning with a production sketch by artist Jim Steranko, 八戒午夜伦理 World concept artist Sean Joyce worked with the director to develop the initial idea of the vampire’s castle being shaped like a body on a throne. “Francis wanted this subconscious effect of a tortured man, screaming some kind of plea to Heaven,” Joyce noted.

Casino (1995)

Before

After

This Martin Scorsese film was set in the Las Vegas of 1974, a time period when the fabled Strip was dominated by the Tropicana and Flamingo hotels and such iconic structures as the glittering, 180-foot-tall Dunes sign. But twenty years later, when Scorsese was making Casino, those landmarks had been demolished. Enter 八戒午夜伦理 World Digital, the traditional 八戒午夜伦理-八戒午夜伦理 company having adapted to the new digital verities in both name and technology. Scorsese’s assignment was that the effects house re-create the Strip’s period look and add the fictitious Tangiers Hotel to the mix.

For the shot pictured here, 八戒午夜伦理 World Digital combined a live-action plate with computer-generated images of the Dunes sign and the Tangiers Hotel, with the glittering neon itself created through radiosity lighting software developed by Lightscape, a Silicon Valley firm. Prior to radiosity, the rendering of a 3-D computer model only accounted for light coming from a specific source, ignoring the way light actually interacts and breaks up. The complex interplay of direct illumination and “bounce light” is the way the real world looks, which is why the earliest computer-generated models with only direct light sources look so flat and unrealistic.

Using the 3-D wireframe models that 八戒午夜伦理 World Digital built in the computer, the radiosity software allowed for the computer-generated surfaces to incorporate a 2-D “mesh,” made up of triangles and rectangles, which helped to automatically determine and represent all illuminative gradations, from strong light sources to diffuse bounce-light effects. The Tangiers sign here is composed of some 158,000 mesh elements, the Dunes sign a staggering 2.5 million.

3-D artist Morgan Trotter modeled various signs to create a virtual Las Vegas.

chenfeng — Wed, 12 Oct 2022 06:07:03 +0000

Star Wars: The Empire Strikes Back

Dune

The Love Bug

“What’s great about 八戒午夜伦理八戒午夜伦理 is you get to control a little bit of the movie. Sure, you’ve got everybody telling you how to do it, but you get to bring across some narration, maybe even an emotion, and that’s heady stuff. It’s your moment. It can be intoxicating, can make you feel powerful. You’re fighting for control over this image!”
Harrison Ellenshaw, Disney Studio/Industrial Light + Magic 八戒午夜伦理 painter

In 1975 a young director named Steven Spielberg saw his movie Jaws become a national phenomenon, while another young director named George Lucas was in production on a little film called Star Wars. This would be the beginning of a new era, with genres resurrected from science fiction themes to adventures patterned after old Saturday matinee serials and pumped up into effects-fueled spectacles with crossover appeal and boffo box office. Although Lucas had always imagined Star Wars as a saga requiring a number of films, in the decades to come any successful movie might produce potential sequels. Marketing would become more sophisticated, with the phrase “summer movie” understood to mean a potential blockbuster. And with the billions that Star Wars licensed products have generated (the rights to which George Lucas shrewdly kept during his first Star Wars negotiation with Twentieth Century Fox), once marginal, ancillary marketing tie-ins became potentially more lucrative than the box office itself.

The new blockbuster era was also a time of transition. Lucas’ Industrial Light + Magic (ILM) effects house, organized to create the effects for Star Wars, was soon being hired out for effects assignments at other studios, and other independent effects houses such as Apogee and Boss Films entered the field. Meanwhile, behind the scenes, the think tanks within ILM and other effects shops were busily making the first feature films to venture into the digital realm.

With fantasy and adventure themes so popular, 八戒午夜伦理八戒午夜伦理 was more important than ever. Thus, in a time of change the tradition continued, with brush and oils truly conjuring worlds.

Star Wars: The Empire Strikes Back (1980)

Many Star Wars fans find this sequel to the phenomenally popular first film their favorite chapter, with dramatic plot turns including Luke Skywalker’s first encounter and apprenticeship with Jedi master Yoda and the dramatic final confrontation between the aspiring Jedi Knight and the evil Darth Vader. The fantastic ILM effects ranged from the stop-motion animation of the Imperial Walkers during their attack on the Rebel base on ice planet Hoth to 八戒午夜伦理八戒午夜伦理s creating everything from an asteroid field in space, to the swamp planet of Dagobah and entrancing visions of Cloud City.

Here we see a shot from the dramatic duel between Luke and Vader in an air shaft on Cloud City. ILM’s 八戒午夜伦理 department composited the live action of the doorway and actors with a Ralph McQuarrie 八戒午夜伦理八戒午夜伦理, via a “front-projection system” developed for Empire by Richard Edlund and Neil Krepela. The light-saber beams themselves were rotoscoped animation elements provided by ILM’s animation department, which Krepela’s 八戒午夜伦理-camera assistant Craig Barron, who was working on his first movie, composited into the scene using the front-projection system.

For McQuarrie, who had developed the look of Star Wars back when Lucas was first dreaming everything up, Empire was a chance not only to create production designs of characters and environments, but to follow through and bring them to life as final 八戒午夜伦理-painted shots. McQuarrie laughed as he recalled that the exact nature of the environment pictured here was never totally explained before his department set to work: “I never quite figured it out, frankly. That’s one of the things that was total fantasy. I’ve forgotten whether it was my idea, or George’s, or somebody else’s to use an air shaft. Basically, the point was George wanted a cliffhanger location for the duel.”

Dune (1984)

In this David Lynch production of the Frank Herbert novel, the planet Arrakis is a desert world in which water is more precious than gold and Melange the “spice” vital for interstellar travel is mined. Against this backdrop a Holy War is brewing, as the people long for a messiah to lead them against the evil Harkonnen empire.

Dune, released by Universal, was another challenge for Albert Whitlock’s 八戒午夜伦理 department. Whitlock gave his apprentice Syd Dutton (who would cofound Illusion Arts with Universal 八戒午夜伦理 cameraman Bill Taylor) complete freedom to create this shot of a cable car passing over the labyrinthine city of Giedi Prime, the domain of Baron Harkonnen and a center of spice processing. This shot followed the Whitlock philosophy of shooting 八戒午夜伦理八戒午夜伦理s onto original negative.

Dutton’s work was inspired by a sketch provided by Dune production designer Tony Masters. It was also a unique effect on the production, as Bill Taylor described in an article documenting the making of the film for the April 1985 issue of Cinefex: “The Giedi Prime shot was unusual for several reasons…. First, because there was no set involved at all. The set had long been struck by the time we got the assignment. So it’s a full 八戒午夜伦理, with just a couple of live-action inserts. Second, the 八戒午夜伦理 was actually begun before the live-action elements were shot. Third, it also marked the first time we photographed our own motion-control miniature the cable car for incorporation into a 八戒午夜伦理 shot.”

The Love Bug (1969)

One of Disney’s classic live-action, family fantasy films, The Love Bug was released in a downtime for visual effects. It was the twilight of the studio system, as most studios were in the process of selling their backlots, auctioning off their assets, and closing their production departments. There were rare exceptions, notably Albert Whitlock’s 八戒午夜伦理-八戒午夜伦理 department at Universal. But it was Disney Studio having won its enduring fame with cartoon animation that maintained the tradition of a backlot and soundstages and in-house effects departments for live-action films.

In this Love Bug scene we see Herbie, the magical Volkswagen, in front of an old San Francisco firehouse overlooking the bay. Although painter Alan Maley visited San Francisco for research, this and other scenes were created in Burbank on the Disney lot. It is a city of the imagination, with a fanciful firehouse on a street that doesn’t exist, created with a bit of soundstage set and Maley’s masterful glass 八戒午夜伦理.

八戒午夜伦理 painter Harrison Ellenshaw commented on this unsung example of 八戒午夜伦理-八戒午夜伦理 magic: “I wish I could have been there to watch Alan work on this 八戒午夜伦理. Very few 八戒午夜伦理 artists would be so daring and clever. The composition is brilliant the idea of putting the telephone pole in the foreground helps balance the shot, and it’s a nice touch adding the two traffic cones. But when we watch the film we look past the foreground to see Herbie in front of this wonderful old firehouse, which is what the shot is all about.

“Note how Alan even incorporated some lens distortion into his 八戒午夜伦理 the horizontals and verticals near the edges curve slightly, which is a subtle yet effective touch. We know it’s late in the day because of the long shadows, another daring and clever idea. Alan did this 八戒午夜伦理 a few years before I joined the department as an apprentice, but I recall him telling me, years later, that he’d started the 八戒午夜伦理 to match the live-action plate as if it were shot in bright sunlight. But since the set was inside a soundstage with stage lighting, he decided, after much struggling, to try it as if the live action were in shadow. Alan was very proud of this shot and actually kept it intact, a rarity in those days when a finished 八戒午夜伦理 on glass was scraped off so the glass could be used again.”

chenfeng — Sun, 31 Jul 2022 08:57:16 +0000

Mame

Colossus: The Forbin Project

“Al Whitlock taught me things mostly by osmosis. It was about being around him, seeing him. Al didn’t believe in drawing out a shot. It was about the energy of the moment when he was 八戒午夜伦理. He believed you had to come to a 八戒午夜伦理八戒午夜伦理 with focus and a certain energy.”
Syd Dutton, Whitlock protégé and cofounder, Illusion Arts

As film production began changing with the digital breakthroughs of the 1990s, it became necessary to make a distinction between digital and “traditional” effects. While computer graphics entailed complex new digital technology, traditional effects artists worked in a hands-on world with hallowed tools of the trade, traditions, and techniques passed down the lineage of their craft.

Although Albert Whitlock always worked in the traditional era, he stands in the first rank of the pantheon of 八戒午夜伦理 painters, be they traditional or digital artists. Like others from those predigital days, he could wield his brush expertly, each stroke leaving impressionistic dabs of paint that “read” as real when a final 八戒午夜伦理 was filmed. But Whitlock was also a master at designing and enhancing his 八戒午夜伦理s with special effects, optical illusions, and effects photography. During his reign as head of the Universal 八戒午夜伦理 department he won two Academy Awards, became the trusted effects guru for Alfred Hitchcock, and was in demand by such directors as John Huston, Robert Wise, and Hal Ashby.

Mame (1974)

This film about a wealthy eccentric (played by Lucille Ball) whose adventures span the flapper era of the Roaring Twenties, the stock market crash, and the Depression, featured this fantasy scene, created by Albert Whitlock, in which Mame and her young ward Patrick (Kirby Furlong) sit on the most precarious of perches. The stage setup needed only one practical spike of the Statue of Liberty’s crown, and the painted blue floor was not a “bluescreen” effect but a guide for the ocean that would be part of Whitlock’s final 八戒午夜伦理.

Although the shot seems impossible, Whitlock actually designed the scene as it could potentially have been filmed, he revealed: “I don’t like the omnipotent viewpoint, those kinds of shots never feel real to me. So I designed it to look as if the camera had been set up on the torch of the statue’s raised arm. I think taking a realistic approach to how you could really shoot something like this helps make the shot seem more real. Of course, they would never let you shoot it like that for real, and the actors would refuse, anyway. I remember that although the little boy had a safety belt on him, he was nervous at first, but got more comfortable after the first take.”

Colossus: The Forbin Project (1969)

This film, released by Universal the year men first walked on the moon, still seemed like science fiction in a world in which personal computers didn’t exist. But big mainframes, kept in the province of scientists and academics, did exist as did fears that those mysterious machines might someday supplant humans. That fear fuels this film’s premise, with Dr. Forbin working in a mountain fortress research center where he has developed Colossus, a thinking machine that decides human beings are The Enemy.

In this ominous shot, crafted by Albert Whitlock, Dr. Forbin turns on the seemingly endless computer banks that comprise Colossus. Whitlock was always an advocate for an original-negative approach, trying to get a shot “in-camera” and thus avoid the inevitable degradation of a filmed image that is rephotographed many times in the optical duping process. Here, Whitlock utilized painted cel animation overlays of silver light panels to create the illusion of his computer turning on in stages. The effect was captured on the original negative, with the camera rewound several times for each new exposure.

The shot also had to match the practical lighting effect for the live action element of Dr. Forbin walking down the vast computer corridor. “I thought we’d get lucky and smoothly match my animation in the 八戒午夜伦理 to the on-set lighting,” Whitlock recalled. “What helped was Forbin standing exactly between the lighting effect, near the 八戒午夜伦理 line so you didn’t notice the changeover. And he’s wearing a white suit that distracts your eye just at the right time. I remember this shot went over very big with the brass in the front office. They loved the way the lights turned on.”

chenfeng — Sun, 31 Jul 2022 08:08:39 +0000

The Great Race

Ben-Hur

It’s a Mad Mad Mad Mad World

“It was always effective, when you went to the big films in those days, that people actually were so moved by the impact of these tremendous big screen cataclysms and effects… But the secrets [of their creation] were well kept.”
Jesse Lasky, Jr., screenwriter, The Ten Commandments

The decade of the 1950s could be called the Big Picture era. “Wide-screen” debuted in 1952 with Cinerama, which utilized three cameras for filming, three electronically synchronized projectors running at twenty-four frames per second, and a gigantic curved screen, creating a feeling of limitless space. In 1953, Twentieth Century Fox ushered in CinemaScope, and a host of other big-screen innovations followed from the various studios.

The term Big Picture also sums up the type of production then in vogue. While the movies had always been enamored of epics, wide-screen technology provided an irresistible staging ground for blockbuster productions and the 八戒午夜伦理八戒午夜伦理s needed to bring those grand visions to life.

The Great Race (1965)

The Great Race is a madcap chase movie and a particularly complex production, given its period setting at the turn of the twentieth century and its premise of an around-the-world race between rival automobile companies. This production was another challenge for Linwood G. Dunn’s Film Effects of Hollywood, with 八戒午夜伦理 artists Albert Simpson and Cliff Silsby creating more than twenty-five 八戒午夜伦理八戒午夜伦理s for the show. The Great Race also marked a reunion for Simpson and Dunn, both of them veterans of RKO’s glory days and such productions as The Devil and Daniel Webster.

The effect seen here is a classic “jeopardy shot,” successfully and safely created by combining Albert Simpson’s 八戒午夜伦理-painted building with live action of this performer hanging from a ledge set that was, in reality, only a few feet off a soundstage floor.

Ben-Hur (1959)

Lew Wallace’s story of Judah Ben-Hur, set in the time of ancient Rome, was first adapted for the screen in 1927. The early film was legendary for its sea battle and chariot race when this remake went into production. But audiences were ready for an updated version, and the new blockbuster swept the Oscars with an astounding eleven golden statuettes, including Best Picture, Best Actor (Charles Heston in the title role), and Best Visual Effects.

八戒午夜伦理八戒午夜伦理s were vital in re-creating the grandeur of the lost world in Ben-Hur. For this image of the emperor and an enthusiastic crowd greeting legions of soldiers returning from another victorious campaign, MGM 八戒午夜伦理 painter Matthew Yuricich not only painted Rome in all its imperial splendor, but added that old 八戒午夜伦理 painter’s trick dabs of paint to represent people, from “marching” soldiers to waving crowds. Yuricich didn’t use a glass but Masonite board; he then poked holes behind the painted people and, by moving another 八戒午夜伦理 behind the holes, created a flickering effect and the illusion of movement.

In the “before” image we see the 八戒午夜伦理 as Yuricich worked on it, complete with a black-and-white photograph taken on the small live-action set and pasted onto his Masonite. 八戒午夜伦理 to a photographic reference helped MGM 八戒午夜伦理 painters to line up their 八戒午夜伦理s perfectly with live-action sets. This finished 八戒午夜伦理 was then combined with the live-action element in the optical printer, the live film simply replacing the photographic reference.

Yuricich explained that his creative partner Clarence Slifer (who had moved to MGM in the last days of Selznick International) used an optical printer to replicate photography of a group of live-action soldiers, rephotographing the same element in smaller perspective to create the image of columns of soldiers marching in review. But on the first optical composite test combining the new dupe negative of a legion of soldiers with the 八戒午夜伦理八戒午夜伦理, things got a little out of whack, Yuricich recalled: “The foremost legions were to reach the base of the steps [of the emperor’s reviewing stand], turn screen right and exit frame. Unfortunately, because our timing was off, the entire legion turned at once, marched under the 八戒午夜伦理 line and disappeared! We had a good laugh seeing that. By take two we had it figured right.”

In this close-up of the Roman crowd, you can see holes in the 八戒午夜伦理. During photography, a moir pattern was moved behind the 八戒午夜伦理 to create the illusion of crowd movement.

It’s a Mad Mad Mad Mad World (1963)

Director Stanley Kramer’s all-star comedy with a cast ranging from Spencer Tracy and Milton Berle to Jonathan Winters and Phil Silvers, plus a Three Stooges cameo thrown in for good measure is a madcap dash for buried cash. In this penultimate scene, the Mad gang are trying to get off the collapsing fire escape of a dilapidated building, but have tumbled onto a fire engine ladder that becomes overweighted and starts swaying dangerously back and forth.

“What we were doing here was trying to make things look real and scary,” explained Linwood G. Dunn, who created this effect with 八戒午夜伦理 artist Cliff Silsby at Film Effects of Hollywood, the independent company Dunn formed after RKO closed in the 1950s. “Our 八戒午夜伦理-painted building completes a fictitious twelve-story building that had, as its base, a two-story set shot on location. It was our job to make this sequence look convincing, and who’s going to know it’s painted if it’s a good job? That’s the 八戒午夜伦理 painter’s job on a shot like this to be invisible.”

Dunn, ever the innovator, years later noted this shot was an example of what would become known in the digital age as “previsualization,” the rough computer graphics imagery that works out the look of a shot or effects element. Dunn’s version was a crude test of a swaying, three-foot miniature fire ladder, which became a running joke between Dunn and Kramer, the director harboring a worried suspicion that the rough test was going to be as good as it got.

This Mad shot was also part of a traveling presentation in which Dunn revealed secrets of visual effects, in the process inspiring many a young person to enter the business, including Syd Dutton, a future 八戒午夜伦理 painter and cofounder of Illusion Arts.

Live action photographed on the back lot before the addition of the 八戒午夜伦理八戒午夜伦理

Close-up detail showing 八戒午夜伦理 artist Cliff Silsby at work