超级面板
文章目录
最新文章
最近更新
文章分类
标签列表
文章归档

碰撞课程:汇编 - A crash course in assembly

原文:A crash course in just-in-time (JIT) compilers

本文译自Lin Clark 关于 WebAssembly 的卡通介绍系列,渣翻译,因此附上英文原文

To understand how WebAssembly works, it helps to understand what assembly is and how compilers produce it.

In the article on the JIT, I talked about how communicating with the machine is like communicating with an alien.

要了解 WebAssembly 如何工作,先理解 assembly(汇编) 是什么,以及编译器如何生成它会很有帮助。

在上一篇关于 JIT 的文章中,我谈到了与机器的通信就像与外星人通信。

I want to take a look now at how that alien brain works—how the machine’s brain parses and understands the communication coming in to it.

There’s a part of this brain that’s dedicated to the thinking—things like adding and subtracting, or logical operations. There’s also a part of the brain near that which provides short-term memory, and another part that provides longer-term memory.

我想看看这个外星人的大脑如何工作——机器的大脑如何解析和理解通信。

这个大脑中有一部分专注于思考,例如加法和减法,或逻辑操作。附近还有一部分提供短期记忆的大脑,还有另一部分提供长期记忆。

These different parts have names.

  • The part that does the thinking is the Arithmetic-logic Unit (ALU).
  • The short term memory is provided by registers.
  • The longer term memory is the Random Access Memory (or RAM).

这三部分都有名字:

  • 思考的部分是算术逻辑单元(ALU
  • 短期记忆由寄存器提供。
  • 长期存储器是随机存取存储器(或 RAM )。

The sentences in machine code are called instructions.

What happens when one of these instructions comes into the brain? It gets split up into different parts that mean different things.

The way that this instruction is split up is specific to the wiring of this brain.

机器码中的句子被称为指令。当这些中的一个指令进入大脑时会发生什么?

它分成不同的部分,意味着不同的东西。这个指令分开的方式是针对这个大脑的布线。

For example, a brain that is wired like this might always take the first six bits and pipe that in to the ALU. The ALU will figure out, based on the location of ones and zeros, that it needs to add two things together.

This chunk is called the “opcode”, or operation code, because it tells the ALU what operation to perform.

例如,像这样布线的大脑总是可以将前六位和管道连接到 ALUALU将根据 10 的位置确认它需要将两者加在一起。

这个块被称为“操作码”或操作代码,因为它告诉 ALU 执行什么操作。

Then this brain would take the next two chunks of three bits each to determine which two numbers it should add. These would be addresses of the registers.

然后这个大脑会把接下来的代码块分成两个三位来确定它应该执行加法的两个数字。这些都是是寄存器的地址。

Note the annotations above the machine code here, which make it easier for us humans to understand what’s going on. This is what assembly is. It’s called symbolic machine code. It’s a way for humans to make sense of the machine code.

You can see here there is a pretty direct relationship between the assembly and the machine code for this machine. Because of this, there are different kinds of assembly for the different kinds of machine architectures that you can have. When you have a different architecture inside of a machine, it is likely to require its own dialect of assembly.

So we don’t just have one target for our translation. It’s not just one language called machine code. It’s many different kinds of machine code. Just as we speak different languages as people, machines speak different languages.

请注意这里的机器码上面的注释,这使得我们人类更容易了解发生了什么。这就是汇编语言的内容,它被称为符号机器代码。这是人类理解机器代码的方式。

你可以看到汇编语言和机器代码之间有一个非常直接的关系。正因为因此,对于不同类型的机器架构,会有不同类型的汇编语言。当你有一种不同的机器内部架构时,你可能需要自己的汇编语言方言。

所以我们不是只有一个翻译目标,它也不只是一种称为机器码的语言。实际上,有很多种不同的机器码,就像我们使用不同的语言一样,机器也说不同的语言。

With human to alien translation, you may be going from English, or Russian, or Mandarin to Alien Language A or Alien language B. In programming terms, this is like going from C, or C++, or Rust to x86 or to ARM.

You want to be able to translate any one of these high-level programming languages down to any one of these assembly languages (which corresponds to the different architectures). One way to do this would be to create a whole bunch of different translators that can go from each language to each assembly.

通过人与外星人的翻译,您可能会从英语,俄语或普通话转换到到外星人语言A或外来语言B。在编程方面,这就像从 CC++,或 Rustx86ARM

你希望能够将这些高级编程语言中的任何一种转换为这些汇编语言中的任何一种(对应于不同架构)。这样做的一个方法是创建一大堆不同的翻译器,可以从每种语言转到每个程序集。

That’s going to be pretty inefficient. To solve this, most compilers put at least one layer in between. The compiler will take this high-level programming language and translate it into something that’s not quite as high level, but also isn’t working at the level of machine code. And that’s called an intermediate representation (IR).

这将是非常低效的。为了解决这个问题,大多数编译器在两者之间至少放置了一层中间层。编译器通常接收高级编程语言,并将其转换为不太高的级别,但也不能在机器代码级别工作。这就是所谓的中间表示(IR)。

This means the compiler can take any one of these higher-level languages and translate it to the one IR language. From there, another part of the compiler can take that IR and compile it down to something specific to the target architecture.

The compiler’s front-end translates the higher-level programming language to the IR. The compiler’s backend goes from IR to the target architecture’s assembly code.

这意味着编译器可以处理这些更高级别的语言中的任何一种,将其转换为一种 IR 语言。随后,编译器的另一部分可以处理该 IR 将其编译为特定于目标架构的特定内容。

编译器的前端将高级编程语言转换为 IR 。编译器的后端将 IR 转换到目标架构的汇编代码。

That’s what assembly is and how compilers translate higher-level programming languages to assembly. In the next article, we’ll see how WebAssembly fits in to this.

这就是汇编和编译器如何将更高级的编程语言翻译成汇编的过程。在下一篇文章中,我们将看到 WebAssembly 如何和这相符。

0 comments
Anonymous
Markdown is supported

Be the first person to leave a comment!