原文:Creating and working with WebAssembly modules
本文译自Lin Clark 关于 WebAssembly 的卡通介绍系列,渣翻译,因此附上英文原文
- 概述:
- 背景:
- 现在的 WebAssembly:
- 创建和使用 WebAssembly 模块
- 是什么使 WebAssembly 很快?
- 未来的 WebAssembly:
WebAssembly is a way to run programming languages other than JavaScript on web pages. In the past when you wanted to run code in the browser to interact with the different parts of the web page, your only option was JavaScript.
So when people talk about WebAssembly being fast, the apples to apples comparison is to JavaScript. But that doesn’t mean that it’s an either/or situation—that you are either using WebAssembly, or you’re using JavaScript.
WebAssembly 是一种在网页上运行 JavaScript 以外的编程语言的方法。过去,当你想在浏览器中运行代码与网页的不同部分进行交互时,你唯一的选择是JavaScript。
所以当人们谈论 WebAssembly
很快的时候,就是和 JavaScript 对比的。但是,这并不意味着这是一个二选一的问题:使用 WebAssembly
,或者使用JavaScript。
In fact, we expect that developers are going to use both WebAssembly and JavaScript in the same application. Even if you don’t write WebAssembly yourself, you can take advantage of it.
WebAssembly modules define functions that can be used from JavaScript. So just like you download a module like lodash from npm today and call functions that are part of its API, you will be able to download WebAssembly modules in the future.
So let’s see how we can create WebAssembly modules, and then how we can use them from JavaScript.
事实上,我们期望开发人员将在同一应用程序中同时使用 WebAssembly
和 JavaScript。即使你不自己编写 WebAssembly
,你也可以利用它。
WebAssembly 模块定义了可以在 JavaScript 使用的函数。就像从 npm 下载一个像 lodash 这样的模块,并且调用其 API 。将来,你也可以下载 WebAssembly 模块。
那么让我们来看看我们如何创建 WebAssembly
模块,然后如何用 JavaScript 来使用。
WebAssembly 在哪里适合?
In the article about assembly, I talked about how compilers take high-level programming languages and translate them to machine code.
在关于汇编的文章中,我讨论了编译器如何将高级编程语言转换为机器代码。
Where does WebAssembly fit into this picture?
You might think it is just another one of the target assembly languages. That is kind of true, except that each one of those languages (x86, ARM ) corresponds to a particular machine architecture.
WebAssembly 在这张图片的哪个位置呢?
也许你认为这只是另一种目标汇编语言。这是真的,除了这些语言(x86
,ARM
)中的每一种对应于特定的机器体系结构。
When you’re delivering code to be executed on the user’s machine across the web, you don’t know what your target architecture the code will be running on.
So WebAssembly is a little bit different than other kinds of assembly. It’s a machine language for a conceptual machine, not an actual, physical machine.
当你为网络上的用户提供机器上执行的代码时,你并不知道代码将运行的目标体系结构。
WebAssembly
与其他类型的程序集有所不同,它是概念机器的机器语言,而不是实际的物理机器。
Because of this, WebAssembly instructions are sometimes called virtual instructions. They have a much more direct mapping to machine code than JavaScript source code. They represent a sort of intersection of what can be done efficiently across common popular hardware. But they aren’t direct mappings to the particular machine code of one specific hardware.
WebAssembly
指令有时称为虚拟指令。它们比 JavaScript 源代码更直接的映射到机器代码。它们代表了常见的硬件可以有效地完成的那些普遍的工作。但是它们并不直接映射到一个特定硬件的特定机器代码。
The browser downloads the WebAssembly. Then, it can make the short hop from WebAssembly to that target machine’s assembly code.
浏览器下载 WebAssembly
。然后,它可以从 WebAssembly
跳转到目标机器的汇编代码。
编译为.wasm
The compiler tool chain that currently has the most support for WebAssembly is called LLVM. There are a number of different front-ends and back-ends that can be plugged into LLVM.
Note: Most WebAssembly module developers will code in languages like C and Rust and then compile to WebAssembly, but there are other ways to create a WebAssembly module. For example, there is an experimental tool that helps you build a WebAssembly module using TypeScript, or you can code in the text representation of WebAssembly directly.
当前对 WebAssembly
支持最多的编译器工具链称为 LLVM
。有许多不同的前端和后端可以插入到 LLVM
中。
注意:大多数 WebAssembly 模块开发人员将使用 C 和 Rust 等语言进行编码,然后编译到 WebAssembly,但还有其他方法可以创建 WebAssembly 模块。例如,有一个实验工具可以帮助你使用 TypeScript 构建 WebAssembly 模块,也可以直接在 WebAssembly 的文本表示中进行编码。
Let’s say that we wanted to go from C to WebAssembly. We could use the clang front-end to go from C to the LLVM intermediate representation. Once it’s in LLVM’s IR, LLVM understands it, so LLVM can perform some optimizations.
To go from LLVM’s IR (intermediate representation) to WebAssembly, we need a back-end. There is one that’s currently in progress in the LLVM project. That back-end is most of the way there and should be finalized soon. However, it can be tricky to get it working today.
There’s another tool called Emscripten which is a bit easier to use at the moment. It has its own back-end that can produce WebAssembly by compiling to another target (called asm.js) and then converting that to WebAssembly. It uses LLVM under the hood, though, so you can switch between the two back-ends from Emscripten.
假设我们想从 C 到 WebAssembly
。我们可以使用 clang
前端把 C 转换到 LLVM
中间表示。一旦在 LLVM
的 IR
中,LLVM
就会理解它,LLVM
可以执行一些优化。
要从 LLVM
的 IR
(中间表示)到 WebAssembly
,我们需要一个后端,目前 LLVM
项目中的后端正在开发中,这个后端是最重要的,会尽快完成,然而,现在很难让它工作起来。
还有另外一个名为 Emscripten
的工具,现在更容易使用。它有自己的后端,可以通过编译到另一个目标(称为asm.js),然后把它(asm.js) 转换为 WebAssembly
。它的底层使用了 LLVM
,所以你可以在两个后端之间切换。
Emscripten includes many additional tools and libraries to allow porting whole C/C++ codebases, so it’s more of a software developer kit (SDK) than a compiler. For example, systems developers are used to having a filesystem that they can read from and write to, so Emscripten can simulate a file system using IndexedDB.
Regardless of the toolchain you’ve used, the end result is a file that ends in .wasm. I’ll explain more about the structure of the .wasm file below. First, let’s look at how you can use it in JS.
Emscripten 包含许多其他工具和库,可以移植整个 C/C++ 代码库,它比编译器更像是软件开发工具包(SDK)。比如,系统开发人员习惯于读取和写入的文件系统,而 Emscripten 可以使用IndexedDB 来模拟文件系统。
不管你使用什么样的工具链,最终的结果是生成以 .wasm
结尾的文件。我将在下面详细解释 .wasm
文件的结构。首先,我们来看看如何在 JS 中使用它。
在JavaScript中加载.wasm模块
The .wasm file is the WebAssembly module, and it can be loaded in JavaScript. As of this moment, the loading process is a little bit complicated.
.wasm
文件是 WebAssembly
模块,可以在 JavaScript 中加载。目前,装载过程有点复杂。
function fetchAndInstantiate(url, importObject) { |
You can see this in more depth in our docs.
详细用法可以参考文档
We’re working on making this process easier. We expect to make improvements to the toolchain and integrate with existing module bundlers like webpack or loaders like SystemJS. We believe that loading WebAssembly modules can be as easy as as loading JavaScript ones.
There is a major difference between WebAssembly modules and JS modules, though. Currently, functions in WebAssembly can only use numbers (integers or floating point numbers) as parameters or return values.
我们正在努力使这个过程更容易。我们期望对工具链进行改进,并与现有的模块打包程序如 Webpack
,或 SystemJS
加载器整合。我们相信,加载 WebAssembly
模块可以像加载 JavaScript 一样简单。
但是,WebAssembly
模块和 JS 模块之间存在很大差异。目前,WebAssembly
中的函数只能使用数值(整数或浮点数)作为参数或返回值。
For any data types that are more complex, like strings, you have to use the WebAssembly module’s memory.
If you’ve mostly worked with JavaScript, having direct access to memory isn’t so familiar. More performant languages like C, C++, and Rust, tend to have manual memory management. The WebAssembly module’s memory simulates the heap that you would find in those languages.
对于任何更复杂的数据类型(如字符串),必须使用 WebAssembly
模块的内存。
如果你主要使用 JavaScript,对直接访问内存并不是很熟悉。而有更好性能的语言,如 C,C++ 和 Rust,往往具有手动内存管理。WebAssembly
模块的内存模拟你会在这些语言中使用的堆。
To do this, it uses something in JavaScript called an ArrayBuffer. The array buffer is an array of bytes. So the indexes of the array serve as memory addresses.
If you want to pass a string between the JavaScript and the WebAssembly, you convert the characters to their character code equivalent. Then you write that into the memory array. Since indexes are integers, an index can be passed in to the WebAssembly function. Thus, the index of the first character of the string can be used as a pointer.
为此,它使用 JavaScript 中的 ArrayBuffer
(数组缓冲区)。数组缓冲区是一个字节数组,因此,数组的索引可以作为内存地址。
如果要在 JavaScript 和 WebAssembly
之间传递一个字符串,则将字符转换为等效的字符码。然后你写入内存数组。由于索引是整数,因此可以将索引传入 WebAssembly
函数。如此,字符串的第一个字符的索引可以用作指针。
It’s likely that anybody who’s developing a WebAssembly module to be used by web developers is going to create a wrapper around that module. That way, you as a consumer of the module don’t need to know about memory management.
If you want to learn more, check out our docs on working with WebAssembly’s memory.
开发 WebAssembly
模块的人员很可能围绕该模块创建一个包装器给 Web 开发人员使用。这样,模块的使用者不需要了解内存管理。
如果你想了解更多信息,请查看我们在使用 WebAssembly 内存的文档。
.wasm文件的结构
If you are writing code in a higher level language and then compiling it to WebAssembly, you don’t need to know how the WebAssembly module is structured. But it can help to understand the basics.
If you haven’t already, we suggest reading the article on assembly (part 3 of the series).
如果你使用较高级别的语言编写代码,然后将其编译到 WebAssembly
,则无需知道 WebAssembly
模块的结构。但它可以帮助了解基础知识。
如果还没有阅读之前汇编的文章,我们建议你先阅读。
Here’s a C function that we’ll turn into WebAssembly:
这是一个C函数,我们会将其转换为 WebAssembly
:
int add42(int num) { |
You can try using the WASM Explorer to compile this function.
If you open up the .wasm file (and if your editor supports displaying it), you’ll see something like this.
你可以尝试使用 WASM Explorer
编译此功能。 如果你打开了 .wasm
文件(如果你的编辑器支持显示它),你会看到这样的内容:
00 61 73 6D 0D 00 00 00 01 86 80 80 80 00 01 60 |
That is the module in its “binary” representation. I put quotes around binary because it’s usually displayed in hexadecimal notation, but that can be easily converted to binary notation, or to a human readable format.
For example, here’s what num + 42 looks like.
那就是“二进制”表示的的模块。我用引号把“二进制”括起来,因为它通常以十六进制表示法显示,但可以很容易地转换成二进制符号,或者以人类可读的格式。
例如,这里是 num + 42
的样子。
代码如何工作:堆栈器(How the code works: a stack machine)
In case you’re wondering, here’s what those instructions would do.
如果你想知道,这些指令会做什么。
You might have noticed that the add operation didn’t say where its values should come from. This is because WebAssembly is an example of something called a stack machine. This means that all of the values an operation needs are queued up on the stack before the operation is performed.
你可能已经注意到,add
操作符没有说明其值来自哪里。这是因为 WebAssembly
是一个堆栈机器。这是说在在执行操作之前,操作需要的所有值都在堆栈中排队。
Operations like add know how many values they need. Since add needs two, it will take two values from the top of the stack. This means that the add instruction can be short (a single byte), because the instruction doesn’t need to specify source or destination registers. This reduces the size of the .wasm file, which means it takes less time to download.
像 add
之类的操作知道它们运算需要多少值。由于 add
运算需要两个,它将从堆栈的顶部获取两个值。这意味着 add
指令可以是短的(单个字节),因为指令不需要指定源寄存器或目标寄存器。这减少了 .wasm
文件的大小,意味着需要更少的时间来下载文件。
Even though WebAssembly is specified in terms of a stack machine, that’s not how it works on the physical machine. When the browser translates WebAssembly to the machine code for the machine the browser is running on, it will use registers. Since the WebAssembly code doesn’t specify registers, it gives the browser more flexibility to use the best register allocation for that machine.
即使 WebAssembly
是根据堆栈机器来指定的,但是这并不是它在物理机上的工作原理。当浏览器将 WebAssembly
代码没有指定寄存器,因此这使浏览器可以更灵活地使用该机器的最佳寄存器分配。
模块部分
Besides the add42 function itself, there are other parts in the .wasm file. These are called sections. Some of the sections are required for any module, and some are optional.
除了 add42
函数本身,.wasm
文件中的其他部分。这些被称为 sections
。某些 sections
是必选的,任何模块都需要,有些 sections
是可选的。
Required:
- Type. Contains the function signatures for functions defined in this module and any imported functions.
- Function. Gives an index to each function defined in this module.
- Code. The actual function bodies for each function in this module.
必选的:
Type
: 包含定义在本模块的函数和导入函数的签名。Function
: 为此模块中定义的每个功能提供索引。Code
: 本模块中每个函数的实际函数体。
Optional:
- Export. Makes functions, memories, tables, and globals available to other WebAssembly modules and JavaScript. This allows separately-compiled modules to be dynamically linked together. This is WebAssembly’s version of a .dll.
- Import. Specifies functions, memories, tables, and globals to import from other WebAssembly modules or JavaScript.
- Start. A function that will automatically run when the WebAssembly module is loaded (basically like a main function).
- Global. Declares global variables for the module.
- Memory. Defines the memory this module will use.
- Table. Makes it possible to map to values outside of the WebAssembly module, such as JavaScript objects. This is especially useful for allowing indirect function calls.
- Data. Initializes imported or local memory.
- Element. Initializes an imported or local table.
可选的:
Export
: 使函数,内存,表格和全局变量可用于其他 WebAssembly 模块和 JavaScript。这允许单独编译的模块动态链接在一起。这是WebAssembly
版本的.dll
。Import
: 指定从其他WebAssembly
模块或 JavaScript 中导入的函数,内存,表和全局定义。Start
: 一种在加载WebAssembly
模块时将自动运行的函数(基本上类似main
函数)。Global
: 声明模块的全局变量。Memory
: 定义该模块将使用的内存。Table
: 可以映射到WebAssembly
模块以外的值,例如JavaScript
对象。这对于间接函数调用特别有用。Data
: 初始化导入的或本地内存。Element
: 初始化导入或本地表。
For more on sections, here’s a great in-depth explanation of how these sections work.
有关部分的更多内容,这里将详细介绍这些部分的工作原理。
下一节
Now that you know how to work with WebAssembly modules, let’s look at why WebAssembly is fast.
现在,你已经知道如何使用 WebAssembly
模块,接下来我们来看看 WebAssembly
为什么很快。