Small RISC emulator

为君一笑 提交于 2019-12-03 20:26:29

if you want something rooted in the real world, one of the most-used embedded RISC microcontrollers is the PIC family. google gives several emulators, but i don't think the source is available for most.

another possibility is QEMU, which already emulates several ARM varieties.

and, of course, if you're not interested in emulating a real-world device, far easier and better performance would be to roll your own. with only what you need, and not getting into the mess of state flags, overflow bits, limited bus width, RAM timings, etc.

I wrote Wren for a friend who wanted a VM language running on an embedded controller with around 16K of RAM. (But it allows up to 64k per process in the code as written.) It includes a compiler for a dumb little programming language. It's all, uh, pretty basic and hasn't seen much use, but it is just what you described in your first paragraph.

The FORTH "virtual machine" is about as simple as they come. 16-bit address space (typically), 16-bit data words, two stacks, memory. Loeliger's "Threaded Interpretive Languages" tells you a lot about how to build a FORTH interpreter on a Z80.

If you want simple, consider the Manchester Mark I. See page 15 of this PDF. The machine has 7 instructions. It takes about an hour to write an interpreter for it. Unfortunately, the instructions are pretty dang limited (which is why pretty much a full spec of the machine can fit on one page).

Javier's approach of rolling your own is very pragmatic. Designing and creating a tiny machine is a two day task, if that. I built a tiny VM for a project a few years ago and it took me two days to write the VM with a simple visual debugger.

Also - does it have to be RISC? You could choose, say, 68K for which there are open source emulators, and 68K was a well-understood target for gcc.

David Cary

Many people writing game programs and other applications embed a language into the application to allow users to write small programs.

As far as I can tell, the most popular embedded languages in very roughtly most-popular-first order (although "more popular" doesn't necessarily mean "better") seem to be:

  • domain-specific language custom-designed for this one specific application and used nowhere else
  • Lua a
  • Tcl
  • Python a b, often a simplified subset such as PyMite c
  • Forth
  • JavaScript a
  • Lisp
  • AngelScript a
  • XPL0 a b
  • Squirrel a
  • Haskell a b
  • NPCI (Nano Pseudo C Interpreted) a
  • RoboTalk
  • interpreting some hardware machine language (Why is this the least-popular choice? For good reasons, described below).

You may want to check out the Gamedev StackExchange, in particular questions like "How do you add a scripting language to a game?".

You may want to check out some of the questions here on StackOverflow tagged "embedded language"; such as "Selecting An Embedded Language", "What is a good embeddable language I can use for scripting inside my software?" "Alternatives to Lua as an embedded language?" "Which game scripting language is better to use: Lua or Python?", etc.

Many implementations of these languages use some sort of bytecode internally. Often two different implementations of the same high-level programming language such as JavaScript use two completely different bytecode languages internally ( a ). Often several high-level programming languages compile to the same underlying bytecode language -- for example, the Jython implementation of Python, the Rhino implementation of JavaScript, the Jacl implementation of Tcl, JScheme and several other implementations of Scheme, and several implementations Pascal; all compile to the same JVM bytecode.

details

Why use a scripting language rather than interpreting some hardware machine language?

Why "Alternate Hard And Soft Layers"? To gain simplicity, and faster development.

faster development

People generally get stuff working faster with scripting languages rather than compiled languages.

Getting the initial prototype working is generally much quicker -- the interpreter handles a bunch of stuff behind-the-scenes that machine language forces you to explicitly write out: setting the initial values of variables to zero, subroutine-prolog and subroutine-epilog code, malloc and realloc and free and related memory-management stuff, increasing the size of containers when they get full, etc.

Once you have an initial prototype, adding new features is faster: Scripting languages have rapid edit-execute-debug cycles, since they avoid the "compile" stage of edit-compile-execute-debug cycles of compiled languages.

simplicity

We want the embedded language language to be "simple" in two ways:

  • If a user wants to write a little code that does some conceptually trivial task, we don't want to scare this person off with a complex language that takes 20 pounds of books and months of study in order to write a "Hello, $USER" without buffer overflows.

  • Since we're implementing the language, we want something easy to implement. Perhaps a few simple underlying instructions we can knock out a simple interpreter for in a weekend, and perhaps some sort of pre-existing compiler we can reuse with minimal tweaking.

When people build CPUs, hardware restrictions always end up limiting the instruction set. Many conceptually "simple" operations -- things people use all the time -- end up requiring lots of machine-language instructions to implement.

Embedded languages don't have these hardware restrictions, allowing us to implement more complicated "instructions" that do things that (to a human) seem conceptually simple. This often makes the system simpler in both ways mentioned above:

  • People writing directly in the language (or people writing compilers for the language) end up writing much less code, spending less time single-stepping through the code debugging it, etc.

  • For each such higher-level operation, we shift complexity from the compiler to an instruction's implementation inside the interpreter. Rather than (you writing code in) the compiler breaking some higher-level operation into a short loop in the intermediate language (and repeatedly stepping through that loop in your interpreter at runtime), the compiler emits one instructions in the intermediate language (and you write the same series of operations in your interpreter's implementation of that intermediate "instruction"). With all the CPU intensive stuff implemented in your compiled language ("inside" complex instructions), extremely simple interpreters are often more than fast enough. (I.e., you avoid spending a lot of time building a JIT or trying to speed things up in other ways).

For these reasons and others, many game programmers use a "scripting" language as their "embedded language".

(I see now that Javier already recommended "use an embedded scripting language", so this has turned into a long rant on why that's a good alternative to interpreting a hardware machine language, and pointing out alternatives when one particular scripting language doesn't seem suitable).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!