Lenovo X1 Carbon Gen 1

I've owned this laptop since 2012. I've been very happy with it, and I think it's worth breaking down what I enjoy about it, and what I don't. I also want to share how the mindset I had when buying it - purchase for your needs now, not for things you want to do in the future - was spot-on and honestly not as extreme as it should have been.

I'm going to start breaking it down by feature first. This is the third thinkpad I've owned - my 2001 Thinkpad T-20, which I bought used in 2005 and was my main machine until 2009, set the standard for the years to come.

Keyboard

The most important part of the computer because it's how you connect to it, having a good-quality keyboard is not optional when computer shopping. It's still a low-profile scissor-action lenovo keyboard which feels pretty good, but the keys now have this upside-down bread-loaf shape, which is much less satisfying than the thinkpad keyboards of the past. However, it still feels great, probably nearly as good as the new butterfly keyboard on the latest Apple laptops.

Notably, it was not trivial to map the caps-lock key to operate as a control key. This is a fairly common thing for programmers to do - right above the shift key is too valuable a real-estate to leave to caps-lock. I'm not sure why it was more difficult than usual to remap this key, but I have the following command in my X11 startup script which manages to do so:

xmodmap -e 'keycode 66 = Control_L' -e 'clear Lock' -e 'add control = Control_L'

This laptop does not come with Dvorak keycaps as an option, and the indentations for the trackpoint means you couldn't move the caps anyway; though since nobody else would ever use this laptop this does not bother me.

This keyboard has a Fn key at the bottom left, where the control key would normally extend. This didn't bother me because I remap caps-lock, but I can understand it will bother people. I don't install the thinkpad extensions under linux so I've no idea if any of those function keys do anything.

One last thing about the keyboard - this one doesn't have a Break key. I'm extremely interested in secure operating systems, and this practice of not having a physical Break key bothers me a little. Nevertheless, it as other keys that could be mapped in its place, such as PrtSc.

Screen

As someone who spends more than two hours a day on public transport, 14" is the most useful size for a laptop. The X1 is extremely thin and light, which is effortlessly convenient. When I first considered the 16:9 ratio I found it silly - are people watching movies on their laptop? Why would you ever want a widescreen? However, it actually makes it a more convienient size. It's more like an A4 book than a big awkward tile. It has a 1600x900 natural resolution which looks fine.

Trackpad / Trackpoint

The trackpad is way too big, and will completely get on your nerves. You'll be typing and then accidentally select text and delete it, or accidentally click out of a game, or accidentally... who knows. However, it's not too difficult to disable the stupid thing:

xinput disable 11

The trackpoint is wonderful for a keyboard user - you don't have to take your hand far from the home row to move the cursor. Honestly, everything else being equal, this is what keeps me coming back to thinkpads.

Camera

Unfortunately, the laptop comes with an in-built camera. However, it is easy to cover it with a small segment of post-it note or with a piece of paper and sticky tape.

Solid-State Drive

The default SSD is 128 GiB in size, which may be small by most measures, but looking at my own usage patterns I'm not sure that's true. I divided it in two, leaving a 64 GiB partition with the standard Windows 8 install, and left a few GiB for swap. In the 59 GiB I have left, I built a 20 GiB windows VM. In the remaining space, I once ran out of room! However it turned out to be WebGL hammering the system log. I've never wanted more space.

That said, I did all my professional work on a chroot on a USB stick. Work's development environment was based on Ubuntu LTS. I would use Xephyr to run programs like IntelliJ Idea, which does not work well with a non-reparenting window manager, and even added a flag to prevent programs running within from grabbing my stinking keyboard focus.

Internals

This computer has 4 GiB of RAM which by the time I bought it was no longer enough to translate pypy with itself, let alone run my analysis tools on the live graphs. I decided with this purchase that I would not optimise for running large jobs on this laptop. Why not rent a server from AWS or GCP? I could get a machine with 120+ GiB of RAM for an hour for about AU$2. It turned out that the number of times I needed more than that 4 GiB in the time since, I can count on one hand.

The important thing is that the battery would last easily 6 hours, which was effectively unheard of in 2012. The performance frankly wasn't important.

That said, I did emulate Windows from time to time, using it to run SQL Server databases and a Java Swing client under test, and it held up perfectly.

It has a USB 3 port, a powered USB port, and Mini DisplayPort out. The laptop itself isn't wide enough for Ethernet, though it does have a 4G feature. I bought an external DVD drive, "just in case".

I very much considered going for an alternate architecture - I had a handful of uses for x86-64, but would have enjoyed going for ARM or something more exotic. In the end, it didn't make much of a difference, though the Windows VM thing was handy in the early days.

Setting Up For Development

I like a low-clutter but flexible environment to work in.

  • Debian Stable is a great base to build from. It has whatever tools you need, and I feel very experienced at wrangling it to my needs. I also use GNU Guix or Nix to install packages for my use or for development, which better handles different versions - especially different C library versions.

  • GNU Emacs. If you've never used an editor where the user experience is based around search and interactivity, I really pity you. Til this day, nothing has ever come close to the fantastic usability of Emacs. Especially if you have the essential ido-mode.

  • StumpWM is the window manager of choice - it's clearly written by someone who thinks usability and flexibility is important. The prefix key for window management commands is C-t, which on a Dvorak layout is on the home row, and doesn't shadow anything meaningful. There's also the fact that you can connect to the running process and control it from the command line or SLIME under emacs.

    Initially I also needed CLISP as SBCL had a bug that made StumpWM boot so extremely slowly, but that was fixed by the first Debian upgrade.

  • Conkeror is pretty close to the ideal browser. It organises open pages into buffers rather than tabs, which means it's easy to switch between them by searching. All navigation and editing can be done by searching. It's a bit tricky to figure out what firefox version you need to be running to keep conkeror working, but it's well worth it.

  • Schroot, which is a fantastic way to get environments from different systems running.

  • I also install Xephyr, for when applications don't want to work under StumpWM. I have added a flag to it to prevent applications within from grabbing the X input, which is such an ugly security-violating feature baked right into the X protocol.

  • HATE is a simple terminal emulator. It has no menus, which means you don't have to go to the effort of figuring out how to prevent the Alt key from being swallowed. How people use BASH without a working Alt key I have no idea.

Conclusion

Since AWS exists, don't buy specs that you don't need on a regular basis. Since laptops make terrible gaming machines, buy a much cheaper desktop that you can optimise for that purpose. If you want a laptop for programming, optimise for ergonomics.

Getting some rest - random thoughts

Today, I'm learning to get rest the hard way. No, not representational state transfer, and not reStructuredText - actually stopping and recovering from the last year. From looking for work while planning a wedding, to working for eight full-time hours and commuting for six, to getting run-down and never really recovering, this has been one crazy year. I've had no desire at all to put down my keyboard and stop thinking about code, but I feel that I no longer have any choice. I am so very tired.

My plan is to cut the time I spend coding on interesting problems into nice discreet blocks, to plan ahead the things I will work on in that time, and to take generous rest breaks. I'll spend some time writing about the things I've been playing with, even incomplete thoughts.

Entangled

I've moved my research project into its own garden, where I can tweak the IR and compiler to fit the needs of the analysis and optimiser. At the moment it can compile several versions of python bytecode into two different SSA forms - the first with implicit stack and locals, and the second with the more familliar arguments. It also has a web-based visualisation that can animate the SSA graph as it is constructed. I have a little work to do on the visualisation before I can get back to the next graph transformation - I want to display and highlight the use-dep relationships by drawing nice paths in SVG. Then I get to convert the explicit SSA form into one with explicit special-method invocation.

Starting over with my own compiler has made me realise some of the design decisions I wish had been made in pypy. Entangled has static descriptions of the languages it operates on, much like Chez Scheme has. This gives a great place to add extra information to particular operations, such as their stack effect, whether they never return, and how to decode their operand. It also means we can be certain of the language that reaches each pass - we know there are no operations we don't expect.

Another concept that I wish we had in pypy and in firm was the fact that there is no global state in the compiler. Individual passes can maintain global state, for example if we did call-family lowering in this compiler, that pass would mantain a concordance between functions and their 'call family', which is their set of invoked signatures as well as other functions that may be aliased at the same call site. However, once that pass was done, any information not present in the graph would be thrown away.

I've also discovered how to get 80 percent of the benefit with only 20 percent of the algebra: we can track, with static analysis, which objects we know have no other alias present on the stack. Or more weakly, which objects have no alias that has been written through. The truly interesting bit of this compiler will be how we determine this quality and how we use it.

Bytecode VMs

I love to play with bytecode VMs. Of particular interest to me are VMs for bootstrapping. These involve writing minimal assembler or C, and this providing a runtime on which you can write code to get things done. This is something of a mashup of two things: the bootstrapping king is FORTH, which can be defined quite easily with a little assembler code. On the other hand, a truly nice bytecode VM is implemented by femtolisp which contains entirely printable opcodes.

My bytecode VMs have no practical use - they are just fun to implement. First I like to define the types and operations I would like to have, then I define the representation of values and what state needs to be tracked, and then write a function to perform each operation. I take care that each operation is defined in terms of primitives that translate easily into assembly.

I've recently designed two of these languages - a fancy one with datatypes and fancy matching and all the stuff I'd want in a bytecode language - and a very simple one, with just array-lists, bytestrings, fixed-size integers, closures, and a handfull of constants (booleans and a none value). This last one has the handy benefit that integer literals appear almost as their hexadecimal representation in source. I've written a handful of functions in it, such as equals? and foldr, and can confirm it's not the sort of language you would want to use on a regular basis, but it is quite usable.

The most dramatic thing I have taken away from the latter implementation: make all your functions unary, and have them take a list. If instead you have your CALL_* operations take an extra nargs parameter, that requires a lot more work in the code for the operation, and then you still need to do extra work to actually get the arguments desired to the top of the stack. Another alternative is just giving the function the current stack to work with, but that truly complicates return, as well as one other thing I ran into.

That is, the second big takeaway - you really want a locals vector. Before I started creating one in my functions as a general practice, I found myself needing to set the stack up a certain way so that I could tail-call into different conditions. Most of the time, that was not a constant-time process. So make it easy - I tend to push a vector with default values for locals at function entry.

The third takeaway is that if people are going to write bytecode - that is, not compilers or assemblers - don't use a relative offset or even absolute position in your jump opcode: instead, have it take a command string as argument. This made function construction fairly readable, as they start with allocating an empty vector, then appending strings for each basic block, appending the rest of the function's 'environment', and then combining the environment with the functions start block.

I made a couple of other concessions for readability. Whitespace is a no-op, that is, space can be used to logically group different operations. Comments can be embedded, as the ; operation skips comands until the next newline. String literals can be embedded too, with the " operation performing a similar function.

Here's a code example - foldr:

;; ZL creates a list of length zero.  This will be our environment.
ZL
;; Here, the backtick ` aborts with the message on the top of the
;; stack, and the letter 'a' at the end appends this block to the
;; list.
"\"number is not iterable\"`"a
;; take ([) item 9 (X9) from the environment (e) and branch (q)
;; to it.  Presumably I wanted to keep the list functions near
;; each other.
"eX9[q"a
"eXb[q"a
"\"function is not iterable\"`"a
"\"none is not iterable\"`"a
"\"true is not iterable\"`"a
"\"false is not iterable\"`"a
"\"<unknown> is not iterable\"`"a
;; bX1 = first argument, ZmZ[ is the index, ZmX1[ is the
;; accumulated result
"eXA Z bX1[l ZmZ[ =p +q"a
"Zm bZ[ ZL bX1[ ZmZ[ [a ZmX1[a . X1] Zm ZmZ[ ] eX9[q"a
"ZmZ[z"a
;; strings
"eXE Z bX1[_ ZmZ[ =p +q"a
"Zm bZ[ ZL bX1[ ZmZ[ wa ZmX1[a . X1] Zm ZmZ[ ] eXD[q"a
"ZmZ[z"a

;; start block
;; push local vector [index, result], branch on type of arg1
"ZL Za bX2[a eX1mhq"
\  ;; Construct function

It's clearly not pretty or easy to use - but there is no undefined behaviour or wierd linking behaviour, and you could see how - stranded on a desert island with nothing but an m68k and a way to write the operations - you could bootstrap your way to a real language that you could work in. You could easily write a compiler to generate code for it, too.

Instruction Set Architectures

I'm a real chip geek, and it's an interesting time to be one: at the high end, there's real interest around RISC-V. On the lower end, devices with ARM processors in them are everywhere. You can even find some MIPS based routers if you are prepared to do some ground work. I got into software right after 2006 during the rise of amd64 in the consumer space. The thing to know about that time is that just like in the early 90s, you had to mortage your house to get something interesting that still had power to do what you want to do. The only real players at the time were the Sparc family and the Itanium II, and they were both priced well beyond anything I could afford.

Anyway, we now have the ability to design, test, and run different levels of emulation for any kind of hardware we want to build right from our desks - so it's well worth playing with. You can build stack architectures like the RTX2010 or Burroughs BLS 6000, or you can build wide vector architectures like AMD's GCN which powered the Radeon 5xxx series of graphics cards.

Me - I'm interested in three things: flexible security, wide execution, and great performance tools available to sophisticated compilers and safe runtimes. Oh, and the less patent-hobbled the better. I'm still bitter at Intel for pricing the Itanium II beyond developers who wanted to develop for it, and for killing the i960 in the consumer market.

It's great fun to play with the concepts behind ISA design. For example: have you ever wondered why the different operations on a CPU take the time that they do? Multiplication typically takes three times as long as addition, which might sound puzzling until you find out that they are probably using a Dadda multiplier which is truly a beautiful trick. Why, on a modern x86-64 CPU, does integer division take 29 mostly-pipelined cycles, but floating division take 9 unpipelined cycles? Because there is only one hardware low-latency division unit, and it's in the floating-point path. Fun exercise: figure out how they implemented the different division units.

For myself, writing a compiler that is good at figuring out how to remove false data dependencies, it's fun to look at wide architectures and figure out how to make them work with real-world programming languages. These are languages which are semantically about graph walking and control flow, and require all sorts of dynamic allocation patterns - but if we can figure out how to execute that sort of code on vector architectures, we have a great path to real performance improvements.

As a compiler architect, there are a few things I think are missing from the GPU that would make it a great tool for parallelising typical inner loops. These include a way to represent an append to the result list within each loop that may be conditional, ways to easily handle function pointers, partitioning the different streams by a value and handling those values one at a time.

I can't say yet that I've figured it out - until I can see the sort of changes I can make to the way we lay out objects in the compiler - but I'm having a lot of fun exploring the possibilities.

A good place to start playing actually would be the old Moxie Logic blogs about GCC support. I have often wondered if the GCC compiler could be augmented to graph the cost/benefit of different numbers of registers and seeing how they get used.

Securing Capability Web Services

I recently found myself implementing some capability web services, and got to thinking about how there is little online about how to do so securely. It is one thing to stay within an existing capability system such as waterken or E, and another to be implementing capabilities on top of a rest framework or websockets. Here I'll describe the technique that I like to use, and why.

What is a Capability Web Service?

The web is rife with authorisation schemes, from basic authentication through to OpenID. Capability Web Services are built around unguessable URLs. If the secret URL has been shared with you, perhaps sent to you via email, stashed in your bookmarks, or stored in a dashboard, then you have the authority to access the resource at that address. Capability URLs are often used for password resets, but they can be used for general authorisation and authentication, too; they can be much more secure than passwords when done right.

Making all web services Capability Web Services is not yet common practice, but capabilities themselves are a well-researched field of security with a wide range of usability benefits. Lets see how we can create one of these services in a popular framework.

The API

Building a capability service requires that we identify the resource in the URL string and also authenticate it. We might reason that we can use the same large random number both to designate the resource and to authenticate it. Here is a statically named resource built in that fashion:

@app.route("/widgets/nLinRVtJOKB0ow2yqOXLiD1fSTk1twHXAC7XoGBVOHQ=")
def my_resource():
    "bad example 0"
    return "static secret details"

This resource has a 256 bit random number identifying it - too long to be guessable. Nevertheless, a simple timing attack on this service will discover the number easily. What gives?

Flask uses a regular expression to match the query string against the route - and regular expressions do not run in constant time. The service will take a little longer to 404 when more of the path matches. Lets look at a dynamic resource.

@app.route("/widgets/<uuid:token>")
def my_resource(token):
    "bad example 2"
    for widget_id, secret in db.execute(
            select([widget]).where(widget.c.id == token)):
        return secret
    return make_response("No such widget", 404)

In this example, the uuid that identifies the resource is also the identifier in the database. I imagine that most people, when first tasked with developing a Capability Web Service, do something like the above. Unfortunately, it too is subject to a timing attack. This time the attack takes a little longer, however it can still determine which resources are stored in the database and then access them.

Lets see if we can build a page that is robust against timing attacks. This one will require that the part of the URL that must be kept secret is not used as the identifier for lookup. Here's how I do it in my web pages:

def is_equal(s, t):
    """Constant-time string equal function
    """
    if len(s) == len(t):
        result = 1
    else:
        result = 0
        t = s
    for x, y in zip(s, t):
        result &= int(x == y)
    return result


@app.route("/widgets/<string:token>")
def my_resource(token):
    "correct example"
    # parse the token into uniform bytes
    try:
        raw_token = b64decode(token)
    except TypeError:
        return make_response("Invalid resource", 404)
    # grab the id and auth out of the token
    xid = raw_token[-8:]
    xauth = raw_token[:-8]
    # find the resource in the database
    for widget_id, digest, salt, secret in db.execute(
            select([widget]).where(widget.c.id == xid)):
        # validate the provided auth against the resource itself
        xdigest = sha(xauth + salt).hexdigest()
        if is_equal(digest, xdigest):
            return secret
        else:
            break
    return make_response("Invalid resource", 404)

Generating such a token requires concatenating N random bytes of auth with 8 bytes of id. I recommend using 32 bytes of auth, though you could get away with less. Make sure to save the hexdigest to the database when you create the resource.

While an attacker might be able to guess your 8-byte ids, they will not reasonably be able to guess an entire 40-byte token.

Conclusion

We saw how to build a Capability Web Service that is robust against timing attacks. The process should be the same when using something like CapTP or WAMP over websockets: even though the identifier and the auth are combined into one token, lets not use the same data both for lookup and for performing the authorisation.

Introduction

I have had a blogspot blog for a long time, but it was never easy to post there. For example, I like to format my paragraphs with M-q when writing them, which lays them out it a nice readable way. Unfortunately, in blogspot this looks like a badly wrapped email. Similarly, python code was really hard to make look right - it would ignore my indentation, even within pre tags!

So this is a static site now, which should get in my way a lot less. I get to write reStructuredText, and that gets turned into a static site, with Nikola. Great stuff.