<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Python - Tag - Ateon</title><link>https://ateon.ch/tags/python/</link><description>Python - Tag - Ateon</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 29 May 2026 19:00:00 +0200</lastBuildDate><atom:link href="https://ateon.ch/tags/python/" rel="self" type="application/rss+xml"/><item><title>Formally Verified ASN.1 Encoders and Decoders</title><link>https://ateon.ch/posts/formally-verified-asn1/</link><pubDate>Fri, 29 May 2026 19:00:00 +0200</pubDate><author>Luca Schafroth</author><guid>https://ateon.ch/posts/formally-verified-asn1/</guid><description><![CDATA[<p>When a satellite sends telemetry data to a ground station, both sides need to agree on exactly how that data is structured in binary. The same goes for network protocols, aircraft systems, and anything else where machines exchange precisely formatted messages. Get the encoding wrong and you get garbage. Get the decoding wrong and you may silently recover incorrect data.</p>
<h2 id="why-does-asn1-need-formal-verification">Why Does ASN.1 Need Formal Verification?</h2>
<p>In safety-critical systems, bugs in communication protocol implementations can produce incorrect encodings or silent decoding errors that a finite test suite may not catch. <strong>ASN.1</strong> (Abstract Syntax Notation One) is an international standard for describing data structures independently of any programming language or platform.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> You define your types once (integers, strings, sequences, enumerations). A separate set of encoding rules then determines how those types map to bytes on the wire.</p>
<p>A simple definition looks like this:</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-asn1"><span class="code-title has-title"><i class="arrow fas fa-angle-right fa-fw" aria-hidden="true"></i>
                <span class="code-filename">ASN1</span>
            </span>
        <span class="ellipses"><i class="fas fa-ellipsis-h fa-fw" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy fa-fw" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">Temperature ::= INTEGER (0..100)</span></span></code></pre></div></div><p>This says <code>Temperature</code> is an integer constrained to the range 0–100. The encoding rules then pack that value into as few bits as possible.</p>
<p>The encoding rules used in this thesis are <strong>uPER</strong> (Unaligned Packed Encoding Rules). uPER is compact, which makes it a natural fit for embedded systems and satellite communication where bandwidth is limited. The European Space Agency uses ASN.1 with uPER for telemetry and telecommand data between spacecraft and ground stations.</p>
<h2 id="what-is-asn1scc">What Is ASN1SCC?</h2>
<p>Writing encoders and decoders by hand for every type in a specification is tedious and error-prone. The ESA developed <a href="https://github.com/esa/asn1scc" target="_blank" rel="noopener noreffer ">ASN1SCC</a> to automate this: an open-source compiler that takes an ASN.1 specification as input and generates the corresponding encoding and decoding code.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> It supports C, Ada, and Scala, with a Python backend under active development.</p>
<p>Given the <code>Temperature</code> definition above, ASN1SCC generates a Python class with <code>encode</code> and <code>decode</code> methods. Using them looks roughly like this:</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python"><span class="code-title"><i class="arrow fas fa-angle-right fa-fw" aria-hidden="true"></i></span>
        
        <span class="ellipses"><i class="fas fa-ellipsis-h fa-fw" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy fa-fw" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">encoder</span> <span class="o">=</span> <span class="n">UPEREncoder</span><span class="o">.</span><span class="n">of_size</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">val</span> <span class="o">=</span> <span class="n">Temperature</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">val</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">encoder</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">data</span> <span class="o">=</span> <span class="n">encoder</span><span class="o">.</span><span class="n">get_bitstream_buffer</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">decoder</span> <span class="o">=</span> <span class="n">UPERDecoder</span><span class="o">.</span><span class="n">from_buffer</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">result</span> <span class="o">=</span> <span class="n">Temperature</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">decoder</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Is result == val?</span></span></span></code></pre></div></div><p>The question my thesis set out to answer: can we <strong>prove</strong> that <code>result == val</code> holds for every valid input, not just the ones we happened to test?</p>
<h2 id="why-testing-alone-isnt-enough">Why Testing Alone Isn&rsquo;t Enough</h2>
<p>Tests are the standard answer. Write inputs, check outputs, add edge cases. Done carefully, this catches a lot of bugs.</p>
<p>But tests only cover the cases you thought to write. A <code>Temperature</code> value of 42 passes. What about 127? What about 128, where the bit-packing crosses a byte boundary? What about the exact edges of the constraint range? What about a complex nested structure with a dozen fields, where the encoder for each field must leave the stream in exactly the right state for the next one?</p>
<div class="details admonition info">
    <div class="details-summary admonition-title">
        <i class="icon fas fa-info" aria-hidden="true"></i>Automated testing variants<i class="details-icon fas fa-angle-right" aria-hidden="true"></i>
    </div>
    <div class="details-content">
        <div class="admonition-content"><p>Testing has more sophisticated variants. <strong>Symbolic execution</strong> (e.g., KLEE) treats inputs as symbolic variables and automatically generates concrete inputs to cover different code paths, which is far more systematic than writing tests by hand. <strong>Fuzzing</strong> generates large volumes of random or mutation-based inputs and can find bugs that deterministic test suites miss entirely.</p>
<p>Both techniques close some of the coverage gap. But they still explore a finite set of execution paths. For programs with unbounded inputs or complex loop structures, neither can guarantee that every case has been covered. Formal verification closes that gap.</p>
</div>
    </div>
</div>
<p>Formal verification is a different game. You write a mathematical statement about what the code must do for <em>all</em> inputs, and a tool proves or disproves it automatically. No enumeration of cases.</p>
<h2 id="what-i-set-out-to-prove">What I Set Out to Prove</h2>
<p>The property I focused on is <strong>round-trip correctness</strong>:</p>
<blockquote>
<p>For all valid inputs, decoding the output of an encoder recovers the original value.</p></blockquote>
<p>Formally: $\forall x . decode(encode(x)) = x$</p>
<p>The proof is scoped to valid inputs: the precondition requires constraint-satisfying values on the encoder side and a well-formed buffer on the decoder side. It says nothing about how the decoder handles malformed input from an untrusted source. But within that scope it gives a precise, unconditional correctness statement: the encoder cannot silently corrupt a value, and the decoder cannot misread what the encoder wrote.</p>
<h2 id="nagini-formal-verification-for-python">Nagini: Formal Verification for Python</h2>
<p>The verifier I used is <a href="https://github.com/marcoeilers/nagini" target="_blank" rel="noopener noreffer ">Nagini</a>, a static analysis tool for Python developed by Marco Eilers at ETH Zurich.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup> Nagini lets you annotate Python functions with preconditions and postconditions, then uses an SMT solver to prove those statements hold for every possible execution. Under the hood it translates Python to Viper, an intermediate verification language.<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup></p>
<p>An annotated encode function looks like this:</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python"><span class="code-title"><i class="arrow fas fa-angle-right fa-fw" aria-hidden="true"></i></span>
        
        <span class="ellipses"><i class="fas fa-ellipsis-h fa-fw" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy fa-fw" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">encode</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">codec</span><span class="p">:</span> <span class="n">UPEREncoder</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">Requires</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">is_constraint_valid</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">    <span class="n">Ensures</span><span class="p">(</span><span class="n">codec</span><span class="o">.</span><span class="n">segments</span> <span class="o">==</span> <span class="n">Old</span><span class="p">(</span><span class="n">codec</span><span class="o">.</span><span class="n">segments</span><span class="p">)</span> <span class="o">+</span> <span class="n">segments_of</span><span class="p">(</span><span class="bp">self</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># ... implementation ...</span></span></span></code></pre></div></div><p><code>Requires</code> is the precondition: the value being encoded must satisfy its constraints. <code>Ensures</code> is the postcondition: the encoder&rsquo;s state has been extended by exactly the segments representing this value. Once Nagini accepts this, no test needs to cover that contract. It holds unconditionally.</p>
<h2 id="the-segment-abstraction">The Segment Abstraction</h2>
<p>Encoding writes bits into a shared byte buffer. A single write can span two bytes, and you need to reason precisely about which bits changed and which didn&rsquo;t. Tracking this at the bit level throughout the whole proof would be unmanageable.</p>
<p>The approach I used is a three-layer architecture. The bottom two layers handle actual bit manipulation: individual bits within a byte, then multi-bit writes across the full buffer. Above those sits a <strong>segment abstraction</strong> used purely for verification. Instead of tracking which bits changed, each write is recorded as a <code>(value, length)</code> pair called a segment.</p>
<div class="mermaid" id="id-1">graph TB
    A[&#34;Segment abstraction&lt;br/&gt;Encoders and decoders reason at this level&#34;]
    B[&#34;Byte-sequence layer&lt;br/&gt;Tracks bit writes across the buffer&#34;]
    C[&#34;Bit-level layer&lt;br/&gt;Individual bit read/write within a byte&#34;]
    A --- B --- C</div>
<p>Once the bottom layers are proved correct, the segment abstraction guarantees that the sequence of segments corresponds exactly to the buffer contents. Encoder and decoder proofs then work entirely with segments, without bit arithmetic. That separation is what makes the round-trip proofs tractable, and it distinguishes this approach from the bit-list intermediate representation used in the prior Scala verification work.<sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup></p>
<p>I&rsquo;ll cover the segment abstraction and compositional proof structure in more detail in a follow-up post.</p>
<h2 id="what-was-formally-verified">What Was Formally Verified</h2>
<p>The first component verified was <code>BitStream</code>, the core data structure shared by all generated codecs. The verification establishes absence of runtime errors (index out-of-bounds, overflow) and full functional correctness of all read and write operations: each written value is correctly retrieved by a subsequent read, and previously written data is unchanged. Everything else rests on this.</p>
<p>Building on <code>BitStream</code>, six ASN.1 types were proved to have round-trip correctness under uPER:</p>
<ul>
<li><code>BOOLEAN</code></li>
<li><code>NULL</code></li>
<li><code>ENUMERATED</code></li>
<li><code>INTEGER</code> (constrained range)</li>
<li><code>OCTET STRING</code> (fixed size)</li>
<li><code>SEQUENCE</code> (with fixed-size, non-optional fields)</li>
</ul>
<p>Types like <code>SEQUENCE OF</code>, <code>CHOICE</code>, <code>BIT STRING</code>, and <code>REAL</code> were not verified. Most follow the same proof pattern and are primarily a matter of implementing type-specific auxiliary functions. <code>REAL</code> is the exception: it requires further development in Nagini&rsquo;s floating-point support before it can be tackled at the codec level.</p>
<h2 id="the-cost-annotation-overhead">The Cost: Annotation Overhead</h2>
<p>Formal verification is not free. Proofs require writing specifications alongside the implementation. Across the verified runtime files, annotation lines account for <strong>39.9%</strong> of the codebase: 1,636 specification lines alongside 2,461 lines of implementation.</p>
<p>The distribution is uneven by design. <code>bitstream.py</code>, which establishes the segment abstraction at the byte-sequence level, has more specification than implementation (68% annotation overhead). The encoder and decoder, working at the segment level rather than at the bit level, need far less: 12.6% and 13.5% respectively. The annotation burden concentrates at the foundation, so the higher-level proofs stay comparatively lightweight.</p>
<p><code>segment.py</code> and <code>verification.py</code> consist entirely of specification code with no runtime counterparts; they exist solely to support the proof.</p>
<p>The generated data classes sit at 54% specification, since each class needs its own postconditions and helper lemmas. That&rsquo;s the cost of annotating code you didn&rsquo;t write.</p>
<h2 id="two-bugs-found-before-running-the-prover">Two Bugs Found Before Running the Prover</h2>
<p>Writing formal specifications sometimes finds bugs before the prover even runs. Precisely stating what the code <em>should</em> do exposes gaps between that and what it <em>actually</em> does. Two bugs turned up in the ASN1SCC Python backend this way:</p>
<ol>
<li>The <code>is_constraint_valid</code> check for <code>INTEGER</code> was missing the lower bound of zero, accepting negative values as valid.</li>
<li>The <code>is_constraint_valid</code> check for <code>OCTET STRING</code> did not enforce the fixed-size constraint, accepting arrays of any length.</li>
</ol>
<p>Both were caught just from writing the specification, before running a single proof.</p>
<h2 id="how-i-extended-nagini">How I Extended Nagini</h2>
<p>The verification also required extending Nagini to handle Python features it couldn&rsquo;t verify before:</p>
<ul>
<li><code>bytearray</code>: a mutable heap-allocated type, modelled as a <code>Seq[Int]</code> in Viper with a permission predicate governing access, plus a pure <code>PByteSeq</code> counterpart for use in specifications</li>
<li>Shift operators (<code>&lt;&lt;</code> and <code>&gt;&gt;</code>): encoded via integer arithmetic, since SMT integers don&rsquo;t support bitwise shifts directly; left shift by <em>k</em> becomes multiplication by 2^k, resolved through a case distinction over the shift amount</li>
<li>Dataclasses: <code>@dataclass</code>-decorated classes with implicit <code>__init__</code>, supporting frozen and non-frozen forms and factory defaults</li>
<li><code>IntEnum</code>: integer-backed enumerations, encoded with boxing/unboxing functions that enforce the set of valid values at the type level</li>
</ul>
<p>Beyond new features, six crashes and three soundness issues in Nagini were identified and reported to the <a href="https://github.com/marcoeilers/nagini/issues" target="_blank" rel="noopener noreffer ">issue tracker</a>, each with a minimal reproducing test case. All were subsequently fixed. One soundness bug was particularly subtle: because a Python integer subclass satisfies <code>A(5) == 5</code>, Nagini was misled into accepting the trivially false assertion <code>assert 2 == 1</code> as valid, which I found while writing tests that were supposed to fail.</p>
<p>I&rsquo;ll cover these extensions in more detail in a follow-up post.</p>
<h2 id="artifacts-and-prior-work">Artifacts and Prior Work</h2>
<p>The full thesis is available on the <a href="https://www.pm.inf.ethz.ch/education/student-projects/completedprojects.html" target="_blank" rel="noopener noreffer ">completed projects page</a> of the Programming Methodology Group at ETH Zurich.
Changes to Nagini have been committed to the <a href="https://github.com/marcoeilers/nagini" target="_blank" rel="noopener noreffer ">Nagini repository</a> directly.
ASN1SCC is open source on <a href="https://github.com/esa/asn1scc" target="_blank" rel="noopener noreffer ">GitHub</a>.</p>
<p>This work builds on a prior project that applied the same round-trip verification approach to ASN1SCC&rsquo;s Scala backend.<sup id="fnref1:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup> The aim was to show the same correctness class is achievable in Python.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>ITU-T, <em>X.680: Information Technology – Abstract Syntax Notation One (ASN.1)</em>, 2021. <a href="https://www.itu.int/rec/T-REC-X.680/" target="_blank" rel="noopener noreffer ">https://www.itu.int/rec/T-REC-X.680/</a>&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>G. Mamais, T. Tsiodras, D. Lesens, M. Perrotin, &ldquo;An ASN.1 compiler for embedded/space systems,&rdquo; <em>ERTS 2012</em>, Toulouse, France. <a href="https://hal.science/hal-02263447" target="_blank" rel="noopener noreffer ">https://hal.science/hal-02263447</a>&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>M. Eilers, P. Müller, &ldquo;Nagini: A Static Verifier for Python,&rdquo; <em>Computer Aided Verification (CAV)</em>, 2018, pp. 596–603.&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>P. Müller, M. Schwerhoff, A. J. Summers, &ldquo;Viper: A Verification Infrastructure for Permission-Based Reasoning,&rdquo; <em>VMCAI</em>, 2016. <a href="https://viper.ethz.ch" target="_blank" rel="noopener noreffer ">https://viper.ethz.ch</a>&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:5">
<p>M. Bucev, S. Chassot, S. Felix, F. Schramka, V. Kunčak, &ldquo;Formally Verifiable Generated ASN.1/ACN Encoders and Decoders: A Case Study,&rdquo; arXiv:2412.07235, 2024. <a href="https://arxiv.org/abs/2412.07235" target="_blank" rel="noopener noreffer ">https://arxiv.org/abs/2412.07235</a>&#160;<a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>]]></description></item></channel></rss>