개발을 하다보니 이런저런 인코딩이 많이 있어서
이더리움에 있는 RLP인코딩도 어디 뭐 인코딩 가져온 것인줄 알았는데
이더리움 데이타 저장을 위해 만든 인코딩이더군요.
여튼 개발을 위해 고민을 많이 한 흔적이 보입니다.
개발에 관심있는 분은 한 번 읽어보세요.
제가 중간중간 번역을 할 수도 있어서 원본을 일단 복사해서 옵니다. 양해 부탁드립니다.
위키문서에 추가 설명을 더 붙여놓아서 설명이 잘되어 있습니다.
원본링크 : https://medium.com/coinmonks/data-structure-in-ethereum-episode-1-recursive-length-prefix-rlp-encoding-decoding-d1016832f919
--------------------------------------------------------
Data structure in Ethereum | Episode 1: Recursive Length Prefix (RLP) Encoding/Decoding.
There are literally a lot of papers, blogs which explain how Ethereum organizes its data, but they all seem to be so disconnected and really hard to get an overall picture. In order to help you and confirm my understanding by the way, this series will explain one by one problem of data structure in Ethereum.
I will break out this subject into 5 main episodes and 1 extra espisode (I will call it the 1+ espisode):
- Recursive Length Prefix (RLP) Encoding/Decoding. And 1+ for Hex Prefix Encoding.
- Trie — Radix and Merkel.
- Trie — Patricia.
- Examples.
- State Tree Pruning.
First of all, we are going to make sense about RLP, so what is the purpose of RLP in Ethereum?
Recursive Length Prefix (RLP)
In computer science, data serialization is necessary for many complex data forms to be stored or transmitted in only one formal format. Because of that, RLP is an encoding/decoding algorithm that helps Ethereum to serialize data and possible to reconstruct them quickly.
RLP Encoding
As Ethereum mentioned, the RLP encoding function takes in an item. An item is defined as follows
- A string (will be converted to byte array) is an item
- A list of items is an item
For example, all of objects below are items:
- “dog”
- []
- [“dog”]
- [[], “dog”, [“cat”], “ ”]
RLD encoding is defined as follow:
- If input is a single byte in the
[0x00, 0x7f]
range, so itself is RLP encoding. - If input is non-value (uint(0), []byte{}, string(“”), empty pointer …), RLP encoding is
0x80
. Notice that0x00
value byte is not non-value. - If input is a special byte in
[0x80, 0xff]
range, RLP encoding will concatenates0x81
with the byte,[0x81, the_byte]
. - If input is a string with 2–55 bytes long, RLP encoding consists of a single byte with value
0x80
plus the length of the string in bytes and then array of hex value of string. It’s easy to see that the first byte is in[0x82, 0xb7]
range.
For example:“hello world” = [0x8b, 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64]
, because“hello world”
has11
bytes in dec or0x0b
in hex, so the first byte of RLP encoding is0x80 + 0x0b =0x8b
, after that we concatenate the bytes of“hello word”
. - If input is a string with more than 55 bytes long, RLP encoding consists of 3 parts from the left to the right. The first part is a single byte with value
0xb7
plus the length in bytes of the second part. The second part is hex value of the length of the string. The last one is the string in bytes. The range of the first byte is[0xb8, 0xbf]
.
For example: a string with 1024 “a” characters, so the encoding is“aaa…” = [0xb9, 0x04, 0x00, 0x61, 0x61, …]
. As we can see, from the forth element of array0x61
to the end is the string in bytes and this is the third part. The second part is0x04, 0x00
and it is the length of the string0x0400 = 1024
. The first part is0xb9 = 0xb7 + 0x02
with0x02
being the length of the second part. - If input is an empty array, RLP encoding is a single byte
0xc0
. - If input is a list with total payload in 0–55 bytes long, RLP encoding consists of a single byte with value
0xc0
plus the length of the list and then the concatenation of RLP encodings of the items in list. The range of the first byte is[0xc1, 0xf7]
.
For example:[“hello”, “world”] = [0xcc, 0x85, 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x85, 0x77, 0x6f, 0x72, 0x6c, 0x64]
. In this RLP encoding,[0x85, 0x68, 0x65, 0x6c, 0x6c, 0x6f]
is RLP encoding of“hello”
,[0x85, 0x77, 0x6f, 0x72, 0x6c, 0x64]
is RLP encoding of“world”
and0xcc = 0xc0 + 0x0c
with0x0c = 0x06 + 0x06
being the length of total payload. - If input is a list with total payload more than 55 bytes long, RLP encoding includes 3 parts. The first one is a single byte with value
0xf7
plus the length in bytes of the second part. The second part is the length of total payload. The last part is the concatenation of RLP encodings of the items in list. The range of the first byte is[0xf8, 0xff]
. - One more thing, it is not mentioned in wiki Ethereum but in Golang source code. With boolean type,
true = 0x01
andfalse = 0x80
.
RLP Decoding
RLP decoding is easier when you figure out how RLP encoding works. Actually, RLP decoding just receives encoded input and decodes the type, the length of that data.
- According to the first byte of input, RLP decoding analyses data the type, the length of the actual data and offset.
- According to the type and the offset of data, decode data correspondingly.
- Continue to decode the rest of the input if still possible.
RLP decoding is fully explained in wiki Ethereum and I don’t wanna waste our time on repeating something unnecessarily. I will put the references below.
Diving into RLP
Actually, Ethereum wiki has explained RLP extremely easy to understand, so I just reminded them by my writing style and what I expect in this article is about diving into RLP and getting deep understanding. Hmmm, again[알림: 이 게시글은 관리자에 의해 '블록체인'에서 '개발'로 이동되었습니다]
위 링크에서 실제 거래가 어떻게 RLP로 인코딩 되는지도 확인 가능합니다~