Problem statement
Solution
To solve this problem, we need to come up with a way to separate the strings that have been encoded into a single string.
We could try to separate the strings using a simple separator such as a comma , or
hash symbol #. However, because the strings can contain any UTF-8 character, then any
single separator we use will have the same problem: the same character used to separate
the strings can exist in one of the strings, so decoding the strings would not work.
To fix this problem, we include an additional symbol next to the separator for each string. This symbol must be a property of the string that is next to it to ensure we don’t face the same aforementioned problem.
One possible property to use is the length of the current string, which is what we will
use in our solution. So, we encode each string as <length of string><separator><string>.
Pseudocode
Encoding:
- Initialize an empty list.
- For each string
sinstrs:
a. Append the string that consists of the length ofs, followed by the delimiter"#", and then the stringsitself to the empty list. - Return the encoded string.
Decoding:
- Initialize an empty list
resto store the decoded strings. - Initialize pointers
iandjasi = 0andj = 0. - While
i < len(s):
a. Move pointerjto the next instance of the delimiter"#"using a while loop.
b. Extract the length of the current substring asint(s[i : j])(which was previously encoded intos).
c. Let the start and end of the substring bej + 1andj + 1 + length of substring, respectively.
d. Extact the substring ass[start of substring : end of substring]and append it tores.
e. Move pointersiandjto the end of the substring. - Return
res.
Let $m$ be the sum of the lengths of all the strings.
For encoding, we require $O(m)$ time because we iterate over each character once to build the output and computing lengths and appending separators is constant-time per string. We also need $O(m)$ space because output storage is proportional to the total length of the final encoded string.
For decoding, we need $O(m)$ time because the decoding pointer moves through the encoded string linearly and without revisiting any character. We also need $O(m)$ space because we rebuild the original list of strings, whose cumulative size is $m$.
Python code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Solution:
def encode(self, strs: list[str]) -> str:
encoded = []
for s in strs:
encoded.append(f"{len(s)}#{s}")
return "".join(encoded)
def decode(self, s: str) -> list[str]:
strs = []
i = j = 0
while i < len(s):
while s[j] != "#":
j += 1
len_s = int(s[i:j])
start = j + 1
end = start + len_s
strs.append(s[start:end])
i = j = end
return strs