发表新帖

发表新帖

Twitter text compression challenge

前端未结

关注

 4  826

一生所求 2021-02-06 08:06

Rules

Your program must have two modes: encoding and decoding.
When encoding:
1. Your p

4条回答

臣服心动 (楼主)

2021-02-06 08:16
Here is my variant for actual English.

Each code point have something like 1100000 possible states. Well, that's a lot of space.

So, we stem all original text and get Wordnet synsets from it. Numbers are cast into english names ("fourty two"). 1,1M states will allow us to hold synset id (which can be between 0 and 82114), position inside synset(~10 variants, i suppose) and synset type (which is one of four - noun, verb, adjective, adverb). We even may have enough space to store original form of word (like verb tense id).

Decoder just feeds synsets to Wordnet and retrieves corresponding words.

Source text:
```
A white dwarf is a small star composed mostly of electron-degenerate matter. Because a
white dwarf's mass is comparable to that of the Sun and its volume is comparable to that 
of the Earth, it is very dense.
```
Becomes:
```
A white dwarf be small star composed mostly electron degenerate matter because white
dwarf mass be comparable sun IT volume be comparable earth IT be very dense
```
(tested with Online Wordnet). This "code" should take 27 code points. Ofcourse all "gibberish" like 'lol' and 'L33T' will be lost forever.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题