Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
lovasoa
on Aug 21, 2023
|
parent
|
context
|
favorite
| on:
Transcoding Latin 1 strings to UTF-8 strings at 12...
Not sure whether that was sarcastic, but ISO-8859-1 (Latin 1) encodes most european languages, not just latin.
https://en.wikipedia.org/wiki/ISO/IEC_8859-1
ko27
on Aug 21, 2023
[–]
But where do you find it? Almost the entirety of internet is UTF-8. You can always transcode to Latin 1 for testing purposes, but that raises the question of practical benefits of this algorithm.
tgv
on Aug 21, 2023
|
parent
|
next
[–]
Older corpora are probably still in Latin-1 or some variant. That could include decades of news paper publications.
lovasoa
on Aug 22, 2023
|
parent
|
prev
[–]
All of Europe has written in Latin 1 for a decade. There are billion of files encoded in Latin 1 everywhere.
ko27
on Aug 22, 2023
|
root
|
parent
[–]
Where?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://en.wikipedia.org/wiki/ISO/IEC_8859-1