Logs: liberachat/#haskell
| 2021-05-23 17:13:31 | × | SIben quits (~SIben@ns3106586.ip-5-135-191.eu) (Quit: leaving) |
| 2021-05-23 17:13:49 | → | SIben joins (~SIben@ns3106586.ip-5-135-191.eu) |
| 2021-05-23 17:14:20 | × | SIben quits (~SIben@ns3106586.ip-5-135-191.eu) (Client Quit) |
| 2021-05-23 17:14:26 | → | henrylaxen joins (~user@177.239.36.179) |
| 2021-05-23 17:16:00 | → | SIben joins (~SIben@ns3106586.ip-5-135-191.eu) |
| 2021-05-23 17:19:49 | <boxscape> | % writeFile "foo" "—" |
| 2021-05-23 17:19:49 | <yahb> | boxscape: *** Exception: foo: commitBuffer: invalid argument (invalid character) |
| 2021-05-23 17:19:55 | <boxscape> | why can haskell not write m-dashes? |
| 2021-05-23 17:20:39 | <boxscape> | (or read, for that matter) |
| 2021-05-23 17:21:09 | <lyxia> | that probably depends on the encoding and the default for yahb is probably not UTF-8 |
| 2021-05-23 17:21:15 | <boxscape> | hm, I see |
| 2021-05-23 17:23:34 | <boxscape> | weirdly my haskell script was able to read a file containing an m-dash yesterday but now refuses to do so |
| 2021-05-23 17:23:56 | <boxscape> | on the same machine and everything |
| 2021-05-23 17:24:13 | → | Mark_ joins (uid14803@user/mark/x-9597255) |
| 2021-05-23 17:25:04 | <lyxia> | You need to call an exorcist. |
| 2021-05-23 17:25:32 | <boxscape> | % System.IO.hGetEncoding System.IO.stdin |
| 2021-05-23 17:25:32 | <yahb> | boxscape: Just UTF-8 |
| 2021-05-23 17:25:35 | <boxscape> | I guess I will |
| 2021-05-23 17:25:53 | <koz> | The singularity is here. |
| 2021-05-23 17:26:01 | → | sondre joins (~sondrelun@cm-84.212.100.140.getinternet.no) |
| 2021-05-23 17:26:49 | <hpc> | % writeFile "foo" "-" |
| 2021-05-23 17:26:49 | <yahb> | hpc: |
| 2021-05-23 17:27:07 | <hpc> | just to be sure :P |
| 2021-05-23 17:27:14 | <monochrom> | In fact... |
| 2021-05-23 17:27:14 | <boxscape> | hah yes |
| 2021-05-23 17:27:19 | <monochrom> | % readFile "foo" |
| 2021-05-23 17:27:19 | <yahb> | monochrom: "-" |
| 2021-05-23 17:27:35 | <geekosaur> | that doesn't look like the same dash to me? |
| 2021-05-23 17:27:52 | <monochrom> | No, not intended to be the same dash. |
| 2021-05-23 17:27:54 | <hpc> | it isn't, i just wanted to make sure it wasn't lying and it was somehow an IO error of some sort |
| 2021-05-23 17:28:03 | <monochrom> | Instead, testing what writeFile does "normally". |
| 2021-05-23 17:28:08 | <monochrom> | control experiment |
| 2021-05-23 17:28:55 | <ptrcmd> | %readFile "foo" |
| 2021-05-23 17:29:01 | <ptrcmd> | % readFile "foo" |
| 2021-05-23 17:29:01 | <yahb> | ptrcmd: "-" |
| 2021-05-23 17:29:47 | × | geekosaur quits (~geekosaur@069-135-003-034.biz.spectrum.com) (Remote host closed the connection) |
| 2021-05-23 17:30:46 | <lyxia> | % localeEncoding |
| 2021-05-23 17:30:46 | <yahb> | lyxia: ASCII |
| 2021-05-23 17:30:50 | × | Fare quits (~fare@c-66-31-47-143.hsd1.ma.comcast.net) (Ping timeout: 264 seconds) |
| 2021-05-23 17:31:44 | × | xaotuk quits (~xaotuk@89.110.231.41) (Quit: WeeChat 3.1) |
| 2021-05-23 17:31:49 | <monochrom> | Hrm that's strange. It takes manual effort to set that. |
| 2021-05-23 17:34:11 | <boxscape> | % withFile "foo23" WriteMode (\h -> hGetEncoding h >>= print >> hSetEncoding h utf8 >> hGetEncoding h >>= print >> hPutStrLn h "—") >> withFile "foo23" ReadMode (\h -> hSetEncoding h utf8 >> hGetContents h >>= putStrLn) |
| 2021-05-23 17:34:12 | <yahb> | boxscape: Just ASCII; Just UTF-8; — |
| 2021-05-23 17:34:13 | <boxscape> | I guess that works |
| 2021-05-23 17:35:11 | → | geekosaur joins (~geekosaur@069-135-003-034.biz.spectrum.com) |
| 2021-05-23 17:35:26 | <boxscape> | Ah! I figured out why it worked yesterday |
| 2021-05-23 17:35:44 | <boxscape> | yesterday, I ran the runghc I got from ghc.nix, but today I used the one I got from `nix-shell -p ghc` |
| 2021-05-23 17:36:52 | → | marinelli joins (~marinelli@2a01:4f8:211:ae5::2) |
| 2021-05-23 17:37:00 | → | janislago joins (~user@c-24-98-52-54.hsd1.ga.comcast.net) |
| 2021-05-23 17:37:04 | ← | janislago parts (~user@c-24-98-52-54.hsd1.ga.comcast.net) (ERC (IRC client for Emacs 27.2)) |
| 2021-05-23 17:39:14 | × | gambpang quits (~ian@c-69-246-197-46.hsd1.il.comcast.net) (Ping timeout: 264 seconds) |
| 2021-05-23 17:40:48 | <tomsmeding> | Locale settings are the worst |
| 2021-05-23 17:41:21 | <tomsmeding> | Beware the day you're naively reading a floating point value from a string and it fails because you're now in a European locale where it expects , instead of . |
| 2021-05-23 17:41:31 | <boxscape> | the correct encoding always depends only on the file you read, right? |
| 2021-05-23 17:41:35 | <tomsmeding> | C does that, I hope haskell doesn't |
| 2021-05-23 17:42:26 | <tomsmeding> | I wrote a C++ application once that broke when I added gtk+ as a dependency, because on initialisation that set the locale to a local thing that made floating point IO ops use , |
| 2021-05-23 17:42:33 | <monochrom> | Files have no metadata stating its encoding. |
| 2021-05-23 17:42:46 | × | merijn quits (~merijn@83-160-49-249.ip.xs4all.nl) (Ping timeout: 264 seconds) |
| 2021-05-23 17:43:34 | <boxscape> | monochrom right, but the characters are going to be encoded in some encoding anyway, so you might not be able to figure out what the right encoding is, but it shouldn't depend on anything except the file you're reading, is my understanding |
| 2021-05-23 17:43:58 | × | fabfianda quits (~fabfianda@net-93-148-125-174.cust.dsl.teletu.it) (Ping timeout: 264 seconds) |
| 2021-05-23 17:44:07 | <monochrom> | Yeah, ideally, someone tells you via a back channel. |
| 2021-05-23 17:44:11 | <boxscape> | okay |
| 2021-05-23 17:44:23 | → | fabfianda joins (~fabfianda@mob-5-90-249-226.net.vodafone.it) |
| 2021-05-23 17:44:34 | <monochrom> | In practice, everyone disagree what the back channel should be. |
| 2021-05-23 17:44:39 | <boxscape> | nice |
| 2021-05-23 17:44:41 | <monochrom> | Naw, it's worse. |
| 2021-05-23 17:45:18 | <monochrom> | In practice, everyone agrees there should be one universally adopted encoding, and disagrees what it should be. |
| 2021-05-23 17:48:14 | × | sander quits (~sander@164.89-11-223.nextgentel.com) (Quit: So long! :)) |
| 2021-05-23 17:48:27 | → | sander joins (~sander@164.89-11-223.nextgentel.com) |
| 2021-05-23 17:59:33 | → | ku joins (~ku@2601:280:c780:7ea0:f045:534c:8a14:7395) |
| 2021-05-23 18:04:04 | <c_wraith> | let's settle on UTF-16, the worst of all worlds |
| 2021-05-23 18:05:46 | → | haskman joins (~haskman@106.215.24.177) |
| 2021-05-23 18:06:46 | × | raehik1 quits (~raehik@cpc95906-rdng25-2-0-cust156.15-3.cable.virginm.net) (Ping timeout: 264 seconds) |
| 2021-05-23 18:06:49 | <hpc> | https://en.wikipedia.org/wiki/UTF-7 |
| 2021-05-23 18:08:23 | <hpc> | no matter how bad UTF-16 is, it doesn't have security flaws :D |
| 2021-05-23 18:08:57 | → | pretty_dumm_guy joins (~trottel@188.241.83.100) |
| 2021-05-23 18:09:29 | ← | safinaskar parts (~user@109.252.90.89) () |
| 2021-05-23 18:10:40 | <davean> | it sure would be harder, but do you think something similar is entirely impossible with UTF-16? |
| 2021-05-23 18:11:05 | <davean> | Thats a fundimental data-handling flaw in the program handling the data. |
| 2021-05-23 18:11:15 | × | henrylaxen quits (~user@177.239.36.179) (Remote host closed the connection) |
| 2021-05-23 18:14:52 | <hpc> | unicode in general does have normalization issues, but they're limited to things like ä |
| 2021-05-23 18:15:21 | <davean> | hpc: but I can read UTF-16 as ascii all I want. |
| 2021-05-23 18:15:38 | <davean> | its not like that fails - and thats all that attack is s/utf-16/utf-7/ |
| 2021-05-23 18:15:47 | <davean> | it just happens that for HTML this is a convinient confusion |
| 2021-05-23 18:15:58 | <davean> | other encodings can have convinient confusions for different things. |
| 2021-05-23 18:16:22 | <hpc> | oh well sure, for that first point on the wiki page |
| 2021-05-23 18:16:32 | <hpc> | i was thinking more the XSS thing, where "<" has multiple representations |
| 2021-05-23 18:16:41 | <davean> | ah |
| 2021-05-23 18:17:03 | <davean> | I mean same issue really - that just makes it easier to not notice you've fucked up your encoding handling. |
| 2021-05-23 18:17:59 | <hpc> | yeah, the real answer is normalizing your input since that generalizes to all encodings of unicode |
| 2021-05-23 18:19:47 | → | eggplantade joins (~Eggplanta@108-201-191-115.lightspeed.sntcca.sbcglobal.net) |
| 2021-05-23 18:19:50 | <davean> | ight |
| 2021-05-23 18:20:07 | <davean> | utf-7 just has a convinient bad example |
| 2021-05-23 18:21:00 | <hpc> | well, the point is that issue exists for every codepoint |
| 2021-05-23 18:21:07 | <hpc> | not just the stuff that's equivalent to combining characters |
| 2021-05-23 18:26:53 | → | hnOsmium0001 joins (uid453710@id-453710.stonehaven.irccloud.com) |
| 2021-05-23 18:28:16 | → | WikiLycurgus joins (~juan@cpe-45-46-140-49.buffalo.res.rr.com) |
| 2021-05-23 18:29:42 | × | biberu quits (~biberu@user/biberu) (Read error: Connection reset by peer) |
| 2021-05-23 18:30:27 | → | biberu joins (~biberu@user/biberu) |
| 2021-05-23 18:30:31 | × | river quits (~river@user/river) (Read error: Connection reset by peer) |
All times are in UTC.