Logs: liberachat/#haskell
| 2026-03-19 16:48:26 | <Freakie> | I know, I was just off by a byte factor |
| 2026-03-19 16:48:35 | <EvanR> | and ghc usually uses a whole word for any field |
| 2026-03-19 16:49:13 | <Freakie> | but basically I have two algorithms that work on each other. both consume two lists of one type and accumulate two other lists of another type as output |
| 2026-03-19 16:49:33 | <EvanR> | accumulate? |
| 2026-03-19 16:49:45 | <Freakie> | I mean, just consing |
| 2026-03-19 16:49:50 | <EvanR> | by iteratively pushing to the beginning |
| 2026-03-19 16:49:58 | <Freakie> | yes? |
| 2026-03-19 16:50:02 | <EvanR> | ok |
| 2026-03-19 16:50:05 | <EvanR> | then what |
| 2026-03-19 16:50:31 | <Freakie> | then that continues till the input lists have been consumed |
| 2026-03-19 16:51:03 | <EvanR> | it matters how these accumulated lists are consumed |
| 2026-03-19 16:51:28 | <Freakie> | the output lists are only ever added to during the algorithms, and the input lists are only ever traversed linearly |
| 2026-03-19 16:51:46 | <EvanR> | if the output list is never consumed |
| 2026-03-19 16:52:03 | <Freakie> | I mean they have to by construction of my benchmark |
| 2026-03-19 16:52:15 | <EvanR> | it's either unobservable or a memory leak |
| 2026-03-19 16:52:27 | <Freakie> | they're fed straight into another iteration of the algorithms |
| 2026-03-19 16:53:26 | <Freakie> | heap profiling shows that 80-90% of the data recorded comes from building the output in the one algorithm, but I suspect it shows that because the lists are being retained in the other algorithm and possibly not deallocated on the run? |
| 2026-03-19 16:53:43 | <EvanR> | yes that would be my suspicion |
| 2026-03-19 16:54:08 | <Freakie> | by design the algorithms only work on lists and trees so it's bound to be inefficient but but scaling shouldn't be this bad |
| 2026-03-19 16:55:55 | <int-e> | sm: Well, I implemented a stupid thing: https://silicon.int-e.eu/lambdabot/State/where.html (static page, decodes where file in browser, no search, also not tested much) |
| 2026-03-19 16:56:13 | <Freakie> | I aborted the program after an hour and at that point only 5% of the runtime not spent on garbage collection which is not surprising given how much data is live and I don't have any compaction |
| 2026-03-19 16:58:15 | × | Jackneill quits (~Jackneill@188-143-82-16.pool.digikabel.hu) (Remote host closed the connection) |
| 2026-03-19 16:58:45 | → | Jackneill joins (~Jackneill@188-143-82-16.pool.digikabel.hu) |
| 2026-03-19 16:59:02 | × | Jackneill quits (~Jackneill@188-143-82-16.pool.digikabel.hu) (Remote host closed the connection) |
| 2026-03-19 16:59:10 | <EvanR> | retaining long lists live when they don't have to be is usually caused by the algorithm consuming lists in a suboptimal way. But it's also possible that's unavoidable because of the algorithm |
| 2026-03-19 16:59:43 | <Freakie> | I can most likely refactor it to include compaction but that doesn't really help me reduce the size of the live memory to begin with |
| 2026-03-19 17:00:01 | <monochrom> | [Int] of length n, even in the worst case (which doesn't happen often), takes only n*(24+8) bytes. (Often the +8 doesn't happen because GHC-compiled code actually does a flyweight pattern for small Ints.) |
| 2026-03-19 17:00:37 | <Freakie> | I've been trying for a while to find some way of actually tracking the size of data during runtime |
| 2026-03-19 17:00:46 | <Freakie> | it seems like I have precious few alternatives at least |
| 2026-03-19 17:01:53 | <monochrom> | Profiling is only as good as the person who interprets the profiling report. >:) |
| 2026-03-19 17:02:15 | <Freakie> | well in that case I am an imposter and a cad |
| 2026-03-19 17:05:01 | <EvanR> | monochrom, that sounds like an underestimate for a list node + Int box? or is list node / Int specially implemented to reduce space |
| 2026-03-19 17:06:01 | <monochrom> | 24 is already the node, 8 is already the Int |
| 2026-03-19 17:06:02 | <EvanR> | 8+8+8+8, would be the bare minimum without any runtime info |
| 2026-03-19 17:06:28 | <monochrom> | Oh oops s/8/16/ haha |
| 2026-03-19 17:07:04 | <gentauro> | btw, do you have re-use of pattern-matchin cases in Haskell (like we have en OCaml/F#)?. Example: `type FooBarBazQux = Foo | Bar | Baz | Qux;; let fooBarBazQux = function | Foo | Bar -> "foobar" | Baz | Qux -> "bazqux";;` |
| 2026-03-19 17:07:09 | <monochrom> | But that still doesn't cause n=10000 to become 1GB. So the hypothesis that n=10000 is rejected. |
| 2026-03-19 17:07:42 | <Freakie> | I mean I have empirical data on the total number of nodes generated |
| 2026-03-19 17:07:45 | <gentauro> | it seems that `case … of` requires to define duplicate code for `Foo and Bar` and then `Baz and Qux` |
| 2026-03-19 17:08:01 | <Freakie> | which, ok is closer to 100k when I reach 1 gb |
| 2026-03-19 17:08:21 | <EvanR> | gentauro, probably for the best. I really like reading haskell's "redundant" code |
| 2026-03-19 17:08:35 | <Freakie> | gentauro haskell doesn't have disjoint patterns but I tend to use guard patterns instead |
| 2026-03-19 17:08:45 | <EvanR> | instead of mentally decompressing it after someone was "too lazy to type it all out" |
| 2026-03-19 17:09:01 | <Freakie> | that feels a little like a strawman but ok |
| 2026-03-19 17:09:21 | <gentauro> | EvanR: I'm kind of used to the `SML` way :) |
| 2026-03-19 17:09:43 | <Freakie> | I tend to prefer haskell's syntax for the most part |
| 2026-03-19 17:09:55 | <gentauro> | you know `dry` (Don't repeat yourself) |
| 2026-03-19 17:09:56 | <EvanR> | the difference between a straight list of cases, and the ruby way of having algorithms generate the code or values |
| 2026-03-19 17:10:11 | <EvanR> | yeah I have often see "dry" be bad in haskell |
| 2026-03-19 17:11:06 | <haskellbridge> | <sm> int-e https://silicon.int-e.eu/lambdabot/State/where.html - very nice! Much simpler than https://haskell-links.org . How to make it discoverable ? |
| 2026-03-19 17:11:26 | <EvanR> | you do have pattern synonyms and view patterns |
| 2026-03-19 17:11:40 | <haskellbridge> | <sm> @where+ links https://silicon.int-e.eu/lambdabot/State/where.html I guess |
| 2026-03-19 17:12:21 | <Leary> | gentauro: https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/or_patterns.html |
| 2026-03-19 17:12:49 | <Freakie> | oh neat |
| 2026-03-19 17:14:31 | <int-e> | @where+ where https://silicon.int-e.eu/lambdabot/State/where.html |
| 2026-03-19 17:14:31 | <lambdabot> | I will never forget. |
| 2026-03-19 17:15:04 | <gentauro> | Leary: nice |
| 2026-03-19 17:15:39 | <gentauro> | this will work with minimalistic sum types due to -> "If we want the compiler to aid us in Haskell2010, we must write out all cases explicitly". |
| 2026-03-19 17:15:58 | <gentauro> | unless "underscore wildcard" can be used. |
| 2026-03-19 17:15:59 | <int-e> | (not opposed to changing the `links` entry, but for now it still works) |
| 2026-03-19 17:17:54 | <Leary> | Freakie: BTW, if you want good scaling, you probably need to write your algorithm to stream output as input comes in. You might be able to achieve that by building the output lists outside-in rather than inside-out, but the safer path would be to use a streaming library. |
| 2026-03-19 17:18:13 | <sm> | 👍🏻 and now added at https://joyful.com/Haskell+map#other+docs |
| 2026-03-19 17:18:27 | <Freakie> | technically I'm doing a thesis on algorithms designed to work efficiently on external memory which implies streaming by design |
| 2026-03-19 17:18:51 | <Freakie> | how do you mean outside-in though? |
| 2026-03-19 17:21:42 | <Leary> | When a recursive function `go` wants to yield an element `x`, do so by producing `x : go (...)`. |
| 2026-03-19 17:23:10 | <Freakie> | so the whole tail recursion modulo cons business? |
| 2026-03-19 17:24:04 | × | merijn quits (~merijn@77.242.116.146) (Ping timeout: 245 seconds) |
| 2026-03-19 17:24:25 | <Leary> | Maybe; I don't think from that angle. The important part is that the recursive call is guarded behind the `:` constructor, so it doesn't run until after `x` is consumed. |
| 2026-03-19 17:24:26 | <haskellbridge> | <sm> A new entry for the Haskell Book list, tcard__ : https://www.reddit.com/r/haskell/comments/1ry38dn/haskell_brain_teasers_is_now_available/ |
| 2026-03-19 17:24:43 | <Freakie> | i'll try to keep it in mind |
| 2026-03-19 17:25:03 | <Freakie> | at any rate I'm probably going to have to pivot by necessity to another part of the project and let the performance be what it is |
| 2026-03-19 17:25:18 | <Freakie> | it's just been bothering me for a while that I can't explain how hungry the algorithms are |
| 2026-03-19 17:26:20 | <pipsquak-bird> | why is computer job market so jacked now? india outsourcing and vendors trying to stamp out linux? |
| 2026-03-19 17:26:39 | <Freakie> | more like executives bought into the AI hype |
| 2026-03-19 17:26:44 | <pipsquak-bird> | heh |
| 2026-03-19 17:26:53 | <pipsquak-bird> | AI creates houses and megadorms! |
| 2026-03-19 17:27:02 | <pipsquak-bird> | free room food and net! |
| 2026-03-19 17:27:39 | ChanServ | sets mode +o monochrom |
| 2026-03-19 17:27:44 | monochrom | sets mode +b *!*@c-71-235-170-34.hsd1.ma.comcast.net |
| 2026-03-19 17:27:44 | pipsquak-bird | is kicked by monochrom (pipsquak-bird) |
| 2026-03-19 17:27:56 | <Freakie> | lmao tho |
| 2026-03-19 17:28:05 | <Franciman> | megadorms |
| 2026-03-19 17:28:24 | <Freakie> | Leary so I finally found a way to track the memory of values in runtime which also triggers a garbage collection before measuring and it seems to have completely squashed memory usage (at least within reason) at the cost of, uh, time |
| 2026-03-19 17:28:38 | <Freakie> | so I guess it's just a space leak somewhere after all |
| 2026-03-19 17:30:16 | <Freakie> | one complication of what you suggested is that I always return a pair where only one of the elements is updated |
| 2026-03-19 17:30:27 | <Freakie> | will that matter in terms of what you said? |
| 2026-03-19 17:30:53 | monochrom | sets mode -o monochrom |
| 2026-03-19 17:33:43 | <Leary> | Freakie: Probably, but you can change out `([a], [b])` for `[Either a b]`. |
| 2026-03-19 17:34:58 | <Freakie> | how will that relate to guarding behind a constructor? |
| 2026-03-19 17:36:31 | → | ft joins (~ft@p508db341.dip0.t-ipconnect.de) |
| 2026-03-19 17:37:35 | <Leary> | Well it's a single list, so the rest of the computation is always behind the next cons cell, and it knows what it's already done. With two lists, you would have two independent computations instead. |
| 2026-03-19 17:38:23 | <Freakie> | oh you meant a list of eithers |
| 2026-03-19 17:38:45 | <Freakie> | I don't think that's sound |
| 2026-03-19 17:39:37 | <Freakie> | only one of the pair elements is a list, the other is a priority queue |
| 2026-03-19 17:41:36 | <Freakie> | but yeah it turns out that stuff isn't being garbage collected |
| 2026-03-19 17:41:58 | <Freakie> | which is progress at least |
| 2026-03-19 17:42:28 | <Leary> | Does it actually need to be returned? I suspect you can just thread the queue state as an argument to the recursive call (or other continuation). |
| 2026-03-19 17:43:43 | <Freakie> | do you mean with a stateful priority queue? |
| 2026-03-19 17:45:05 | <Freakie> | both need to be returned because the priority queues make up with invariants of the algorithm |
| 2026-03-19 17:45:06 | → | tri joins (~tri@69.74.159.34) |
All times are in UTC.