Israel bombs Iranian state TV during live broadcast

brucethemoose@lemmy.world · 2 days ago

Also a crime. Not just a great game in their niche, but a long history of them.

brucethemoose@lemmy.world · 2 days ago

Never underestimate Phil Spencer.

brucethemoose@lemmy.world · edit-2 4 days ago

…iOS forces uses Apple services including getting apps through Apple…

Can’t speak to the rest of the claims, but Android practically does too. If one has to sideload an app, you’ve lost 99% of users, if not more.

It makes me suspect they’re not talking about the stock systems OEMs ship.

Relevant XKCD: https://xkcd.com/2501/

brucethemoose@lemmy.world · edit-2 4 days ago

deleted by creator

brucethemoose@lemmy.world · 6 days ago

Chrome?

It’s probably a security risk like Android, and the vast majority use it.

Also, apps are better at sending notifications (like ICE warnings). IMO this is a pretty decent justification for an ‘iOS only’ app.

brucethemoose@lemmy.world · 7 days ago

even if it meant publishing on F-Droid instead of Google Play.

Sadly this means it’s not accessible to like 95% of people, even if driven to install it.

brucethemoose@lemmy.world · edit-2 7 days ago

One thing about Anthropic/OpenAI models is they go off the rails with lots of conversation turns or long contexts. Like when they need to remember a lot of vending machine conversation I guess.

A more objective look: https://arxiv.org/abs/2505.06120v1

https://github.com/NVIDIA/RULER

Gemini is much better. TBH the only models I’ve seen that are half decent at this are:

“Alternate attention” models like Gemini, Jamba Large or Falcon H1, depending on the iteration. Some recent versions of Gemini kinda lose this, then get it back.
Models finetuned specifically for this, like roleplay models or the Samantha model trained on therapy-style chat.

But most models are overtuned for oneshots like fix this table or write me a function, and don’t invest much in long context performance because it’s not very flashy.

brucethemoose@lemmy.world · edit-2 8 days ago

As opposed to the 24/7 Trump show?

I get your point this though.

brucethemoose@lemmy.world · edit-2 8 days ago

What @mierdabird@lemmy.dbzer0.com said, but the adapters arent cheap. You’re going to end up spending more than the 1060 is worth.

A used desktop to slap it in, that you turn on as needed, might make sense? Doubly so if you can find one with an RTX 3060, which would open up 32B models with TabbyAPI instead of ollama. Some configure them to wake on LAN and boot an LLM server.

brucethemoose@lemmy.world · edit-2 8 days ago

ChatGPT (last time I tried it) is extremely sycophantic though. Its high default sampling also leads to totally unexpected/random turns.

Google Gemini is now too.

And they log and use your dark thoughts.

I find that less sycophantic LLMs are way more helpful. Hence I bounce between Nemotron 49B and a few 24B-32B finetunes (or task vectors for Gemma) and find them way more helpful.

…I guess what I’m saying is people should turn towards more specialized and “openly thinking” free tools, not something generic, corporate, and purposely overpleasing like ChatGPT or most default instruct tunes.

brucethemoose@lemmy.world · edit-2 8 days ago

TBH this is a huge factor.

I don’t use ChatGPT much less use it like it’s a person, but I’m socially isolated at the moment. So I bounce dark internal thoughts off of locally run LLMs.

It’s kinda like looking into a mirror. As long as I know I’m talking to a tool, it’s helpful, sometimes insightful. It’s private. And I sure as shit can’t afford to pay a therapist out of the gazoo for that.

It was one of my previous problems with therapy: payment depending on someone else, at preset times (not when I need it). Many sessions feels like they end when I’m barely scratching the surface. Yes therapy is great in general and for deeper feedback/guidance, but still.

To be clear, I don’t think this is a good solution in general. Tinkering with LLMs is part of my living, I understand the jist of how they work, I tend to use raw completion syntax or even base pretrains.

But most people anthropomorphize them because that’s how chat apps are presented. That’s problematic.

brucethemoose@lemmy.world · 10 days ago

You can still use the IGP, which might be faster in some cases.

brucethemoose@lemmy.world · edit-2 11 days ago

Oh actually that’s a great card for LLM serving!

Use the llama.cpp server from source, it has better support for Pascal cards than anything else:

https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: https://huggingface.co/unsloth/InternVL3-14B-Instruct-GGUF

Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

I’m a bit ‘behind’ on the vision model scene, so I can look around more if they don’t feel sufficient, or walk you through setting up the llama.cpp server. Basically it provides an endpoint which you can hit with the same API as ChatGPT.

brucethemoose@lemmy.world · edit-2 13 days ago

1650

You mean GPU? Yeah, it’s good, I was strictly talking about purchasing a laptop for LLM usage, as most are less than ideal for the money. Laptop vram pools are relatively small and SO-DIMMS are usually very slow.

Things will get much better once the “Max” AMD SKUs proliferate.

brucethemoose@lemmy.world · edit-2 13 days ago

Yeah, just paying for LLM APIs is dirt cheap, and they (supposedly) don’t scrape data. Again I’d recommend Openrouter and Cerebras! And you get your pick of models to try from them.

Even a framework 16 is not good for LLMs TBH. The Framework desktop is (as it uses a special AMD chip), but it’s very expensive. Honestly the whole hardware market is so screwed up, hence most ‘local LLM enthusiasts’ buy a used RTX 3090 and stick them in desktops or servers, as no one wants to produce something affordable apparently :/

brucethemoose@lemmy.world · edit-2 13 days ago

I was a bit mistaken, these are the models you should consider:

https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ

https://huggingface.co/AnteriorAI/gemma-3-4b-it-qat-q4_0-gguf

https://huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)

they are state-of-the-art at this size, as far as I know.

brucethemoose@lemmy.world · edit-2 13 days ago

8GB?

You might be able to run Qwen3 4B: https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ/tree/main

But honestly you don’t have enough RAM to spare, and even a small model might bog things down. I’d run Open Web UI or LM Studio with a free LLM API, like Gemini Flash, or pay a few bucks for something off openrouter. Or maybe Cerebras API.

…Unfortunely, LLMs are very RAM intensive, and >4GB (more realistically like 2GB) is not going to be a good experience :(

brucethemoose@lemmy.world · edit-2 13 days ago

Actually, to go ahead and answer, the “fastest” path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).

Hopefully one of these models, depending on how much RAM you have:

https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125

https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ

https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508

https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ

With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/

And then use LM Studio (or some other MLX backend, or even free online API models) as the ‘engine’

Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I’d have to look around.

brucethemoose@lemmy.world · edit-2 13 days ago

Honestly perplexity, the online service, is pretty good.

As for local running, one question first: how much RAM does your Mac have? This is basically the factor for what model you can and should run.

brucethemoose@lemmy.world · edit-2 14 days ago

I don’t understand.

Ollama is not actually docker, right? It’s running the same llama.cpp engine, it’s just embedded inside the wrapper app, not containerized. It has a docker preset you can use, yeah.

And basically every LLM project ships a docker container. I know for a fact llama.cpp, TabbyAPI, Aphrodite, Lemonade, vllm and sglang do. It’s basically standard. There’s all sorts of wrappers around them too.

You are 100% right about security though, in fact there’s a huge concern with compromised Python packages. This one almost got me: https://pytorch.org/blog/compromised-nightly-dependency/

This is actually a huge advantage for llama.cpp, as it’s free of python and external dependencies by design. This is very unlike ComfyUI which pulls in a gazillian external repos. Theoretically the main llama.cpp git could be compromised, but it’s a single, very well monitored point of failure there, and literally every “outside” architecture and feature is implemented from scratch, making it harder to sneak stuff in.

brucethemoose@lemmy.world · edit-2 22 days ago

Israel bombs Iranian state TV during live broadcast

brucethemoose@lemmy.world · edit-2 2 months ago

Israel plans to occupy and flatten all of Gaza if no deal by Trump's trip

brucethemoose@lemmy.world · edit-2 3 months ago

[Meta] How do y'all post clips/animations on Lemmy? Only GIF seems to work.

brucethemoose@lemmy.world · edit-2 7 months ago

[Rumor] Shipping Listing Suggests 24GB+ Intel Arc B580

brucethemoose@lemmy.world · edit-2 9 months ago

Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 10 months ago

How does Lemmy feel about "open source" machine learning, akin to the Fediverse vs Social Media?

brucethemoose@lemmy.world · 11 months ago

Pressure grows as "last chance" negotiations for Gaza deal resume

brucethemoose@lemmy.world · 11 months ago

Hostage-ceasefire deal talks stall over new Netanyahu demands, Israeli officials say

brucethemoose@lemmy.world · 1 year ago

Alleged AMD Strix Halo APU Appears in Benchmark