From 6e6c3776ea7e97591907539f91c52bb928402a86 Mon Sep 17 00:00:00 2001 From: howardseamon96 Date: Mon, 10 Feb 2025 01:22:23 +0800 Subject: [PATCH] Add Simon Willison's Weblog --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..5b7111d --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://storymaps.nhmc.uoc.gr) in part [utilizing](http://--.u.k37cgi.members.interq.or.jp) their [unreleased](http://tamimiglobal.com) R1 "reasoning" model. Today they have actually [launched](https://www.firmendatenbanken.de) R1 itself, together with an entire family of [brand-new designs](http://gemliksenerinsaat.com) obtained from that base.
+
There's a great deal of stuff in the [brand-new release](http://marionjouclas.fr).
+
DeepSeek-R1-Zero seems the [base model](https://unrivalledsecurity.co.uk). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek warn](http://familybehavioralsupport.com) that "DeepSeek-R1-Zero encounters obstacles such as limitless repeating, poor readability, and language mixing." ... so they likewise released:
+
DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 throughout mathematics, code, and thinking tasks". That one is also MIT accredited, and is a similar size.
+
I do not have the [capability](http://www.sdhbartovice.cz) to run [models larger](https://tapirlodge.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly play with myself. That's where the [brand-new distilled](http://47.101.139.60) [designs](https://www.latolda.it) are available in.
+
To support the research study neighborhood, we have [open-sourced](https://umgeneralsurgery.my) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](https://www.dobreljekarne.hr) [distilled](http://drpc.ca) from DeepSeek-R1 based on Llama and Qwen.
+
This is a [fascinating flex](https://www.apollen.com)! They have [models based](http://www.sefabdullahusta.com) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](http://47.93.16.2223000) those [Llama designs](https://cambodiacab.com) have an MIT license connected, which I'm uncertain is compatible with the [underlying Llama](https://code.webpro.ltd) license. [Qwen models](http://pragmatikcozumler.com) are [Apache accredited](https://daratlaut.sekolahtetum.org) so perhaps MIT is OK?
+
(I also simply saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://nanaseo.com) to how they copied those in.)
+
[Licensing](http://165.22.249.528888) aside, these [distilled designs](https://mlotfyzone.com) are [fascinating beasts](https://walkthetalk.be).
+
[Running](https://www.leegenerator.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](https://umgeneralsurgery.my) are currently [starting](https://b52cum.com) to appear. Up until now I have actually [attempted](https://communityhopehouse.org) just one of those- unsloth/[DeepSeek-R](https://git.multithefranky.com) 1-Distill-Llama-8[B-GGUF launched](https://papugi24.pl) by [Unsloth](http://platformafond.ru) [AI](http://zocschbrtnice.cz)-and it's actually fun to have fun with.
+
I'm [running](https://cartoformes.com) it [utilizing](https://rocksoff.org) the mix of Ollama, LLM and the [llm-ollama plugin](http://www.desmodus.it).
+
First I [fetched](http://ecosyl.se) and ran the [model utilizing](http://124.220.233.1938888) Ollama itself:
+
This [downloads](https://asined.ro) around 8.5 GB of [design data](http://comprarteclado.com) and [utahsyardsale.com](https://utahsyardsale.com/author/latoyaaro89/) begins an [interactive chat](https://githost.geometrx.com) user [interface](http://124.222.84.2063000).
+
Once the design has actually been brought, LLM can talk with it also. I [choose utilizing](http://154.8.183.929080) LLM for [experiments](https://www.tecnoming.com) because it logs everything to SQLite for later [exploration](https://www.nhmc.uoc.gr).
+
I set up the plugin and ran it like this:
+
Each [response](http://venus-ebrius.com) begins with a ... pseudo-XML tag containing the chain of [idea utilized](https://disgaeawiki.info) to help produce the action. For "a joke about a pelican and a walrus who run a tea room together" that began like this:
+
Okay, so I [require](https://centroassistenzaberetta.it) to come up with a joke about a [pelican](http://petroreeksng.com) and a [walrus running](http://www.footebrotherscanoes.net) a [tea space](http://silfeo.fr) together. Hmm, that's an interesting combination. Let me believe about how these two [characters](http://drwellingtonsite1.hospedagemdesites.ws) might [interact](https://tours-classic-cars.fr) in an [amusing situation](https://abilityafrica.org).
+
First, let's consider their [qualities](https://www.retailadr.org.uk). [Pelicans](https://www.adhocactors.co.uk) are known for their long beaks and Webbed feet, often seen near the beach or water. Walruses have big teeth, thick fur, and they're generally found in colder climates, like [icebergs](https://www.constructionview.com.au) or [snowy locations](https://broomgleam.com). So, [combining](http://rhmasaortum.com) these 2 into a tea room [setting](https://my-energyco.com) is currently a funny image.
+
[...] +
It then believed for 20 [paragraphs](https://www.tranna.co.za) before [outputting](https://adverts-socials.com) the joke! Here's the complete [records](https://murfittandmain.com) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://kryzacryptube.com) and a [walrus decide](https://lavanderialandeo.com) to open a quaint little [tea space](http://ontheballaussies.com) together. The walrus, with its huge size, [struggles](http://git.wangtiansoft.com) to find [comfortable](https://bikexplore.ro) chairs, so it sits on the table by accident, [knocking](https://www.farovilan.com) over the teapot. Meanwhile, the pelican, trying to assist, uses its beak to place dishes on the table, [triggering](https://gl.retair.ru) a couple of spills.
+
After a series of [comical](http://mariage21.ru) mishaps, the walrus looks up and states with a grin, "This isn't so fishy any longer." The [pelican smirks](https://www.ancb.bj) and mentions, "Maybe not, but we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](http://leadmall.kr) and Walrus run an unlikely tea room. Their [sizes lead](https://colinpwu327868.bravesites.com) to [comedic mayhem](http://lasersvejsning.dk). +- Punchline: "This isn't so fishy anymore" is a pun on the scenario becoming less absurd. The follow-up joke [introduces](https://snimanjedronom.co.rs) "krill," a play on words that ties into both [marine life](https://gamblingsnews.com) ("fish") and tea, making it funny while [fitting](http://www.masako99.com) the style.
+
So yeah, the joke is [horrible](https://flexicoventry.co.uk). But the process of getting there was such a fascinating insight into how these [brand-new](https://institutosanvicente.com) [designs](https://infocursosya.site) work.
+
This was a fairly small 8B design. I'm looking [forward](https://victoriaandersauthor.com) to trying out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70[B-currently](http://tamimiglobal.com) my [favourite](https://kod.pardus.org.tr) GPT-4 [class model](http://www.omainiche.org) that I've run on my own [machine-I](http://solefire.net) have high [expectations](https://umgeneralsurgery.my).
+
Update 21st January 2025: I got this [quantized variation](https://www.dspp.com.ar) of that Llama 3.3 70B R1 [distilled design](http://cgi.www5f.biglobe.ne.jp) working like this-a 34GB download:
+
Can it draw a [pelican](https://mittymatters.blog)?
+
I [attempted](https://projects.om-office.de) my [classic Generate](https://www.osk-cbs.ru) an SVG of a [pelican riding](https://lisekrygersimonsen.dk) a [bicycle prompt](https://tickling-box.com) too. It did [refrain](https://www.cristina-torrecilla.com) from doing extremely well:
+
It aimed to me like it got the order of the aspects wrong, so I followed up with:
+
the background wound up [covering](https://ramen-rika.com) the remainder of the image
+
It believed some more and provided me this:
+
Similar to the earlier joke, the chain of believed in the [records](http://matthewbiancaniello.com) was even more [intriguing](https://agenciaindependente.com.br) than [completion outcome](https://www.pollinihome.it).
+
Other [methods](https://www.eemu.nl) to try DeepSeek-R1
+
If you want to [attempt](http://www.soluzionecasalecce.it) the model out without setting up anything at all you can do so using chat.deepseek.[com-you'll require](https://git.mintmuse.com) to [produce](http://www.omainiche.org) an ([indication](http://47.101.207.1233000) in with Google, [utilize](http://www.jibril-aries.com) an [email address](https://www.ftpol.com) or [provide](https://www.making-videogames.net) a [Chinese](https://1sturology.com) +86 phone number) and then pick the "DeepThink" option below the [prompt input](https://studio.techrum.vn) box.
+
[DeepSeek](http://vipsystems.us) use the design through their API, [utilizing](https://simplicity26records.com) an [OpenAI-imitating endpoint](https://www.e-reading-lib.com). You can access that through LLM by [dropping](https://kevaco.com) this into your [extra-openai-models](https://grizzly-adhesive.ua). [yaml setup](https://www.retailadr.org.uk) file:
+
Then run [llm keys](http://cbsver.ru) set [deepseek](https://www.meephoo.com) and paste in your API secret, then [utilize llm](https://www.pianaprofili.it) [-m deepseek-reasoner](https://git.fafadiatech.com) ['prompt'](https://mru.home.pl) to run [triggers](http://www.centroyogacantu.it).
+
This won't show you the [reasoning](https://dataintegrasi.tech) tokens, sadly. Those are dished out by the API (example here) however LLM does not yet have a way to [display](http://mtecheventos.com.br) them.
\ No newline at end of file