The quality of custom models trained with proper reasoning datasets[0] even with small parameters (3-7B is sweet spot) is incredible now<p>[0]: cartesien.io or Salesforce&#x27;s WebscaleRL

After a quick content  browse, my understanding is  this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be &#x27;retrained&#x27; to reason (score) better on a specific topic , e.g. math was used in the paper

reasoning capability might just be some specific combinations of mirror neurons.<p>even some advanced math usually evolves applying patterns found elsewhere into new topics

Except learning to reason is a far cry from curve fitting. Our brains have more than five parameters.

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk so there is still room for improvement.