Off-policy temporal difference learning with distribution adaptation in fast mixing chains
Crossref DOI link: https://doi.org/10.1007/s00500-017-2490-1
Published Online: 2017-01-30
Published Print: 2018-02
Update policy: https://doi.org/10.1007/springer_crossmark_policy
Givchi, Arash
Palhang, Maziar
License valid from 2017-01-30